Linux, HPC, and Kubernetes Systems Engineer

Posted on Sep 30, 2024 by WNTD
Wallingford, Oxfordshire, United Kingdom
IT
Immediate Start
Annual Salary
Contract/Project

Job Title: Linux, HPC, and Kubernetes Systems Engineer

Location: Remote and onsite required as needs be in Wallingford

Job Type: Contract 3 months - Inside IR35

Job Summary: We are looking for a highly skilled Linux, HPC, and Kubernetes Systems Engineer to join our growing team. This position will be responsible for maintaining and troubleshooting High-Performance Computing (HPC) environments, with a focus on Lenovo and Ubiquity platforms, while also managing Kubernetes clusters. The ideal candidate will have strong experience in Linux administration, HPC systems, and Kubernetes, along with a proven ability to solve complex technical issues and optimize infrastructure performance.

Key Responsibilities:

  • Manage and maintain HPC environments with a primary focus on Lenovo and Ubiquity platforms.
  • Install, configure, and troubleshoot Kubernetes clusters in a production environment.
  • Monitor and optimize Linux-based systems, ensuring reliability and performance for HPC and containerized applications.
  • Troubleshoot complex issues in HPC clusters and Kubernetes infrastructure, including hardware, software, networking, and performance-related problems.
  • Manage resource allocation, workload scheduling, and performance tuning for HPC environments.
  • Implement and manage container orchestration using Kubernetes, ensuring scalability and high availability.
  • Automate system processes and improve operational efficiency using Scripting (Bash, Python, etc.).
  • Perform system upgrades, apply patches, and monitor security vulnerabilities in Linux, HPC, and Kubernetes environments.
  • Collaborate with cross-functional teams to design, deploy, and optimize infrastructure solutions for both HPC and Kubernetes-based workloads.
  • Provide documentation, training, and technical support to end-users and internal stakeholders.
  • Ensure that backup and recovery strategies are effectively implemented for both HPC and Kubernetes environments.
  • Monitor system health and performance using appropriate tools (eg, Prometheus, Grafana) and take proactive measures to address potential issues.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience.
  • Proven experience in Linux system administration (Red Hat, CentOS, or Ubuntu).
  • Strong experience managing HPC systems, particularly with Lenovo and Ubiquity platforms.
  • Extensive hands-on experience with Kubernetes cluster deployment, maintenance, and troubleshooting.
  • Deep understanding of containerization technologies like Docker and Kubernetes.
  • Strong troubleshooting skills across Linux, HPC environments, and Kubernetes infrastructures.
  • Proficiency in Scripting languages (Bash, Python) for automation and process improvement.
  • Knowledge of cluster management and workload scheduling software (eg, SLURM, PBS) for HPC environments.
  • Familiarity with networking protocols, server hardware, storage solutions, and system monitoring tools.
  • Ability to work independently in a fast-paced environment, managing multiple tasks and priorities.

Preferred Skills:

  • Experience with cloud-based Kubernetes deployments (AWS, Azure, GCP).
  • Familiarity with container networking, service discovery, and load balancing (eg, Istio, Envoy).
  • Knowledge of DevOps tools and methodologies (eg, Ansible, Terraform).
  • Experience with virtualization and container security practices.
  • Experience working in research, academic, or enterprise-level environments.

Benefits:

  • Competitive salary and benefits package.
  • Health, dental, and vision insurance.
  • Paid time off, holidays, and professional development opportunities.
  • Opportunity to work in a cutting-edge technological environment.

Reference: 2829829272

https://jobs.careeraddict.com/post/95659845

Linux, HPC, and Kubernetes Systems Engineer

Posted on Sep 30, 2024 by WNTD

Wallingford, Oxfordshire, United Kingdom
IT
Immediate Start
Annual Salary
Contract/Project

Job Title: Linux, HPC, and Kubernetes Systems Engineer

Location: Remote and onsite required as needs be in Wallingford

Job Type: Contract 3 months - Inside IR35

Job Summary: We are looking for a highly skilled Linux, HPC, and Kubernetes Systems Engineer to join our growing team. This position will be responsible for maintaining and troubleshooting High-Performance Computing (HPC) environments, with a focus on Lenovo and Ubiquity platforms, while also managing Kubernetes clusters. The ideal candidate will have strong experience in Linux administration, HPC systems, and Kubernetes, along with a proven ability to solve complex technical issues and optimize infrastructure performance.

Key Responsibilities:

  • Manage and maintain HPC environments with a primary focus on Lenovo and Ubiquity platforms.
  • Install, configure, and troubleshoot Kubernetes clusters in a production environment.
  • Monitor and optimize Linux-based systems, ensuring reliability and performance for HPC and containerized applications.
  • Troubleshoot complex issues in HPC clusters and Kubernetes infrastructure, including hardware, software, networking, and performance-related problems.
  • Manage resource allocation, workload scheduling, and performance tuning for HPC environments.
  • Implement and manage container orchestration using Kubernetes, ensuring scalability and high availability.
  • Automate system processes and improve operational efficiency using Scripting (Bash, Python, etc.).
  • Perform system upgrades, apply patches, and monitor security vulnerabilities in Linux, HPC, and Kubernetes environments.
  • Collaborate with cross-functional teams to design, deploy, and optimize infrastructure solutions for both HPC and Kubernetes-based workloads.
  • Provide documentation, training, and technical support to end-users and internal stakeholders.
  • Ensure that backup and recovery strategies are effectively implemented for both HPC and Kubernetes environments.
  • Monitor system health and performance using appropriate tools (eg, Prometheus, Grafana) and take proactive measures to address potential issues.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience.
  • Proven experience in Linux system administration (Red Hat, CentOS, or Ubuntu).
  • Strong experience managing HPC systems, particularly with Lenovo and Ubiquity platforms.
  • Extensive hands-on experience with Kubernetes cluster deployment, maintenance, and troubleshooting.
  • Deep understanding of containerization technologies like Docker and Kubernetes.
  • Strong troubleshooting skills across Linux, HPC environments, and Kubernetes infrastructures.
  • Proficiency in Scripting languages (Bash, Python) for automation and process improvement.
  • Knowledge of cluster management and workload scheduling software (eg, SLURM, PBS) for HPC environments.
  • Familiarity with networking protocols, server hardware, storage solutions, and system monitoring tools.
  • Ability to work independently in a fast-paced environment, managing multiple tasks and priorities.

Preferred Skills:

  • Experience with cloud-based Kubernetes deployments (AWS, Azure, GCP).
  • Familiarity with container networking, service discovery, and load balancing (eg, Istio, Envoy).
  • Knowledge of DevOps tools and methodologies (eg, Ansible, Terraform).
  • Experience with virtualization and container security practices.
  • Experience working in research, academic, or enterprise-level environments.

Benefits:

  • Competitive salary and benefits package.
  • Health, dental, and vision insurance.
  • Paid time off, holidays, and professional development opportunities.
  • Opportunity to work in a cutting-edge technological environment.

Reference: 2829829272

Share this job:
CareerAddict

Alert me to jobs like this:

Amplify your job search:

CV/résumé help

Increase interview chances with our downloads and specialist services.

CV Help

Expert career advice

Increase interview chances with our downloads and specialist services.

Visit Blog

Job compatibility

Increase interview chances with our downloads and specialist services.

Start Test

Similar Jobs

Admin

Wallingford, Oxfordshire, United Kingdom

Payroll Officer (FTC until May 2025)

Wallingford, Oxfordshire, United Kingdom

Yard Operative

Wallingford, Oxfordshire, United Kingdom

CNC Machinist / Programmer Supervisor

Wallingford, Oxfordshire, United Kingdom