Linux, HPC, and Kubernetes Systems Engineer
Job Title: Linux, HPC, and Kubernetes Systems Engineer
Location: Remote and onsite required as needs be in Wallingford
Job Type: Contract 3 months - Inside IR35
Job Summary: We are looking for a highly skilled Linux, HPC, and Kubernetes Systems Engineer to join our growing team. This position will be responsible for maintaining and troubleshooting High-Performance Computing (HPC) environments, with a focus on Lenovo and Ubiquity platforms, while also managing Kubernetes clusters. The ideal candidate will have strong experience in Linux administration, HPC systems, and Kubernetes, along with a proven ability to solve complex technical issues and optimize infrastructure performance.
Key Responsibilities:
- Manage and maintain HPC environments with a primary focus on Lenovo and Ubiquity platforms.
- Install, configure, and troubleshoot Kubernetes clusters in a production environment.
- Monitor and optimize Linux-based systems, ensuring reliability and performance for HPC and containerized applications.
- Troubleshoot complex issues in HPC clusters and Kubernetes infrastructure, including hardware, software, networking, and performance-related problems.
- Manage resource allocation, workload scheduling, and performance tuning for HPC environments.
- Implement and manage container orchestration using Kubernetes, ensuring scalability and high availability.
- Automate system processes and improve operational efficiency using Scripting (Bash, Python, etc.).
- Perform system upgrades, apply patches, and monitor security vulnerabilities in Linux, HPC, and Kubernetes environments.
- Collaborate with cross-functional teams to design, deploy, and optimize infrastructure solutions for both HPC and Kubernetes-based workloads.
- Provide documentation, training, and technical support to end-users and internal stakeholders.
- Ensure that backup and recovery strategies are effectively implemented for both HPC and Kubernetes environments.
- Monitor system health and performance using appropriate tools (eg, Prometheus, Grafana) and take proactive measures to address potential issues.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience.
- Proven experience in Linux system administration (Red Hat, CentOS, or Ubuntu).
- Strong experience managing HPC systems, particularly with Lenovo and Ubiquity platforms.
- Extensive hands-on experience with Kubernetes cluster deployment, maintenance, and troubleshooting.
- Deep understanding of containerization technologies like Docker and Kubernetes.
- Strong troubleshooting skills across Linux, HPC environments, and Kubernetes infrastructures.
- Proficiency in Scripting languages (Bash, Python) for automation and process improvement.
- Knowledge of cluster management and workload scheduling software (eg, SLURM, PBS) for HPC environments.
- Familiarity with networking protocols, server hardware, storage solutions, and system monitoring tools.
- Ability to work independently in a fast-paced environment, managing multiple tasks and priorities.
Preferred Skills:
- Experience with cloud-based Kubernetes deployments (AWS, Azure, GCP).
- Familiarity with container networking, service discovery, and load balancing (eg, Istio, Envoy).
- Knowledge of DevOps tools and methodologies (eg, Ansible, Terraform).
- Experience with virtualization and container security practices.
- Experience working in research, academic, or enterprise-level environments.
Benefits:
- Competitive salary and benefits package.
- Health, dental, and vision insurance.
- Paid time off, holidays, and professional development opportunities.
- Opportunity to work in a cutting-edge technological environment.
Reference: 2829829272
Linux, HPC, and Kubernetes Systems Engineer
Posted on Sep 30, 2024 by WNTD
Job Title: Linux, HPC, and Kubernetes Systems Engineer
Location: Remote and onsite required as needs be in Wallingford
Job Type: Contract 3 months - Inside IR35
Job Summary: We are looking for a highly skilled Linux, HPC, and Kubernetes Systems Engineer to join our growing team. This position will be responsible for maintaining and troubleshooting High-Performance Computing (HPC) environments, with a focus on Lenovo and Ubiquity platforms, while also managing Kubernetes clusters. The ideal candidate will have strong experience in Linux administration, HPC systems, and Kubernetes, along with a proven ability to solve complex technical issues and optimize infrastructure performance.
Key Responsibilities:
- Manage and maintain HPC environments with a primary focus on Lenovo and Ubiquity platforms.
- Install, configure, and troubleshoot Kubernetes clusters in a production environment.
- Monitor and optimize Linux-based systems, ensuring reliability and performance for HPC and containerized applications.
- Troubleshoot complex issues in HPC clusters and Kubernetes infrastructure, including hardware, software, networking, and performance-related problems.
- Manage resource allocation, workload scheduling, and performance tuning for HPC environments.
- Implement and manage container orchestration using Kubernetes, ensuring scalability and high availability.
- Automate system processes and improve operational efficiency using Scripting (Bash, Python, etc.).
- Perform system upgrades, apply patches, and monitor security vulnerabilities in Linux, HPC, and Kubernetes environments.
- Collaborate with cross-functional teams to design, deploy, and optimize infrastructure solutions for both HPC and Kubernetes-based workloads.
- Provide documentation, training, and technical support to end-users and internal stakeholders.
- Ensure that backup and recovery strategies are effectively implemented for both HPC and Kubernetes environments.
- Monitor system health and performance using appropriate tools (eg, Prometheus, Grafana) and take proactive measures to address potential issues.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience.
- Proven experience in Linux system administration (Red Hat, CentOS, or Ubuntu).
- Strong experience managing HPC systems, particularly with Lenovo and Ubiquity platforms.
- Extensive hands-on experience with Kubernetes cluster deployment, maintenance, and troubleshooting.
- Deep understanding of containerization technologies like Docker and Kubernetes.
- Strong troubleshooting skills across Linux, HPC environments, and Kubernetes infrastructures.
- Proficiency in Scripting languages (Bash, Python) for automation and process improvement.
- Knowledge of cluster management and workload scheduling software (eg, SLURM, PBS) for HPC environments.
- Familiarity with networking protocols, server hardware, storage solutions, and system monitoring tools.
- Ability to work independently in a fast-paced environment, managing multiple tasks and priorities.
Preferred Skills:
- Experience with cloud-based Kubernetes deployments (AWS, Azure, GCP).
- Familiarity with container networking, service discovery, and load balancing (eg, Istio, Envoy).
- Knowledge of DevOps tools and methodologies (eg, Ansible, Terraform).
- Experience with virtualization and container security practices.
- Experience working in research, academic, or enterprise-level environments.
Benefits:
- Competitive salary and benefits package.
- Health, dental, and vision insurance.
- Paid time off, holidays, and professional development opportunities.
- Opportunity to work in a cutting-edge technological environment.
Reference: 2829829272
Alert me to jobs like this:
Amplify your job search:
Expert career advice
Increase interview chances with our downloads and specialist services.
Visit Blog