This Job Vacancy has Expired!

Site Reliability Engineer: infra at scale for MLOps & MLaaS

Posted on Jan 13, 2022 by techfolk Limited

Bristol, Somerset, United Kingdom
IT
Immediate Start
Annual Salary
Full-Time

We're hiring customer focused SRE's and Systems Engineers/Developers to apply infrastructure support and site reliability engineering approaches to significant projects embracing emerging ML compute technology.

As a platform vendor and MLaaS provider, we offer you the opportunity to work across all sectors - including research organisations, universities, technology vendors and enterprises - encountering a diversity of ecosystems and best practices.

This team helps customers extend their data center and cloud provisioning ecosystems to incorporate our ML compute products, helps define and build pipelines for migration, refines production operations, and provides site reliability engineering expertise, automation and technical support throughout.

It's a mix of greenfield work, solution and product evolution, and technical collaboration. Becoming hands-on a subject matter expert, you'll empower others to develop new capabilities and accomplish things that were not previously possible, embracing emerging advances in machine intelligence.

Of particular interest are your skills applied to domains such as any of; site reliability engineering at scale; grid or cloud computing; HPC/scientific computing, OpenStack admin or development; data center orchestration; SDN/NFV; and/or developing Linux-based systems for novel IP-based protocols - or similar.

We're hiring an all-new team, including a lead engineer, and we'll be pleased to explore the possibilities with you.

A flavour of work within this team

  • Interfacing between customers, industry partners and our domain experts
  • Site reliability engineering for our MLaaS cloud-delivered platform
  • Defining and building effective infrastructure provisioning solutions
  • Designing and implementing compute workload migration pathways
  • Guiding on adapting and optimising software for new processors and systems
  • Designing, building and refining production pipelines and tooling
  • Optionally: contributing to aspects of our SDK product and virtual-IPU tools in Python and/or C++

We're looking for

  • Someone customer focused and solution oriented
  • A solid understanding of Computing, Maths or Engineering - accrued through formal education or equivalent applied practice
  • Linux configuration and management with Shell Scripting, Python or similar
  • Optionally; strong Python and/or C++ applied to Linux systems, infrastructure, or Back End development
  • Experience of configuring and managing hardware platforms, and infrastructure for clusters
  • Knowledge of Ethernet and IP Networking standards
  • Production admin skills with two or more of; Kubernetes, Docker, Grid Engine, Slurm, OpenStack, public/private cloud etc.
  • Comfortable debugging across multi-layer solutions
  • Familiarity with modern CI/CD and orchestration methods
  • An aptitude for trouble-shooting and a pragmatic application of engineering rigour: from the basic symptoms through to analysis and resolution with code fixes, work-arounds, improved documentation, tutorials, and collaboration with other teams

You may also bring - or may optionally like to gain - skills around

  • Running novel protocols on IP fabrics
  • HPC or hardware acceleration technologies
  • Data center infrastructure, storage, network, security, virtualisation
  • Compilers and Linux Kernel driver development, debugging and system configuration
  • Linux OS's and memory management

Salary and benefits

  • Compelling salary - talk with us about what you need
  • Stock options in a high growth potential start-up
  • Flexible and inclusive working environment - UK hours, work at the times that suit you
  • Discretionary relocation assistance
  • Optional four day week or part time working
  • Flexible amount of holiday + UK national/public holidays
  • 10% CPD time in your calendar, with supporting budget - in addition to the L&D of your role
  • Matched personal pension | healthcare | life assurance | dental | health cash plan | income protection

About us

Our team is at the forefront of the artificial intelligence revolution, enabling innovators from research and all sectors to expand human potential with technology. From day one you'll be contributing to important and interesting projects, at the forefront of the advanced ML community worldwide. We offer a collaborative, supportive and inclusive environment, where you can learn and flourish on a team with a diversity of perspectives. We're an equal opportunity employer and want to build a work environment where everyone is happy, productive and respectful so they can do their best work. If you have a disability or additional need that requires accommodation, just let us know.

Please note, we are only considering candidates who have an established right to work in the UK.

Location: central Bristol, Cambridge or London (Euston) - with some discretionary remote working once up to speed

Reference: 1461662416

Set up alerts to get notified of new vacancies.