Team Lead (Infrastructure & Reliability)
Posted on Oct 9, 2021 by Claremont Consulting Ltd
Team Lead (Infrastructure & Reliability) - Stockholm, Sweden
I am currently working with a global music streaming platform who are looking for an experienced Team Lead to join their Infrastructure & Reliability team in Stockholm, Sweden on a permanent basis.
You'll oversee a team whose main goal is to help other engineers help themselves by providing a modern infrastructure platform that other teams can use to build, deploy, monitor, and debug distributed apps.
Each team is solely responsible for their own service deployment, operation, and monitoring. Your goal is to make it as simple and enjoyable as possible for them to do so by automating tasks such as observability and logging, certificate, and domain management, and making it simple to deploy and operate databases and data pipelines, all while ensuring that the applications deployed adhere to security best practises.
They strongly believe in infrastructure as code, therefore you'll be working with a variety of cutting-edge technologies and tools, such as Terraform for infrastructure automation, Kubernetes operators, the Grafana Labs stack, and modern distributed cloud service architectures.
You'll mix vision and knowledge with getting your hands dirty with technology as the leader of the infrastructure team. You'll oversee working with the team to define the future roadmap for our infrastructure platform, as well as providing technical leadership and advice and actively participating in the platform's ongoing development.
Our technology stack includes everything from microservices with low latency, high throughput, and high scalability to massive streaming and batch analytics workloads and everything in between. Even though they are quite polyglot, they mostly work with Go and Scala. Their services use gRPC or REST to communicate and are hosted on the Google Cloud Platform, usually on GKE or AppEngine.
They employ a variety of storage options, but for transactional data, they choose CloudSQL, Cloud Datastore, or ElasticSearch, and for analytics, they prefer BigQuery.
For monitoring, they rely mainly on the Grafana stack, which includes Prometheus, Cortex, and Loki, but they also use Stack Driver and Sentry for distributed tracing and exception reporting.
Their client applications employ React or React native and are written in ES6/TypeScript. Their music players run on a variety of platforms, including Embedded and mobile, and share a common SDK developed in C++. They have their own hardware player as well.
As Team Lead for Infrastructure and Reliability you will be responsible for:
*Defining the long-term roadmap and goals for our infrastructure platform
*Providing technical leadership and guidance to the rest of the engineering organization on matters such as infrastructure and observability
*Help engineering teams apply best practices in cloud security and zero trust.
*Lead an engineering team focusing on building an infrastructure platform that enables other teams to move quickly but with guard rails to ensure best practices
*Supporting other engineering teams in making their services robust and reliable and performant
*Ensuring the development organisation measures and follows up on performance and availability metrics.
We believe you will have some of these experiences to fill the role:
*Experience with cloud infrastructure and modern cloud architectures
*Automating infrastructure at scale using tools such as Terraform
*Modern application runtime and orchestration tools such as Kubernetes and Docker/containerd
*Comfortable with reading and writing code in eg Go.
*Knowledge of different deployment strategies and operating procedures for applications and data pipelines.
*Knowledge of Kubernetes and related tooling such as Operators or cert-manager
*Cloud infrastructure security
*Application security & zero trust
*Experience with monitoring at scale
*Scaling logging and metrics across machines as well as multiple clusters
*Distributed tracing and log correlation
*Scaling logging across clusters and
*Expertise in problem-solving and understanding global-scale systems
*Troubleshooting problems and bottlenecks interactions between multiple services
*Interactions between client mobile apps, web sites, services
You're friendly, pragmatic, professional, communicative, precise, and fun to work with. You are probably comfortable with describing yourself as:
*An analytical problem solver
*Always looking to learn more.
*A great communicator - inside the team, as well as outside of it.
*An open person who says what you mean and mean what you say.
*Ready to get your hands dirty and join a team of doers!
Set up alerts to get notified of new vacancies.