Site Reliability Engineer - SRE

Posted on Feb 28, 2025 by Investigo Change Solutions
Birmingham, United Kingdom
IT
Immediate Start
£65k - £70k Annual
Full-Time

Site Reliability Engineer

Permanent

£70,000

Midlands based/Hybrid working

As the Site Reliability Engineer you will be joining the clients Platform Engineering Team to help build, manage, and support some of the clients core infrastructure.

Key areas of responsibilities:

  • Ensuring the platform services meet high standards for availability, reliability, and performance
  • Defining and promoting best practices for observability, incident management, and operational processes
  • Leading incident management efforts
  • Partner with platform engineers and product teams
  • Develop and maintain monitoring, logging, and alerting solutions to provide actionable insights into platform health and performance

Key Skills

  • You will have a deep understanding of concepts such as SLAs, SLOs, and error budget
  • You will have expertise in tools such as Prometheus, Grafana, Loki, or similar
  • You will have experience in leading incident response processes, including root cause analysis and implementing preventative measures
  • You will be proficient in Scripting languages (eg, Python, Bash)
  • You will need to work effectively with cross functional teams
  • You will be a problem solver

Reference: 2905239213

https://jobs.careeraddict.com/post/100518680

This Job Vacancy has Expired!

Site Reliability Engineer - SRE

Posted on Feb 28, 2025 by Investigo Change Solutions

Birmingham, United Kingdom
IT
Immediate Start
£65k - £70k Annual
Full-Time

Site Reliability Engineer

Permanent

£70,000

Midlands based/Hybrid working

As the Site Reliability Engineer you will be joining the clients Platform Engineering Team to help build, manage, and support some of the clients core infrastructure.

Key areas of responsibilities:

  • Ensuring the platform services meet high standards for availability, reliability, and performance
  • Defining and promoting best practices for observability, incident management, and operational processes
  • Leading incident management efforts
  • Partner with platform engineers and product teams
  • Develop and maintain monitoring, logging, and alerting solutions to provide actionable insights into platform health and performance

Key Skills

  • You will have a deep understanding of concepts such as SLAs, SLOs, and error budget
  • You will have expertise in tools such as Prometheus, Grafana, Loki, or similar
  • You will have experience in leading incident response processes, including root cause analysis and implementing preventative measures
  • You will be proficient in Scripting languages (eg, Python, Bash)
  • You will need to work effectively with cross functional teams
  • You will be a problem solver

Reference: 2905239213

CareerAddict

Alert me to jobs like this:

Amplify your job search:

CV/résumé help

Increase interview chances with our downloads and specialist services.

CV Help

Expert career advice

Increase interview chances with our downloads and specialist services.

Visit Blog

Job compatibility

Increase interview chances with our downloads and specialist services.

Start Test