Site Reliability Engineer
Posted on Dec 2, 2021 by CV-Library
You will be working in a team of SRE engineers based in UK and at offshore location.
* Closely work with Development team and QA team to set high benchmark for development and QA.
* Develop and maintain strong colloaboration with Network reliability team and Network Design team.
* Identify Track & implement resiliency improvements to deliver 99.999% availability
* Define SLI, SLO and Error budgets.
* Develop and maintian CI/CD for Infrastructure.
* Build software and systems to manage platform infrastructure and applications
* Improve reliability, quality, and time-to-market of our suite of software solutions
* Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
* Gather and analyse metrics from both operating systems and applications to assist in performance tuning and fault finding
* Automate as many aspects of platform services as possible such that manual trivial support activity is reduced or eliminated
* Ensure new systems / services deployed can be integrated to existing monitoring and management tools so that the performance of the service and deviations from normal are easily anticipated and instrumented
* Provide operational support to Middleware, Containers, Databases.
* Champion an “automate first” attitude, developing continuous integration pipelines to ensure our platforms can scale whilst remaining operationally efficient.
What you'll bring:
* Must have demonstrated breadth of experience by having a background in technology architecture, design, and development
* Strong background in System Administration/architecture
* Ability to make good technical decisions and to convince others as to the merits and reasons for those decisions
* Demonstrable track record in software reliability working within a software delivery team
* Knowledge of emerging technologies and R&D to gain a better understanding
* A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
* Exceptional operational support and engineering experience and the ability to apply that knowledge to solve the complex problem of running platform services and resources reliability at scale
* Have a blend of skills including sysadmin, security(LDAP, OAuth2.0), automation(Terraform, Ansible, CI/CD) and the ability to code with a deep knowledge of Operating Systems and Application Source Code(Python3.x), Networking , Alerting and Monitoring (Grafana, Prometheus, Kibana, Elasticsearch)
* Confident in interacting with developers and deep diving into both Application and Infrastructure code
* Demonstrable knowledge and practical experience of managing hybrid infrastructure environments as a consumer of VMWare, K8s and Docker using IaC like Terraform, Ansible
* Strong incident handling and problem resolution skills using Service-now, Jira & confluence to provide a trusted service to our business.
* Broad background knowledge of infrastructure including Network, storage, performance, security
Set up alerts to get notified of new vacancies.
£70k - £80k Annual
£50k - £90k Annual
£85k - £90k Annual
£450 - £500 Daily
£45k - £60k Annual
£60k - £90k Annual
£55k - £65k Annual
£450 - £500 Daily
£550 - £650 Daily
£500 - £550 Daily