Site Reliability Engineer
Posted on Jul 14, 2022 by Request Technology
*Open to sponsorship must have at least two years left*
*Hybrid 2 days in office 3 days remote*
A prestigious company is on the search for an Associate Principal Site Reliability Engineer. They are looking for someone who has experience with AWS, Azure, or GCP cloud environments. This position will be working with Linux system, containers such as Docker and Kubernetes, and must have Scripting experience preferably with Python. This company ideally wants someone with 5-8 years of experience in a similar site reliability engineer position.
- Collaborate with development, operations, and infrastructure teams to ensure availability of services, and to work through implementation issues.
- Develop automation for incident response and to prevent problem recurrence
- Create and enhance runbooks to respond to service outages or degradations
- Assess the production readiness of services
- Define and track operational metrics for production performance, reliability, scalability, and availability
- Architect, develop and maintain shared services and tools to improve reliability and reduce toil across the organization
- Bachelor's or Master's Degrees in Computer Science, Information Systems or other related field. Or equivalent work experience.
- Minimum of 5-8 years of experience in Site Reliability Engineering/DevOps
- The requirements listed are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the primary functions.
- Experience with maintaining and troubleshooting large-scale distributed systems
- Experience with Agile/Scrum methodology
- Experience managing infrastructure in public cloud environments like AWS (preferred), Azure or GCP
- Experience providing visibility using monitoring and alerting tools like Splunk, SignalFx, AppDynamics, Datadog, StackDriver, Sysdig, Prometheus or Grafana
- Programming/Scripting experience in languages like Java, Bash, Python or Go
- Experience with distributed messaging systems like Kafka, RabbitMQ, or ActiveMQ
- Experience with container orchestration systems like Kubernetes, Mesos, Docker Swarm or Rancher
- Experience with using Continuous Integration and Continuous Delivery (CI/CD) tools like Jenkins, Travis, Harness, Spinnaker, Appveyor, CodeBuild or CodePipeline.