Site Reliability Engineer
Posted on Jan 28, 2021 by Request Technology
A prestigious company is on the search for a Software Engineer. This is a site reliability engineer and they need to be an expert and have a lot of experience with Linux administration, Puppet, Docker, and Python Scripting. They will be responsible for having extensive knowledge with infrastructure and application monitoring tools as well as be able to implement IaC conepts using Terraform, Chef, and Puppet. This is a full time position and can be worked remote.
- Implement tools and processes necessary to achieve required SLOs for Company Platform.
- Define and implement CI/CD pipelines.
- Automate delivery of platform services using infrastructure-as-a-code. Build self-service playbooks for platform which can be consumed across globally distributed teams at Company.
- Define and implement incident response management process, deploy necessary tools.
- Fix support and escalation issues.
- Conduct post-incident reviews.
- Collaborate with application and business stakeholders to ensure high-quality product is developed and deployed in production. Work diligently with other engineering teams to ratify release processes necessary to meet business goals.
- Drive continuous improvement process
- Expert knowledge of one of the major public cloud platforms (Azure, AWS, Google Cloud Platform)
- Hands-on programming experience in Python or other object-oriented programming languages.
- Expert knowledge of Infrastructure and Application Monitoring tools: Prometheus, Grafana, DataDog, etc
- Experience implementing IaC concepts using Terraform, Chef, Puppet.
- Experience with Elasticsearch, Kibana
- Experience administering Databases
- Expert in Linux administration.
- Expert knowledge of Docker, Helm.
- Experience implementing CI/CD for cloud native applications.
- Experience with deploying applications that utilize Service Mesh
- Experience administering Kubernetes clusters.
- Experience defining and implementing incident response management processes.
- Bachelor's degree
- 8+ years' experience in software engineering
- Master's degree - preferred
- Understanding of GitOps principals.
- Experience implementing secure and compliant Kubernetes platforms.
- Experience deploying and managing stateful distributed service in Kubernetes.
- Experience with security scanning tools.
- Experience with intrusion detection systems.
- Experience with various messaging systems, such as Kafka or RabbitMQ
- Working knowledge of Databricks, Team Foundation Server, TeamCity, Octopus deploys and DataDog
Set up alerts to get notified of new vacancies.