Software Engineer- Site Reliability Engineer
Posted on Jan 14, 2021 by Epsilon
As a Site Reliability Engineer, youd proactively monitor and improve end-to-end system performance, identifying deficiencies and potential failures throughout our infrastructure. You will build deep, end-to-end knowledge of the complexity of ourplatform and continuously create improvements and automation to enhance durability, performance, and maintainability of the platform.You are central to theautomation of everything at Epsilon.
* Using Full Stack methodologies, develop and maintain scalable alerting, ticketing, and logging tools for debugging and monitoring
* Proactively monitor events, investigate issues, analyze solutions, and drive problems through to resolution using a wide variety of Ops tools and monitoring platforms to gain knowledge, understanding, and enable persistent monitoring of system availability, performance, and capacity
* Maintain our monitoring systems and develop new metrics/monitoring dashboards as additional coverage events become necessary
* Provide support to maintain a high availability environment
* Bachelors degree in Computer Science or related field
* Good understanding of Linux, Bash and shell scripting
* Knowledge of and experience with network stack, protocols, network management and monitoring tools
* Experience with automation tools: Puppet, Chef, Docker, Jenkins and/or Ansible
* Knowledge of Docker for container orchestration
* Experience with SQL
* Ability to work collaboratively in a fast-paced environment
* Experience working with Agile methodologies, preferably SCRUM.
* Excited by Big Data technologies and interested in integrating statistics and analytics to make our systems perform even better
Set up alerts to get notified of new vacancies.