Posted on Oct 24, 2020 by IDC Technologies Solutions Ltd
As a Site Reliability Engineer you will design and implement web applications and REST API services using a microservice-based infrastructure. The new technology stack includes (AWS), Docker/K8S, NoSQL/SQL database, and a range of monitoring tools. Your focus will be on maximizing system uptime.
Team members all participate in an on-call rotation. You will build innovative automated solutions and tools to help debug and resolve problems in production and prevent them from recurring. Further, you will proactively seek out system weaknesses and find ways to fix them before they cause production issues using monitoring data, watching trends, and using Chaos Engineering.
- Keeping your assigned site or service up and running or getting it back up and running quickly when failure occurs
- Working closely with internal partners and teams to ensure that we ship software that meets security, Compliance, SLA, and performance requirements
- Writing, updating, and using documentation, including runbooks/playbooks
- Automating work including infrastructure needs, testing, failover solutions, failure mitigation.
- Debugging complex problems across an entire stack and creating solid solutions
- Developing CI/CD processes to improve cadence
- Using Chaos Engineering to test what you build under real-world conditions
Key Skills and Attributes Required
- 7 years' experience with software engineering, software development, or system operations
- Excellent communication skills, both verbal and written Knows their way around a Unix/Linux Shell, can write Shell Scripts, and understands Linux internals
- Experience debugging complex problems
- Experience designing, building, and operating large-scale production systems
- Knows Python, Java, Go, Rust, or similar
- Understands networking and messaging, especially between services
- Has hands-on experience using source control (Git, GitHub, GitLab) and feature branching strategies
- Has experience with a variety of open-source databases (Postgres, Redis etc.)
- Experience with containers, such as with Docker or Kubernetes
- Experience with monitoring and observability such as with Datadog, Sensu, New Relic, and Nagios
- Experience automating infrastructure, testing, and deployments using tools like Terraform, Helm and can explain the Infrastructure as Code paradigm
- Experience with configuration management
- Understands the idea behind Chaos Engineering, even if they haven't yet implemented it themselves It's not expected that any single candidate would have expertise across all of these areas - we're looking for candidates that are particularly strong in a few areas, and have some interest and capabilities in others.
At Shell New Energies IT Ops our mission is to leverage technology specially Cloud Native tech. Stack to support Shell Power strategy. We will support bringing clean energy generation, consumption, Trading together in one integrated platform and provide our developers, data engineers, an opportunity to leverage ready models and tools in ML, data and analytics. Shell New Energies has a startup culture that emphasizes transparency, collaboration and career growth, with the ability to work on small, nimble teams. Our team members can create change at scale and have an opportunity to truly disrupt and shape.