Senior Systems Site Reliability Engineer - Spain
Posted on Sep 28, 2021 by PSD Technology Contracts Ltd.
An experienced SRE, my client is an international organisation providing technology and communications solutions to the aviation sector.
As Senior Systems Site Reliability Engineer (Kubernetes), you will be involved in exciting technical challenges by analysing, troubleshooting, and designing vital services, platforms, and infrastructure while always thinking about reliability, scalability, resilience, security, and performance. You will be accountable for ensuring that services are designed and delivered to be mission critical with a focus on security, resiliency, scale, and performance.
You will be a part of the team responsible for helping to support 24x7 uptime and availability of production mission critical customer facing cloud services, distributed across multiple regions. You'll help to create more consistent, automated push button environments across all tiers, proactively test and tune all aspects of the infrastructure, streamline CI/CD processes, monitor, and respond to system notifications and alerts and continually work to optimize and improve the performance, security and reliability of the systems.
Within this role you would be able to make a significant impact in leading the team and direction of all SRE processes.
Currently fully remote but when restrictions allow will move to a hybrid model of 3 days in office and 2 days from home. Role is based in Barcelona, Spain.
- Help build a Site Reliability Engineering culture across the organization by sharing your best practices, approaches, documentation, and code with other engineering teams.
- Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually.
- Able to troubleshoot complicated, cross platform issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices.
- Monitor application performance take steps to improve overall application performance and stability and follow through with implementation.
- Conduct system analysis, configuration management and develops improvements for system software performance, availability, and reliability.
- Design, write, ship, and motivate the creation of software and systems to increase observability, product reliability and organizational efficiency.
- Work closely with software engineers and testers to ensure the system is responding properly to no-functional requirements such as performance, security, and availability.
- Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it.
- Maintain and monitoring deployment, orchestration, of the Servers, docker containers, databases, and general Back End infrastructure
- Keep up-to date with security and proactively identify, diagnose, and solve complex security issues
- In Monitoring and analysing infrastructure performance utilising Nagios, ServiceNow and New Relic or other monitoring tools
- Create - Dashboards
- Some Database experience in either PostgreSQL, MySQL or NoSql (MongoDB)
- Strong Kubernetes and Docker experience
- Strong Linux/CentOS/RedHat Enterprise experience as a Linux Systems Administrator
- Hands on experience in utilising configuration management tools such as Puppet, Chef or Ansible
- Good understanding of Internet protocols and networking fundamentals
For further information please contact Nick Fraser