Principal DevOps Engineer
Posted on Sep 21, 2021 by Oracle
We are looking for a Site Reliability Engineer to join the HCGBU - Delivery Platform team. The ideal candidate is technically strong, and able to persevere through complexity and ambiguity - They've directly worked on services that are highly available, scalable, and redundant. Automation is a core tenet for everything they do. They understand that simple systems are easier to operate and troubleshoot. They can balance speed with iteration and incremental improvements. They've made life easier for other developers and have motivated their teams to make both process and service improvements.
If you are passionate about taking ownership of big technical challenges and producing software solutions that have broad, significant impacts - come join our team!
Candidates should have broad working knowledge across multiple domains, but we love to see specialization as well. The basics we expect are: Networking, Linux Systems Engineering, Software Engineering/Automation, Database Services (big data technologies) and Distributed Systems.
In this role, you will:
As a DevOps, within the HCGBU - Delivery Platfrom team, you will assist in designing and maintaining hosting, process, transform, and analyze operational processes. Your first mission will be to work closely with our software developers and Cloud architects to define a sustainable operational model for HCGBU services. This includes mechanisms to scale the systems by way of easy-to-use tooling and automation. You will work in concert with developers to evolve systems/products for better scalability, reliability and enable developer velocity. You will also author and maintain operational run books to help reduce mean Time of Incidents (TOI), and be responsible for managing and triaging operational tickets pertaining to the data platform services. Emphasis on driving prioritization and execution of work based on business impact is a must.
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence.
Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services.
Develop designs, architectures, standards, and methods for large-scale distributed systems.
Automate CI/CD pipelines to deliver changes fast in a safe way with zero downtime.
Provide comprehensive, automated tools for product developers so they can self-serve thier operations.
Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.
Work with other engineers within the HCGBU - Delivery Platform team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
Articulate technical characteristics of services and technology areas and guide development teams to engineer and add capabilities to internal Oracle services.
Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs).
Utilize a deep understanding of service topology and the dependencies required to troubleshoot issues and define mitigations.
Understand and explain the effect of product architecture decisions on distributed systems.
Serve as part of a 24x7 On Call rotation in support of the HCGBU - Delivery Platform
Professional curiosity and a desire to a develop deep understanding of services and technologies.
Bachelor's or Master's degree in Computer Science or equivalent related field experience
Experience with Python, Ruby, bash, and other Scripting programming languages
Experience working with fault tolerant, highly available, high throughput, distributed, scalable systems
Aptitude to be a good team player and the desire to learn and implement new Cloud technologies as needed
Excellent organizational, verbal, and written communication skills
5+ years of experience in two or more of the following
Developing/operating large scale distributed services/applications
System Administration including Linux internals, TCP/IP, DNS, Load balancing technologies
Container administration and development utilizing Kubernetes, Docker, Mesos, or similar
Infrastructure automation through Terraform, Chef, Ansible, Puppet or similar
Big Data Infrastructure including Hadoop, Spark, NoSQL, Object Storage, or similar
Experience with TCP/IP and socket programming
Knowledge of cloud compute technologies, network monitoring, data processing and analytics
Experience with CI/CD pipelines
Proficiency in working with git
Analyze, design develop, troubleshoot and debug software programs for commercial or end user applications. Writes code, completes programming and performs testing and debugging of applications.
As a member of the software engineering division, you will analyze and integrate external customer specifications. Specify, design and implement modest changes to existing software architecture. Build new products and development tools. Build and execute unit tests and unit test plans. Review integration and regression test plans created by QA. Communicate with QA and porting engineering to discuss major changes to functionality.
Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 7 years of software engineering or related experience
Set up alerts to get notified of new vacancies.