IT Capacity & Performance Engineer
Posted on Mar 17, 2020 by Base 3
IT Capacity & Performance Engineer - Distributed Technologies Domain: Window, Linux, ESX, Storage, Network infra
The Capacity & Performance Management Squad (part of the Production tribe within the Group Technology Services Division) is currently organized around 3 main domains: One domain is responsible for the Mainframe and Tandem technologies, another domain is specialized in the Distributed Technologies and the third domain is supporting Business and Service Capacity Management. The Distributed Technologies domain is responsible for a full set of activities covering both Capacity and Performance aspects. It produces the capacity plan on a yearly basis and performs monthly follow ups; it collects the demand and forecasts the need in term of IT infrastructure; it handles capacity and performance operations to prevent Capacity and/or Performance incidents. All of this is done keeping in mind the cost impact (hardware and software) of the IT infrastructure evolution.
The team is also involved in various projects, ensuring that an application is rightly sized and well-tuned before going into production. Via the projects, capacity indicators are selected to report on resource usage, quality of service and Business volume evolution. Based on this, the team develops models to correlate resource consumption with volume changes and quality of service.
Your part of the deal
You will work as an IT Capacity Engineer in the Distributed Technologies Domain. This domain is responsible for the capacity and performance management processes covering the: Windows, Linux, ESX, Distributed Storage and Network infrastructure. Your speciality is in this Distributed Servers area, with in particular Linux guests being hosted on the ESX virtualization layer deployed on Converged (VCE) and Hyper-converged (Nutanix) infrastructure.
Your main responsibilities will encompass the following activities:
- Take ownership of the Linux capacity and performance management process.
- Regular monitoring of the capacity and performance of all the Distributed platforms.
- Identification and Investigation of capacity and performance issues/problems.
- Report in a comprehensive manner (adapted to the targeted audience) and propose mitigating actions.
- perform ad-hoc capacity and performance analysis, make recommendations and produce associated reports.
- Collaborate to the publication of the yearly capacity plan
- Organize the monthly capacity follow up meetings with Technical Product Owners.
- Participate to projects in order to collect and validate the demand, assess the feasibility of the proposed solution and make recommendations on the Design and Sizing based on Business requirements.
- Participate in Performance testing, analyse the performance data, propose changes to optimise resource usage and performance, make recommendations and produce the Performance and Capacity Test Report.
- Build capacity models to correlate Business Volumes with Resource utilisation and Service levels. This will require an in depth understanding of the Business Drivers and Service Level Agreements.
These activities are done based on the team standard monitoring and reporting tools: Linux monitoring (PCP/Tivoli), VMware VROPs, Nutanix Prism, Perfmon, Splunk and SSRS. Whenever needed, you'll have to setup automated data collection and reporting (for the team and the customer).
- You have proven experience (senior level) with capacity management and performance engineering processes (including forecasting) for Linux Operating Systems running on Converged (VCE vBlock) and Hyper-Converged (Nutanix) infrastructure.
- You have experience in some of the following capacity and performance tools: Linux monitoring (PCP/Tivoli), VMware VROPs, PRISM Nutanix, SSRS, Splunk.
- You have SQL language and SQL Server Reporting Service (SSRS) knowledge (capabilities of creating SQL queries and build reporting) and have a good level of expertise in Excel.
- Although the main focus will be on Linux, you are able to apply the capacity and performance process to the other technologies handled by the Distributed Domain.
- You have good English communication skills (written and spoken) and you are able to create synthetic complex understandable capacity/performance reports adapted to the audience/customers: management, business or technical.
- You have a collaborative mind-set; you are a team player and willing to share knowledge with your colleagues.
- You have an analytical mind-set and an understanding of statistics; you are able to analyse and correlate data.
- You can work autonomously and have leadership skills; you are not afraid of taking ownership and/or initiatives. You are dynamic.
- You are Continuous Improvement oriented and ready to propose and drive improvement initiatives.
- You are ready to extend your knowledge and participate to other capacity management activities of the team.
As a plus:
- You have experience with Cloud capacity management.
- You have experience with REST API .
- You have experience in Business and Service Capacity management.
- You have experience in other capacity/performance engineering tools which could bring added value to the team.
- You have experience in Agile and DevOps organizations.