Senior Director - Critical Incident Management
Posted on Nov 14, 2018 by Capital One
At Capital One, we're building a leading information-based technology company. Still founder-led by Chairman and Chief Executive Officer Richard Fairbank, Capital One is on a mission to help our customers succeed by bringing ingenuity, simplicity, and humanity to banking. We measure our efforts by the success our customers enjoy and the advocacy they exhibit. We are succeeding because they are succeeding.
Guided by our shared values, we thrive in an environment where collaboration and openness are valued. We believe that innovation is powered by perspective and that teamwork and respect for each other lead to superior results. We elevate each other and obsess about doing the right thing. Our associates serve with humility and a deep respect for their responsibility in helping our customers achieve their goals and realize their dreams. Together, we are on a quest to change banking for good.
Senior Director - Critical Incident Management
We are actively seeking a highly creative and intellectually curious Senior Director with deep technical expertise in Systems and Applications to join our team! This is an opportunity to build/lead a high-performance team responsible for strategy/roadmap development; design/build/maintain a complex, large scale Technology Operations Center environment; hone the performance of the environment; and foster the use of expert level troubleshooting skills by the team. Our Application and System environments play a major role in protecting our company, so ensuring optimal performance of the environment is critical. You will collaborate and innovate with smart and passionate people within Capital One to push the envelope (when necessary) to deliver results that have a direct impact on the company's bottom line, and will manage a team of dynamic and talented professionals who want to learn from your experience and skills.
Working alongside the Lines of Business CIOs, this role is responsible for defining and enforcing the Capital One Problem & Incident standards to be applied across the whole of the Capital One organization.
Ensure that processes related to Critical Incident Management is done in accordance with industry best practices.
Ensure that the processes are documented and that these processes are managed to effectively deliver the required performance and availability
Work with other Capital One Technology leaders and technical staff to develop solid approaches to joined up working across those teams
Drive best practices for efficient incident management
Deliver on enterprise Operational Excellence goals on MTTD, MTTR, etc.
Manage an effective problem management process to identify and drive longer solutions to avoid a repeat of critical incidents
Develop and promote automated recovery procedures to minimize downtime
Responsible for managing effective stakeholder and regulator communications during critical incidents
Provide thought leadership for developing innovative operational tools which will help promote higher system availability
Actively participate in incident calls and bridges to ensure quick service restoration and effective root cause analysis, remediation and communication.
Build close working relationships with SRE teams across various LOBs to influence a concerted effort to improvement system availability
Liaise with other IT Service managers in a highly technical and sensitive environment to influence proposed architectures and technology investments
Work closely with suppliers, and 3rd parties to ensure IT industry best practice is followed to deliver solutions efficiently and effectively
Working with design team resources to ensure that specific solution designs meet the operational requirements and best practices
Contribute to the business cases for technology investments
Provide advice and prepare strategic reports and briefings for heads of service; associate directors and stakeholders with respect to network issues or developments
Ensure value for money is delivered from the allocated budgets
Manage operational teams to ensure coverage 24x7
Manage Software Engineering team to build and manage operational tools which help in documenting incidents, promoting automation, etc.
Develop a formal "minimum spec" for staff training and ensure that all team staff members achieve this level to maintain a high talent bar.
Work with other team managers to develop mutual understanding of each area and develop methods of cooperation and support to achieve our stated corporate objectives
Responsible for staff management matters, which will include responsibility for supporting appraisals, development of staff, recruitment and where necessary processes such as grievance and disciplinary matters
Responsible for an individual's development on the job and management of their job performance. Work in conjunction with line managers and other job managers to assess and manage confidential information about an individual's performance and capability development
Responsible for ensuring that all staff maintain their obligations to mandatory (corporate) training. Maintain a live report on all team skills and training obligations and completion
The post holder will need to ensure that they are kept informed by their team staff on operational matters and develop a team communications process in order that staff can operate independently. The process will inform staff when to escalate and to whom
Maintain a proactive approach to after-incident reporting so that information of this type can easily and readily be fed into service management reports and/or problem management reviews
Keep proactive data management and analysis to ensure anomaly detection and proactive prediction of potential issues and hot spots
Discuss complex technical issues with technicians, engineers, and vendors
Assemble and clearly present technical information in a business-like manner to non-technical personnel
Lead and facilitate communications with people in immediate department, other departments, and external third parties
Inform and influence senior leaders and peers
This position is an operational role. As such, periodic late night work and participation in a management on-call rotation will be required. At times the late night work may come with minimal advance notice.
Bachelor's degree or military experience
At least 7 years of experience leading technical teams
At least 7 years of experience in software engineering practices
At least 7 years of people management experience
Master's degree in Information Technology or Information Security
10+ years of professional experience in a technical leadership role supporting an enterprise critical incident management function
Background in Agile software development and Scaled Agile frameworks
Strong understanding of Cloud hosted solutions, primarily AWS
Capital One will consider sponsoring a new qualified applicant for employment authorization for this position.