Sr Lead SRE
Company: JPMorganChase
Location: Plano
Posted on: April 1, 2026
|
|
|
Job Description:
Description We are seeking a Delivery SRE leader who will ensure
security applications are delivered with strong SDLC discipline and
measurable reliability. This role partners closely with Product
Owners and engineering leadership to challenge assumptions, sharpen
the Definition of Done, and bake SRE requirements into design and
build phases. The leader will govern operational readiness, quality
gates, and resilience practices so that every release meets agreed
SLOs and is production ready. Key Responsibilities: Define and
enforce quality gates across requirements, design, secure coding,
testing, release, and post-production monitoring, translate
business objectives into clear, testable requirements that include
reliability, availability, performance, security, and
observability. Establish and manage SLOs/SLIs and error budgets;
ensure they are integrated into product roadmaps and delivery
plans, challenge Product Owners and teams to meet a rigorous,
objective Definition of Done before release. Sample DoD checklist:
SLOs defined and monitored; alerts tuned; runbooks and escalation
paths in place; automated tests (unit, integration, security)
passing; performance and capacity validated; resilience and
failover tested; rollback verified; vulnerability findings
remediated; compliance controls and audit artifacts complete;
documentation and support readiness confirmed. Lead operational
readiness reviews and triage risks; ensure timely remediation and
prevention of recurrence through root-cause analysis and
auto-remediation. Maintain logging, alerting, and monitoring
platforms; ensure dashboards provide health and performance
visibility. Govern CI/CD pipeline controls for security,
reliability, and change management; promote automation to eliminate
toil. Lead and participate in critical incident response (including
outside business hours when needed); drive post-incident reviews
and resilience improvements. Monitor delivery health and
operational KPIs; lead continuous improvement across teams and
products Oversee capacity planning and resilience management for
large-scale, distributed systems, Partner with engineering on
public cloud best practices (AWS or equivalent) for compute,
storage, networking, messaging, automation (CloudFormation,
Terraform), and data services. Build a culture of collaboration,
reliability, and continuous improvement; coach teams to adopt
DevOps and SRE principles. Partner with regional engineering
leaders to drive operational best practices and consistent
execution. Provide concise, outcome-focused updates to management
and stakeholders; influence decisions across Product, Engineering,
SRE, and Security. Required Qualifications, Capabilities, and
Skills Formal training or certification with 5 years supporting
critical security-focused applications in large-scale environments
and managing and mentoring teams. Experience with
monitoring/logging tools (e.g., Splunk, AppDynamics) and dashboard
technologies; Splunk Administrator certification desired. Strong
grasp of SDLC, secure development, DevOps/CI/CD tooling; capable of
implementing top-tier continuous improvement with root-cause
analysis and auto-remediation. Effective under pressure;
accountable, with excellent stakeholder management and
communication skills. This position may require HSA system access.
Enhanced screening (criminal and credit background checks, and/or
other screening) is required prior to employment and annually
thereafter. Global team collaboration with flexibility to engage
during critical incidents outside standard business hours Preferred
Qualifications, Capabilities, and Skills Experience implementing
and managing SLOs/SLIs, error budgets, and operational readiness
reviews for distributed systems, including leading post-incident
analysis and resilience improvements. Deep expertise in public
cloud platforms (AWS or equivalent), infrastructure automation
tools (CloudFormation, Terraform), and capacity planning for
large-scale environments, with a track record of driving DevOps and
SRE adoption across teams. CTC
Keywords: JPMorganChase, The Colony , Sr Lead SRE, IT / Software / Systems , Plano, Texas