Principal Site Reliability Engineer – Remote

Número de la requisición: 2301142
Categoría de la vacante: Technology
Localização da vaga: Basking Ridge, NJ
(Remote considered)

Man standing and writing on a white board while presenting to coworkers in a meeting room.

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.

We are seeking a Principal Site Reliability Engineer (SRE) to lead the design and implementation of resilient, observable, and high-performing systems across our organization. This role is ideal for a strategic thinker and hands-on technologist who thrives in complex environments and is passionate about reliability, automation, and innovation-especially at the intersection of SRE and AI.

You’ll enjoy the flexibility to work remotely * from anywhere within the U.S. as you take on some tough challenges. 

Primary Responsibilities:

  • Observability & Monitoring
    • Lead the implementation and standardization of OpenTelemetry across services to enhance observability and traceability
    • Define and enforce SLIs, SLOs, and error budgets in collaboration with engineering teams
  • Resiliency Engineering
    • Design and execute resiliency tests, disaster recovery (DR) exercises, and chaos engineering game days to proactively identify and mitigate system weaknesses
    • Develop automated failure injection and recovery validation tools
  • CI/CD & Performance Engineering
    • Enhance CI/CD pipelines with automated performance and load testing to ensure reliability and scalability before production deployment
    • Collaborate with DevOps and QA to integrate performance benchmarks into release gates
  • Cloud Architecture & Reliability
    • Drive cloud adoption strategies with a focus on resiliency patterns, multi-region failover, and cost-effective scaling
    • Partner with cloud architects to design fault-tolerant infrastructure and services
  • AI & Innovation in SRE
    • Explore and implement AI-driven solutions for anomaly detection, incident prediction, and intelligent alerting
    • Innovate with AI agents to automate routine SRE tasks and improve incident response efficiency
  • Leadership & Mentorship
    • Serve as a thought leader and mentor for SRE best practices across the organization
    • Lead cross-functional initiatives to improve system reliability, developer productivity, and customer experience

You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.

Required Qualifications:

  • 10+ years of experience in software engineering, DevOps, or SRE roles, with at least 3 years in a principal or lead capacity
  • 5+ years of experience with CI/CD tooling (e.g., Jenkins, GitHub Actions, ArgoCD) 
  • 5+ years of experience with container orchestration in cloud platforms (Azure or OCI preferred)
  • 3+ years of deep expertise in observability and monitoring tools (e.g., OpenTelemetry, Prometheus, Grafana, Datadog)
  • 3+ years of experience with chaos engineering, DR planning, and performance testing

Preferred Qualifications:

  • Experience with service mesh technologies (e.g., Istio, Linkerd).
  • Hands-on experience with infrastructure as code (Terraform, Pulumi) and  automation tools such as Ansible, Helm
  • Familiarity with AI/ML concepts and experience applying them in operational contexts 
  • Proven excellent communication and leadership skills

*All employees working remotely will be required to adhere to UnitedHealth Group’s Telecommuter Policy 

Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. In addition to your salary, we offer benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with us, you’ll find a far-reaching choice of benefits and incentives. The salary for this role will range from $132,200 to $226,600 annually based on full-time employment. We comply with all minimum wage laws as applicable. 

Application Deadline: This will be posted for a minimum of 2 business days or until a sufficient candidate pool has been collected. Job posting may come down early due to volume of applicants.

At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone–of every race, gender, sexuality, age, location and income–deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes — an enterprise priority reflected in our mission.

UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations.

UnitedHealth Group is a drug – free workplace. Candidates are required to pass a drug test before beginning employment.

#OptumTechPJ

Información adicional sobre la vacante

Número de la requisición 2301142

Segmento de negocio Optum

Nivel del cargo Director

Disponibilidad para viajar No

País US

Estado de horas extras Exempt

Vacante de teletrabajo Yes