Lead Site Reliability Engineer
\Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.
Software engineering is the application of engineering to the design, development, implementation, testing and maintenance of software in a systematic method. The roles in this function will cover all primary development activity across all technology functions that ensure we deliver code with high quality for our applications, products and services and to understand customer needs and to develop product roadmaps.
These roles include, but are not limited to analysis, design, coding, engineering, testing, debugging, standards, methods, tools analysis, documentation, research and development, maintenance, new development, operations and delivery. With every role in the company, each position has a requirement for building quality into every output. This also includes evaluating new tools, new techniques, strategies; Automation of common tasks; build of common utilities to drive organizational efficiency with a passion around technology and solutions and influence of thought and leadership on future capabilities and opportunities to apply technology in new and innovative ways. Generally work is self-directed and not prescribed.
Primary Responsibilities:
- Manage Azure Cloud Infrastructure and building resilient and self-scaling systems
- Implement solutions to continuously improve operational reliability of the cloud infrastructure
- You will be responsible for the availability, performance, monitoring and Infra Provisioning for the Platform which comprises of Cloud infrastructure and On Prem technologies
- Closely partner with Engineering and Technical Support teams to drive resolution of critical issues
- Publish and implement operational standards for all Cloud infrastructure and services
- Work towards reducing Operations toil by automating repeatable tasks
- Focus would be to mentor and develop other members in the SRE subject area
- Application deployments using CI/CD tools, code repository, code scanning, artifact repo, compliance scanning, packaging, deployment, and configuration management
- Build Operations Dashboards leveraging tools like Dynatrace, Splunk or Grafana
- Handling incident, change and problem management
- Help with provisioning of Infrastructure using Terraform
- Enhancing Platform Observability Dashboards
- Closely partnering with Development Teams and help address Platform related roadblocks
- Conduct post-mortem after a production issues.
- React to production deficiencies by continuously implementing automation, self-healing, and real-time monitoring to production systems
- Work with Docker, Kubernetes, Azure cloud, Prometheus, Grafana, Java, Python and many other modern SaaS technologies
- Participate in projects involving people of many different disciplines: Engineering, Cloud, Networking, CI/CD, Project management, Monitoring, alerting etc.
- Stay informed of new technologies and Innovate
- Works with less structured, more complex issues
- Serves as a resource to others
Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but - not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
Required Qualifications:
- Bachelor’s or advanced Degree in a related technical field
- 3+ years IT Experience
- 3+ years DevOps Experience
- 2+ years experience on Infrastructure as Code (Terraform/Ansible/Chef/Puppet)
- 2+ years experience on Docker and Container Orchestration (Kubernetes/OpenShift)
- 2+ years experience on DevOps and CI/CD tools such as Git, Jenkins
- 2+ years experience on Kafka Support
- 2+ years experience on Monitoring tools and technologies (Splunk, Dynatrace, new relic)
Preferred Qualifications:
- Infrastructure Engineering Experience
- Cloud Experience (Azure/AWS/GCP)
- Automation experience
- Good Knowledge on SRE principles
- Hands on scripting with one or more: YAML, JSON, PowerShell, BASH or Python
At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone – of every race, gender, sexuality, age, location and income – deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes – an enterprise priority reflected in our mission.
Información adicional sobre la vacante
Número de la requisición 2292094
Segmento de negocio Optum
Disponibilidad para viajar No
País IN
Estado de horas extras Exempt
Vacante de teletrabajo No