Service Reliability Eng - London, N1C 4AG

Universal Music Group
London

Service Reliability Eng - London, N1C 4AG, United Kingdom

Job Summary:

We are UMG, the Universal Music Group. We are the world’s leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world.

As a key member of our Global Technical Operations team, you will be responsible for the reliability, scalability, and performance of the critical systems that power a global enterprise. By blending a software engineering mindset with operational expertise, you will engineer solutions that improve system reliability, automate complex processes, and reduce manual toil. You will be an essential partner to our development, infrastructure, and security teams, driving a culture of resilience and continuous improvement across the organization.

As a Site Reliability Engineer, you won't just be supporting systems; you'll be ensuring the services that connect artists and fans around the globe are always on.

Job Functions:


Key Responsibilities:


System Reliability & Performance:

  • Design, build, and maintain the availability, scalability, and performance of critical services.

  • Develop and maintain robust monitoring, alerting, and observability systems (e.g., using AWS CloudWatch, Dynatrace) to ensure rapid issue detection and resolution.

  • Monitor infrastructure capacity and performance, providing analysis and suggestions for service delivery improvement.

Automation & Efficiency:

  • Drive the automation of repetitive operational tasks, including infrastructure provisioning, deployments, and scaling.

  • Create and maintain scripts and custom code to support and enhance our operational toolset.

  • Support and optimize CI/CD pipelines to improve deployment speed and reliability.

Incident Management & Collaboration:

  • Participate in an on-call rotation to troubleshoot and mitigate production incidents.

  • Lead post-incident reviews and root cause analyses to implement lasting solutions.

  • Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design and development lifecycle.

Job Requirements:

Required Experience & Skills:

  • A strong background in systems administration (Linux/Windows) in a large-scale environment.

  • Proficiency in at least one programming language (e.g., Python, Go, Java).

  • Hands-on experience with a major cloud platform (AWS, GCP, or Azure), with a high preference for AWS.

  • Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, Ansible).

  • Experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, Dynatrace).

  • Proven analytical and problem-solving abilities with experience in a high-pressure environment.

  • Excellent communication skills and the ability to foster a collaborative team environment.

Preferred Experience & Skills:

  • Bachelor's degree in an IT-related field.

  • Experience managing large-scale, distributed systems for a global organization.

  • Familiarity with IT governance standards like ITIL.

  • Direct experience with ServiceNow for IT service management.

  • Knowledge of chaos engineering, resilience testing, and advanced capacity planning.

Posted 2025-12-24

Recommended Jobs

HR Operations Administrator

Bruin Financial
London

44654 Human Resources Operations Administrator Are you an experienced HR professional with a strong background in Financial Services ? We’re partnering with a leading organisation to recruit …

View Details
Posted 2026-03-28

Adult Social Care Lawyer

London Borough of Lambeth
Lambeth, Greater London

Job Category : Legal Location : Lambeth Town Hall, London Borough of Lambeth Hours Per Week : 35.00 Start Date: Immediate Start    Start Time : 09:00 End Time : 17:00 Salary: £421.58 PE…

View Details
Posted 2025-08-29

Live-in Au-pair in W14, Job ID J20E86

Little Ones UK Ltd
West Kensington, Greater London

A lovely family in West Kensington is looking for a bright, energetic Au Pair to help look after their three children. The ideal candidate will be someone who can assist with school pick-ups and drop…

View Details
Posted 2026-05-03

Front of House Receptionist - Brent | Independent School

Marchant Recruitment
Brent, Greater London

Borough: Brent (Inner London) Status: Leading Independent Secondary School Contract Type: Permanent, Full-Time Salary Range: £26,000 – £30,000 (Based on experience) The Opportunity T…

View Details
Posted 2026-01-22

History Teacher - Camden Independent School (Part Time)

Marchant Recruitment
London

School Status & Location Sector: Highly Selective Independent School (Historical Inquiry Hub). Borough: Camden. Start Date: Permanent, part-time (0.8 FTE) role commencing January 2026. …

View Details
Posted 2025-11-25

Oracle Supply Chain Analyst

Talenterprize
London

Engagement Type:  Support Description : Talenterprize are appointed by a global electronics manufacturing company to secure an Oracle supply chain management analyst. The supply chain manageme…

View Details
Posted 2025-05-22

Claims Advocate - FAJS

Ardonagh Specialty
London

Job Title: Claims Advocate Location: London/Hybrid (Typically 2/3 days in the office) Type: Full time – Permanent (If you are a job share partnership, work reduced hours, or any other way of …

View Details
Posted 2026-05-01

Electrical or Mechanical Engineer

Invictus Recruitment
West London

Job Title: Electrical orMechanical Maintenance Engineer Location: Green Park, London Salary: £45,000 per annum Working Hours: Monday to Friday 8:00am to 5:00pm An excellent opportunity h…

View Details
Posted 2026-04-12

Credit Controller

London

Job Description  The successful candidate will be responsible for:  Assessing tenant rent accounts to identify cases that are in arrears.  Collecting / Chasing rents from tenants …

View Details
Posted 2025-09-10

Senior Yield Analyst - Programmatic

Central London

Senior Yield Analyst – Programmatic. Be the 1st hire in the Rev Ops team! This career opportunity offers great career progression & development prospects as my client is continuing to grow! An…

View Details
Posted 2026-04-15