Observability Engineer - London, N1C 4AG

Universal Music Group
London

Observability Engineer - London, N1C 4AG, United Kingdom

Job Summary:

We are UMG, the Universal Music Group. We are the worlds leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world.

We are seeking a talented and proactive Observability Engineer to join our dynamic global team. You will be passionate about data driven decisions, automation, and committed to continuous improvement. In this pivotal role, you will be instrumental in building, maintaining, and enhancing the comprehensive observability solutions that ensure the reliability, performance, and scalability of our critical IT systems and applications across the globe. You'll work at the intersection of technology and the vibrant world of music, providing deep insights that drive operational excellence and enable rapid response to any challenge.

Job Functions:

  • Design & Implementation: Working with the team to ensure our observability stack provides world-class services through the design, implementation, and continuous improvement of our environment, encompassing monitoring, logging, tracing, and alerting systems across diverse cloud-native, on-premise, and hybrid environments.

  • Innovate with Tooling: Evaluate, select, and implement leading observability tools and platforms (e.g., Dynatrace, AWS Cloudwatch, Prometheus, Grafana, ELK Stack, Splunk, OpenTelemetry), and automate our observability pipelines and alerting mechanisms working with development teams to ensure.

  • Champion Best Practices: Help define, enforce, and advocate for observability standards and best practices across all engineering and operations teams, ensuring consistent instrumentation and visibility throughout our global organization.

  • Develop Robust Monitoring: Create and maintain powerful monitoring solutions, intuitive dashboards, and automated alerts to provide real-time insights into system health, performance, and availability.

  • Drive Incident Resolution: Work with the Operations teams to utilizing telemetry data for swift diagnosis and resolution, and conducting post-incident reviews to enhance our systems.

  • Collaborate & Empower: Partner across other teams in the wider UMG global Technology team to drive positive change, influencing and driving best practice within the Observability space for their ultimate benefit. Working closely with development, SRE, and infrastructure teams to embed observability throughout the entire technology lifecycle, empowering teams with the insights they need.

  • Optimize Performance: Analyze telemetry data to identify and resolve performance bottlenecks, optimize resource allocation, and fine-tune configurations for peak efficiency.

  • Support Compliance & Security: Contribute to our compliance and security efforts through effective log management and integration with SIEM systems.

  • Other Essential Components:

  • o Work independently to deliver results while acting as part of a global team to design and implement robust solutions across a hybrid, complex ecosystem of systems.

  • o Undertake system analysis based on Observability notifications to troubleshoot complex issues to improve value from UMGs technology investments, working with a mindset of continual improvement.

  • o Take an active part in documenting and defining processes and best practice.

  • Make UMG the place to be: Actively taking part in making the Observability team a positive and respectful place to be. UMG is a place where everyone can bring themselves fully to work and thrive, as a key part of the Technology department you are an important part of building our culture.

Job Requirements:

  • Experience: 3+ years of hands-on experience in an Observability, Site Reliability Engineering (SRE), or DevOps role, with a dedicated focus on observability.

  • Technical Expertise:

    Strong understanding and practical experience with monitoring, logging, and tracing systems.

    Proficiency with industry-standard observability tools (e.g., Dynatrace, AWS Cloudwatch, Prometheus, Grafana, ELK Stack, Splunk, Logic Monitor).

    Strong technical knowledge with major cloud platforms (AWS, Azure, or GCP).

    Solid programming and scripting skills (e.g., Python, Go, Shell, JavaScript) for automation.

    Understanding of distributed systems, microservices architectures, and cloud-native environments. Experience with Docker/Kubernetes and DevOps principles

    Familiarity with CI/CD pipelines and automation tools (e.g., Ansible, Terraform).

  • Problem-Solving : Exceptional analytical and problem-solving abilities, with a proactive approach to tackling complex technical challenges.

  • Communication: Excellent communication, collaboration, and interpersonal skills, with the ability to clearly articulate technical concepts to diverse audiences.

  • Domain Knowledge: Prior experience supporting critical business applications within a large-scale, global enterprise environment.

  • Security Awareness: Awareness of security best practices and the ability to integrate security monitoring into observability processes.

  • Education: Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.

  • Self-motivated with high degree of initiative and excellent follow-up skills, along with strong analytical and problem-solving skills.

  • Travel may be required but is not part of the regular work schedule.

Desired Qualifications:

  • Advanced Concepts: Experience with Chaos Engineering, Canary/BlueGreen deployment strategies, capacity planning, data analysis, networking.

  • Software Engineering: Experience in designing and automating Observability workloads with a foundation in software engineering, database administration and system administration. Scripting and programming for Observability as well as troubleshooting across Python, Go, Java and associated languages.

  • Certifications: Relevant industry certifications (e.g., AWS Certified DevOps Engineer, Kubernetes certifications).

Posted 2026-04-03

Recommended Jobs

Finance Officer - Elite Secondary School - Enfield - ASAP...

Marchant Recruitment
Enfield, Greater London

A successful and forward-thinking secondary school in Enfield is seeking a skilled Finance Officer to join its business support team starting as soon as possible. This role is essential to the smo…

View Details
Posted 2026-03-21

Sales Account Manager - Programmatic

Central London

An opportunity to join one of the leading UK publishers as Programmatic (#Sales) Account Manager. Our client is a leading digital media business that’s looking for a commercially driven Programm…

View Details
Posted 2025-11-09

Geography Teacher - High-Achieving Mixed School - Croydon...

Marchant Recruitment
Croydon, Greater London

Geography Teacher - High-Achieving Mixed School - Croydon (September Start) Are you a hard-working, motivated and driven teacher who wants good support and career progression? • Geography Teacher …

View Details
Posted 2026-04-16

Primary Supply Teacher (June-July)

Ethos Education
Ealing, Greater London

Supply Primary Teacher – Ealing Full time until July 2026 Are you an adaptable and confident Supply Primary Teacher looking for a supportive primary school? A welcoming and inclusive primary sch…

View Details
Posted 2026-04-21

Behaviour Support Mentor

Academics
Lewisham, Greater London

Learning Support Assistant - SEMH School Are you passionate about supporting young people with social, emotional and mental health needs? A specialist SEMH school in Lewisham is seeking a dedicated…

View Details
Posted 2026-05-18

FID - Front Office Counterparty Risk

Morgan Stanley
London

Fixed Income Counterparty Risk – Associate (London) The Firm is seeking a Counterparty Risk Manager for Fixed Income. This is a front office (first line of defense) risk manager role that involve…

View Details
Posted 2026-03-15

Finance Officer - Independent Mixed School, Croydon

Marchant Recruitment
Croydon, Greater London

Finance Officer – Independent Mixed School, Croydon (ASAP Start) School Information This independent mixed secondary school in Croydon is known for its strong governance, financial stability, an…

View Details
Posted 2026-01-13

IT Infrastructure Specialist

Colt Technology Services
London

What you will do At Colt Data Centre Services this is our business. Colt Data Centre Services offers flexible and affordable colocation and IT infrastructure solutions. Our connectivity and coloca…

View Details
Posted 2026-05-15

Maths Teacher - Whole-School Numeracy - Hackney

Marchant Recruitment
London

Maths Teacher – Lead Numeracy Across the Curriculum and Drive Rigorous Attainment Strategies at KS4 – Hackney A high-performing and academically strategic secondary academy in Hackney is seeking…

View Details
Posted 2025-12-13

Full-Time Dentist - Edgware - Indeed

Locum Meds
Edgware, Greater London

JOB OVERVIEW Location:  Edgware Job Type: Full-Time Date Posted: November 2025 Hours: 37.5 hours per week Salary: £70,000 – £100,000 per annum (DOE) About the Job: A reputabl…

View Details
Posted 2025-11-18