Skyscanner Mar 14, 2025

Skyscanner’s journey to effective observability

Article Summary

Skyscanner was drowning in observability chaos: multiple vendors, fragmented tools, and engineers losing confidence in their ability to debug production issues.

During COVID-19, Skyscanner's platform team seized the opportunity to completely overhaul their observability stack. They migrated 300+ microservices from a patchwork of specialized vendors and internal systems to a unified approach built on open standards.

Key Takeaways

Critical Insight

Skyscanner transformed observability from a technical burden into a sociotechnical tool that connects 110M travelers to 1,200+ partners with data-driven confidence.

The secret weapon that made adoption stick wasn't technical at all (it involved gamifying system debugging with the OpenTelemetry Demo).

About This Article

Problem

Skyscanner's monitoring setup was scattered across different vendors for RUM, tracing, and synthetics, plus internal systems running OpenTSDB, Prometheus, and ELK. Engineers couldn't easily connect signals across services, which made the whole system harder to work with.

Solution

Skyscanner moved to OpenTelemetry APIs and semantic conventions for traces, metrics, logs, and baggage. They set up a centralized Collector Gateway that sends data via the standard OTLP protocol to New Relic as their single backend.

Impact

With smart sampling strategies on distributed traces, Skyscanner now stores just 4% of their 2M spans and 80K traces per second while keeping full debugging capability. This cut their telemetry costs by over 90%.