DoorDash Apr 7, 2021

Optimizing OpenTelemetry's Span Processor for High Throughput and Low Latency

Article Summary

DoorDash's OpenTelemetry adoption hit a wall: 72% CPU utilization versus 56% without tracing. That's a costly tax for observability.

DoorDash engineer Santosh Banda details how the team benchmarked six different implementations of OpenTelemetry's Batch Span Processor to eliminate performance overhead. Using Java Microbenchmark Harness, they systematically tested everything from ArrayBlockingQueue to LMAX Disruptor to find the optimal solution.

Key Takeaways

Critical Insight

DoorDash contributed benchmarks and optimizations to OpenTelemetry that eliminated the CPU overhead, making distributed tracing essentially free in production.

The winning solution came from an unexpected library that combined ring buffers with a custom signaling strategy nobody had tried before.

Recent from DoorDash

Related Articles