Optimizing OpenTelemetry's Span Processor for High Throughput and Low Latency
Article Summary
DoorDash's OpenTelemetry adoption hit a wall: 72% CPU utilization versus 56% without tracing. That's a costly tax for observability.
DoorDash engineer Santosh Banda details how the team benchmarked six different implementations of OpenTelemetry's Batch Span Processor to eliminate performance overhead. Using Java Microbenchmark Harness, they systematically tested everything from ArrayBlockingQueue to LMAX Disruptor to find the optimal solution.
Key Takeaways
- Baseline OpenTelemetry caused 16% CPU overhead on production workloads
- MpscQueue with signal batching delivered highest throughput and lowest CPU cost
- Continuous polling by exporter thread created expensive context switches
- Signal batching reduced exporter notifications by waiting for full export batches
- Final optimization brought CPU utilization back to 56% baseline
Critical Insight
DoorDash contributed benchmarks and optimizations to OpenTelemetry that eliminated the CPU overhead, making distributed tracing essentially free in production.