Datadog May 26, 2023

Network Latency Issues in Mobile Apps

Article Summary

Anatole Beuzon and Bowen Chen from Datadog turned what seemed like a simple deployment alert into a months-long debugging odyssey. What they uncovered reveals how deceptively complex network latency issues can be.

Datadog's usage estimation service started triggering high-latency alerts on every deployment, regardless of code changes. The team traced the problem through four distinct bottlenecks spanning their entire network stack, from application layer down to the Linux kernel.

Key Takeaways

Critical Insight

What appeared to be a simple network issue required fixing an Envoy CPU bottleneck, patching a Linux kernel bug, migrating EC2 instances, and implementing graceful pod shutdown hooks.

The team shares specific AWS ENA metrics and Kubernetes configurations that could have caught these issues earlier and saved dozens of debugging hours.

Recent from Datadog

Related Articles