Productionizing Envoy Mobile at Lyft
Article Summary
JP Simard from Lyft reveals how they replaced URLSession and OkHttp across all their mobile apps with a single networking library—and the results weren't what anyone expected.
Lyft spent years migrating their iOS and Android apps to Envoy Mobile, an open-source networking library based on Envoy Proxy. After months of rigorous A/B testing and gradual rollouts starting in December 2021, they now handle billions of daily requests through this unified solution.
Key Takeaways
- OOM crashes dropped 69.3% and app hangs fell 47.9% on iOS Driver app
- ANRs reduced by 30% on Android bike and scooter apps
- Success rates vary up to 10% across carriers with same library
- Android always uses IPv6 dual stack sockets for all connections
- Real-time stats caught 3 production incidents missed by existing monitoring
Lyft successfully replaced platform-native networking libraries with Envoy Mobile across all apps, matching or exceeding previous performance while gaining unprecedented observability and cross-platform consistency.
About This Article
Lyft's mobile apps didn't have good visibility into network performance. The team had to use hooks into platform libraries with very low sampling rates, which meant they couldn't detect incidents quickly and missed production issues affecting billions of daily requests.
JP Simard's team used Envoy Mobile's stats system, which comes from Envoy Proxy. It sends comprehensive metrics directly to gRPC or statsd endpoints, giving them near real-time visibility across mobile operations.
Envoy Mobile's stats caught at least 3 production incidents in recent months that Lyft's existing observability solutions missed. Those older solutions relied on costly analytics events, so the faster detection enabled quicker incident response.