Android Network Observability: Shedding Light on Performance Blind Spots
Article Summary
Shaurya Jaiswal from Slice reveals how his team cut app launch API calls from 28 to 12, improving startup time by 40%. The secret? A custom observability system that exposes what Firebase Performance Monitoring can't see.
Most Android network monitoring tools show you error rates and latency, but they miss the critical details: DNS lookup times, SSL handshake delays, carrier-specific issues, and which APIs are hammering your backend at launch. Slice Engineering built an in-house solution using OkHttp's EventListener to capture every millisecond of the network request lifecycle.
Key Takeaways
- Custom EventListener tracks DNS, SSL, connection, and transmission times per request
- Request tagging pattern attaches metrics object throughout entire lifecycle
- Heatmap visualization identified 28 launch APIs, reduced to 12 over 3 releases
- Carrier-specific DNS optimization cut lookup time from 230ms to 60ms (74% faster)
- Sample under 5% of users to get insights without analytics cost explosion
By building granular network observability into OkHttp and Retrofit, Slice reduced app startup time by 40% and identified carrier-specific performance issues that standard tools completely missed.
About This Article
Firebase Performance Monitoring doesn't give you enough detail about what happens during network requests. You can't see the individual stages like DNS lookup, SSL handshake, or request transmission, so it's hard to figure out where slowdowns actually happen across different carriers and network types.
Slice Engineering built NetworkMetricsEventListener using OkHttp's EventListener API to capture timing data at each stage of a request. They added request tagging to track metrics through the entire lifecycle and pull data from interceptors and error handlers.
The system found that certain carriers had DNS issues causing lookups to take 3-4x longer than normal. After implementing DNS pre-fetching, median DNS lookup time fell from 230ms to 60ms. This cut total request time by 22% and improved user engagement.