Introducing Pulse: Envoy Mobile's stats library
Article Summary
Jingwei Hao from Lyft reveals how real-time stats APIs caught a production crash spike at 9:55am, enabling engineers to ship a hotfix before most users even noticed the problem.
Lyft built Pulse, a stats library for Envoy Mobile that brings server-side observability practices (counters, gauges, histograms) to mobile apps. Unlike traditional crash reporting (minutes delay) or analytics events (longer resolution), Pulse reports time-series data in real-time, integrating with PagerDuty and dashboards just like backend services.
Key Takeaways
- Real-time app crash metrics now trigger PagerDuty alerts for mobile on-call engineers
- Pulse supports Counter and Gauge stats, with Histogram support coming next
- Stats flow via gRPC to StatsD-based service, serialized as Prometheus MetricFamily
- Traditional mobile observability relies on delayed crash reports or low-resolution analytics
- Engineers can now monitor specific interactions like Request a Ride button taps
Pulse makes time-series metrics a mobile development necessity by enabling the same real-time observability that backend engineers take for granted.
About This Article
Mobile teams traditionally used crash reporting tools like Crashlytics, which had minutes-level latency, or analytics systems with even longer delays. This made real-time anomaly detection and rapid incident response nearly impossible.
Jingwei Hao's team at Lyft built Pulse with Counter and Gauge APIs that send time-series data to a gRPC service based on StatsD. The stats are serialized as Prometheus MetricFamily so they work with existing backend observability systems.
Mobile on-call engineers now get immediate PagerDuty alerts when metrics spike. This lets them identify problems and deploy hotfixes before users are widely affected. When the app crash metric spiked recently, the team caught and fixed it the same morning.