Battery Instrumentation at Facebook
Article Summary
Facebook was flying blind on battery drain across a billion devices. User-facing battery percentage? Too noisy, too coarse, and deliberately inaccurate by design.
Three engineers from Facebook's New York team break down their production battery instrumentation system for iOS and Android. They explain why traditional profiling tools fail at scale and how they built a modeling approach that catches regressions before they ship.
Key Takeaways
- CPU energy scales as cube of frequency, not linearly
- Radio tail time from network requests varies dramatically by carrier and connection type
- CPU spin detector aggregates stack traces across millions of devices to pinpoint exact regression lines
- Power model uses device XML profiles (Android) and power monitor measurements (iOS)
- A/B test caught 10% battery increase before shipping to production
Facebook built a component-level battery monitoring system that models energy drain without physical noise, catching regressions in hours instead of relying on user complaints.
About This Article
At Facebook's scale of a billion devices, traditional debugging fell apart. Developers couldn't reproduce issues on their own machines, and with hundreds of engineers shipping code daily, finding which change caused a battery drain became impossible without measuring what actually happened in production.
Facebook built instrumentation that tracked hardware usage at a granular level. They measured CPU at the thread level, network traffic by call site, and camera and location usage by product team. This let them connect hardware drain back to the specific code responsible.
The CPU spin detector collected stack traces from millions of devices and found the exact line that caused a regression within hours. Thread-level CPU measurement cut the investigation space dramatically, from every line of code in the system down to specific functions or thread pools.