Meta Brennan Vincent Oct 30, 2017

Battery Instrumentation at Facebook

Article Summary

Facebook was flying blind on battery drain across a billion devices. User-facing battery percentage? Too noisy, too coarse, and deliberately inaccurate by design.

Three engineers from Facebook's New York team break down their production battery instrumentation system for iOS and Android. They explain why traditional profiling tools fail at scale and how they built a modeling approach that catches regressions before they ship.

Key Takeaways

Critical Insight

Facebook built a component-level battery monitoring system that models energy drain without physical noise, catching regressions in hours instead of relying on user complaints.

They open-sourced BatteryMetrix, and there's a surprising culprit draining 30% CPU while doing absolutely nothing visible to users.

About This Article

Problem

At Facebook's scale of a billion devices, traditional debugging fell apart. Developers couldn't reproduce issues on their own machines, and with hundreds of engineers shipping code daily, finding which change caused a battery drain became impossible without measuring what actually happened in production.

Solution

Facebook built instrumentation that tracked hardware usage at a granular level. They measured CPU at the thread level, network traffic by call site, and camera and location usage by product team. This let them connect hardware drain back to the specific code responsible.

Impact

The CPU spin detector collected stack traces from millions of devices and found the exact line that caused a regression within hours. Thread-level CPU measurement cut the investigation space dramatically, from every line of code in the system down to specific functions or thread pools.