Building A Performant iOS Profiler
Article Summary
Indragie Karunaratne from Sentry reveals how they built a production-ready iOS profiler that runs on millions of real user devices. Most profilers only work locally, but this one collects real-world performance data without killing your app's performance.
Sentry's engineering team needed to profile iOS apps in production across different devices and conditions, not just during local development. They built a sampling profiler that could run in-process on user devices while maintaining minimal overhead, navigating iOS sandboxing limitations and async-signal-safety challenges along the way.
Key Takeaways
- Achieved under 5% average CPU overhead on mid-tier iOS devices
- Sampling at 100Hz captures functions running longer than 10ms
- Signal handlers failed: unreliable delivery and couldn't profile GCD worker threads
- Mach thread suspend APIs proved more reliable than POSIX signals
- Millions of profiles ingested from production devices over 5 months
Sentry built a sampling profiler that runs in production iOS apps with under 5% CPU overhead by using Mach thread suspend APIs and frame pointer stack walking.
About This Article
Sentry needed to collect call stacks from suspended threads without causing deadlocks. When a thread holding a lock gets suspended while running code that tries to acquire that same lock, the entire process deadlocks.
The team kept the code running during thread suspension minimal, executing only the essential stack capture operations. They called only functions that never take locks and moved unsafe work like thread metadata collection to happen before suspension or after the thread resumed.
This approach made stack walking reliable across all thread types. Sentry could ingest millions of profiles from production devices while staying under the 5% CPU overhead requirement needed for real-world deployment.