How We Reduced Our ANR by Three Times
Article Summary
OkCredit slashed their ANR rate by 67% and cold startup time by 70%. Here's how they debugged one of Android's most frustrating problems.
The OkCredit Android team dove deep into Android's source code to understand ANR triggers at the system level. They discovered the surprising connection between cold startup performance and background ANRs, then systematically eliminated bottlenecks across broadcast receivers, services, and the main thread.
Key Takeaways
- ANR watchdog caught slow methods better than system traces from nativePollOnce
- Background app wakeups are 2.3x slower, causing ANRs during FCM and WorkManager execution
- SharedPreferences apply() causes ANR because it blocks during onPause and onReceive
- Input dispatch ANR only triggers on next user interaction, not during the blocking operation
- Optimizing App.onCreate reduced both cold startup and background service ANRs
OkCredit achieved 0.03% ANR rate by treating cold startup optimization as the key to preventing background service timeouts.
About This Article
OkCredit's ANRs were hard to debug in production. The Play Console didn't provide full stacktraces, trace dumps came in late, grouping was inconsistent, and they couldn't see CPU or memory pressure data when ANRs happened.
The team studied Android 12's source code to learn how ANRs get triggered across InputDispatching, Broadcast, Service, and ContentProvider timeouts. They then used ANR-WatchDog library to capture Java method traces instead of relying on native traces.
They found that ContentProvider ANRs don't show up in production and that background app wakeups during FCM and WorkManager execution were 2.3x slower than foreground execution. This led them to cut the ANR rate from 0.47% to 0.03% while keeping cold startup time at 0.66%.