OkCredit Anjal Saneen Oct 18, 2022

How We Reduced Our ANR by Three Times

Article Summary

OkCredit slashed their ANR rate by 67% and cold startup time by 70%. Here's how they debugged one of Android's most frustrating problems.

The OkCredit Android team dove deep into Android's source code to understand ANR triggers at the system level. They discovered the surprising connection between cold startup performance and background ANRs, then systematically eliminated bottlenecks across broadcast receivers, services, and the main thread.

Key Takeaways

Critical Insight

OkCredit achieved 0.03% ANR rate by treating cold startup optimization as the key to preventing background service timeouts.

The article includes a deep dive into Android's native InputDispatcher code that reveals why ANRs don't always appear when you'd expect them.

About This Article

Problem

OkCredit's ANRs were hard to debug in production. The Play Console didn't provide full stacktraces, trace dumps came in late, grouping was inconsistent, and they couldn't see CPU or memory pressure data when ANRs happened.

Solution

The team studied Android 12's source code to learn how ANRs get triggered across InputDispatching, Broadcast, Service, and ContentProvider timeouts. They then used ANR-WatchDog library to capture Java method traces instead of relying on native traces.

Impact

They found that ContentProvider ANRs don't show up in production and that background app wakeups during FCM and WorkManager execution were 2.3x slower than foreground execution. This led them to cut the ANR rate from 0.47% to 0.03% while keeping cold startup time at 0.66%.