JioHotstar Android App — Road to 99.9% CFUR
Article Summary
JioHotstar pushed their Android app from 99.5% to 99.8% crash-free rate by hunting down invisible memory killers. Here's how they did it.
With millions of users and 40M+ concurrent viewers during live events, the JioHotstar team obsessed over their Crash Free User Rate (CFUR). They share their systematic approach to eliminating Out Of Memory crashes that were blocking their path to 99.9% CFUR.
Key Takeaways
- Shifted focus from top 10 to top 25 crashes, uncovering OOM pattern affecting 0.3% of users
- Used custom logs to trace user journeys, revealing offline mode content switching triggered crashes
- Memory profiling exposed Lottie cache (8MB), failed network retries, and ViewModel leaks
- Cleared third-party caches, halted offline API calls, refactored event handling to fix leaks
- All top 10 OOM crashes eliminated, pushing CFUR above 99.8% post-deployment
By profiling specific user journeys and aggressively managing third-party library memory, JioHotstar eliminated 0.3% of crashes that traditional top-10 prioritization missed.
About This Article
JioHotstar's Android app kept crashing when trying to allocate just 2064 bytes, even though 3MB of memory was free. The heap was hitting its 512MB limit during large live events that pulled in over 40 million concurrent users, and the problem was worse on devices with less RAM.
The team used Android Profiler to analyze memory usage on offline user journeys and found several culprits. Lottie was holding onto 8MB in cache, EmojiCompat was keeping 352KB in memory, and failed network retries in offline mode were continuously creating OkHttp objects like SegmentPool and CipherSuite.
They fixed the crashes by clearing the Lottie cache when activities were destroyed, turning off EmojiCompat with a feature flag, stopping API calls when offline, and rewriting how ViewModels handled events. All ten of the top OOM crashes went away and crash-free user rate climbed above 99.8%.