Reclaiming Terabytes: Optimizing Android image caching with TLRU
Article Summary
Grab's engineering team just reclaimed terabytes of storage across 100+ million Android devices without breaking a sweat. Their secret? A clever twist on a decades-old caching algorithm.
Grab's superapp faced a growing problem: image caching was eating up user storage, with 90% of users hitting the 100MB cache limit while others kept stale promotional images for months. The team evolved their standard LRU cache into TLRU (Time-Aware LRU), adding time-based eviction while preserving all the benefits of traditional size-based management.
Key Takeaways
- P95 users saw 50MB storage reduction after TLRU rollout
- Cache hit ratio stayed within 3pp target with zero infrastructure cost increase
- Modified Glide's DiskLruCache journal to track last access timestamps
- Backward compatible migration preserved all existing cache data seamlessly
- Three core attributes: TTL expiration, minimum threshold, and maximum size
By adding time awareness to LRU caching, Grab reclaimed terabytes globally while maintaining user experience and keeping server costs flat.
About This Article
Grab's image cache didn't remove old files based on time, so stale promotional images and outdated content just sat on disk indefinitely. Even when these images hadn't been used in months, they still took up space. The 90th percentile of users kept hitting the 100MB capacity limit.
Grab modified Glide's DiskLruCache to track when entries were last accessed by adding timestamps to READ and CLEAN operations in the journal format. This let the system automatically delete entries older than 20 days while keeping at least 20MB of cache available.
The TLRU implementation freed up terabytes of storage across 100+ million devices. Users at the P95 level saw their cache shrink by about 50MB. Cache hit ratios stayed within acceptable ranges and infrastructure costs didn't increase.