How We Applied Client-Side Caching
Article Summary
DoorDash makes millions of ML predictions per second, but their Redis-based feature store was becoming a massive cost and scalability bottleneck. Could client-side caching be the answer?
The DoorDash ML platform team tackled their gigascale feature store performance problem by implementing pod-local caching in their Sibyl Prediction Service. They used network traffic simulation to validate the approach before rolling it out to production with rigorous safety checks.
Key Takeaways
- Achieved 70%+ cache hit rate, closely matching offline simulations
- Used tcpdump and pcap to capture live Redis traffic for simulation
- Chose Caffeine library over Guava and LinkedHashMap for thread-safe LRU caching
- Implemented dry-run rollout with parallel code paths to ensure correctness
- Cache saturates after just 15 minutes with 1M capacity
Critical Insight
Client-side caching delivered 70%+ hit rates in production, improving latency and reliability while reducing load on multi-TB Redis clusters.