Garbage Collection Optimization: High Throughput and Low Latency Java Applications
Article Summary
LinkedIn Engineering cut their tail latency by 75% through systematic garbage collection tuning. Here's their playbook for high-performance Java apps.
Swapnil Ghike shares how LinkedIn's team optimized GC settings for their next-generation feed data platform serving thousands of requests per second. The article breaks down their methodical approach from baseline measurements to production-ready configuration.
Key Takeaways
- ParNew/CMS outperformed G1 collector for their workload, showing lower CPU and memory overhead
- Tuning CMSInitiatingOccupancyFraction to 92 and MaxTenuringThreshold to 2 reduced pause frequency
- Setting ParGCCardsPerStrideChunk to 32768 dropped young gen pauses from 80ms to 50ms
- Final config achieved 60ms p99.9 latency with CMS cycles running just once per hour
- AlwaysPreTouch flag and vm.swappiness=0 eliminated runtime page faulting penalties
LinkedIn achieved 40-60ms GC pauses every 3 seconds and 60ms p99.9 latency through data-driven tuning of ParNew/CMS settings.
About This Article
LinkedIn's feed platform had a garbage collection problem. Young generation pauses were hitting 80ms, and the old generation kept triggering unpredictably. The issue came down to long-lived cached objects piling up in a 32GB heap, with the CMS initiation threshold set at 70%.
Swapnil Ghike's team dug into verbose GC logs using Naarad and gclogviewer to spot patterns. They increased the heap to 40GB and tuned the card table scanning by setting ParGCCardsPerStrideChunk to 32768. This change improved how worker threads distributed their tasks.
The new settings cut young generation pauses down to 40-60ms, happening every three seconds instead of constantly. Old generation collections dropped to about once per hour. The platform could now handle thousands of requests per second without breaking.