Shopify Oct 14, 2011

Most Memory Leaks Are Good

Article Summary

Shopify Engineering shares a war story: their app servers were hitting 16GB+ memory usage, crashing with ENOMEM errors, and requiring constant restarts. The culprit? Not where they expected.

This 2011 Shopify Engineering post walks through a production memory leak crisis that stumped the team for weeks. Despite using Ruby profiling tools like Memprof and analyzing 2+ million live objects, the leak remained elusive until old-fashioned debugging revealed the truth.

Key Takeaways

Critical Insight

The memory leak turned out to be in a C extension that Ruby profiling tools couldn't detect, solved only by reproducing production conditions locally and trusting hunches.

The team's initial approach with VM dumps and MongoDB analysis of 2 million objects led them down the wrong path entirely.

About This Article

Problem

Shopify's production app servers kept running out of memory, growing past the 16GB physical limit. The team tried rebooting servers periodically to keep things running, but couldn't figure out what was causing it.

Solution

They reproduced the problem locally by simulating a bad memcached connection using the code loop { Rails.cache.write(rand(10**10).to_s, rand(10**10).to_s) }. Once they found the leak in the C extension, they switched to a different memcached client library.

Impact

Memory usage stabilized after the switch, and production servers stopped needing constant restarts. The Errno::ENOMEM crashes that had been affecting the infrastructure went away.

Recent from Shopify

Related Articles