Map Performance Comparison and Judy Data Structure Alternatives
Article Summary
Marko Kevac from Bumble's Platform team benchmarked 6 associative array implementations to replace their aging Judy data structure. The winner? Not what you'd expect when memory matters.
Bumble's C/C++ services handle hundreds of thousands of requests per second with hundreds of gigabytes in memory. Their team tested hash tables and trees from Google, C++ STL, and others against their longtime Judy implementation, measuring both speed and memory overhead with 10 million random lookups.
Key Takeaways
- Google's dense_hash_map was fastest but used excessive memory
- spp::sparse_hash_map delivered best speed-to-memory ratio with 1-bit overhead
- Standard std::hash outperformed xxhash and t1ha for integer keys
- jemalloc allocator boosted speed but increased memory consumption
- Judy trees enable sorted iteration; hash tables sacrifice this for speed
spp emerged as the top Judy replacement, balancing performance with reasonable memory overhead for high-throughput services.
About This Article
Badoo's Judy implementation hadn't been updated in years, so the Platform team needed to figure out if newer options could handle the same workload. They were processing hundreds of thousands of requests per second while keeping hundreds of gigabytes in memory.
Marko Kevac tested six hash table options: Judy, std::unordered_map, google::sparse_hash_map, google::dense_hash_map, and spp::sparse_hash_map. He used 32-bit integer keys and 64-bit pointer values, running 10 million random lookups on an Intel i7-6700K with GCC 6.3.0 and -O3 optimization.
spp::sparse_hash_map worked well as a Judy replacement. It matched Judy's random-get performance while using only 1 bit of memory overhead per entry. Badoo built a C++ wrapper around it and started planning production tests on their high-demand services.