Klarna Itamar B Nov 15, 2022

6 Lessons Learned from Optimizing the Performance of a Node.js Service

Article Summary

Klarna's A/B testing platform needed single-digit millisecond latency at 99.9%. Their Node.js service was spiking to seconds under load.

The team built a performance testing pipeline to catch issues before production. What they discovered through load testing revealed hidden bottlenecks that standard monitoring completely missed.

Key Takeaways

Critical Insight

Six optimization lessons transformed a Node.js service from unpredictable multi-second spikes to consistent sub-millisecond performance under sustained load.

The team's approach to DNS caching respected TTLs without indefinite caching, solving a problem that could have broken the entire service during redeployments.

About This Article

Problem

Itamar B's team found that the StatsD client was resolving the hostname for every outgoing message. This created tens of thousands of queued UV_GETADDRINFO requests that overwhelmed the event loop, even though CPU and memory usage stayed low.

Solution

They fixed it by adding proper DNS caching outside the client. Using monkey patching on Node.js's DNS module, they made it respect TTL values. This avoided the StatsD client's indefinite caching, which couldn't handle load balancer redeployment.

Impact

The DNS caching fix cut the number of active requests in the queue significantly. It removed the bottleneck that was causing response times to spike for several seconds during sustained load testing at Klarna's A/B testing platform.