Grab Aug 4, 2022

Performance Bottlenecks in Go Apps

Article Summary

Grab's engineering team discovered their Go apps were mysteriously throttling at 1.94 CPU cores but flying at 2 cores. The culprit? A sneaky interaction between Kubernetes VPA and GOMAXPROCS.

Grab's real-time data platform team (Coban) runs stream processing pipelines on Kubernetes with vertical pod autoscaling. While debugging consumer lag issues on their SinktoS3 pipeline, they uncovered a critical performance trap affecting Go applications.

Key Takeaways

Critical Insight

A 0.06 core difference (1.94 vs 2.0) caused catastrophic performance degradation because Go's GOMAXPROCS only uses integer CPU values.

The article includes detailed tables showing exactly when VPA recommendations will throttle your Go apps and when they'll thrive.

About This Article

Problem

Grab's SinktoS3 pipeline started lagging badly when VPA cut CPU allocation below 2 cores. The pod had enough resources to handle Kafka-to-S3 transfers, but performance still tanked.

Solution

Grab switched to VPA v0.13's integer CPU allocation feature, available in Kubernetes 1.25 and later. This ensures GOMAXPROCS gets whole-number CPU values instead of fractional amounts that round down and waste capacity.

Impact

After setting a minimum 2-core VPA limit, CPU utilization jumped to 95%. The pipeline processed more records than the day before and cleared the backlog.