Profile-Guided Optimisation
Article Summary
Grab's engineering team just unlocked 30% memory savings and 38% storage reduction with a simple compiler flag. Here's how Profile-Guided Optimization (PGO) delivered massive gains with minimal code changes.
The Grab AI Platform team experimented with PGO, a technique where production CPU profiles guide compiler optimizations in Go 1.20+. They tested it across multiple services including their open-source TalariaDB time-series database to measure real-world impact.
Key Takeaways
- TalariaDB saw 10% CPU reduction and 30% memory savings after enabling PGO
- Ingested events per CPU jumped from 1.1M to 1.7M events
- Storage for queued events dropped by 38% (7GB reduction)
- Profile duration matters: 59 seconds failed, 6 minutes succeeded
- ROI varies wildly: 30% gains for some services, only 5% for others
PGO delivered 10-30% resource savings on high-throughput services with just a compiler flag change, but the gains depend heavily on service characteristics and profiling approach.
About This Article
Grab's engineering team wanted to squeeze more performance out of their Go applications beyond what the compiler already provided. They knew profile-guided optimization could help, but needed a way to figure out which services would actually benefit before investing engineering time in the effort.
The team set up PGO by collecting 6-minute CPU profiles from production services using pprof, then rebuilt their Docker images with the `-PGO=./talaria.PGO` compiler flag. This approach let them apply profiles to both main binaries and Go plugins in services like TalariaDB.
TalariaDB saw real improvements, but when they tried the same approach on Catwalk, they only got a 5% CPU gain. It turned out PGO works differently depending on how each service is built. For their monorepo setup, they also needed to add more support to their build pipeline to make PGO practical across the board.