Microsoft Saumye Srivastava Jul 20, 2021

Spotting Latency Regressions Ahead of Time at Teams Mobile

Article Summary

Microsoft Teams Mobile merges 50+ commits daily from 350+ developers. How do they catch performance regressions before users feel the pain?

The Teams Mobile team built "Bouncer," an automated performance testing system that detects latency regressions as small as 15% on a commit-by-commit basis. This deep dive from Microsoft engineer Saumye Srivastava reveals the technical architecture behind catching slowdowns before they ship.

Key Takeaways

Critical Insight

Teams Mobile now catches performance regressions within hours of code merge instead of during production rollout, letting engineers focus on speed improvements rather than firefighting.

The secret to their stability involved Termux, Magisk root access, and some clever tricks with Android's thermal engine that most teams never touch.

About This Article

Problem

Teams Mobile's Android codebase gets 50+ commits daily from 350+ developers. When latency regressions show up in production dashboards, it's hard to figure out which code change caused them. This often delays releases and fixes.

Solution

Saumye Srivastava's team built Bouncer to run instrumentation tests before and after each mainline commit. They use Microsoft Hydralab on physical Pixel 4A devices, comparing median latency across 9 iterations. The tool flags regressions when they exceed 10-15%.

Impact

False positives dropped significantly. Performance tests went from consistently failing to mostly passing. When a red flag does appear, engineers can trust it's a real regression, so they don't have to manually adjust thresholds after every alert.