Fixing Performance Regressions Before They Happen
Article Summary
Angus Croll from Netflix reveals how his team slashed false performance alerts by 90% while catching more real regressions. The secret? They stopped using static thresholds entirely.
Netflix's TVUI team runs performance tests on 1,700+ device types serving 222 million members. Their old approach with static memory thresholds created constant false alarms and missed subtle regressions. They needed a smarter way to detect performance issues before code shipped to production.
Key Takeaways
- Reduced alerts from 100+ to 10 per month with 90% fewer false positives
- Anomaly detection flags values 4 standard deviations above recent 40-run mean
- Changepoint detection uses e-divisive algorithm to spot distribution pattern shifts
- Running tests 3x and taking minimum value filters device noise effectively
- Dynamic thresholds adapt automatically, eliminating manual threshold adjustments
By replacing static thresholds with statistical anomaly and changepoint detection, Netflix now catches genuine performance regressions earlier with 90% fewer false alerts.
About This Article
Netflix's TVUI team had to manually set thresholds for each test variation. Only 30% of variations got custom thresholds because the work was time-consuming and it was hard to pick the right values.
Netflix built two detection methods. The first compares results against the mean and standard deviation from the last 40 runs. The second uses e-divisive changepoint detection on the 100 most recent test runs to spot shifts in the data distribution.
Running tests 3 times and taking the minimum value per build cut down the total number of validation runs while keeping detection accuracy the same. This let the team test more without creating more alerts.