Netflix Angus Croll Jan 24, 2022

Fixing Performance Regressions Before They Happen

Article Summary

Angus Croll from Netflix reveals how his team slashed false performance alerts by 90% while catching more real regressions. The secret? They stopped using static thresholds entirely.

Netflix's TVUI team runs performance tests on 1,700+ device types serving 222 million members. Their old approach with static memory thresholds created constant false alarms and missed subtle regressions. They needed a smarter way to detect performance issues before code shipped to production.

Key Takeaways

Critical Insight

By replacing static thresholds with statistical anomaly and changepoint detection, Netflix now catches genuine performance regressions earlier with 90% fewer false alerts.

The team is now decoupling their detection logic to release as an open-source library that works for any sequential quantitative data, not just performance metrics.

About This Article

Problem

Netflix's TVUI team had to manually set thresholds for each test variation. Only 30% of variations got custom thresholds because the work was time-consuming and it was hard to pick the right values.

Solution

Netflix built two detection methods. The first compares results against the mean and standard deviation from the last 40 runs. The second uses e-divisive changepoint detection on the 100 most recent test runs to spot shifts in the data distribution.

Impact

Running tests 3 times and taking the minimum value per build cut down the total number of validation runs while keeping detection accuracy the same. This let the team test more without creating more alerts.