How We Improved Performance Score Accuracy
Article Summary
Edward Gou from Sentry reveals why their performance scores were lying to developers. A single slow pageload could tank your entire app's score, even when 99% of users had fast experiences.
Sentry's Performance Score condenses multiple Web Vitals (LCP, FCP, FID, TTFB, CLS) into a 0-100 rating based on real user data. But their original calculation method had a fatal flaw: it aggregated metrics first, then scored them. This meant outliers could completely misrepresent actual user experience.
Key Takeaways
- Old method: 2 fast pageloads plus 1 slow outlier scored 26/100
- New method: Same data now scores 68/100, reflecting actual user experience
- Scores calculated per pageload first, then averaged to prevent outlier skew
- Dynamic weighting adjusts when browsers don't support certain Web Vitals
- Most teams will see score increases after the fix deploys
By scoring individual pageloads before averaging instead of aggregating metrics first, Sentry fixed how outliers were unfairly tanking performance scores that should have reflected mostly positive user experiences.
About This Article
Sentry's Performance Scores used static weighting across five Web Vitals (LCP 30%, FID 30%, CLS 15%, FCP 15%, TTFB 10%). This meant that metrics like LCP, which might only have one Chrome sample, got weighted the same as metrics with 100 Safari pageloads, even though Safari doesn't support LCP.
Sentry changed how it calculates Performance Scores. Instead of aggregating Web Vitals first and then scoring, it now calculates a score for each individual pageload using the Complementary Log-Normal CDF function. These scores are bounded between 0 and 100, then averaged together with dynamically adjusted weights that skip any missing Web Vitals for that specific pageload.
Because individual pageload scores cap at 100, outliers have less impact on the final average. This makes scores consistent and accurate whether you're looking at app-level data, drilling down to a specific page, or examining a single pageload.