How We Improved Performance Score Accuracy
Article Summary
Edward Gou from Sentry reveals why their performance scores were lying to developers. A single slow pageload could tank your entire app's score, even when 99% of users had fast experiences.
Sentry's Performance Score condenses multiple Web Vitals (LCP, FCP, FID, TTFB, CLS) into a 0-100 rating based on real user data. But their original calculation method had a fatal flaw: it aggregated metrics first, then scored them. This meant outliers could completely misrepresent actual user experience.
Key Takeaways
- Old method: 2 fast pageloads plus 1 slow outlier scored 26/100
- New method: Same data now scores 68/100, reflecting actual user experience
- Scores calculated per pageload first, then averaged to prevent outlier skew
- Dynamic weighting adjusts when browsers don't support certain Web Vitals
- Most teams will see score increases after the fix deploys
Critical Insight
By scoring individual pageloads before averaging instead of aggregating metrics first, Sentry fixed how outliers were unfairly tanking performance scores that should have reflected mostly positive user experiences.