A/B Testing using Google’s Staged Rollouts
Article Summary
Twitch discovered that Google Play's staged rollouts create biased user groups, breaking standard A/B testing assumptions. Here's how they adapted.
The Twitch Science Team explored using Google Play's staged rollout feature for mobile experiments instead of traditional in-app A/B logic. They found users who updated early were significantly more active than non-updaters, violating the randomization principle that makes A/B tests valid.
Key Takeaways
- Early updaters showed higher engagement before the experiment even started
- Adoption took 2 weeks to reach 10%, only 8.3% updated within 1 week
- Time-series analysis found +4% session increase with CausalImpact R package
- Difference-in-differences bootstrapping confirmed +5.4% session lift with 95% confidence
- Update history and user tenure predicted group assignment, not geography or device
Critical Insight
Staged rollouts enable testing major UI changes without complex branching logic, but require time-series and difference-in-differences methods instead of standard A/B analysis.