DragonCrawl: Generative AI for High-Quality Mobile Testing
Article Summary
Uber's mobile testing was broken. Engineers spent 30-40% of their time maintaining test scripts that broke with every UI change. So they built an AI that tests apps like a human would.
Uber's Developer Platform team created DragonCrawl, a system using large language models to execute mobile tests across 3,000+ simultaneous experiments and 50+ languages. Instead of brittle scripts, it adapts to UI changes independently by understanding screen context and test goals through natural language.
Key Takeaways
- Blocked 10 high-priority bugs in 3 months while saving thousands of developer hours
- 99%+ stability with zero maintenance across 85 cities and multiple device types
- Uses compact 110M parameter MPNet model, 1000x smaller than GPT-3.5
- Handles adversarial cases: restarted app when payments failed, retried going online for 5 minutes
- Precision@1 of 97.23% choosing correct UI actions from screen context
Critical Insight
DragonCrawl made testing Uber's core trip flow across 85 cities and 50+ languages possible without manual maintenance, something previously humanly impossible at their scale.