Better Android Testing at Airbnb (Part 6)
Article Summary
Airbnb's Android tests take 2 hours to run but finish in minutes. Here's the CI infrastructure that makes it possible.
In the final part of their testing series, Airbnb's Android team reveals how they orchestrate automated tests at scale. This covers test generation, Firebase integration, and the tooling that ties their entire testing framework together.
Key Takeaways
- Kotlin scripts auto-generate JUnit tests by parsing Fragment classes with AST
- Only tests affected modules, cutting Firebase costs and test time significantly
- Flank shards 2 hours of tests into parallel runs finishing in minutes
- Custom tooling auto-posts Firebase failures and Happo diffs to PRs
- Pixel 3 devices run tests 2x faster than original Pixel hardware
Airbnb built a fully automated CI pipeline that generates tests, detects changes, handles Firebase outages, and surfaces results directly in PRs without manual intervention.
About This Article
Airbnb's integration tests ran one at a time on Firebase Test Lab, which meant long waits for results. When tests failed across multiple shards, developers couldn't easily tell which ones had actually broken.
Eli Hart's team wrote CI tooling that reads Flank's JUnit reports, stores the artifacts in Buildkite, and comments on pull requests with links straight to the failed Firebase test matrices. Developers see the failures right away.
Instead of digging through CI logs, developers can click a PR comment to see Firebase failures immediately. This cuts down on confusion and support requests while keeping tests running across multiple shards without any slowdown.