Netflix App Testing At Scale
Article Summary
Netflix runs 1,000+ device tests on every PR for an app with 1M lines of code across 400+ modules. Here's how they evolved their testing strategy over 14 years.
Ken Yee, Senior Engineer at Netflix, shares how the streaming giant tests their Android app at massive scale. The team of ~50 engineers supports devices from Android 7.0 to the latest OS, handling everything from Android Go devices to premium foldables.
Key Takeaways
- Disbanded dedicated SDET team; feature developers now own all testing layers
- Custom device lab with cellular tower for network testing and frame drop verification
- PageObject pattern abstracts tests from implementation during XML to Compose migration
- Automated tooling identifies PR authors who break tests and promotes stable tests
- Testing pyramid looks like hourglass due to heavy physical device usage
Netflix balances speed and coverage by running narrow grid tests on PRs (under 30 min) and full grid tests post-merge (under 120 min), accepting that device tests will always have some flakiness at scale.
About This Article
Netflix's 1M-line codebase relied on legacy RxJava streams and asynchronous code that made unit tests flaky. Tests would leave global state behind or race across threads, which blocked CI pipelines.
Ken Yee's team brought in Test Dispatchers for Kotlin Coroutines and Test Schedulers for RxJava to control time in a predictable way. They also used dependency injection to stop state from leaking between test methods.
Unit tests now run 10x faster when using plain JVM tests instead of Hilt or Robolectric. Netflix can run about 1,000 device tests in 30-minute PR windows and get consistent results every time.