Netflix • Jose Alcérreca • Apr 17, 2025

Netflix App Testing At Scale

Article Summary

Netflix runs 1,000+ device tests on every PR for an app with 1M lines of code across 400+ modules. Here's how they evolved their testing strategy over 14 years.

Ken Yee, Senior Engineer at Netflix, shares how the streaming giant tests their Android app at massive scale. The team of ~50 engineers supports devices from Android 7.0 to the latest OS, handling everything from Android Go devices to premium foldables.

Key Takeaways

Disbanded dedicated SDET team; feature developers now own all testing layers
Custom device lab with cellular tower for network testing and frame drop verification
PageObject pattern abstracts tests from implementation during XML to Compose migration
Automated tooling identifies PR authors who break tests and promotes stable tests
Testing pyramid looks like hourglass due to heavy physical device usage

Critical Insight

Netflix balances speed and coverage by running narrow grid tests on PRs (under 30 min) and full grid tests post-merge (under 120 min), accepting that device tests will always have some flakiness at scale.

Their approach to handling test flakiness includes some clever automation you won't find in standard CI/CD pipelines.

About This Article

Problem

Netflix's 1M-line codebase relied on legacy RxJava streams and asynchronous code that made unit tests flaky. Tests would leave global state behind or race across threads, which blocked CI pipelines.

Solution

Ken Yee's team brought in Test Dispatchers for Kotlin Coroutines and Test Schedulers for RxJava to control time in a predictable way. They also used dependency injection to stop state from leaking between test methods.

Impact

Unit tests now run 10x faster when using plain JVM tests instead of Hilt or Robolectric. Netflix can run about 1,000 device tests in 30-minute PR windows and get consistent results every time.