Seamlessly Swapping the API backend of the Netflix Android app
Article Summary
Netflix Android engineers migrated 170 API endpoints from a monolith to a new microservice without users noticing. Here's how they pulled it off over a year.
The Netflix Android team moved their Backend for Frontend (BFF) from a centralized Java monolith to a standalone Node.js microservice. This gave them full ownership of the request lifecycle, better observability, and faster local development while maintaining zero user impact.
Key Takeaways
- Built 3-tier testing: functional tests, replay testing with production traffic, and automated canaries
- Tracked latency metrics by UI screen to catch regressions before production rollout
- Gained distributed tracing with Zipkin to debug performance across microservices
- Switched from Java to JavaScript for endpoint code despite team unfamiliarity
- Traded some latency for observability when breaking monolith cache dependencies
Netflix migrated all Android API endpoints to a new microservice with comprehensive testing infrastructure that prevented user-facing issues during the year-long transition.
About This Article
When Netflix Android moved from a monolithic API service to microservices, they lost the local caching of video metadata. This meant new network calls for data that used to be cached, which slowed down a small portion of requests even after optimization work.
The team set up distributed tracing with Zipkin to map out which microservice calls were running in sequence during each request. This gave them visibility into the full request chain and made it easy to pinpoint which team owned each performance problem.
Canary deployments with Kayenta metrics caught a 4-5% increase in homepage latency before it hit all users. The team had time to investigate and fix the issues before rolling out to production.