Netflix Rohan Dhruva, Ed Ballot Sep 8, 2020

Seamlessly Swapping the API backend of the Netflix Android app

Article Summary

Netflix Android engineers migrated 170 API endpoints from a monolith to a new microservice without users noticing. Here's how they pulled it off over a year.

The Netflix Android team moved their Backend for Frontend (BFF) from a centralized Java monolith to a standalone Node.js microservice. This gave them full ownership of the request lifecycle, better observability, and faster local development while maintaining zero user impact.

Key Takeaways

Critical Insight

Netflix migrated all Android API endpoints to a new microservice with comprehensive testing infrastructure that prevented user-facing issues during the year-long transition.

The team discovered previously hidden performance gains while investigating what looked like regressions in their canary reports.

About This Article

Problem

When Netflix Android moved from a monolithic API service to microservices, they lost the local caching of video metadata. This meant new network calls for data that used to be cached, which slowed down a small portion of requests even after optimization work.

Solution

The team set up distributed tracing with Zipkin to map out which microservice calls were running in sequence during each request. This gave them visibility into the full request chain and made it easy to pinpoint which team owned each performance problem.

Impact

Canary deployments with Kayenta metrics caught a 4-5% increase in homepage latency before it hit all users. The team had time to investigate and fix the issues before rolling out to production.