Netflix Dec 13, 2018

Performance comparison of video coding standards: an adaptive streaming perspective

Article Summary

Joel Sole and the Netflix encoding team reveal why codec comparison studies often contradict each other—and how they're fixing the problem. Their findings challenge conventional wisdom about H.264, H.265, and VP9 performance.

Netflix's video algorithms team published a comprehensive analysis of video codec performance from an adaptive streaming perspective. Unlike traditional codec comparisons that use short clips at fixed resolutions, they tested full-length Netflix titles across 10 different resolutions using their Dynamic Optimizer framework and VMAF quality metric.

Key Takeaways

Critical Insight

Netflix's adaptive streaming methodology reveals dramatically different codec performance rankings than traditional fixed-resolution tests, with results varying by up to 13 percentage points depending on testing approach.

The article reveals exactly which encoder settings and content characteristics caused the biggest performance gaps between testing methodologies.

About This Article

Problem

Video codec comparison studies often reach different conclusions because they use different testing methods, encoder settings, and metrics. One study might show codec A is 15% better, while another claims codec B is 10% better.

Solution

Netflix's team switched to VMAF measured at display resolution instead of encoding resolution. They also started using harmonic mean temporal averaging (HVMAF) to give more weight to outlier frames than a simple average would. This approach better matches what viewers actually see.

Impact

When Netflix tested HVMAF across their full catalog, VP9 encoders showed 12% better bitrate savings in the high-quality range compared to traditional PSNR-based testing. The choice of metric directly changes which codec looks best.