How Discord Handles Two and Half Million Concurrent Voice Users Using WebRTC
Article Summary
Discord handles 2.6 million concurrent voice users with just 850 servers across 13 regions. Here's how they built a custom WebRTC architecture that scales.
Discord's engineering team shares their technical approach to real-time voice and video communication. Written by Staff Engineer Jozsef Vass, this deep dive reveals how they modified WebRTC to serve millions of gamers while keeping IP addresses private and bandwidth costs low.
Key Takeaways
- Custom C++ media engine bypasses WebRTC's SDP/ICE/DTLS for 90% smaller handshakes (1KB vs 10KB)
- Salsa20 encryption replaces DTLS/SRTP for faster performance and silence suppression
- Client-server architecture prevents IP leaks and enables moderation at scale
- Homegrown SFU bridges native and browser clients while dropping packets from muted users
- Elixir-based signaling servers with etcd service discovery enable zero-downtime failovers
Discord serves 220 Gbps of voice traffic by customizing WebRTC's lower-level APIs and building specialized infrastructure that prioritizes gaming use cases over standard implementations.
About This Article
Discord needed to handle multiparty voice communication across web, desktop, and mobile without exposing user IP addresses. They also wanted to avoid the high costs of peer-to-peer networking for large group channels that could have up to 1000 concurrent speakers.
Jozsef Vass's team built a custom C++ media engine on top of WebRTC's native library. They replaced the standard SDP/ICE/DTLS handshakes with minimal 1000-byte exchanges and added Salsa20 encryption with silence suppression to cut bandwidth and CPU usage when no one is talking.
Discord now handles 2.6 million concurrent voice users across 850 voice servers in 13 regions. The system maintains 220 Gbps of egress traffic and 120 Mpps packet throughput. Their custom SFU bridges native and browser applications by translating between different encryption and transport protocols.