Distributed Job Scheduler: Zero to 20k Concurrent Jobs
Article Summary
Glance built a distributed job scheduler that went from zero to handling 20,000+ concurrent jobs. Here's how they did it with Redis and smart architecture.
Ricky Mondal, Senior Engineer at Glance, shares the technical journey of building a fault-tolerant job scheduler to fetch and process real-time content from multiple publishing partners. The system needed to handle millions of tasks across distributed nodes while maintaining high availability.
Key Takeaways
- Used Redis sorted sets, pub/sub, and Lua scripts for atomic state transitions
- Broke workflows into dedicated queues: XML parsing, content creation, asset upload, moderation
- Implemented polling workers checking every second plus real-time notifications via Redis
- Built retry mechanisms with exponential backoff and distributed locking to prevent duplicate execution
- Achieved parallel and sequential processing across multiple worker instances for scalability
Critical Insight
The team scaled from basic queue processing to 20k+ concurrent jobs by combining Redis primitives, dedicated worker queues, and robust fault tolerance mechanisms.