Glance Ricky Mondal Nov 22, 2023

Distributed Job Scheduler: Zero to 20k Concurrent Jobs

Article Summary

Glance built a distributed job scheduler that went from zero to handling 20,000+ concurrent jobs. Here's how they did it with Redis and smart architecture.

Ricky Mondal, Senior Engineer at Glance, shares the technical journey of building a fault-tolerant job scheduler to fetch and process real-time content from multiple publishing partners. The system needed to handle millions of tasks across distributed nodes while maintaining high availability.

Key Takeaways

Critical Insight

The team scaled from basic queue processing to 20k+ concurrent jobs by combining Redis primitives, dedicated worker queues, and robust fault tolerance mechanisms.

The article reveals specific Redis commands and Lua script patterns that made atomic operations possible at scale.

About This Article

Problem

Glance had a problem where multiple workers in their distributed system could pick up the same job at the same time, leading to duplicate processing and inconsistent data.

Solution

Ricky Mondal's team added a locking mechanism with expiry times that workers would grab when taking on a job. They also used Lua scripts to handle state changes atomically, which prevented race conditions.

Impact

With the locks and Lua scripts in place, duplicate job execution stopped happening and the system stayed consistent. This let them scale up to 20,000+ concurrent jobs without data corruption or conflicts.