Distributed Job Scheduler: Zero to 20k Concurrent Jobs Summary & Key Takeaways

Article Summary

Glance built a distributed job scheduler that went from zero to handling 20,000+ concurrent jobs. Here's how they did it with Redis and smart architecture.

Ricky Mondal, Senior Engineer at Glance, shares the technical journey of building a fault-tolerant job scheduler to fetch and process real-time content from multiple publishing partners. The system needed to handle millions of tasks across distributed nodes while maintaining high availability.

Key Takeaways

Used Redis sorted sets, pub/sub, and Lua scripts for atomic state transitions
Broke workflows into dedicated queues: XML parsing, content creation, asset upload, moderation
Implemented polling workers checking every second plus real-time notifications via Redis
Built retry mechanisms with exponential backoff and distributed locking to prevent duplicate execution
Achieved parallel and sequential processing across multiple worker instances for scalability

Critical Insight

The team scaled from basic queue processing to 20k+ concurrent jobs by combining Redis primitives, dedicated worker queues, and robust fault tolerance mechanisms.

The article reveals specific Redis commands and Lua script patterns that made atomic operations possible at scale.

Distributed Job Scheduler: Zero to 20k Concurrent Jobs

Article Summary

Key Takeaways

Recent from Glance

Related Articles

Related Articles

GoTransit: Unifying Our Mobility Products with Public Transportation

Gojek ties their services to public transit, making trips seamless for everyone.

Gojek • Jan 5, 2023

Scaling Mobile Device Management

Uber shares how they grew their Mobile Device Management setup to keep a diverse, expanding mobile fleet in check.

Uber • Nov 3, 2022

Introducing Skynet: Infrastructure as Code for Gojek

How can hundreds of microservices be supported with an equally strong cloud infrastructure provisioning strategy? Introducing Skynet.

Gojek • Aug 1, 2022

Managing Deep Scope Hierarchies in Large Codebases

Uber takes on tangled code layers, finding ways to keep their big codebase fast and manageable.

Uber • May 15, 2022

Distributed Job Scheduler: Zero to 20k Concurrent Jobs

Article Summary

Key Takeaways

Recent from Glance

How We Improved Our 1Weather Android App Cold Startup Time by Over 70%

Scaling Glance Game Centre to Support 100 Million Daily Active Users

Glance TV Optimization Part 1: Shrinking APK Size by 65%

Enhancing App Performance: A Practical Approach at Glance

Related Articles

GoTransit: Unifying Our Mobility Products with Public Transportation

Scaling Mobile Device Management

Introducing Skynet: Infrastructure as Code for Gojek

Managing Deep Scope Hierarchies in Large Codebases