How Meetup Scales Notification Queue Consumers
Article Summary
Meetup sends 8-10 million notifications daily. Their queue kept backing up, sending messages late or not at all.
Elle Mundy, an SRE at Meetup, shares how her team debugged their AWS SQS autoscaling strategy through three iterations. What started as a simple metric swap turned into a calculus problem that exposed deeper architectural issues.
Key Takeaways
- First attempt: scaled on oldest message age, wasted money on stuck messages
- Second try: queue depth worked better but scaled too late during spikes
- Final solution: custom Lambda calculates load to capacity ratio every minute
- New metric revealed they needed 3x more consumer tasks than expected
- Exposed hidden bottleneck: ran out of lock keys for deduplication
Critical Insight
By creating a custom metric that divides messages sent by messages received, Meetup now scales proactively before queues back up instead of reacting after notifications are already delayed.