Recovering from Crashes with Safe Mode
Article Summary
Lyft engineers faced a nightmare scenario: feature flags causing infinite crash loops on app launch, requiring emergency hotfixes and losing revenue. They built Safe Mode to break the cycle.
Michael Rebello from Lyft Engineering shares how they created an automated recovery system that detects crash loops caused by bad configuration changes and prevents users from getting stuck in an unusable app state.
Key Takeaways
- Safe Mode tracks consumed feature flags and detects crashes before launch completes
- Resets problematic flags to safe defaults, letting users continue using the app
- Dashboards and PagerDuty alarms alert engineers within minutes of incidents
- Rolled out gradually with shadow mode to avoid false positives
- Already prevented multiple hotfixes and saved real revenue since launch
Critical Insight
Lyft's Safe Mode automatically recovers from configuration-induced crash loops, avoiding hotfixes while keeping affected users productive during incident resolution.