Mobile Crash Reporting at LinkedIn
Article Summary
LinkedIn built their own crash reporting system instead of using third-party tools. Here's why that decision paid off.
Staff Engineer Ramanathan Muthukaruppan shares how LinkedIn's mobile team designed and scaled an internal crash reporting platform. The system processes crashes across iOS and Android, integrates with their experimentation platform, and helps teams decide whether to ramp new releases.
Key Takeaways
- Built custom crash reporting to capture LiX experiment data without external security risks
- Reduced query time from 120 seconds to 5 seconds through index optimization
- Normalized crash data across iOS and Android into single avro schema format
- Paginated both metadata and details to handle scale after initial caching failed
- Used Kafka, Elasticsearch, and Pinot for real-time crash processing and analytics
LinkedIn's custom crash reporting system now provides fast, secure insights that integrate directly with their A/B testing platform to accelerate release decisions.
About This Article
A Samza data processing job at LinkedIn ran out of memory after consuming 16K crash events. The default configuration allowed for 50K events in memory, but the crash payloads were too large because they included stack traces and detailed information.
The team lowered the in-memory event buffer from 50K to 10K events. This prevented memory exhaustion while keeping crash data processing fast enough to handle the workload.
The configuration change stopped out-of-memory failures in the Samza job. Crash data now processes continuously and reliably across LinkedIn's mobile platforms without interruptions.