LinkedIn Ramanathan Muthukaruppan May 4, 2017

Mobile Crash Reporting at LinkedIn

Article Summary

LinkedIn built their own crash reporting system instead of using third-party tools. Here's why that decision paid off.

Staff Engineer Ramanathan Muthukaruppan shares how LinkedIn's mobile team designed and scaled an internal crash reporting platform. The system processes crashes across iOS and Android, integrates with their experimentation platform, and helps teams decide whether to ramp new releases.

Key Takeaways

Critical Insight

LinkedIn's custom crash reporting system now provides fast, secure insights that integrate directly with their A/B testing platform to accelerate release decisions.

The team's creative solution for testing the system while onboarding SREs is worth stealing for your own infrastructure projects.

About This Article

Problem

A Samza data processing job at LinkedIn ran out of memory after consuming 16K crash events. The default configuration allowed for 50K events in memory, but the crash payloads were too large because they included stack traces and detailed information.

Solution

The team lowered the in-memory event buffer from 50K to 10K events. This prevented memory exhaustion while keeping crash data processing fast enough to handle the workload.

Impact

The configuration change stopped out-of-memory failures in the Samza job. Crash data now processes continuously and reliably across LinkedIn's mobile platforms without interruptions.