Shopify Aug 14, 2020

Building Reliable Mobile Applications

Article Summary

Shopify's POS app processes billions in sales yearly. Unlike typical apps, downtime means merchants can't make sales at all.

Shopify's mobile team shares how they built reliability into a mission-critical retail app where traditional mobile constraints (slow app store reviews, delayed user updates) make quick fixes nearly impossible. This deep dive covers their evolution from ad-hoc releases to a sophisticated system handling massive scale.

Key Takeaways

Critical Insight

By combining automated testing, weekly release trains, dedicated on-call rotation, and staged rollouts, Shopify scaled their POS team while improving reliability for merchants processing billions annually.

Their ChatOps bot automatically documents incidents in real-time during outages, and new engineers actually volunteer for on-call duty as an onboarding accelerator.

About This Article

Problem

As Shopify's POS team expanded from a small group to multiple teams, manual code reviews slowed things down. It became hard to find reviewers who knew the specific code areas well enough.

Solution

Shopify split the codebase into components and assigned each one to a team. They used Code Owners to automatically route reviews to the right people. They also built tophat, a deployment tool that runs with a single command and automated the manual testing process.

Impact

The tophat tool cut out repetitive steps like saving work, pulling changes, building locally, and deploying to devices. Code review cycles got faster as a result.