How We Automated iOS Unit Test Generation with AI
Article Summary
Kush Agrawal from Duolingo just shared how they generated 85,000 lines of iOS test code with almost zero manual effort. The bottleneck in software development has officially flipped: it's no longer writing code, it's verifying it works.
Duolingo ships iOS releases weekly with tens of thousands of new lines of code monthly. To keep pace, they built an autonomous AI pipeline using Claude Code and Temporal workflows that generates unit tests, manages PRs through their entire lifecycle, and auto-heals CI failures. Over 17 weeks, the system merged 250 PRs completely autonomously.
Key Takeaways
- 250 test PRs merged in 17 weeks, adding 4,460 test functions across 233 classes
- MVVM test coverage tripled from 9% to 30%, repositories up 352%
- 76% of PRs passed CI on first attempt with auto-healing for the rest
- System scores files by type, coverage gap, size, and module priority
- PR lifecycle workflow auto-assigns reviewers, fixes CI, and closes stale PRs
Duolingo's autonomous test generation pipeline tripled iOS test coverage by generating 85K lines of code across 250 PRs with minimal human intervention, proving AI can handle the full PR lifecycle from creation to merge.
About This Article
Duolingo's iOS codebase was growing by tens of thousands of lines each month. When they used LLM-generated code, it exposed several architectural problems. UserClient was too tightly coupled, which made mock injection impossible. The team also mixed up SwiftTesting and XCTest frameworks. On top of that, Swift 6 Sendable violations prevented testing across multiple repository classes.
Kush Agrawal's team used Claude Code to validate code locally and found failure patterns across two batches with merge rates of 47-48%. They then fixed the root causes one by one. UserClient was migrated to use protocol-based injection with linter enforcement. They rewrote prompts to make the framework choice clearer. They also added @Sendable guidance to help with Swift 6 compatibility.
These fixes improved CI pass rates across the entire codebase, not just the pipeline. Repository test coverage jumped 352%. ViewModels saw a 203% increase. DataSources grew 192%. Overall MVVM coverage went from 9% to 30% across 233 unique classes.