Duolingo • Jun 24, 2026

How We Automated iOS Unit Test Generation with AI

Article Summary

Kush Agrawal from Duolingo just shared how they generated 85,000 lines of iOS test code with almost zero manual effort. The bottleneck in software development has officially flipped: it's no longer writing code, it's verifying it works.

Duolingo ships iOS releases weekly with tens of thousands of new lines of code monthly. To keep pace, they built an autonomous AI pipeline using Claude Code and Temporal workflows that generates unit tests, manages PRs through their entire lifecycle, and auto-heals CI failures. Over 17 weeks, the system merged 250 PRs completely autonomously.

Key Takeaways

250 test PRs merged in 17 weeks, adding 4,460 test functions across 233 classes
MVVM test coverage tripled from 9% to 30%, repositories up 352%
76% of PRs passed CI on first attempt with auto-healing for the rest
System scores files by type, coverage gap, size, and module priority
PR lifecycle workflow auto-assigns reviewers, fixes CI, and closes stale PRs

Critical Insight

Duolingo's autonomous test generation pipeline tripled iOS test coverage by generating 85K lines of code across 250 PRs with minimal human intervention, proving AI can handle the full PR lifecycle from creation to merge.

The real challenge wasn't generating tests, it was something else entirely that became the new bottleneck (and they're solving that with a second AI reviewer).

About This Article

Problem

Duolingo's iOS codebase was growing by tens of thousands of lines each month. When they used LLM-generated code, it exposed several architectural problems. UserClient was too tightly coupled, which made mock injection impossible. The team also mixed up SwiftTesting and XCTest frameworks. On top of that, Swift 6 Sendable violations prevented testing across multiple repository classes.

Solution

Kush Agrawal's team used Claude Code to validate code locally and found failure patterns across two batches with merge rates of 47-48%. They then fixed the root causes one by one. UserClient was migrated to use protocol-based injection with linter enforcement. They rewrote prompts to make the framework choice clearer. They also added @Sendable guidance to help with Swift 6 compatibility.

Impact

These fixes improved CI pass rates across the entire codebase, not just the pipeline. Repository test coverage jumped 352%. ViewModels saw a 203% increase. DataSources grew 192%. Overall MVVM coverage went from 9% to 30% across 233 unique classes.