Google • Caren Chang • May 20, 2025

On-Device Generative AI APIs: ML Kit and Gemini Nano

Article Summary

Caren Chang, Chengji Yan, and Taj Darra from Google just made on-device AI dramatically easier for Android developers. Four new ML Kit APIs let you integrate Gemini Nano without any prompt engineering or fine-tuning.

Google released on-device GenAI APIs as part of ML Kit, bringing Gemini Nano capabilities to Android apps through high-level, ready-to-use interfaces. The APIs handle summarization, proofreading, rewriting, and image description entirely on-device, with no server costs or internet dependency.

Key Takeaways

Summarization API jumped from 77.2 to 92.1 benchmark score with LoRA fine-tuning
Processes 510 tokens/second input and generates 11 tokens/second output on Pixel 9 Pro
Works offline with zero API costs and local data processing for privacy
Envision app now summarizes documents for blind users in production

Critical Insight

Google's new ML Kit APIs deliver production-ready Gemini Nano integration with quality scores above 84% across all four use cases, requiring just a few lines of code.

The article reveals the four-layer architecture that makes these APIs work without prompt engineering, plus specific token limits you need to know.

About This Article

Problem

Android developers needed to add Gemini Nano to their apps across different tasks like summarization, proofreading, rewriting, and image description. They wanted to avoid manual prompt engineering and fine-tuning.

Solution

Google created GenAI APIs with four components: the Gemini Nano base model, API-specific LoRA adapter models, inference parameters optimized for each API, and an evaluation pipeline that uses LLM raters, statistical metrics, and human raters to check quality.

Impact

The proofreading API scored 90.2 on benchmarks and rewriting hit 84.1, while image description reached 92.3. These scores are higher than what the base model achieved, and the system still generates text at 11 tokens per second on Pixel 9 Pro.