On-Device Generative AI APIs: ML Kit and Gemini Nano
Article Summary
Caren Chang, Chengji Yan, and Taj Darra from Google just made on-device AI dramatically easier for Android developers. Four new ML Kit APIs let you integrate Gemini Nano without any prompt engineering or fine-tuning.
Google released on-device GenAI APIs as part of ML Kit, bringing Gemini Nano capabilities to Android apps through high-level, ready-to-use interfaces. The APIs handle summarization, proofreading, rewriting, and image description entirely on-device, with no server costs or internet dependency.
Key Takeaways
- Summarization API jumped from 77.2 to 92.1 benchmark score with LoRA fine-tuning
- Processes 510 tokens/second input and generates 11 tokens/second output on Pixel 9 Pro
- Works offline with zero API costs and local data processing for privacy
- Envision app now summarizes documents for blind users in production
Google's new ML Kit APIs deliver production-ready Gemini Nano integration with quality scores above 84% across all four use cases, requiring just a few lines of code.
About This Article
Android developers needed to add Gemini Nano to their apps across different tasks like summarization, proofreading, rewriting, and image description. They wanted to avoid manual prompt engineering and fine-tuning.
Google created GenAI APIs with four components: the Gemini Nano base model, API-specific LoRA adapter models, inference parameters optimized for each API, and an evaluation pipeline that uses LLM raters, statistical metrics, and human raters to check quality.
The proofreading API scored 90.2 on benchmarks and rewriting hit 84.1, while image description reached 92.3. These scores are higher than what the base model achieved, and the system still generates text at 11 tokens per second on Pixel 9 Pro.