Google Caren Chang Aug 22, 2025

Gemini Nano with On-Device ML Kit and GenAI APIs

Article Summary

Caren Chang, Joanna Huang, and Chengji Yan from Google reveal how they're making Gemini Nano v3 940 tokens/second fast while keeping quality consistent across devices. The secret? LoRA adapters and rigorous evals behind the scenes.

Google just launched Gemini Nano v3 on Pixel 10 devices, accessible through ML Kit GenAI APIs. The team explains their approach to maintaining consistent quality as they upgrade models: combining evaluation pipelines across languages with feature-specific LoRA adapter training on top of the base model.

Key Takeaways

Critical Insight

Google's GenAI APIs now deliver 84% faster prefix processing with Gemini Nano v3 while using adapter training to guarantee developers get consistent results across model upgrades.

The article reveals specific benchmarking data comparing nano-v2 and nano-v3 performance that shows where the real speed gains are coming from.

About This Article

Problem

Google needed to keep Gemini Nano working consistently across different model versions and device hardware. The GenAI APIs had to meet quality standards no matter which model version a user's device ran.

Solution

The team built evaluation pipelines with LLM-based raters, statistical metrics, and human raters for each supported language. They then trained feature-specific LoRA adapters that sat on top of the Gemini Nano base model to maintain API quality.

Impact

Image encoding got faster, dropping from 0.8 seconds on Pixel 9 Pro to 0.6 seconds on Pixel 10 Pro. Text-to-text prefix speed nearly doubled from 510 tokens per second to 940 tokens per second. Results stayed consistent across model upgrades.