Gemini Nano with On-Device ML Kit and GenAI APIs
Article Summary
Caren Chang, Joanna Huang, and Chengji Yan from Google reveal how they're making Gemini Nano v3 940 tokens/second fast while keeping quality consistent across devices. The secret? LoRA adapters and rigorous evals behind the scenes.
Google just launched Gemini Nano v3 on Pixel 10 devices, accessible through ML Kit GenAI APIs. The team explains their approach to maintaining consistent quality as they upgrade models: combining evaluation pipelines across languages with feature-specific LoRA adapter training on top of the base model.
Key Takeaways
- Prefix speed jumped from 510 to 940 tokens/second on Pixel 10 Pro
- LoRA adapters ensure API quality stays consistent across model versions
- Image encoding dropped from 0.8 to 0.6 seconds between generations
- Eval pipeline uses LLM raters, statistical metrics, and human review
Google's GenAI APIs now deliver 84% faster prefix processing with Gemini Nano v3 while using adapter training to guarantee developers get consistent results across model upgrades.
About This Article
Google needed to keep Gemini Nano working consistently across different model versions and device hardware. The GenAI APIs had to meet quality standards no matter which model version a user's device ran.
The team built evaluation pipelines with LLM-based raters, statistical metrics, and human raters for each supported language. They then trained feature-specific LoRA adapters that sat on top of the Gemini Nano base model to maintain API quality.
Image encoding got faster, dropping from 0.8 seconds on Pixel 9 Pro to 0.6 seconds on Pixel 10 Pro. Text-to-text prefix speed nearly doubled from 510 tokens per second to 940 tokens per second. Results stayed consistent across model upgrades.