Gemini Nano with On-Device ML Kit and GenAI APIs
Article Summary
Caren Chang, Joanna Huang, and Chengji Yan from Google reveal how they're making Gemini Nano v3 940 tokens/second fast while keeping quality consistent across devices. The secret? LoRA adapters and rigorous evals behind the scenes.
Google just launched Gemini Nano v3 on Pixel 10 devices, accessible through ML Kit GenAI APIs. The team explains their approach to maintaining consistent quality as they upgrade models: combining evaluation pipelines across languages with feature-specific LoRA adapter training on top of the base model.
Key Takeaways
- Prefix speed jumped from 510 to 940 tokens/second on Pixel 10 Pro
- LoRA adapters ensure API quality stays consistent across model versions
- Image encoding dropped from 0.8 to 0.6 seconds between generations
- Eval pipeline uses LLM raters, statistical metrics, and human review
Critical Insight
Google's GenAI APIs now deliver 84% faster prefix processing with Gemini Nano v3 while using adapter training to guarantee developers get consistent results across model upgrades.