Software Mansion • Jakub Mroz • Nov 20, 2025

Building an AI-Powered Note-Taking App in React Native: Part 3 (Local RAG)

Article Summary

Jakub Mroz from Software Mansion just shipped a fully local RAG pipeline in React Native. No cloud APIs, no data leaks, just on-device AI that actually works.

This is Part 3 of a series building a privacy-first AI note-taking app. The team integrates LLaMA 3.2 1B SpinQuant with React Native ExecuTorch and React Native RAG to enable natural language chat with your notes, entirely offline.

Key Takeaways

LLaMA 3.2 1B SpinQuant runs locally for mobile RAG
RAG retrieves semantically similar notes, then generates grounded answers
Token streaming provides smooth conversational UX without cloud latency
Complete privacy: all embeddings, retrieval, and generation happen on-device

Critical Insight

You can now build ChatGPT-style interfaces in React Native that run completely offline and keep user data private.

Part 4 will add speech-to-text so users can talk to their AI assistant, pushing mobile AI capabilities even further.

About This Article

Problem

Building conversational AI on mobile means grounding LLM responses in user data while keeping everything private. Software Mansion needed to retrieve semantically similar note chunks and filter them by similarity scores above 0.2 threshold without relying on cloud infrastructure.

Solution

The team built a RAG service using React Native ExecuTorch to run LLaMA 3.2 1B SpinQuant locally. They integrated the text vector store from Part 1 and created a prompt generator that categorizes retrieved results into three tiers: 'Highly relevant' (>0.6), 'Relevant' (>0.4), and 'Slightly relevant' (>0.2).

Impact

Users can now chat with their notes entirely on-device with token-by-token streaming. There's no cloud latency and no data leaves the phone. The AI assistant loads the full RAG pipeline on-demand and stops gracefully when the screen closes.