AIDrivingHelper: On-Device RAG Coaching for Driving Apps
Android module that uses RAG entirely on-device to answer natural language questions about past trips and deliver personalized driving coaching. Built to plug into apps that already record trip data, with voice input and voice output.
What this is
This is an add-on coaching layer, not a standalone trip tracking app.
Apps like DriveKit (DriveQuant), Zendrive, eDriving Mentor, and Cambridge Mobile Telematics already record every trip and collect driving events: speeding, hard braking, phone usage, rapid acceleration. They have the data. What they often lack is a natural-language interface that lets the driver actually have a conversation about their behavior.
That is what this module adds. Once the host app has trip data, this layer lets the driver ask questions like "Was I speeding more at night?" or "How was my braking this week?" and get back a real, context-aware coaching response grounded in their actual trips.
What it does
The entire pipeline runs on-device. Semantic search over stored trips, retrieval of the most relevant ones, inference via a local LLM, and voice output. No cloud. No trip data ever leaves the phone.
Architecture
User voice query (SpeechRecognizer)
↓
GeckoEmbeddingModel (on-device, 768-dim)
↓
SqliteVectorStore (semantic search over stored trips)
↓
Top-4 most relevant trips retrieved
↓
Gemma 3 1B IT (on-device LLM) + retrieved context
↓
Coaching response (text + TTS output)
MVVM:
LLMViewModelmanages LLM state and trip insertionLLMInferenceChainsets up the full RAG chain (embedder, vector store, LLM)TripEntity/TripDaohandle Room persistenceRAGChatUIis the Compose chat interface with voice input
Tech Stack
| Layer | Technology | |---|---| | Language | Kotlin | | UI | Jetpack Compose + Material 3 | | On-device LLM | MediaPipe tasks-genai 0.10.23 (Gemma 3 1B IT int4) | | RAG Framework | Google AI Edge local-agents 0.2.0 | | Embeddings | GeckoEmbeddingModel (768-dim, on-device) | | Vector Store | SqliteVectorStore (local semantic search) | | Database | Room 2.6.1 | | Voice I/O | Android SpeechRecognizer + TextToSpeech API | | Async | Kotlin Coroutines + Guava bridge |
RAG Setup
GeckoEmbeddingModel (768-dim semantic embeddings)
+
SqliteVectorStore (persistent vector DB, on-device)
+
DefaultSemanticTextMemory
↓
RetrievalAndInferenceChain (top-k=4, question-answering task)
↓
MediaPipeLlmBackend (Gemma 3 1B IT, CPU)
Retrieval config is top-k=4. The 4 most semantically similar trips are retrieved and injected as context before inference. This keeps the context window focused and prevents the model from hallucinating about trips it was not given.
Prompt Engineering
The system prompt instructs the model to:
- Act as a driving coach
- Use second-person language ("you were speeding", not "the driver was speeding")
- Only reference information from the retrieved trips, not general knowledge
- Not guess or fill in gaps
Temperature: 1.0 / Top-P: 0.95 / Top-K: 64
Demo Trip Data
7 synthetic trips are pre-loaded to show the system working out of the box:
- Speeding events on highway and in residential areas
- Phone usage events
- Sudden braking events
- Hard acceleration events
Each trip is embedded and stored in SqliteVectorStore on first run.
Voice Interaction
- Input: Android SpeechRecognizer via microphone button in the chat UI
- Output: Android TextToSpeech API reads the coaching response aloud
- Flow: Speak, recognize, embed query, retrieve trips, run LLM, speak response
What I Learned
- RAG on Android is completely practical. The Google AI Edge local-agents library handles the hardest parts: embedding, vector search, chain orchestration. It is more capable than I expected for a mobile SDK.
- Semantic search beats keyword search. Gecko embeddings surface contextually relevant trips even when the driver's phrasing does not match stored data word for word.
- The Guava-Coroutines bridge is the one tricky dependency.
kotlinx-coroutines-guavais needed because local-agents uses ListenableFuture internally. Miss this and nothing compiles. - Voice and RAG are a natural pair. Spoken queries are vague and conversational. That is exactly the kind of input where semantic retrieval is strongest.