MemoryVault: Search Any File by Meaning, Not by Name

What it does

Type "warm beach vacation" into MemoryVault and it finds beach photos, text notes, and PDF entries that are semantically close to that query. Not because any file is labelled "beach". Because Gemini Embedding 2 converted the content of each file into 3,072 numbers and placed similar-meaning content near each other in that space.

Upload anything: a photo, a PDF, a text note, an audio clip. Each file is converted to a vector the moment it is uploaded. When you search, the query goes through the same model, becomes the same kind of vector, and ChromaDB finds whatever is closest.

You can search by text or by uploading a query image. Either way, everything flows through the same embedding pipeline and gets compared in the same vector space.

What is supported: JPG, PNG, WebP, PDF, TXT, MP3, WAV

Architecture

Upload: file (photo / PDF / text / audio)
        ↓
FastAPI backend (port 8000)
        ↓
Gemini Embedding 2 API
        ↓
3,072-dimensional vector
        ↓
ChromaDB (chroma_db/, stored locally on disk)

Search: text query or image query
        ↓
FastAPI backend
        ↓
Gemini Embedding 2 API (same model, same space)
        ↓
Query vector
        ↓
ChromaDB cosine similarity search
        ↓
Top 5 matches ranked by similarity score

The key insight in that diagram: both upload and search go through the same model. That is why a text query can find a photo. They land in the same vector space. Proximity in that space is proximity in meaning.

Tech Stack

| Layer | Technology | |---|---| | Frontend | Next.js (TypeScript) | | Backend | FastAPI (Python) | | Embedding model | Gemini Embedding 2 (gemini-embedding-2, GA) | | Vector database | ChromaDB (local, persistent) | | Styling | Tailwind CSS | | SDK | google-genai >= 1.0.0 | | PDF parsing | pypdf | | HTTP client | Axios |

About Gemini Embedding 2

Gemini Embedding 2 (gemini-embedding-2) is an API-only model from Google. It is the first embedding model to natively handle text, images, audio, PDFs, and video in a single unified vector space without stitching separate encoders together.

One API call returns a list of 3,072 floats. That list is the embedding. The same call works whether you pass it a JPEG, a PDF page, a text string, or an audio clip.

Accessed via the Gemini API at ai.google.dev. The Vertex AI path does not support the asia-south1 region for this model, so the Gemini API is the correct integration point.

Important: The correct SDK is google-genai, not the older google-generativeai package. The older package does not support gemini-embedding-2 and the failure is silent.

from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_KEY")

# Same call for text
result = client.models.embed_content(
    model="gemini-embedding-2",
    contents="warm beach vacation"
)

# Same call for an image
image_part = types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg")
result = client.models.embed_content(
    model="gemini-embedding-2",
    contents=image_part
)

vector = result.embeddings[0].values  # list of 3,072 floats

How ChromaDB Stores and Searches

ChromaDB is a vector database that runs as a local binary store on disk. No server needed. When a vector is added, ChromaDB builds an HNSW index that allows approximate nearest-neighbor search across millions of vectors in milliseconds.

The distance metric used is cosine distance. Two identical vectors produce a distance of 0. Completely unrelated vectors in natural language space typically land around 0.7 to 1.0. The app converts distance to a similarity percentage for display.

similarity % = (1 - distance / 2) × 100

In practice, all real-world embeddings sit between 60% and 95% similarity against any query, because nothing in natural language space is ever truly "opposite". A 90%+ match is very strong. Below 65% is weak enough to filter.

What I Learned

The model is API-only, not in GCP Console. Gemini Embedding 2 does not appear in Vertex AI Model Garden for anyone, not just users in India. It is an API endpoint, not a deployable model card. The Gemini API is the right path.
Use the new SDK. google-generativeai is deprecated and does not support gemini-embedding-2. Switching to google-genai and updating the import pattern (from google import genai) is required.
Text always beats images in similarity scores. A file containing the word "beach" will score higher against a beach query than an actual beach photo will. Text is explicit. Images require the model to infer meaning from pixels. Both work, but text wins on raw precision.
Cosine similarity compresses into a narrow range. Because all real-world content has positive correlation in the embedding space, scores cluster between 60% and 90%. A one-word query like "beach" does not create enough distance from "mountain" to make clean separation possible. Descriptive phrases like "warm tropical beach with palm trees" produce much better gaps between relevant and irrelevant results.
ChromaDB is local and immediate. No infrastructure, no cloud, no API key. It writes binary files to a folder. The entire vector store for this project is a few MB on disk.