Personal Shopper: Multi-Agent AI That Searches Google, Reddit & YouTube in Parallel

What this is

A full end-to-end AI shopping assistant with an Android app on the front and a multi-agent system on Google Cloud at the back.

You type a product query. Four AI agents go to work. Three of them run simultaneously in parallel, each searching a different source. One synthesizer reads everything and writes a single, opinionated recommendation with clickable links directly to products, Reddit threads, and YouTube videos.

The interesting part is not the app itself. It is the multi-agent architecture running behind it, built on Google's Agent Development Kit (ADK) and hosted on Vertex AI Agent Engine.

What it does

Type anything: "suggest wireless headphones under 5000 INR"

3 Parallel Research Agents fire simultaneously:

Shopping Agent: searches real product listings, prices, and availability via Serper.dev
Reddit Agent: finds genuine community reviews and user opinions from Reddit threads via Serper.dev site filter
YouTube Agent: finds expert video reviews and hands-on assessments via YouTube Data API

1 Synthesizer Agent runs after all three complete:

Reads all research results and produces a ranked recommendation
Top 3 products with price ranges, pros/cons, and who each is best for
One clear best pick with a direct purchase link
Community warnings and caveats pulled from Reddit
1 to 2 YouTube review links you can tap directly in the app

Google ADK: Two Agent Types

This project uses Google's Agent Development Kit (ADK), an open-source framework for building multi-agent AI systems. ADK provides two agent types used here.

SequentialAgent is the outer container. It runs its sub-agents one after the other, in order. Here it runs the ParallelAgent first, waits for it to finish, then runs the Synthesizer. This ensures the Synthesizer always has complete research before it writes anything.

ParallelAgent fires all its sub-agents at the same time. All three searches happen simultaneously rather than one after another. The total wait time is the same as a single search, not three searches stacked on top of each other.

Together they form a two-step pipeline: gather everything at once, then synthesize once.

Agent Flow

SequentialAgent: personal_shopper
        |
        |-- Step 1 -----------------------------------------
        |
        |   ParallelAgent: parallel_research
        |        |
        |        |-- Google Shopping Agent  -->  Serper.dev API
        |        |
        |        |-- Reddit Agent           -->  Serper.dev (site:reddit.com)
        |        |
        |        +-- YouTube Agent          -->  YouTube Data API v3
        |
        |        (all three run at the exact same time)
        |
        |-- Step 2 -----------------------------------------
        |
        +-- Synthesizer Agent (Gemini 2.5 Flash)
                 |
                 +-- Ranked recommendation with clickable links

Full System Architecture

Android App (Kotlin / Jetpack Compose)
        |
        |  POST /query  {"message": "..."}
        v
Cloud Run Proxy  <-- Python + FastAPI in a Docker container
        |  Gets auth token automatically via service account
        |  Creates Vertex AI session
        |  Polls up to 12 rounds until synthesizer completes
        v
Vertex AI Agent Engine  <-- Google ADK multi-agent system
        |
        +-- SequentialAgent: personal_shopper
                |
                |-- ParallelAgent: parallel_research
                |        |-- LlmAgent: google_shopping_agent
                |        |-- LlmAgent: reddit_research_agent
                |        +-- LlmAgent: youtube_review_agent
                |
                +-- LlmAgent: synthesizer (Gemini 2.5 Flash)
        |
        v
Cloud Run Proxy collects and returns combined text
        |
        v
Android App renders markdown with typewriter animation + clickable links

Why the proxy exists: Vertex AI requires a Google Cloud auth token that expires every hour. You cannot safely store a rotating credential inside an Android app. The Cloud Run proxy runs inside Google's infrastructure, gets its own identity automatically, and handles all authentication on behalf of the app.

Why the polling loop: ADK's streamQuery API is step-by-step. Each call advances the agent pipeline one round. The proxy loops up to 12 rounds, 5 seconds apart, until the synthesizer returns its final text. In practice this takes 4 to 7 rounds.

Tech Stack

| Layer | Technology | |-------|-----------| | Android UI | Jetpack Compose + Material 3 | | Android State | ViewModel + StateFlow + Coroutines | | Android HTTP | Retrofit + OkHttp + Gson | | Backend Language | Python 3.11 | | Backend Framework | FastAPI + Uvicorn | | Backend HTTP Client | httpx (async) | | Containerization | Docker | | Cloud Runtime | Google Cloud Run (serverless) | | AI Agent Framework | Google ADK (Agent Development Kit) | | Agent Orchestration | ParallelAgent + SequentialAgent | | AI Hosting | Vertex AI Agent Engine | | AI Model | Gemini 2.5 Flash | | Search APIs | Serper.dev, YouTube Data API v3 |

ADK Agent Code (simplified)

parallel_research = ParallelAgent(
    name="parallel_research",
    sub_agents=[shopping_agent, reddit_agent, youtube_agent],
)

synthesizer = LlmAgent(
    model="gemini-2.5-flash",
    name="synthesizer",
    instruction="Synthesize findings from all three sources into a ranked recommendation...",
)

root_agent = SequentialAgent(
    name="personal_shopper",
    sub_agents=[parallel_research, synthesizer],
)

What I Learned

Multi-agent systems are event-driven, not request-response. The Vertex AI Agent Engine API returns one step at a time: tool calls, tool responses, agent handoffs. You have to poll across multiple rounds to get the full answer. Understanding this changed how I thought about the entire backend design.
Parallel execution is real. ParallelAgent fires sub-agents concurrently inside Vertex AI. The three searches genuinely run at the same time, which makes the overall latency about the same as a single search.
Auth between services needs a deliberate solution. The Android app cannot hold a Google Cloud token. The proxy pattern, a server with its own cloud identity that proxies requests on behalf of the app, is the correct architectural answer for mobile-to-cloud AI.
The Dockerfile is all Cloud Run needs. Six lines specifying Python, the requirements, and the start command. Google Cloud Build reads it, builds the container in the cloud, and deploys it. No local Docker installation needed.
Agent instructions are the hardest part to get right. Getting the YouTube agent to pass URLs through to the synthesizer, and getting the synthesizer to include them in the final output, required more iteration on instruction prompts than any of the actual code.