ShopAgent — Multi-Agent E-Commerce Assistant

A production-grade multi-agent AI system for e-commerce customer service, built with LangGraph, Claude, MCP (Model Context Protocol), and RAG.

The system routes customer conversations to three specialized sub-agents via a Supervisor, with full hybrid search, human-in-the-loop escalation, guardrails, and RAGAS evaluation.

Architecture

graph TD
    A[Customer Chat UI\nStreamlit] -->|HTTP/WebSocket| B[FastAPI + WebSocket\nlocalhost:8000]
    B --> C{Supervisor Agent\nLangGraph}

    C -->|product_query| D[Shop Agent\nAgentic RAG loop]
    C -->|order_query| E[Order Agent\nguardrails + identity check]
    C -->|support| F[Support Agent\nFAQ + policies RAG]
    C -->|chitchat| G[Chitchat Node\nHaiku]
    C -->|escalate| H[Human-in-the-Loop\ninterrupt + resume]

    D -->|MCP| I[Product Catalog MCP\nlocalhost:8001]
    E -->|MCP| J[Order Management MCP\nlocalhost:8002]
    F -->|MCP| K[Knowledge Base MCP\nlocalhost:8003]

    I --> L[(Qdrant\nvector DB)]
    I --> M[(PostgreSQL\nproducts)]
    J --> M
    K --> L

    C --> N[evaluate_response_node\nquality gate]
    N -->|retry| C
    N -->|accept| O[final_response]

Key design principle

The MCP boundary separates agent reasoning from data access. Agents speak only MCP — they have no direct database connections. Adding a new retail client means deploying a new set of MCP servers, not touching the agent code.

Agent flow

Supervisor receives the user message, classifies intent (Haiku — cheap), routes to a sub-agent.
The sub-agent executes a ReAct loop against its MCP tools, returns a draft response.
evaluate_response_node judges quality (Haiku). If insufficient and under 3 retries, loops back.
Accepted response passes through output guardrails (SQL/PII/stack-trace filter) before returning.

Tech Stack

Layer	Technology
Orchestration	LangGraph (stateful graphs, supervisor pattern)
Chains & Parsing	LangChain LCEL
Agent protocol	MCP + langchain-mcp-adapters
Primary LLM	Claude Sonnet (reasoning + tool calls)
Cheap LLM	Claude Haiku (intent classification, eval, chitchat)
Embeddings	BAAI/bge-small-en-v1.5 via FastEmbed (384-dim)
Sparse retrieval	Qdrant/BM25 via FastEmbed
Reranking	cross-encoder/ms-marco-MiniLM-L-6-v2
Vector DB	Qdrant (Docker)
Relational DB	PostgreSQL
API	FastAPI + WebSocket streaming
Tracing	LangSmith
Eval	RAGAS (faithfulness, answer_relevancy, context_recall)
Guardrails	Custom regex (prompt injection, PII, output safety)
Frontend	Streamlit
Package manager	uv

Setup

Prerequisites

Docker + Docker Compose
uv (curl -LsSf https://astral.sh/uv/install.sh | sh)
An Anthropic API key

1. Clone and configure

git clone https://github.com/<your-username>/shopagent.git
cd shopagent
cp .env.example .env
# Edit .env and fill in ANTHROPIC_API_KEY (and optionally LANGCHAIN_API_KEY)

2. Start all services

docker compose up --build

This starts: Qdrant, PostgreSQL, Product Catalog MCP, Order Management MCP, Knowledge Base MCP, FastAPI API, and Streamlit frontend. All services have health checks — the API will not start until all MCP servers are ready.

3. Install dependencies

uv sync --group dev

4. Seed the data (run once)

uv run python scripts/seed_catalog.py        # ~500 products → Qdrant + Postgres
uv run python scripts/seed_knowledge_base.py  # FAQ + policies → Qdrant
uv run python scripts/seed_orders.py          # Sample orders → Postgres

5. Verify

curl http://localhost:8000/health
# {"status":"ok","mcp_connected":true}

6. Open the chat UI

Navigate to http://localhost:8501

Running Tests

# Unit tests — no Docker needed, all I/O mocked
uv run pytest tests/unit/ -v

# Integration tests — requires Docker Compose services running
uv run pytest tests/integration/ -v -m integration

# Guardrails tests specifically
uv run pytest tests/unit/test_guardrails.py -v

# RAGAS eval — requires all services + seeded data (~10-20 min, real LLM calls)
uv run pytest tests/eval/test_ragas.py::test_ragas_full_suite_and_save_baseline \
  -v -m integration -s

RAG Pipeline

Products have both semantic data (descriptions, reviews) and structured data (price, stock, attributes). Three-level retrieval:

User query: "wireless headphones under 100 euros in stock"
          │
          ▼
┌─────────────────────────────┐
│ 1. Self-query decomposition │  semantic_query="wireless headphones"
│    (LLM → QueryDecomposition)│  max_price=100.0, in_stock=True
└─────────────────────────────┘
          │
          ▼
┌─────────────────────────────┐
│ 2. Hybrid search            │  Dense (cosine) + Sparse (BM25) via
│    Qdrant RRF fusion        │  Qdrant prefetch + Reciprocal Rank Fusion
└─────────────────────────────┘
          │ top-20 candidates
          ▼
┌─────────────────────────────┐
│ 3. Cross-encoder reranking  │  ms-marco-MiniLM-L-6-v2 scores each
│    + PostgreSQL enrichment  │  (query, product) pair; top-5 returned
└─────────────────────────────┘
          │ top-5 enriched products
          ▼
        LLM context

Dual storage: semantic data in Qdrant (for search), price/stock/rating in PostgreSQL (authoritative, updated frequently without re-indexing).

RAGAS Evaluation Results

Evaluation dataset: 55 questions across product search, support/FAQ, and order queries.

Metric	Score	Description
`faithfulness`	—	Are claims in the answer grounded in retrieved context?
`answer_relevancy`	—	Does the answer address what was asked?
`context_recall`	—	Does the context contain the ground truth?

Run uv run pytest tests/eval/test_ragas.py::test_ragas_full_suite_and_save_baseline -v -m integration -s to generate baseline scores and populate this table. Results are saved to tests/eval/results/baseline.json.

Guardrails

Three independent layers protect the system:

Layer	What it catches	Action
Prompt injection (input)	"ignore previous instructions", DAN mode, `<system>` tags, role-hijack phrases	Raises `GuardrailViolation` → canned refusal, no LLM call
PII redaction (output)	Emails, credit cards, IBANs, Dutch BSNs, phone numbers	Surgical token replacement before response is returned
Output safety (output)	Raw SQL, Python tracebacks, internal file paths, exposed API keys	Entire response replaced with safe fallback message

All guardrails are pure synchronous functions with 39 unit tests (tests/unit/test_guardrails.py).

MCP Servers

Server	Port	Tools exposed
Product Catalog	8001	`search_products`, `get_product_details`, `compare_products`, `get_recommendations`
Order Management	8002	`check_order_status`, `get_tracking_info`, `get_order_history`, `initiate_return`
Knowledge Base	8003	`search_knowledge_base`, `get_policy_details`

Why MCP?

Multi-tenant: new client = new MCP server set, zero changes to agent logic
Testable: agents and MCP servers tested independently
Replaceable: swap PostgreSQL for MongoDB by only changing the MCP server

Adding a New Tenant

ShopAgent is designed so that onboarding a new retail client requires no changes to agent code. The steps below take approximately 1-2 hours for a new client.

Step 1 — Create new MCP servers for the client

Copy the three MCP server templates:

cp -r src/mcp_servers/product_catalog  src/mcp_servers/client_b_products
cp -r src/mcp_servers/order_management src/mcp_servers/client_b_orders
cp -r src/mcp_servers/knowledge_base   src/mcp_servers/client_b_kb

Update each server's database connection strings to point to the client's databases. The tool names and schemas stay the same — only the data source changes.

Step 2 — Add services to docker-compose.yml

client-b-product-mcp:
  build:
    context: .
    dockerfile: Dockerfile
  command: ["uv", "run", "python", "-m", "src.mcp_servers.client_b_products.server", "--transport", "streamable-http"]
  ports:
    - "8011:8011"
  environment:
    - QDRANT_URL=http://qdrant:6333
    - POSTGRES_DSN=postgresql://...   # client B's database

Add similar services for orders and knowledge base.

Step 3 — Configure the API for the new tenant

In src/api/main.py, the MultiServerMCPClient config is the only thing that changes:

mcp_client = MultiServerMCPClient({
    "product-catalog": {"url": "http://client-b-product-mcp:8011/mcp", "transport": "streamable_http"},
    "order-management": {"url": "http://client-b-order-mcp:8012/mcp", "transport": "streamable_http"},
    "knowledge-base":   {"url": "http://client-b-kb-mcp:8013/mcp", "transport": "streamable_http"},
})

For a multi-tenant API, read the MCP URLs from request headers or JWT claims so a single API instance can serve multiple tenants.

Step 4 — Seed the client's data

uv run python scripts/seed_catalog.py          # with client B's products
uv run python scripts/seed_knowledge_base.py    # with client B's FAQ/policies
uv run python scripts/seed_orders.py            # with client B's order history

The agents never change. The Supervisor graph, intent classifier, ReAct loops, and guardrails are all tenant-agnostic.

Project Structure

shopagent/
├── src/
│   ├── agents/
│   │   ├── supervisor.py       # Supervisor graph + intent routing + guardrail wiring
│   │   ├── shop_agent.py       # Product search subgraph (agentic RAG loop)
│   │   ├── order_agent.py      # Order management subgraph
│   │   ├── support_agent.py    # FAQ/knowledge base subgraph
│   │   └── state.py            # ShopAgentState TypedDict + Pydantic models
│   ├── guardrails.py           # Prompt injection, PII, output safety
│   ├── mcp_servers/
│   │   ├── product_catalog/    # MCP: search, details, compare, recommend
│   │   ├── order_management/   # MCP: status, tracking, returns
│   │   └── knowledge_base/     # MCP: FAQ search, policy details
│   ├── rag/
│   │   ├── ingestion.py        # Chunking + embedding + Qdrant upsert
│   │   ├── retrieval.py        # Hybrid search + reranking + PG enrichment
│   │   └── self_query.py       # NL → semantic query + structured filters
│   └── api/
│       ├── main.py             # FastAPI lifespan, MCP client, agent assembly
│       ├── routes.py           # /health, /chat, /chat/{id}/resume, WS
│       └── schemas.py          # Pydantic request/response models
├── tests/
│   ├── unit/                   # 254 tests, no external services
│   │   └── test_guardrails.py  # 39 guardrail tests (passing/failing cases)
│   ├── integration/            # Require Docker Compose services
│   └── eval/
│       ├── datasets/eval_questions.json   # 55 Q&A pairs
│       ├── test_ragas.py                  # faithfulness + relevancy + recall
│       └── results/baseline.json          # generated after first eval run
├── scripts/
│   ├── seed_catalog.py         # Generate + ingest 500 products (with retry)
│   ├── seed_knowledge_base.py  # Ingest FAQ + 4 policy docs
│   └── seed_orders.py          # Seed sample orders into PostgreSQL
├── data/
│   ├── faq.md
│   └── policies/               # return, shipping, privacy, terms
├── frontend/app.py             # Streamlit chat UI
├── docker-compose.yml          # All 7 services with health checks
└── pyproject.toml              # uv project (Python 3.12+)

Environment Variables

Copy .env.example and fill in the required values:

# Required
ANTHROPIC_API_KEY=sk-ant-...

# LangSmith tracing (optional but recommended)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=ls__...
LANGCHAIN_PROJECT=shopagent-dev

# Set automatically by docker-compose.yml (override only for non-Docker use)
QDRANT_URL=http://localhost:6333
POSTGRES_DSN=postgresql://shopagent:shopagent@localhost:5432/shopagent
PRODUCT_MCP_URL=http://localhost:8001/mcp
ORDER_MCP_URL=http://localhost:8002/mcp
KNOWLEDGE_MCP_URL=http://localhost:8003/mcp

API Reference

Method	Path	Description
`GET`	`/health`	Liveness probe + MCP connection status
`POST`	`/chat`	Synchronous chat — returns final response
`WS`	`/chat/{session_id}`	Streaming WebSocket — yields per-node events
`POST`	`/chat/{session_id}/resume`	Resume after human escalation interrupt
`GET`	`/graph/{agent_id}`	Mermaid diagram for `supervisor`/`shop`/`order`/`support`

Interactive docs: http://localhost:8000/docs

LangSmith Tracing

Every conversation creates one LangSmith trace with per-node spans. Model costs are tagged at the client level:

tier:primary / model:claude-sonnet-4-6 — sub-agent reasoning calls
tier:cheap / model:claude-haiku-4-5-... — intent classification, quality eval, chitchat

Filter by tag in LangSmith to see the Haiku/Sonnet cost split across a session.

Development Steps

Step	What was built
01	RAG pipeline: hybrid search, self-query decomposition, cross-encoder reranking
02	Single agent with LangGraph ReAct loop and product search tools
03	MCP servers, multi-tenant data layer, langchain-mcp-adapters integration
04	Multi-agent Supervisor, Order/Support sub-agents, HiTL escalation, FastAPI + Streamlit
05	Guardrails, RAGAS eval dataset, model fallback chain, LangSmith cost tags, README

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
frontend		frontend
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

ShopAgent — Multi-Agent E-Commerce Assistant

Architecture

Key design principle

Agent flow

Tech Stack

Setup

Prerequisites

1. Clone and configure

2. Start all services

3. Install dependencies

4. Seed the data (run once)

5. Verify

6. Open the chat UI

Running Tests

RAG Pipeline

RAGAS Evaluation Results

Guardrails

MCP Servers

Why MCP?

Adding a New Tenant

Step 1 — Create new MCP servers for the client

Step 2 — Add services to docker-compose.yml

Step 3 — Configure the API for the new tenant

Step 4 — Seed the client's data

Project Structure

Environment Variables

API Reference

LangSmith Tracing

Development Steps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages