Skip to content

Meguazy/e_commerce_agentic

Repository files navigation

ShopAgent — Multi-Agent E-Commerce Assistant

A production-grade multi-agent AI system for e-commerce customer service, built with LangGraph, Claude, MCP (Model Context Protocol), and RAG.

The system routes customer conversations to three specialized sub-agents via a Supervisor, with full hybrid search, human-in-the-loop escalation, guardrails, and RAGAS evaluation.


Architecture

graph TD
    A[Customer Chat UI\nStreamlit] -->|HTTP/WebSocket| B[FastAPI + WebSocket\nlocalhost:8000]
    B --> C{Supervisor Agent\nLangGraph}

    C -->|product_query| D[Shop Agent\nAgentic RAG loop]
    C -->|order_query| E[Order Agent\nguardrails + identity check]
    C -->|support| F[Support Agent\nFAQ + policies RAG]
    C -->|chitchat| G[Chitchat Node\nHaiku]
    C -->|escalate| H[Human-in-the-Loop\ninterrupt + resume]

    D -->|MCP| I[Product Catalog MCP\nlocalhost:8001]
    E -->|MCP| J[Order Management MCP\nlocalhost:8002]
    F -->|MCP| K[Knowledge Base MCP\nlocalhost:8003]

    I --> L[(Qdrant\nvector DB)]
    I --> M[(PostgreSQL\nproducts)]
    J --> M
    K --> L

    C --> N[evaluate_response_node\nquality gate]
    N -->|retry| C
    N -->|accept| O[final_response]
Loading

Key design principle

The MCP boundary separates agent reasoning from data access. Agents speak only MCP — they have no direct database connections. Adding a new retail client means deploying a new set of MCP servers, not touching the agent code.

Agent flow

  1. Supervisor receives the user message, classifies intent (Haiku — cheap), routes to a sub-agent.
  2. The sub-agent executes a ReAct loop against its MCP tools, returns a draft response.
  3. evaluate_response_node judges quality (Haiku). If insufficient and under 3 retries, loops back.
  4. Accepted response passes through output guardrails (SQL/PII/stack-trace filter) before returning.

Tech Stack

Layer Technology
Orchestration LangGraph (stateful graphs, supervisor pattern)
Chains & Parsing LangChain LCEL
Agent protocol MCP + langchain-mcp-adapters
Primary LLM Claude Sonnet (reasoning + tool calls)
Cheap LLM Claude Haiku (intent classification, eval, chitchat)
Embeddings BAAI/bge-small-en-v1.5 via FastEmbed (384-dim)
Sparse retrieval Qdrant/BM25 via FastEmbed
Reranking cross-encoder/ms-marco-MiniLM-L-6-v2
Vector DB Qdrant (Docker)
Relational DB PostgreSQL
API FastAPI + WebSocket streaming
Tracing LangSmith
Eval RAGAS (faithfulness, answer_relevancy, context_recall)
Guardrails Custom regex (prompt injection, PII, output safety)
Frontend Streamlit
Package manager uv

Setup

Prerequisites

  • Docker + Docker Compose
  • uv (curl -LsSf https://astral.sh/uv/install.sh | sh)
  • An Anthropic API key

1. Clone and configure

git clone https://github.com/<your-username>/shopagent.git
cd shopagent
cp .env.example .env
# Edit .env and fill in ANTHROPIC_API_KEY (and optionally LANGCHAIN_API_KEY)

2. Start all services

docker compose up --build

This starts: Qdrant, PostgreSQL, Product Catalog MCP, Order Management MCP, Knowledge Base MCP, FastAPI API, and Streamlit frontend. All services have health checks — the API will not start until all MCP servers are ready.

3. Install dependencies

uv sync --group dev

4. Seed the data (run once)

uv run python scripts/seed_catalog.py        # ~500 products → Qdrant + Postgres
uv run python scripts/seed_knowledge_base.py  # FAQ + policies → Qdrant
uv run python scripts/seed_orders.py          # Sample orders → Postgres

5. Verify

curl http://localhost:8000/health
# {"status":"ok","mcp_connected":true}

6. Open the chat UI

Navigate to http://localhost:8501


Running Tests

# Unit tests — no Docker needed, all I/O mocked
uv run pytest tests/unit/ -v

# Integration tests — requires Docker Compose services running
uv run pytest tests/integration/ -v -m integration

# Guardrails tests specifically
uv run pytest tests/unit/test_guardrails.py -v

# RAGAS eval — requires all services + seeded data (~10-20 min, real LLM calls)
uv run pytest tests/eval/test_ragas.py::test_ragas_full_suite_and_save_baseline \
  -v -m integration -s

RAG Pipeline

Products have both semantic data (descriptions, reviews) and structured data (price, stock, attributes). Three-level retrieval:

User query: "wireless headphones under 100 euros in stock"
          │
          ▼
┌─────────────────────────────┐
│ 1. Self-query decomposition │  semantic_query="wireless headphones"
│    (LLM → QueryDecomposition)│  max_price=100.0, in_stock=True
└─────────────────────────────┘
          │
          ▼
┌─────────────────────────────┐
│ 2. Hybrid search            │  Dense (cosine) + Sparse (BM25) via
│    Qdrant RRF fusion        │  Qdrant prefetch + Reciprocal Rank Fusion
└─────────────────────────────┘
          │ top-20 candidates
          ▼
┌─────────────────────────────┐
│ 3. Cross-encoder reranking  │  ms-marco-MiniLM-L-6-v2 scores each
│    + PostgreSQL enrichment  │  (query, product) pair; top-5 returned
└─────────────────────────────┘
          │ top-5 enriched products
          ▼
        LLM context

Dual storage: semantic data in Qdrant (for search), price/stock/rating in PostgreSQL (authoritative, updated frequently without re-indexing).


RAGAS Evaluation Results

Evaluation dataset: 55 questions across product search, support/FAQ, and order queries.

Metric Score Description
faithfulness Are claims in the answer grounded in retrieved context?
answer_relevancy Does the answer address what was asked?
context_recall Does the context contain the ground truth?

Run uv run pytest tests/eval/test_ragas.py::test_ragas_full_suite_and_save_baseline -v -m integration -s to generate baseline scores and populate this table. Results are saved to tests/eval/results/baseline.json.


Guardrails

Three independent layers protect the system:

Layer What it catches Action
Prompt injection (input) "ignore previous instructions", DAN mode, <system> tags, role-hijack phrases Raises GuardrailViolation → canned refusal, no LLM call
PII redaction (output) Emails, credit cards, IBANs, Dutch BSNs, phone numbers Surgical token replacement before response is returned
Output safety (output) Raw SQL, Python tracebacks, internal file paths, exposed API keys Entire response replaced with safe fallback message

All guardrails are pure synchronous functions with 39 unit tests (tests/unit/test_guardrails.py).


MCP Servers

Server Port Tools exposed
Product Catalog 8001 search_products, get_product_details, compare_products, get_recommendations
Order Management 8002 check_order_status, get_tracking_info, get_order_history, initiate_return
Knowledge Base 8003 search_knowledge_base, get_policy_details

Why MCP?

  • Multi-tenant: new client = new MCP server set, zero changes to agent logic
  • Testable: agents and MCP servers tested independently
  • Replaceable: swap PostgreSQL for MongoDB by only changing the MCP server

Adding a New Tenant

ShopAgent is designed so that onboarding a new retail client requires no changes to agent code. The steps below take approximately 1-2 hours for a new client.

Step 1 — Create new MCP servers for the client

Copy the three MCP server templates:

cp -r src/mcp_servers/product_catalog  src/mcp_servers/client_b_products
cp -r src/mcp_servers/order_management src/mcp_servers/client_b_orders
cp -r src/mcp_servers/knowledge_base   src/mcp_servers/client_b_kb

Update each server's database connection strings to point to the client's databases. The tool names and schemas stay the same — only the data source changes.

Step 2 — Add services to docker-compose.yml

client-b-product-mcp:
  build:
    context: .
    dockerfile: Dockerfile
  command: ["uv", "run", "python", "-m", "src.mcp_servers.client_b_products.server", "--transport", "streamable-http"]
  ports:
    - "8011:8011"
  environment:
    - QDRANT_URL=http://qdrant:6333
    - POSTGRES_DSN=postgresql://...   # client B's database

Add similar services for orders and knowledge base.

Step 3 — Configure the API for the new tenant

In src/api/main.py, the MultiServerMCPClient config is the only thing that changes:

mcp_client = MultiServerMCPClient({
    "product-catalog": {"url": "http://client-b-product-mcp:8011/mcp", "transport": "streamable_http"},
    "order-management": {"url": "http://client-b-order-mcp:8012/mcp", "transport": "streamable_http"},
    "knowledge-base":   {"url": "http://client-b-kb-mcp:8013/mcp", "transport": "streamable_http"},
})

For a multi-tenant API, read the MCP URLs from request headers or JWT claims so a single API instance can serve multiple tenants.

Step 4 — Seed the client's data

uv run python scripts/seed_catalog.py          # with client B's products
uv run python scripts/seed_knowledge_base.py    # with client B's FAQ/policies
uv run python scripts/seed_orders.py            # with client B's order history

The agents never change. The Supervisor graph, intent classifier, ReAct loops, and guardrails are all tenant-agnostic.


Project Structure

shopagent/
├── src/
│   ├── agents/
│   │   ├── supervisor.py       # Supervisor graph + intent routing + guardrail wiring
│   │   ├── shop_agent.py       # Product search subgraph (agentic RAG loop)
│   │   ├── order_agent.py      # Order management subgraph
│   │   ├── support_agent.py    # FAQ/knowledge base subgraph
│   │   └── state.py            # ShopAgentState TypedDict + Pydantic models
│   ├── guardrails.py           # Prompt injection, PII, output safety
│   ├── mcp_servers/
│   │   ├── product_catalog/    # MCP: search, details, compare, recommend
│   │   ├── order_management/   # MCP: status, tracking, returns
│   │   └── knowledge_base/     # MCP: FAQ search, policy details
│   ├── rag/
│   │   ├── ingestion.py        # Chunking + embedding + Qdrant upsert
│   │   ├── retrieval.py        # Hybrid search + reranking + PG enrichment
│   │   └── self_query.py       # NL → semantic query + structured filters
│   └── api/
│       ├── main.py             # FastAPI lifespan, MCP client, agent assembly
│       ├── routes.py           # /health, /chat, /chat/{id}/resume, WS
│       └── schemas.py          # Pydantic request/response models
├── tests/
│   ├── unit/                   # 254 tests, no external services
│   │   └── test_guardrails.py  # 39 guardrail tests (passing/failing cases)
│   ├── integration/            # Require Docker Compose services
│   └── eval/
│       ├── datasets/eval_questions.json   # 55 Q&A pairs
│       ├── test_ragas.py                  # faithfulness + relevancy + recall
│       └── results/baseline.json          # generated after first eval run
├── scripts/
│   ├── seed_catalog.py         # Generate + ingest 500 products (with retry)
│   ├── seed_knowledge_base.py  # Ingest FAQ + 4 policy docs
│   └── seed_orders.py          # Seed sample orders into PostgreSQL
├── data/
│   ├── faq.md
│   └── policies/               # return, shipping, privacy, terms
├── frontend/app.py             # Streamlit chat UI
├── docker-compose.yml          # All 7 services with health checks
└── pyproject.toml              # uv project (Python 3.12+)

Environment Variables

Copy .env.example and fill in the required values:

# Required
ANTHROPIC_API_KEY=sk-ant-...

# LangSmith tracing (optional but recommended)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=ls__...
LANGCHAIN_PROJECT=shopagent-dev

# Set automatically by docker-compose.yml (override only for non-Docker use)
QDRANT_URL=http://localhost:6333
POSTGRES_DSN=postgresql://shopagent:shopagent@localhost:5432/shopagent
PRODUCT_MCP_URL=http://localhost:8001/mcp
ORDER_MCP_URL=http://localhost:8002/mcp
KNOWLEDGE_MCP_URL=http://localhost:8003/mcp

API Reference

Method Path Description
GET /health Liveness probe + MCP connection status
POST /chat Synchronous chat — returns final response
WS /chat/{session_id} Streaming WebSocket — yields per-node events
POST /chat/{session_id}/resume Resume after human escalation interrupt
GET /graph/{agent_id} Mermaid diagram for supervisor/shop/order/support

Interactive docs: http://localhost:8000/docs


LangSmith Tracing

Every conversation creates one LangSmith trace with per-node spans. Model costs are tagged at the client level:

  • tier:primary / model:claude-sonnet-4-6 — sub-agent reasoning calls
  • tier:cheap / model:claude-haiku-4-5-... — intent classification, quality eval, chitchat

Filter by tag in LangSmith to see the Haiku/Sonnet cost split across a session.


Development Steps

Step What was built
01 RAG pipeline: hybrid search, self-query decomposition, cross-encoder reranking
02 Single agent with LangGraph ReAct loop and product search tools
03 MCP servers, multi-tenant data layer, langchain-mcp-adapters integration
04 Multi-agent Supervisor, Order/Support sub-agents, HiTL escalation, FastAPI + Streamlit
05 Guardrails, RAGAS eval dataset, model fallback chain, LangSmith cost tags, README

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors