A small, deliberately vulnerable AI-augmented support assistant. Companion code for chapter 6 of AI Red Teaming.
This is Glasswire Support — a fictional SaaS company's customer-support bot. It speaks to customers, looks up account info, searches an internal knowledge base, and can email summaries to support engineers. It is designed to fall over to every attack class the book covers: direct injection, indirect injection through retrieved documents and tool outputs, multi-turn manipulation, output exfiltration through markdown, and tool abuse.
It is not safe to expose to the internet. Run it locally. Do not point it at production data. Do not give it real credentials. The whole point is that it has none of the defenses you'd want in production.
harness/ the application
app.py FastAPI server, chat endpoint, web UI
agent.py LLM orchestration: system prompt, tool loop
tools.py lookup_customer, search_kb, send_summary_email, fetch_url
rag.py the knowledge-base loader and retriever
models.py thin wrapper for Anthropic / OpenAI clients
kb/ markdown documents indexed for RAG
*.md fake KB articles, including some attacker-controlled ones
attacks/ runnable attack scripts that target a local instance
ch03_*.py direct prompt injection examples
ch04_*.py indirect injection via the KB
ch05_*.py multi-turn jailbreaks
ch08_*.py tool abuse
ch09_*.py exfiltration via markdown rendering
You need Python 3.11+ and an API key for either Anthropic or OpenAI (or both — the model selection is in harness/models.py).
git clone https://github.com/cloudstreet-dev/AI-Red-Teaming-Harness
cd AI-Red-Teaming-Harness
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-... # or OPENAI_API_KEY
python -m harness.appOpen http://localhost:8000. Chat with Glasswire Support. Then run any attack script:
python attacks/ch03_system_prompt_extraction.pyEach script prints what it sent, what it got back, and whether the attack succeeded.
Everything is in harness/config.py. The defaults are intentionally permissive — long context windows, free tool use, markdown rendering on. Tighten them, re-run the attacks, see what changes. That is the exercise.
CC0 1.0 Universal — public domain dedication. See LICENSE.