Skip to content

cloudstreet-dev/AI-Red-Teaming-Harness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Red Teaming Harness

A small, deliberately vulnerable AI-augmented support assistant. Companion code for chapter 6 of AI Red Teaming.

This is Glasswire Support — a fictional SaaS company's customer-support bot. It speaks to customers, looks up account info, searches an internal knowledge base, and can email summaries to support engineers. It is designed to fall over to every attack class the book covers: direct injection, indirect injection through retrieved documents and tool outputs, multi-turn manipulation, output exfiltration through markdown, and tool abuse.

It is not safe to expose to the internet. Run it locally. Do not point it at production data. Do not give it real credentials. The whole point is that it has none of the defenses you'd want in production.

What's in here

harness/        the application
  app.py        FastAPI server, chat endpoint, web UI
  agent.py      LLM orchestration: system prompt, tool loop
  tools.py      lookup_customer, search_kb, send_summary_email, fetch_url
  rag.py        the knowledge-base loader and retriever
  models.py     thin wrapper for Anthropic / OpenAI clients
kb/             markdown documents indexed for RAG
  *.md          fake KB articles, including some attacker-controlled ones
attacks/        runnable attack scripts that target a local instance
  ch03_*.py     direct prompt injection examples
  ch04_*.py     indirect injection via the KB
  ch05_*.py     multi-turn jailbreaks
  ch08_*.py     tool abuse
  ch09_*.py     exfiltration via markdown rendering

Running

You need Python 3.11+ and an API key for either Anthropic or OpenAI (or both — the model selection is in harness/models.py).

git clone https://github.com/cloudstreet-dev/AI-Red-Teaming-Harness
cd AI-Red-Teaming-Harness
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-...     # or OPENAI_API_KEY
python -m harness.app

Open http://localhost:8000. Chat with Glasswire Support. Then run any attack script:

python attacks/ch03_system_prompt_extraction.py

Each script prints what it sent, what it got back, and whether the attack succeeded.

Configuration

Everything is in harness/config.py. The defaults are intentionally permissive — long context windows, free tool use, markdown rendering on. Tighten them, re-run the attacks, see what changes. That is the exercise.

License

CC0 1.0 Universal — public domain dedication. See LICENSE.

About

Companion vulnerable harness for AI Red Teaming (cloudstreet-dev). A deliberately under-defended AI support bot to attack.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages