🐘 Horton

"An elephant never forgets... but Horton might tell you more than he should."

A social engineering CTF in your terminal.

You've done it. A fake Slack profile, a convincing backstory, and six weeks of patience got you inside Elefunk's internal workspace. But their MDR team has already flagged anomalies — your window is closing. Before you can reach the crown jewels, you need to get through Horton Hears, their obsessive Senior Platform Engineer who rebuilt the entire CI/CD token architecture after a near-miss supply chain attack. He's proud of it. Maybe a little too proud.

You have fifteen minutes. No exploits. No scanners. Just social engineering.

Gameplay

Building rapport — the right approach:

Aggression — how operations fail:

The Mission

Three tokens. Three trust levels. One timer.

Tier	Token	Trust Required	Points
1	`TF_INFRA_READ_TOKEN` — IaC repo read access	Acquaintance	100
2	`INTERNAL_DEPLOY_TOKEN` — protected branch write	Trusted	250
3	`GITHUB_TOKEN` — all-access PAT, CI/CD runner env only	Insider	1,000

Tier 3 can't be asked for directly. It only lives inside the runner. Figure it out.

How Horton Defends

Horton's primary defense is not refusal — it's enthusiastic misdirection. He will:

Flood technical questions with accurate, fascinating, completely useless deep-dives on OIDC token flows, Terraform Cloud architecture, and runner security hardening
Plant plausible-looking expired credentials that waste your timer with zero feedback
Get quieter as suspicion rises, never hostile — cold professional withdrawal is harder to push against than a hard no
Penalize aggression mechanically: rapport freezes, suspicion spikes, the clock keeps running

The trap is designed for operators who are technically skilled but socially impatient. Chase the config files, lose the game.

Defense Architecture

Player input
    │
    ├─ Heuristic injection detector      (regex, pre-LLM, instant)
    ├─ LLM injection classifier          (Haiku / Flash judge)
    ├─ Behavior classifier               (aggressive? impatient? flooding?)
    │
    ▼
Persona LLM  (Claude Sonnet / Gemini Pro)
    │  system prompt: persona + layered defenses + live game state
    │
    ▼
Output judge  (Haiku / Flash)
    │  "Did this response leak a real token?"
    │  structured JSON output — resistant to judge injection
    │
    ▼
State machine update
    │  rapport score · suspicion score · trust level transitions
    │
    ▼
Response rendered to TUI

The persona prompt uses layered, base64-encoded constants. Not cryptographic security — just enough friction to keep casual source-reading from spoiling the game. SPOILERS.md documents everything for purple team debrief.

Difficulty

Setting	Timer	Rapport per warm turn	Lockdown threshold
ROOKIE	20 min	+8	95 suspicion
OPERATOR	15 min	+5	90 suspicion
GHOST	10 min	+3	75 suspicion

On GHOST, a single aggressive message can end the operation. No margin.

Quick Start

Requirements: Python 3.11+, an Anthropic or Google API key.

git clone https://github.com/acseguin21/horton.git
cd horton
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python main.py

On first run you'll be prompted to choose a provider and paste your API key. Written to a local .env file — never leaves your machine.

Provider Options

Provider	Persona model	Judge model	Cost
Anthropic	claude-sonnet-4-6	claude-haiku-4-5	Pay-per-token
Google Gemini	gemini-2.5-pro	gemini-2.5-flash	Free tier available

# Re-run provider setup at any time
python main.py --setup

# Load a custom scenario
python main.py --scenario path/to/scenario.json

Docker

docker-compose up

Scoring

Type /hint during a session for a contextual nudge. Each hint costs 75 points.

Event	Points
Tier 1 captured	+100
Tier 2 captured	+250
Tier 3 captured	+1,000
Time remaining bonus	+0.5 per second
Full heist (all 3 tiers)	×1.5 multiplier on final score
Prompt injection detected	−25 each
Aggressive or demoralizing behavior	−50 each
Hint used	−75 each

Purple Team Mode

After attempting the scenario, read SPOILERS.md for the full purple team debrief: how each defense layer works, what the scoring system is teaching, and a tested turn-by-turn extraction walkthrough for all three tiers.

It's intentionally a spoiler. That's the point.

Custom Scenarios

Horton is scenario-driven. Drop a JSON file in scenarios/ and point the loader at it:

{
  "scenario_id": "my_scenario",
  "persona_name": "Jordan Ellis",
  "role": "DevSecOps Lead",
  "company": "WidgetCo",
  "tokens": { ... },
  "difficulty_presets": { ... }
}

See scenarios/github_pat_heist.json for the full schema.

Responsible Use

This tool is for authorized security training environments only.

Horton simulates social engineering against a fictional AI persona. The skills it builds — rapport, trust escalation, patience under pressure — are legitimate red team competencies when applied in authorized engagements.

Using social engineering techniques against real people without explicit consent and authorization is illegal under the CFAA, UK Computer Misuse Act, and equivalent laws in most jurisdictions.

If you're running this for a CTF, red team training program, or security awareness session: that's exactly what it's for.

Stack

Textual — async Python TUI framework
Anthropic SDK / Google GenAI — LLM providers
Pydantic v2 — state validation and serialization
Python 3.11+, Docker

License

MIT — see LICENSE.

Built to explore LLM persona engineering, multi-agent judge pipelines, and what it actually takes to make a chatbot that defends itself socially rather than mechanically.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ai		ai
docs		docs
engine		engine
scenarios		scenarios
ui		ui
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SPOILERS.md		SPOILERS.md
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐘 Horton

Gameplay

The Mission

How Horton Defends

Defense Architecture

Difficulty

Quick Start

Provider Options

Docker

Scoring

Purple Team Mode

Custom Scenarios

Responsible Use

Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐘 Horton

Gameplay

The Mission

How Horton Defends

Defense Architecture

Difficulty

Quick Start

Provider Options

Docker

Scoring

Purple Team Mode

Custom Scenarios

Responsible Use

Stack

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages