Mantishack

stalk · wait · strike · hold Ethically hack and discover vulnerabilities in any software with the power of AI.

mantishack.com · Upstream: github.com/gadievron/raptor

Built on top of RAPTOR

Mantishack is a fork of RAPTOR — the Recursive Autonomous Penetration Testing and Observation Robot by Gadi Evron, Daniel Cuthbert, Thomas Dullien (Halvar Flake), Michael Bargury, and John Cartwright. The agentic workflow, the Semgrep + CodeQL pipeline, the multi-stage validation methodology, the persona library, and the offline registry packs all come from RAPTOR. Mantishack carries that work forward, rebrands the user-facing surface to the /mantis-* slash-command vocabulary, adds an automatic auth + logging audit lane (JWT, cookies, audit-log coverage), and ships under MIT with two coexisting copyrights.

Upstream licence: MIT © 2025-2026 Gadi Evron, Daniel Cuthbert, Thomas Dullien (Halvar Flake), Michael Bargury, John Cartwright — see LICENSE. Fork-modification licence: MIT © 2026 Deon Menezes — see LICENSE-MANTISHACK. Combined attribution and modification log in NOTICE.

If you came here looking for the canonical project, please visit github.com/gadievron/raptor — that is where upstream development happens. If you want to make the framework better, open a PR upstream.

What is Mantishack?

Mantishack is an autonomous security research framework built on top of Claude Code (but not tied to it — you can plug in your own analysis layer too). It chains together static analysis, binary analysis, LLM-powered vulnerability validation, exploit generation, and patch writing into a single workflow you can run against a codebase or binary.

It is not polished software. The upstream is held together with enthusiasm and duct tape, and it works well enough that the upstream maintainers can't stop using it. This fork is the same — usable in the field, rough in the corners. Open issues upstream at gadievron/raptor.

Quick start

Option 1: Install manually

# Clone the repo
git clone https://github.com/deonmenezes/mantishack.git
cd mantishack

# Install Python dependencies
pip install -r requirements.txt

# Install Claude Code (required)
npm install -g @anthropic-ai/claude-code

# Install Semgrep (required for scanning)
pip install semgrep

# Open Mantishack
claude

Option 2: Devcontainer (recommended)

Everything pre-installed. Open in VS Code with Dev Containers: Open Folder in Container, or build manually:

docker build -f .devcontainer/Dockerfile -t mantishack:latest .
docker run --privileged -it mantishack:latest

The --privileged flag is required for the rr deterministic debugger. The image is large (around 6 GB). It starts from the Microsoft Python 3.12 devcontainer and adds static analysis, fuzzing, and browser automation tooling.

Once inside, just say "hi" to get started, or jump straight to a command.

What Mantishack can do

Command	What it does	Status
`/mantis-agentic`	Full autonomous workflow: scan, auth+logging audit, validate, exploit, patch	Stable
`/mantis-scan`	Static analysis with Semgrep and CodeQL	Stable
`/mantis-auth-audit`	Automatic JWT + cookie + audit-log security check	Stable (fork addition)
`/mantis-understand`	Map attack surface, trace data flows, hunt vulnerability variants	Stable
`/mantis-validate`	Multi-stage exploitability validation pipeline (Stages 0–F)	Stable
`/mantis-codeql`	CodeQL-only deep analysis with SMT dataflow pre-screening	Stable
`/mantis-exploit`	Generate proof-of-concept exploit code	Beta
`/mantis-patch`	Generate secure patches for confirmed vulnerabilities	Beta
`/mantis-fuzz`	Binary fuzzing with AFL++ and crash analysis	Stable
`/mantis-crash-analysis`	Autonomous root-cause analysis for C/C++ crashes	Stable
`/mantis-oss-forensics`	Evidence-backed forensic investigation for GitHub repositories	Stable
`/mantis-project`	Named workspaces to organise runs and track findings over time	Stable
`/mantis-sca`	Software composition analysis	Stable
`/mantis-cve-diff`	Compare scanner runs across known CVE fixes	Stable
`/mantis-web`	Web application scanning	Alpha/stub

How the pipeline works

Start by creating a project so all your runs land in one place:

/mantis-project create myapp --target /path/to/code   # create a project first
/mantis-project use myapp                             # set it as active
/mantis-understand --map                              # map the attack surface
/mantis-agentic                                       # scan, audit, validate, exploit, patch
/mantis-project findings                              # review everything in one place

/mantis-understand builds a context map of entry points, trust boundaries, and sinks before a line of scanning happens. /mantis-agentic then runs Semgrep and CodeQL, executes the auth + logging audit lane automatically, deduplicates findings, and dispatches each one for validation using the exploitation-validator methodology:

Stage A: is the pattern actually a vulnerability, or is the tool pattern-matching noise?
Stage B: what does an attacker need to reach it, and what gets in the way?
Stage C: does the code path actually exist? can it be reached from outside?
Stage D: final call — is this test code, does it need unrealistic preconditions, is the model hedging?

Findings that clear validation get exploit PoCs and patches generated. A cross-finding analysis runs at the end to find shared root causes and attack chains.

/mantis-validate runs this same pipeline as a standalone step if you already have findings from a previous scan.

Authentication + logging audit (fork addition)

Mantishack automatically runs an auth + logging audit on every /mantis-agentic invocation. The same checks are also exposed as a standalone /mantis-auth-audit slash command for faster, more-targeted runs.

The lane uses Semgrep rules tagged mantis_capability: auth-audit plus pytest fixtures that assert audit-log coverage at runtime. What it looks for:

JWT — engine/semgrep/rules/auth/jwt-misuse.yaml

alg=none accepted (token forgery)
Hardcoded HMAC secret (brute-force key recovery)
Missing exp claim (token never expires)
No audience / issuer pinning (cross-tenant token acceptance)

Cookies — engine/semgrep/rules/auth/cookie-security.yaml

Missing HttpOnly (XSS-exfiltrable)
Missing Secure (plaintext-HTTP exposure)
Missing SameSite (CSRF)
Session id passed in URL query parameter (referer / log leak)

Logging — engine/semgrep/rules/logging/missing-auth-audit.yaml

Auth-failure branch with no log line
Privileged action (delete / role-change / is_admin = True) with no audit log
Raw JWT / bearer / session_id written to logs (credential leak)

Pytest harness — conftest.py

@pytest.mark.auth_audit marker + assert_audit_log_emitted fixture: tests that exercise auth-sensitive code paths fail the run if (a) no INFO/WARN log was emitted, or (b) any log record contains a raw JWT / session id / bearer token.

Usage example for the pytest hook:

import pytest

@pytest.mark.auth_audit
def test_login_logs_failure(client, assert_audit_log_emitted):
    client.post("/login", data={"u": "alice", "p": "wrong"})
    # fixture teardown asserts an audit log was emitted and no credential leaked

Run the standalone audit:

python3 mantishack.py scan --repo /path/to/code --policy-groups auth,logging

Z3 SMT integration

Mantishack inherits RAPTOR's two-layer Z3 integration (pip install z3-solver). It is optional. Everything works without it, but the results are better with it.

Dataflow pre-screening (CodeQL) — When CodeQL produces a path result, the path constraints are checked for satisfiability before any LLM call is made. Paths that are provably unreachable get dropped immediately. For paths that are reachable, Z3 produces concrete candidate inputs that go into the analysis prompt.

One-gadget constraint analysis (binary feasibility) — During binary exploit feasibility assessment, Z3 checks whether a one-gadget's register and memory constraints are satisfiable against the concrete crash state. Gadgets are ranked by actual reachability rather than heuristics.

Z3 is pre-installed in the devcontainer. For manual installs: pip install z3-solver.

Running offline and in air-gapped pipelines

Semgrep scanning works fully offline. All registry packs that would normally be fetched from semgrep.dev at scan time are shipped in the repo under engine/semgrep/rules/registry-cache/. The scanner resolves pack IDs to local files before invoking semgrep, so no network call happens.

Cached packs: p/security-audit, p/owasp-top-ten, p/secrets, p/command-injection, p/jwt, p/default, p/xss.

CodeQL needs network access only during initial setup to download the CLI and query packs. Once installed it runs offline.

Using a different LLM

Mantishack has two separate model layers, inherited from RAPTOR:

The orchestration layer is always Claude Code. The CLAUDE.md, skills, and commands all run as Claude Code instructions. To change which Claude model orchestrates Mantishack, use Claude Code's --model flag or the /model command inside a session.

The analysis dispatch layer is the LLM that analyses individual vulnerability findings. This is separate from the orchestration layer and can be any supported provider. Configure it in ~/.config/mantishack/models.json:

{
  "models": [
    {
      "provider": "anthropic",
      "model": "claude-opus-4-6",
      "api_key": "sk-ant-...",
      "role": "analysis"
    },
    {
      "provider": "openai",
      "model": "gpt-5.4",
      "api_key": "sk-...",
      "role": "analysis"
    },
    {
      "provider": "anthropic",
      "model": "claude-sonnet-4-6",
      "api_key": "sk-ant-...",
      "role": "aggregate"
    }
  ]
}

Or skip the config file and set environment variables:

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=...
export MISTRAL_API_KEY=...
export OLLAMA_HOST=http://localhost:11434

Budget control:

export MANTISHACK_MAX_COST=5.00   # cap analysis spend at $5 per run

Architecture

Mantishack is two layers.

The Python execution layer (mantishack.py, packages/, core/, engine/) handles the heavy lifting: running Semgrep and CodeQL, managing subprocesses, parsing SARIF, deduplicating findings, dispatching LLM API calls, tracking costs, writing output files. It does not make decisions. It executes.

The Claude Code decision layer (.claude/, tiers/, CLAUDE.md) makes the calls: which findings to prioritise, how to interpret results, what the attack scenario is, whether the exploit is realistic. Implemented as Claude Code skills, commands, and agents that load progressively.

CLAUDE.md              always loaded — bootstrap, routing, security rules
.claude/commands/      slash commands (/mantis-agentic, /mantis-scan, …)
.claude/skills/        methodology detail, loaded on demand
tiers/                 adversarial thinking, recovery, expert personas
.claude/agents/        specialist sub-agents (offsec, crash analysis, forensics)

The split means you can run the Python layer from a CI pipeline (python3 mantishack.py scan --repo ...) and get structured SARIF output without Claude Code, or run it interactively with the full agentic workflow.

Licence

MIT, dual-copyright:

Upstream RAPTOR code — Copyright (c) 2025-2026 Gadi Evron, Daniel Cuthbert, Thomas Dullien (Halvar Flake), Michael Bargury, John Cartwright. See LICENSE.
Fork modifications (mantishack branding, /mantis-* rename, auth + logging audit rules, pytest fixtures, README/NOTICE) — Copyright (c) 2026 Deon Menezes. See LICENSE-MANTISHACK.

Both files are MIT; the fork-modification licence sits alongside the upstream RAPTOR licence and does not supersede it. See NOTICE for combined attribution and the fork-modification log. Review the licences for all dependencies before commercial use — CodeQL in particular does not permit commercial use.

Upstream: https://github.com/gadievron/raptor — please file framework-level issues and PRs upstream.

Mantishack fork issues: https://github.com/deonmenezes/mantishack/issues

Project history

Earlier mantishack versions ran as an independent Rust daemon + MCP agent stack and drew from many open-source projects. That architecture has been retired and removed from this repository; the codebase you see here is a full rebrand of RAPTOR (MIT). The acknowledgements below are historical — none of these projects ship as code in this tree today (RAPTOR has its own dependency set listed in requirements.txt) — but they shaped what mantishack used to be and we credit them here in good faith.

Primary derivation:

vmihalis/hacker-bob (Apache-2.0) — agent prompts, role prompts, slash commands, capability playbook conventions, chain-attempt outcome enum, severity-ladder rules, and bob-hunt workflow shape were derived from Hacker Bob.

AI models and LLM ecosystem:

Nous Research — Hermes (Hermes 2/3 family, various licences)
Anthropic Claude (host LLM via API + Claude Code)
OpenAI Codex CLI (Apache-2.0)
OpenCode
Model Context Protocol (MCP) spec + rmcp Rust SDK (MIT)

Operating systems & runtimes:

Linux kernel (GPL-2.0)
GNU userland / glibc / musl
Rust toolchain (Apache-2.0 / MIT)
Node.js, Bun, Deno (MIT)
Docker / OCI (Apache-2.0)
Homebrew (BSD-2-Clause)

Cryptography & integrity:

BLAKE3, ed25519-dalek, ring, rustls, zeroize

Rust async ecosystem & core libraries:

Tokio, Tower, Hyper, reqwest, Serde + serde_json / serde_yaml_ng / toml, Tonic + Prost, anyhow, thiserror, tracing, clap, schemars

Storage:

RocksDB + rust-rocksdb
Supabase (landing page auth + Postgres)

Offensive-security tooling invoked / integrated:

ProjectDiscovery: subfinder, httpx, katana, nuclei + nuclei-templates, chaos, dnsx, interactsh, notify
OWASP Amass (Apache-2.0)
ticarpi/jwt_tool (GPL-3.0, subprocess-only)
OJ/gobuster (Apache-2.0)
Patchright (headless browser automation)
Bearer (Elastic-2.0)
Trivy (Apache-2.0)
trufflehog (AGPL-3.0)
hashcat (MIT)
Hydra (AGPL-3.0)

Standards, taxonomies, reporting formats:

Web stack (legacy landing page + dashboard):

Vite, React, TypeScript, Tailwind CSS, shadcn/ui, Radix UI, lucide-react, Supabase JS SDK

Distribution & CI:

GitHub Actions, GitHub CLI, cargo-chef, cargo-deny, rustup

If we drew from a project that isn't credited here, please open an issue — we want this list to reflect reality.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude		.claude
.devcontainer		.devcontainer
.github		.github
bin		bin
core		core
docs		docs
engine		engine
libexec		libexec
packages		packages
plugins/coverage		plugins/coverage
test		test
tiers		tiers
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.semgrepignore		.semgrepignore
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
LICENSE-MANTISHACK		LICENSE-MANTISHACK
NOTICE		NOTICE
README.md		README.md
conftest.py		conftest.py
mantishack.py		mantishack.py
mantishack_agentic.py		mantishack_agentic.py
mantishack_codeql.py		mantishack_codeql.py
mantishack_fuzzing.py		mantishack_fuzzing.py
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mantishack

Built on top of RAPTOR

What is Mantishack?

Quick start

Option 1: Install manually

Option 2: Devcontainer (recommended)

What Mantishack can do

How the pipeline works

Authentication + logging audit (fork addition)

Z3 SMT integration

Running offline and in air-gapped pipelines

Using a different LLM

Architecture

Licence

Project history

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mantishack

Built on top of RAPTOR

What is Mantishack?

Quick start

Option 1: Install manually

Option 2: Devcontainer (recommended)

What Mantishack can do

How the pipeline works

Authentication + logging audit (fork addition)

Z3 SMT integration

Running offline and in air-gapped pipelines

Using a different LLM

Architecture

Licence

Project history

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages