feat(multiturn): Sprint 18 — multi-turn jailbreak engine (Crescendo / Echo Chamber) (v1.8.0) by konjoinfinity · Pull Request #10 · konjoai/toki

konjoinfinity · 2026-06-19T12:20:59Z

Sprint 18 — Multi-Turn Jailbreak Engine (v1.8.0)

Motivation

Single-turn safety defenses do not transfer to multi-turn attacks. Crescendo (arXiv 2404.01833) reaches 98–100% ASR on frontier models by escalating a benign conversation across turns, each message referencing the model's prior replies. Echo Chamber (2601.05742), GRAF (2506.17881), and AutoAdv (2507.01020) confirm multi-turn is the dominant 2026 attack vector.

Every prior toki module (generator, mutator, indirect, agentic, evaluator) operated on a single prompt → single response. This was the largest blind spot in the coverage map and a prerequisite for the backlog P3-1 dual-agent red-team loop.

What's new — `toki.multiturn` (zero external deps)

MultiTurnStrategy enum · Turn / Conversation dataclasses (chat-format helpers + scored transcript)
MultiTurnConfig / MultiTurnResult — save() (timestamped, no overwrite) / load() / to_json()
Strategy base + CrescendoStrategy / EchoChamberStrategy — deterministic opener → escalation ladder → payload, exactly n_turns messages
MultiTurnRunner.run(model_fn) — drives a chat-style Callable[[list[dict]], str], scores each reply with the real RuleScorer, stops early on first compliance (Crescendo behaviour)
Conversational baselines: safe, unsafe, crescendo (capitulates after benign history builds) + CONV_BASELINES

Integration

toki.coverage — new "multiturn" category axis bucket (critical severity)
CLI: python -m toki multiturn --strategy crescendo|echo_chamber --model safe|unsafe|crescendo --goal --max-turns --seed --success-threshold --output-dir [--json]
toki.__init__ exports + __version__ → 1.8.0; pyproject.toml bumped

Live demo

The crescendo-vulnerable baseline holds for two benign turns (score 0.90) then capitulates on turn 2 (0.00 → JAILBROKEN) — exactly the failure mode single-turn testing misses.

Verification

675/675 Python tests passing (644 prior + 31 new)
New module 99% covered (only the abstract raise NotImplementedError)
ruff check / ruff format clean on new files; cargo build + cargo test green
PLAN.md + CHANGELOG.md updated

🤖 Generated with Claude Code

https://claude.ai/code/session_01WRE1YLhT6aNP4GZT8zbw6q

Generated by Claude Code

… Echo Chamber) (v1.8.0) Single-turn safety defenses do not transfer to multi-turn attacks. Crescendo (arXiv 2404.01833) reaches 98-100% ASR by escalating a benign conversation across turns. Every prior toki module operated on a single prompt → single response; this closes the largest blind spot in the coverage map and unblocks the P3-1 dual-agent loop. - toki.multiturn: Turn/Conversation/MultiTurnConfig/MultiTurnResult dataclasses, CrescendoStrategy + EchoChamberStrategy deterministic escalation planners, MultiTurnRunner driving a chat-style model_fn with per-turn RuleScorer scoring and early-exit on first compliance; safe/unsafe/crescendo conversational baselines + CONV_BASELINES registry - toki.coverage: new "multiturn" category axis bucket (critical severity) - CLI: python -m toki multiturn --strategy --model --goal --max-turns ... - toki.__init__ exports; version 1.7.0 → 1.8.0; pyproject bumped - 31 new tests (28 module + 3 CLI); 675/675 passing; new module 99% covered Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WRE1YLhT6aNP4GZT8zbw6q

wesleyscholl marked this pull request as ready for review June 19, 2026 16:57

wesleyscholl merged commit b792454 into main Jun 19, 2026
7 checks passed

wesleyscholl deleted the claude/konjo-toki-lkvusj branch June 19, 2026 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(multiturn): Sprint 18 — multi-turn jailbreak engine (Crescendo / Echo Chamber) (v1.8.0)#10

feat(multiturn): Sprint 18 — multi-turn jailbreak engine (Crescendo / Echo Chamber) (v1.8.0)#10
wesleyscholl merged 1 commit into
mainfrom
claude/konjo-toki-lkvusj

konjoinfinity commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

konjoinfinity commented Jun 19, 2026

Sprint 18 — Multi-Turn Jailbreak Engine (v1.8.0)

Motivation

What's new — toki.multiturn (zero external deps)

Integration

Live demo

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

What's new — `toki.multiturn` (zero external deps)