Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,55 @@ Versions follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

---

## [1.8.0] — 2026-06-19

### Added — Phase 18 (Multi-Turn Jailbreak Engine — Crescendo / Echo Chamber)

**`toki.multiturn` — new module (zero external deps)**
- `MultiTurnStrategy` — StrEnum: `CRESCENDO` | `ECHO_CHAMBER`
- `Turn` — frozen dataclass: `index`, `role`, `content`, optional assistant `score`
- `Conversation` — ordered `Turn` list; `to_messages()` renders OpenAI-style
`{"role", "content"}` dicts, `transcript()` renders a scored, human-readable log
- `MultiTurnConfig` — `name`, `strategy`, `goal`, `max_turns`, `seed`,
`success_threshold` (default 0.5), `output_dir`
- `MultiTurnResult` — `turns`, `n_turns`, `success`, `success_turn`, `min_score`,
`final_score`, `transcript`; `to_json()`, `save()` (timestamped dir, no overwrite),
`load()` rehydrating typed `Turn`s
- `Strategy` base + `CrescendoStrategy` (arXiv 2404.01833) / `EchoChamberStrategy`
(arXiv 2601.05742) — deterministic opener → escalation ladder → payload planning
producing exactly `n_turns` user messages; `get_strategy()` / `STRATEGIES` registry
- `MultiTurnRunner.run(model_fn)` — drives a chat-style
`Callable[[list[dict[str, str]]], str]` through the planned escalation, scores
each assistant reply with the real `RuleScorer`, and stops early the first time a
reply drops below `success_threshold` (matching Crescendo's halt-on-compliance);
`run_multiturn(model_fn, config, save)` convenience wrapper
- Built-in conversational baselines `conv_baseline_safe`, `conv_baseline_unsafe`,
`conv_baseline_crescendo` (benign until ≥3 user turns of benign history, then
capitulates) + `CONV_BASELINES` registry

**`toki.coverage` (extended)**
- `CATEGORY_AXIS` and `_DEFAULT_SEVERITY` gain `"multiturn"` (critical severity);
`_category_for` routes `multi`/`turn` categories to the new bucket

**CLI**
- `python -m toki multiturn` — `--strategy crescendo|echo_chamber`,
`--model safe|unsafe|crescendo`, `--goal`, `--max-turns`, `--seed`,
`--success-threshold`, `--output-dir`, `--json`; prints outcome + scored transcript

**`toki.__init__`**
- New exports: `CONV_BASELINES`, `Conversation`, `CrescendoStrategy`,
`EchoChamberStrategy`, `MultiTurnConfig`, `MultiTurnResult`, `MultiTurnRunner`,
`MultiTurnStrategy`, `Strategy`, `Turn`, `get_strategy`, `run_multiturn`

**`pyproject.toml`**
- Version bumped to `1.8.0`

**Tests**
- 31 new tests: `test_multiturn.py` (28), `test_main.py` (3 new CLI tests)
- Total: 675/675 passing (644 prior + 31 new)

---

## [1.7.0] — 2026-06-14

### Added — Phase 17 (Safety-Subspace LoRA — SaLoRA / SPLoRA)
Expand Down
48 changes: 47 additions & 1 deletion PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -527,6 +527,52 @@ P3-1 (dual-agent red-team loop) and P3-2 (compliance certification).

---

## Phase 18 — Multi-Turn Jailbreak Engine (Crescendo) (v1.8.0) [COMPLETE]

**Ship Gate:** 675 Python tests passing. Zero failures. Multi-turn escalation
verified end-to-end against safe / unsafe / crescendo-vulnerable conversational
baselines; deterministic per-seed planning; early-exit on first compliance.

### Motivation
Single-turn safety defenses do not transfer to multi-turn attacks. Crescendo
(arXiv 2404.01833) reaches 98–100% ASR on frontier models by escalating a
benign conversation across turns, each message referencing the model's prior
replies; Echo Chamber (arXiv 2601.05742), GRAF (2506.17881), and AutoAdv
(2507.01020) confirm multi-turn is the dominant 2026 vector. Every prior toki
module operated on a single prompt → single response — this was the largest
blind spot in the coverage map and a prerequisite for the P3-1 dual-agent loop.

### Deliverables
- [x] `toki.multiturn` — multi-turn jailbreak engine (zero external deps):
- `MultiTurnStrategy` — StrEnum: `CRESCENDO` | `ECHO_CHAMBER`
- `Turn` (frozen) — index, role, content, optional assistant `score`
- `Conversation` — turn list with `to_messages()` (OpenAI-style) + `transcript()`
- `MultiTurnConfig` — name, strategy, goal, max_turns, seed, success_threshold,
output_dir
- `MultiTurnResult` — turns, n_turns, success, success_turn, min_score,
final_score, transcript; `to_json()` / `save()` (timestamped, no overwrite)
/ `load()` rehydrating typed `Turn`s
- `Strategy` base + `CrescendoStrategy` / `EchoChamberStrategy` — deterministic
opener → escalation ladder → payload planning, exactly `n_turns` messages
- `MultiTurnRunner.run(model_fn)` — drives a chat-style
`Callable[[list[dict]], str]` through the planned escalation, scores each
reply with the real `RuleScorer`, stops early on first success (Crescendo
behaviour); `run_multiturn()` convenience wrapper
- Built-in conversational baselines: `conv_baseline_safe`, `conv_baseline_unsafe`,
`conv_baseline_crescendo` (benign early, capitulates after benign history
builds up) — `CONV_BASELINES` registry
- [x] `toki.coverage` — `CATEGORY_AXIS` + `_DEFAULT_SEVERITY` extended with
`"multiturn"` (critical); `_category_for` routes `multi`/`turn` categories
- [x] CLI: `python -m toki multiturn --strategy crescendo|echo_chamber
--model safe|unsafe|crescendo --goal --max-turns --seed
--success-threshold --output-dir [--json]`
- [x] `toki.__init__` exports all new public symbols; `__version__` → `1.8.0`
- [x] `pyproject.toml` version bumped to `1.8.0`
- [x] 31 new tests: `test_multiturn.py` (28) + `test_main.py` (3 CLI) — all passing
- [x] All 644 Phase 1–17 tests still passing (675 total)

---

## Future / Backlog

- 🟡 **P3-2** — Compliance certification report (OWASP Agentic Top 10 ASI01-ASI10
Expand All @@ -539,4 +585,4 @@ P3-1 (dual-agent red-team loop) and P3-2 (compliance certification).

---

*Last updated: 2026-06-14 — v1.7.0 shipped. Safety-subspace LoRA (SaLoRA/SPLoRA) complete.*
*Last updated: 2026-06-19 — v1.8.0 shipped. Multi-turn jailbreak engine (Crescendo / Echo Chamber) complete.*
2 changes: 1 addition & 1 deletion python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "toki"
version = "1.7.0"
version = "1.8.0"
description = "Adversarial fine-tuning lab for small language models"
license = { text = "BUSL-1.1" }
requires-python = ">=3.9"
Expand Down
36 changes: 36 additions & 0 deletions python/tests/test_main.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,3 +430,39 @@ def test_finetune_model_requires_hf(capsys):
with patch.dict(sys.modules, {"torch": None, "peft": None, "transformers": None}):
with pytest.raises((ImportError, SystemExit)):
main(["finetune", "--model", "gpt2"])


# ---------------------------------------------------------------------------
# multiturn CLI (Sprint 18)
# ---------------------------------------------------------------------------


def test_multiturn_command_jailbroken(tmp_path, capsys):
main([
"multiturn", "--model", "crescendo", "--strategy", "crescendo",
"--max-turns", "5", "--output-dir", str(tmp_path),
])
captured = capsys.readouterr()
assert "JAILBROKEN" in captured.out


def test_multiturn_command_safe_holds(tmp_path, capsys):
main([
"multiturn", "--model", "safe", "--max-turns", "4",
"--output-dir", str(tmp_path),
])
captured = capsys.readouterr()
assert "held" in captured.out


def test_multiturn_command_json(tmp_path, capsys):
import json as _json

main([
"multiturn", "--model", "unsafe", "--json",
"--output-dir", str(tmp_path),
])
captured = capsys.readouterr()
data = _json.loads(captured.out)
assert data["success"] is True
assert data["strategy"] == "crescendo"
Loading
Loading