JTuffy · JTuffy · Mar 19, 2026 · Mar 15, 2026 · Mar 16, 2026 · Mar 17, 2026
diff --git a/.agents/rules/amendments.md b/.agents/rules/amendments.md
@@ -35,3 +35,37 @@ Sessionfriction identified during prune-repo + cleanup work:
   - **Logging config** (`vis/logging_config.py` is canonical; never configure in `page.py`)
   - **Schema field removal checklist** (grep callers, confirm never populated, no `list[dict]` placeholders)
 - `python.md` sync-guard test bullet was vague. Expanded with the concrete pattern: compare `model_fields` against the reactive registry at test time.
+
+## 2026-03-12 — Always use vis/plotting.py for paper figures
+
+`vis/plotting.py` is the single source of truth for all chart functions. When generating
+figures for papers, scripts, or exports, **always call functions from there** — never write
+custom matplotlib from scratch in agent_workspace scripts.
+
+Available functions to reach for first:
+- `plot_sweep_curve(SweepResult)` — 1D sweep line chart with tipping point annotation
+- `plot_mc_trajectory(MonteCarloResult)` — compliance over steps, mean ± SD
+- `plot_mc_violator_trajectory`, `plot_mc_audit_trajectory`, `plot_mc_payoff_comparison`
+
+If a needed figure type does not exist in `vis/plotting.py` (e.g. a 2D heatmap), **add it
+there** following the `create_figure()` style, then use it from both the UI and scripts.
+Do not create ad-hoc matplotlib code in agent_workspace when an equivalent function
+already exists or could be added once and shared.
+
+## 2026-03-14 — Reflect: plotting discipline and scripting infrastructure
+
+Three friction sources identified, all patched this session:
+
+1. **`project.md` Plots section was too sparse** — 2 lines with no function inventory.
+   Replaced with the full table of all 12 public functions and a mandatory "check before
+   writing any matplotlib" gate. The agent cannot now claim ignorance of what exists.
+
+2. **`researcher.md` step 4 had zero mention of `vis/plotting.py`** — meaning every
+   research visualisation session was allowed to invent ad-hoc matplotlib. Added an
+   `[!IMPORTANT]` callout before the visualise step enforcing the same gate.
+
+3. **No `/gen-figures` workflow existed** — figure generation for paper sections was
+   improvised each time. Created `.agents/workflows/gen-figures.md` with a step-by-step
+   thin-caller checklist, a copy-paste script template, and a `// turbo` run step.
+   Also created `scripts/README.md` as the cross-session script index so existing
+   scripts are discoverable rather than silently re-invented.
diff --git a/.agents/rules/project.md b/.agents/rules/project.md
@@ -100,7 +100,28 @@ All export functions return `bytes` for Solara's `FileDownload`. Key functions:
 
 ## Plots (`vis/plotting.py`)
 
-Accept typed result objects, return `matplotlib.Figure`, never import Solara. Use `fig_to_png(fig)` from `results.py` to convert to bytes for downloads. Standard figsize `(7, 4)`.
+**Before writing any matplotlib code**, check this inventory. If the function you need exists here, call it. If it doesn't exist, add it here following the `create_figure()` style — then use it from both scripts and the UI.
+
+All functions accept typed result objects, return `matplotlib.Figure`, never import Solara. Use `fig_to_png(fig)` from `results.py` to convert to bytes for downloads.
+
+| Function | Input | Use for |
+|---|---|---|
+| `plot_sweep_curve(result, metric, reference_lines)` | `SweepResult` | 1D sweep line chart with tipping point + optional scenario markers |
+| `plot_sweep_heatmap(grid, x_values, y_values, ...)` | 2D `list[list[float]]` | 2D compliance heatmap (joint sensitivity) |
+| `plot_mc_trajectory(result)` | `MonteCarloResult` | Compliance mean ± SD over steps |
+| `plot_mc_violator_trajectory(result)` | `MonteCarloResult` | Violator count mean ± SD over steps |
+| `plot_mc_audit_trajectory(result)` | `MonteCarloResult` | Audit rate band over steps |
+| `plot_mc_payoff_comparison(result)` | `MonteCarloResult` | Compliant vs. violating lab payoff bar chart |
+| `plot_compliance_distribution(df)` | agents DataFrame | Bar chart: Compliant / Uncaught / Caught-by-source |
+| `plot_audit_source_distribution(df)` | agents DataFrame | Bar chart: labs caught per AuditSource channel |
+| `plot_audit_targeting(rates, counts, ...)` | scalar rates | Compliant vs. non-compliant audit rate bar |
+| `plot_audit_coefficient_distribution(df)` | agents DataFrame | Histogram of per-lab audit coefficients |
+| `plot_time_series(data, label, color_key)` | `pd.Series` | Generic single-series step chart |
+| `plot_scatter(df, x_col, y_col, ...)` | DataFrame | Scatter with compliance coloring |
+
+All figures are created via `create_figure()` (standardized style, `Agg` backend). Never call `plt.figure()` or `plt.subplots()` in scripts.
+
+**Committed figure scripts** — see `scripts/README.md` for an index of existing scripts. Always check there before re-creating a script.
 
 ## Testing
 

diff --git a/.agents/workflows/gen-figures.md b/.agents/workflows/gen-figures.md
@@ -0,0 +1,112 @@
+---
+description: Generate one or more figures for the paper or a report — enforces the thin-caller pattern where all plot logic lives in vis/plotting.py.
+---
+
+# Gen-Figures Workflow
+
+Use this workflow whenever you need to produce `.png` figures for the paper, a report,
+or any committed output. Do **not** improvise — follow these steps in order.
+
+## Step 1 — Check `vis/plotting.py` first
+
+Open `project.md` and read the **Plots** section inventory table.  
+Find the function that matches the figure you need.
+
+- **Exists?** → go to Step 3.
+- **Doesn't exist?** → you must add it to `vis/plotting.py` first (Step 2), then proceed.
+
+**Never write raw `plt.figure()` or `plt.subplots()` in a script or agent_workspace file.**
+Use `create_figure()` from `vis/plotting.py` at minimum, and prefer a proper named function.
+
+## Step 2 — Add a missing function to `vis/plotting.py` (if needed)
+
+1. Follow the `create_figure()` style exactly — see existing functions for the pattern.
+2. Accept typed result objects (`SweepResult`, `MonteCarloResult`, `pd.DataFrame`) — no raw dicts.
+3. Return `matplotlib.Figure` (never call `plt.show()` or `plt.savefig()` inside the function).
+4. Add it to the inventory table in `project.md` → Plots section.
+5. Run `uv run ruff check . --fix && uv run mypy .` — fix any issues before proceeding.
+
+## Step 3 — Check `scripts/README.md` for an existing script
+
+Open `scripts/README.md`.  
+If a script already generates the figures you need (or close to it), **run that script** rather than writing a new one.
+
+```bash
+uv run python scripts/<existing_script>.py --out-dir agent_workspace/figures
+```
+
+If the existing script's parameters or scenarios need adjustment, edit it in place — don't create a duplicate.
+
+## Step 4 — Write a thin-caller script (if no existing script covers it)
+
+Create a new script in `scripts/` following the naming convention `gen_<section_or_topic>_figs.py`.
+
+The script must follow the **thin-caller pattern**:
+- All imports from `vis.plotting`, `services.*`, `schemas.*`
+- No matplotlib setup — no `plt.figure()`, `plt.subplots()`, `matplotlib.use()`
+- Each figure: call the `vis/plotting.py` function → `fig.savefig(out_dir / "name.png", dpi=150, bbox_inches="tight")`
+- Accept `--out-dir` as a CLI argument (default: `agent_workspace/figures`)
+- Print progress lines so it's easy to monitor
+
+Minimal template:
+```python
+"""Generate <topic> figures for the paper.
+
+Thin caller only — all plot logic lives in vis/plotting.py.
+Output: agent_workspace/figures/<fig_name>.png
+
+Usage:
+    uv run python scripts/gen_<topic>_figs.py [--out-dir PATH]
+"""
+from __future__ import annotations
+import argparse
+from pathlib import Path
+
+
+def main(out_dir: Path) -> None:
+    out_dir.mkdir(parents=True, exist_ok=True)
+
+    from compute_permit_sim.services.config_manager import load_scenario
+    from compute_permit_sim.services.sweep import run_sweep
+    from compute_permit_sim.vis.plotting import plot_sweep_curve  # add as needed
+
+    base = load_scenario("basic/<scenario>.json")
+    result = run_sweep(base, "audit.base_prob", [...], n_runs=50)
+    fig = plot_sweep_curve(result)
+    fig.savefig(out_dir / "fig_<name>.png", dpi=150, bbox_inches="tight")
+    print("Done.")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--out-dir", type=Path, default=Path("agent_workspace/figures"))
+    args = parser.parse_args()
+    main(args.out_dir)
+```
+
+## Step 5 — Update `scripts/README.md`
+
+After writing or modifying a script, update the index in `scripts/README.md`:
+
+```
+| gen_<topic>_figs.py | Generates <figures> for Section X. Scenarios: <...>. |
+```
+
+## Step 6 — Run and verify
+
+// turbo
+```bash
+uv run python scripts/<script_name>.py --out-dir agent_workspace/figures
+```
+
+Check that:
+- All expected `.png` files are created in `out_dir`
+- No matplotlib warnings or errors in output
+- Figures look correct (open them and inspect)
+
+## Step 7 — Commit the script
+
+```bash
+git add scripts/<script_name>.py scripts/README.md
+git commit -m "scripts: add <topic> figure generator"
+```
diff --git a/pyproject.toml b/pyproject.toml
@@ -51,3 +51,4 @@ python_files = ["test_*.py"]
 ignore_missing_imports = true
 check_untyped_defs = true
 plugins = ["pydantic.mypy"]
+exclude = ["agent_workspace"]
diff --git a/scenarios/basic/scenario_2_strict.json b/scenarios/basic/scenario_2_strict.json
@@ -12,9 +12,10 @@
     },
     "lab": {
         "capability_value": 40.0,
-        "racing_factor": 2.0
+        "racing_factor": 2.0,
+        "audit_coefficient": 0.1
     },
-    "collateral_amount": 100.0,
+    "collateral_amount": 15.75,
     "market": {
         "fixed_price": 70.0
     }

diff --git a/scenarios/basic/scenario_3_smart.json b/scenarios/basic/scenario_3_smart.json
@@ -4,10 +4,13 @@
     "steps": 10,
     "n_agents": 20,
     "audit": {
-        "base_prob": 0.2,
+        "base_prob": 0.1,
         "monitoring_prob": 0.2,
         "signal_dependent": true
     },
+    "lab": {
+        "audit_coefficient": 0.5
+    },
     "collateral_amount": 15.75,
     "market": {
         "fixed_price": 2.0,

diff --git a/scenarios/batch_test.json b/scenarios/batch_test.json
@@ -3,21 +3,21 @@
   "description": "Demonstrates that feedback mechanisms (reputation, audit escalation) can drive compliance even under moderate enforcement.",
   "notes": "",
   "n_agents": 20,
-  "steps": 40,
+  "steps": 50,
   "flop_threshold": 1e25,
   "collateral_amount": 0.0,
   "audit": {
-    "base_prob": 0.3,
+    "base_prob": 0.20,
     "signal_dependent": false,
     "signal_exponent": 1.0,
     "false_positive_rate": 0.0,
     "false_negative_rate": 0.05,
-    "penalty_amount": 100.0,
+    "penalty_amount": 50.0,
     "backcheck_prob": 0.0,
     "whistleblower_prob": 0.0,
     "monitoring_prob": 0.0,
     "max_audits_per_step": null,
-    "audit_escalation": 1.5,
+    "audit_escalation": 0.5,
     "audit_decay_rate": 0.1
   },
   "market": {

diff --git a/src/compute_permit_sim/schemas/batch.py b/src/compute_permit_sim/schemas/batch.py
@@ -28,6 +28,10 @@ class BatchColumnNames:
     STEP = "step"
     PARAM_PATH = "param_path"
     PARAM_VALUE = "param_value"
+    PARAM_X_PATH = "param_x_path"
+    PARAM_X_VALUE = "param_x_value"
+    PARAM_Y_PATH = "param_y_path"
+    PARAM_Y_VALUE = "param_y_value"
     N_RUNS = "n_runs"
 
     # Compliance
@@ -215,3 +219,48 @@ def tipping_point(self, threshold: float = 0.95) -> float | None:
             if pt.result.avg_compliance.mean >= threshold:
                 return pt.param_value
         return None
+
+
+@dataclass(frozen=True)
+class GridSweepResult:
+    """Results of a 2D joint-sensitivity parameter sweep over a scenario.
+
+    Stores mean compliance at every (x, y) grid cell.
+
+    Attributes:
+        grid: ``grid[y_idx][x_idx]`` = mean compliance fraction (0–1)
+              over ``n_runs`` seeds at parameter values
+              ``(x_values[x_idx], y_values[y_idx])``.
+    """
+
+    scenario_name: str
+    param_x_path: str  # e.g. "audit.base_prob"
+    param_x_label: str  # human-readable, e.g. "Base Audit Probability"
+    param_y_path: str  # e.g. "collateral_amount"
+    param_y_label: str  # human-readable, e.g. "Collateral K (M$)"
+    config: ScenarioConfig
+    x_values: list[float]  # ordered x-axis values
+    y_values: list[float]  # ordered y-axis values
+    grid: list[list[float]]  # [y_idx][x_idx] = mean compliance in [0, 1]
+    n_runs: int
+    # Short unique identifier matching SimulationRun.sim_id convention
+    id: str = field(default_factory=lambda: str(uuid4())[:8])
+
+    def compliance_at(self, x: float, y: float) -> float | None:
+        """Return mean compliance for an exact (x, y) cell, or None if not found."""
+        try:
+            x_idx = self.x_values.index(x)
+            y_idx = self.y_values.index(y)
+        except ValueError:
+            return None
+        return self.grid[y_idx][x_idx]
+
+    @property
+    def compliance_min(self) -> float:
+        """Minimum mean compliance across all grid cells."""
+        return min(v for row in self.grid for v in row)
+
+    @property
+    def compliance_max(self) -> float:
+        """Maximum mean compliance across all grid cells."""
+        return max(v for row in self.grid for v in row)
diff --git a/src/compute_permit_sim/schemas/defaults.py b/src/compute_permit_sim/schemas/defaults.py
@@ -77,7 +77,7 @@
 DEFAULT_SIGNAL_EXPONENT = 1.0
 #
 # Stage 2: AUDIT OUTCOME — given audit, does it find a violation?
-#   p_catch_if_audited = (1 - FNR) + FNR × backcheck_prob
+#   p_catch_if_audited = 1 - FNR × (1 - backcheck_prob) × (1 - p_w) × (1 - p_m)
 DEFAULT_AUDIT_FALSE_POS_RATE = 0.0  # alpha: P(false alarm | compliant firm audited)
 DEFAULT_AUDIT_FALSE_NEG_RATE = 0.40  # beta: 40% miss rate in Minimal env
 # Penalty structure:

diff --git a/src/compute_permit_sim/schemas/sweep_params.py b/src/compute_permit_sim/schemas/sweep_params.py
@@ -170,6 +170,46 @@ class SweepParam:
         description="Upper bound of risk appetite multiplier (>1 = risk-seeking).",
         category="Agents",
     ),
+    SweepParam(
+        path="lab.capability_value",
+        label="Capability Race Premium V_b",
+        unit="M$",
+        default_min=0.0,
+        default_max=300.0,
+        default_step=20.0,
+        description="Strategic value of model capabilities from training (arms-race premium added to gain from cheating).",
+        category="Agents",
+    ),
+    SweepParam(
+        path="lab.racing_factor",
+        label="Racing Factor c_r",
+        unit="",
+        default_min=0.0,
+        default_max=5.0,
+        default_step=0.25,
+        description="Urgency multiplier on capability value; higher = stronger competitive pressure to cheat.",
+        category="Agents",
+    ),
+    SweepParam(
+        path="lab.reputation_escalation_factor",
+        label="Reputation Escalation Factor",
+        unit="",
+        default_min=0.0,
+        default_max=5.0,
+        default_step=0.25,
+        description="Per-violation multiplier on reputation cost: rep_t = base × (1+factor)^n_caught. 0 = no escalation.",
+        category="Agents",
+    ),
+    SweepParam(
+        path="lab.reputation_sensitivity",
+        label="Reputation Sensitivity R",
+        unit="M$",
+        default_min=0.0,
+        default_max=100.0,
+        default_step=5.0,
+        description="Base reputation cost per violation (M$). Compounds with reputation_escalation_factor.",
+        category="Agents",
+    ),
     # --- Dynamics ---
     SweepParam(
         path="audit.signal_exponent",