Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .agents/rules/amendments.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,37 @@ Sessionfriction identified during prune-repo + cleanup work:
- **Logging config** (`vis/logging_config.py` is canonical; never configure in `page.py`)
- **Schema field removal checklist** (grep callers, confirm never populated, no `list[dict]` placeholders)
- `python.md` sync-guard test bullet was vague. Expanded with the concrete pattern: compare `model_fields` against the reactive registry at test time.

## 2026-03-12 — Always use vis/plotting.py for paper figures

`vis/plotting.py` is the single source of truth for all chart functions. When generating
figures for papers, scripts, or exports, **always call functions from there** — never write
custom matplotlib from scratch in agent_workspace scripts.

Available functions to reach for first:
- `plot_sweep_curve(SweepResult)` — 1D sweep line chart with tipping point annotation
- `plot_mc_trajectory(MonteCarloResult)` — compliance over steps, mean ± SD
- `plot_mc_violator_trajectory`, `plot_mc_audit_trajectory`, `plot_mc_payoff_comparison`

If a needed figure type does not exist in `vis/plotting.py` (e.g. a 2D heatmap), **add it
there** following the `create_figure()` style, then use it from both the UI and scripts.
Do not create ad-hoc matplotlib code in agent_workspace when an equivalent function
already exists or could be added once and shared.

## 2026-03-14 — Reflect: plotting discipline and scripting infrastructure

Three friction sources identified, all patched this session:

1. **`project.md` Plots section was too sparse** — 2 lines with no function inventory.
Replaced with the full table of all 12 public functions and a mandatory "check before
writing any matplotlib" gate. The agent cannot now claim ignorance of what exists.

2. **`researcher.md` step 4 had zero mention of `vis/plotting.py`** — meaning every
research visualisation session was allowed to invent ad-hoc matplotlib. Added an
`[!IMPORTANT]` callout before the visualise step enforcing the same gate.

3. **No `/gen-figures` workflow existed** — figure generation for paper sections was
improvised each time. Created `.agents/workflows/gen-figures.md` with a step-by-step
thin-caller checklist, a copy-paste script template, and a `// turbo` run step.
Also created `scripts/README.md` as the cross-session script index so existing
scripts are discoverable rather than silently re-invented.
23 changes: 22 additions & 1 deletion .agents/rules/project.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,28 @@ All export functions return `bytes` for Solara's `FileDownload`. Key functions:

## Plots (`vis/plotting.py`)

Accept typed result objects, return `matplotlib.Figure`, never import Solara. Use `fig_to_png(fig)` from `results.py` to convert to bytes for downloads. Standard figsize `(7, 4)`.
**Before writing any matplotlib code**, check this inventory. If the function you need exists here, call it. If it doesn't exist, add it here following the `create_figure()` style — then use it from both scripts and the UI.

All functions accept typed result objects, return `matplotlib.Figure`, never import Solara. Use `fig_to_png(fig)` from `results.py` to convert to bytes for downloads.

| Function | Input | Use for |
|---|---|---|
| `plot_sweep_curve(result, metric, reference_lines)` | `SweepResult` | 1D sweep line chart with tipping point + optional scenario markers |
| `plot_sweep_heatmap(grid, x_values, y_values, ...)` | 2D `list[list[float]]` | 2D compliance heatmap (joint sensitivity) |
| `plot_mc_trajectory(result)` | `MonteCarloResult` | Compliance mean ± SD over steps |
| `plot_mc_violator_trajectory(result)` | `MonteCarloResult` | Violator count mean ± SD over steps |
| `plot_mc_audit_trajectory(result)` | `MonteCarloResult` | Audit rate band over steps |
| `plot_mc_payoff_comparison(result)` | `MonteCarloResult` | Compliant vs. violating lab payoff bar chart |
| `plot_compliance_distribution(df)` | agents DataFrame | Bar chart: Compliant / Uncaught / Caught-by-source |
| `plot_audit_source_distribution(df)` | agents DataFrame | Bar chart: labs caught per AuditSource channel |
| `plot_audit_targeting(rates, counts, ...)` | scalar rates | Compliant vs. non-compliant audit rate bar |
| `plot_audit_coefficient_distribution(df)` | agents DataFrame | Histogram of per-lab audit coefficients |
| `plot_time_series(data, label, color_key)` | `pd.Series` | Generic single-series step chart |
| `plot_scatter(df, x_col, y_col, ...)` | DataFrame | Scatter with compliance coloring |

All figures are created via `create_figure()` (standardized style, `Agg` backend). Never call `plt.figure()` or `plt.subplots()` in scripts.

**Committed figure scripts** — see `scripts/README.md` for an index of existing scripts. Always check there before re-creating a script.

## Testing

Expand Down
112 changes: 112 additions & 0 deletions .agents/workflows/gen-figures.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
description: Generate one or more figures for the paper or a report — enforces the thin-caller pattern where all plot logic lives in vis/plotting.py.
---

# Gen-Figures Workflow

Use this workflow whenever you need to produce `.png` figures for the paper, a report,
or any committed output. Do **not** improvise — follow these steps in order.

## Step 1 — Check `vis/plotting.py` first

Open `project.md` and read the **Plots** section inventory table.
Find the function that matches the figure you need.

- **Exists?** → go to Step 3.
- **Doesn't exist?** → you must add it to `vis/plotting.py` first (Step 2), then proceed.

**Never write raw `plt.figure()` or `plt.subplots()` in a script or agent_workspace file.**
Use `create_figure()` from `vis/plotting.py` at minimum, and prefer a proper named function.

## Step 2 — Add a missing function to `vis/plotting.py` (if needed)

1. Follow the `create_figure()` style exactly — see existing functions for the pattern.
2. Accept typed result objects (`SweepResult`, `MonteCarloResult`, `pd.DataFrame`) — no raw dicts.
3. Return `matplotlib.Figure` (never call `plt.show()` or `plt.savefig()` inside the function).
4. Add it to the inventory table in `project.md` → Plots section.
5. Run `uv run ruff check . --fix && uv run mypy .` — fix any issues before proceeding.

## Step 3 — Check `scripts/README.md` for an existing script

Open `scripts/README.md`.
If a script already generates the figures you need (or close to it), **run that script** rather than writing a new one.

```bash
uv run python scripts/<existing_script>.py --out-dir agent_workspace/figures
```

If the existing script's parameters or scenarios need adjustment, edit it in place — don't create a duplicate.

## Step 4 — Write a thin-caller script (if no existing script covers it)

Create a new script in `scripts/` following the naming convention `gen_<section_or_topic>_figs.py`.

The script must follow the **thin-caller pattern**:
- All imports from `vis.plotting`, `services.*`, `schemas.*`
- No matplotlib setup — no `plt.figure()`, `plt.subplots()`, `matplotlib.use()`
- Each figure: call the `vis/plotting.py` function → `fig.savefig(out_dir / "name.png", dpi=150, bbox_inches="tight")`
- Accept `--out-dir` as a CLI argument (default: `agent_workspace/figures`)
- Print progress lines so it's easy to monitor

Minimal template:
```python
"""Generate <topic> figures for the paper.

Thin caller only — all plot logic lives in vis/plotting.py.
Output: agent_workspace/figures/<fig_name>.png

Usage:
uv run python scripts/gen_<topic>_figs.py [--out-dir PATH]
"""
from __future__ import annotations
import argparse
from pathlib import Path


def main(out_dir: Path) -> None:
out_dir.mkdir(parents=True, exist_ok=True)

from compute_permit_sim.services.config_manager import load_scenario
from compute_permit_sim.services.sweep import run_sweep
from compute_permit_sim.vis.plotting import plot_sweep_curve # add as needed

base = load_scenario("basic/<scenario>.json")
result = run_sweep(base, "audit.base_prob", [...], n_runs=50)
fig = plot_sweep_curve(result)
fig.savefig(out_dir / "fig_<name>.png", dpi=150, bbox_inches="tight")
print("Done.")


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--out-dir", type=Path, default=Path("agent_workspace/figures"))
args = parser.parse_args()
main(args.out_dir)
```

## Step 5 — Update `scripts/README.md`

After writing or modifying a script, update the index in `scripts/README.md`:

```
| gen_<topic>_figs.py | Generates <figures> for Section X. Scenarios: <...>. |
```

## Step 6 — Run and verify

// turbo
```bash
uv run python scripts/<script_name>.py --out-dir agent_workspace/figures
```

Check that:
- All expected `.png` files are created in `out_dir`
- No matplotlib warnings or errors in output
- Figures look correct (open them and inspect)

## Step 7 — Commit the script

```bash
git add scripts/<script_name>.py scripts/README.md
git commit -m "scripts: add <topic> figure generator"
```
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,4 @@ python_files = ["test_*.py"]
ignore_missing_imports = true
check_untyped_defs = true
plugins = ["pydantic.mypy"]
exclude = ["agent_workspace"]
5 changes: 3 additions & 2 deletions scenarios/basic/scenario_2_strict.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,10 @@
},
"lab": {
"capability_value": 40.0,
"racing_factor": 2.0
"racing_factor": 2.0,
"audit_coefficient": 0.1
},
"collateral_amount": 100.0,
"collateral_amount": 15.75,
"market": {
"fixed_price": 70.0
}
Expand Down
5 changes: 4 additions & 1 deletion scenarios/basic/scenario_3_smart.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@
"steps": 10,
"n_agents": 20,
"audit": {
"base_prob": 0.2,
"base_prob": 0.1,
"monitoring_prob": 0.2,
"signal_dependent": true
},
"lab": {
"audit_coefficient": 0.5
},
"collateral_amount": 15.75,
"market": {
"fixed_price": 2.0,
Expand Down
8 changes: 4 additions & 4 deletions scenarios/batch_test.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,21 @@
"description": "Demonstrates that feedback mechanisms (reputation, audit escalation) can drive compliance even under moderate enforcement.",
"notes": "",
"n_agents": 20,
"steps": 40,
"steps": 50,
"flop_threshold": 1e25,
"collateral_amount": 0.0,
"audit": {
"base_prob": 0.3,
"base_prob": 0.20,
"signal_dependent": false,
"signal_exponent": 1.0,
"false_positive_rate": 0.0,
"false_negative_rate": 0.05,
"penalty_amount": 100.0,
"penalty_amount": 50.0,
"backcheck_prob": 0.0,
"whistleblower_prob": 0.0,
"monitoring_prob": 0.0,
"max_audits_per_step": null,
"audit_escalation": 1.5,
"audit_escalation": 0.5,
"audit_decay_rate": 0.1
},
"market": {
Expand Down
49 changes: 49 additions & 0 deletions src/compute_permit_sim/schemas/batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ class BatchColumnNames:
STEP = "step"
PARAM_PATH = "param_path"
PARAM_VALUE = "param_value"
PARAM_X_PATH = "param_x_path"
PARAM_X_VALUE = "param_x_value"
PARAM_Y_PATH = "param_y_path"
PARAM_Y_VALUE = "param_y_value"
N_RUNS = "n_runs"

# Compliance
Expand Down Expand Up @@ -215,3 +219,48 @@ def tipping_point(self, threshold: float = 0.95) -> float | None:
if pt.result.avg_compliance.mean >= threshold:
return pt.param_value
return None


@dataclass(frozen=True)
class GridSweepResult:
"""Results of a 2D joint-sensitivity parameter sweep over a scenario.

Stores mean compliance at every (x, y) grid cell.

Attributes:
grid: ``grid[y_idx][x_idx]`` = mean compliance fraction (0–1)
over ``n_runs`` seeds at parameter values
``(x_values[x_idx], y_values[y_idx])``.
"""

scenario_name: str
param_x_path: str # e.g. "audit.base_prob"
param_x_label: str # human-readable, e.g. "Base Audit Probability"
param_y_path: str # e.g. "collateral_amount"
param_y_label: str # human-readable, e.g. "Collateral K (M$)"
config: ScenarioConfig
x_values: list[float] # ordered x-axis values
y_values: list[float] # ordered y-axis values
grid: list[list[float]] # [y_idx][x_idx] = mean compliance in [0, 1]
n_runs: int
# Short unique identifier matching SimulationRun.sim_id convention
id: str = field(default_factory=lambda: str(uuid4())[:8])

def compliance_at(self, x: float, y: float) -> float | None:
"""Return mean compliance for an exact (x, y) cell, or None if not found."""
try:
x_idx = self.x_values.index(x)
y_idx = self.y_values.index(y)
except ValueError:
return None
return self.grid[y_idx][x_idx]

@property
def compliance_min(self) -> float:
"""Minimum mean compliance across all grid cells."""
return min(v for row in self.grid for v in row)

@property
def compliance_max(self) -> float:
"""Maximum mean compliance across all grid cells."""
return max(v for row in self.grid for v in row)
2 changes: 1 addition & 1 deletion src/compute_permit_sim/schemas/defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@
DEFAULT_SIGNAL_EXPONENT = 1.0
#
# Stage 2: AUDIT OUTCOME — given audit, does it find a violation?
# p_catch_if_audited = (1 - FNR) + FNR × backcheck_prob
# p_catch_if_audited = 1 - FNR × (1 - backcheck_prob) × (1 - p_w) × (1 - p_m)
DEFAULT_AUDIT_FALSE_POS_RATE = 0.0 # alpha: P(false alarm | compliant firm audited)
DEFAULT_AUDIT_FALSE_NEG_RATE = 0.40 # beta: 40% miss rate in Minimal env
# Penalty structure:
Expand Down
40 changes: 40 additions & 0 deletions src/compute_permit_sim/schemas/sweep_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,46 @@ class SweepParam:
description="Upper bound of risk appetite multiplier (>1 = risk-seeking).",
category="Agents",
),
SweepParam(
path="lab.capability_value",
label="Capability Race Premium V_b",
unit="M$",
default_min=0.0,
default_max=300.0,
default_step=20.0,
description="Strategic value of model capabilities from training (arms-race premium added to gain from cheating).",
category="Agents",
),
SweepParam(
path="lab.racing_factor",
label="Racing Factor c_r",
unit="",
default_min=0.0,
default_max=5.0,
default_step=0.25,
description="Urgency multiplier on capability value; higher = stronger competitive pressure to cheat.",
category="Agents",
),
SweepParam(
path="lab.reputation_escalation_factor",
label="Reputation Escalation Factor",
unit="",
default_min=0.0,
default_max=5.0,
default_step=0.25,
description="Per-violation multiplier on reputation cost: rep_t = base × (1+factor)^n_caught. 0 = no escalation.",
category="Agents",
),
SweepParam(
path="lab.reputation_sensitivity",
label="Reputation Sensitivity R",
unit="M$",
default_min=0.0,
default_max=100.0,
default_step=5.0,
description="Base reputation cost per violation (M$). Compounds with reputation_escalation_factor.",
category="Agents",
),
# --- Dynamics ---
SweepParam(
path="audit.signal_exponent",
Expand Down
Loading
Loading