Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 42 additions & 52 deletions documents/TECHNICAL_DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,14 +68,14 @@ graph TB
- `enforcement.py`: `Auditor` implements two-stage audit model (signal → audit → outcome)

**Infrastructure Layer** (`services/`): Mesa integration and simulation control.
- `model_wrapper.py`: `ComputePermitModel` orchestrates the 7-phase simulation loop (see below)
- `mesa_model.py`: `ComputePermitModel` orchestrates the simulation loop (see below)
- `config_manager.py`: Loads/saves JSON scenarios as validated `ScenarioConfig` objects
- `data_collect.py`: Reporter functions for Mesa DataCollector (compliance rate, price)
- `metrics.py`: Compliance and run-metrics calculations from agent snapshots

**Visualization Layer** (`vis/`): Interactive UI and state management.
- `state.py`: `SimulationManager` manages reactive state, bridges UI ↔ Model
- `solara_app.py`: Solara components (ConfigPanel, Dashboard, InspectorTab)
- `components.py`: Reusable UI widgets (scatter plots, range controls)
- `simulation.py`: `SimulationEngine` manages reactive state, bridges UI ↔ Model
- `page.py`: Solara entry point (ConfigPanel, Dashboard, InspectorTab)
- `components/`: Reusable UI widgets (scatter plots, range controls, cards)

**Configuration** (`schemas/`): Pydantic models (`AuditConfig`, `MarketConfig`, `LabConfig`, `ScenarioConfig`) used throughout for type-safe configuration.

Expand All @@ -92,19 +92,8 @@ FLOP scaling reference:
| Near-future frontier | 10²⁵ | ~$500M |
| Projected 2027 | 10²⁶ | ~$5B |

### Flexible Penalty Structure
Penalties follow the formula (inspired by EU AI Act Article 99):
```
penalty = max(penalty_fixed, penalty_percentage × firm_revenue)
if penalty_ceiling: penalty = min(penalty, penalty_ceiling)
```

- `penalty_fixed` (M$): Fixed floor, e.g. EU AI Act €35M
- `penalty_percentage`: Fraction of annual revenue, e.g. EU AI Act 7%
- `penalty_ceiling` (M$): Optional cap
- When both are 0 (default), falls back to flat `penalty_amount`
- `firm_revenue` (annual turnover) is distinct from `economic_value` (training run value)

### Penalty Structure
Penalties are defined simply via a flat per-firm amount (`penalty_amount`) set at instantiation.
Ref: Christoph (2026) §2.5 — P_eff = min(K + φ, L)

### Collateral / Staking Mechanism
Expand All @@ -127,24 +116,29 @@ The `Auditor` implements a realistic enforcement process:

| Stage | Method | Determines | Parameters |
|-------|--------|------------|------------|
| **1. Signal Generation** | `compute_signal_strength()` | How suspicious a firm appears | `used_compute`, `flop_threshold` |
| **2. Audit Occurrence** | `decide_audit()` | Whether audit is initiated | `base_prob` (π₀), `high_prob` (π₁) |
| **3. Audit Outcome** | `audit_finds_violation()` | Whether violation is detected | `false_positive_rate` (α), `false_negative_rate` (β) |
| **1a. Signal** | `compute_signal()` | Suspicion signal from excess FLOP | `excess_compute`, `flop_threshold`, `signal_exponent` |
| **1b. Audit Occurrence** | `compute_audit_probability()` | Whether audit is initiated | `base_prob` (π₀), `audit_coefficient` c(i), `signal_dependent` |
| **2. Audit Outcome** | `audit_detection_channel()` | Whether violation is detected | `false_negative_rate` (β), `backcheck_prob`, `whistleblower_prob`, `monitoring_prob`, `false_positive_rate` (α) |

**Signal Strength Formula** (for non-compliant firms):
```
signal = 0.5 + 0.5 × min(1, (used_compute - threshold) / threshold)
signal = min(1.0, (used_compute / flop_threshold)^signal_exponent)
```
- At threshold: signal ≈ 0.5 (borderline suspicious)
- At 2× threshold: signal = 1.0 (very suspicious)
- Larger training runs are harder to hide
- At `signal_exponent=1.0` (linear), 50% excess → 0.5 signal.
- Higher exponents create a more convex, lenient regime for minor infractions.

**Effective Detection Probability**:
```
p_audit = base_prob + signal_strength × (high_prob - base_prob)
p_catch = p_audit × (1 - false_negative_rate)
p_eff = p_catch + (1 - p_catch) × backcheck_prob
# if signal_dependent=True:
p_audit = min(1.0, base_prob + c(i) × signal × (1.0 - base_prob))
# if signal_dependent=False:
p_audit = base_prob

miss = false_negative_rate × (1 - backcheck_prob) × (1 - p_w) × (1 - p_m)
p_catch = 1 - miss
p_eff = p_audit × p_catch
```
`base_prob` is a uniform floor applied equally to all firms (random audits). `c(i)` only scales the signal-dependent component, so firm-specific audit rate differences arise from violation visibility, not the random baseline. When `signal_dependent=False`, `c(i)` has no effect. `p_w` (whistleblower) and `p_m` (monitoring) are nested within `p_catch`: they provide additional detection when the direct audit pass and backcheck both miss.

## Simulation Loop

Expand All @@ -165,32 +159,31 @@ sequenceDiagram
SM->>Model: step()

Note over Model: Phase 0: Collateral
Model->>Lab: Post collateral K (deduct from wealth)
Model->>Lab: Post collateral K

Note over Model: Phase 1: Trading
Model->>Lab: get_bid() for each agent
Lab-->>Model: bids
Model->>Market: allocate(bids)
Market-->>Model: (price, winners)
Model->>Lab: Update has_permit, deduct wealth
Market-->>Model: (price, allocations)
Model->>Lab: Update permits_held

Note over Model: Phase 2: Compliance Decision
Model->>Auditor: compute_signal_strength(planned_training_flops, flop_threshold)
Auditor-->>Model: expected_signal
Model->>Auditor: compute_effective_detection(signal)
Model->>Auditor: compute_signal(excess_compute, flop_threshold)
Auditor-->>Model: signal
Model->>Auditor: compute_detection_probability(excess, threshold, coeff, p_w, p_m)
Auditor-->>Model: p_eff
Model->>Lab: decide_compliance(price, penalty, p_eff)
Lab-->>Model: is_compliant

Note over Model: Phase 3–4: Signal, Audit & Enforcement
Model->>Auditor: generate_signal(used_compute, flop_threshold)
Auditor-->>Model: signal_strength
Model->>Auditor: decide_audit(signal_strength)
Auditor-->>Model: should_audit
Model->>Auditor: audit_finds_violation(is_compliant)
Auditor-->>Model: violation_found (uses FPR/FNR)
Model->>Auditor: compute_audit_probability(signal, audit_coefficient)
Auditor-->>Model: p_audit
Model->>Auditor: audit_detection_channel(is_compliant, p_w, p_m)
Auditor-->>Model: (caught, caught_backcheck)
Model->>Auditor: apply_penalty(violation_found, penalty_amount)
Auditor-->>Model: penalty
Model->>Lab: Apply penalties + seize collateral if caught
Model->>Lab: Refund collateral if not caught

Note over Model: Phase 5: Value Realization
Model->>Lab: Realize economic value
Expand All @@ -204,15 +197,15 @@ sequenceDiagram
SM-->>UI: Trigger re-render
```

**Phase 0 - Collateral**: Labs post refundable deposit K (deducted from wealth). Skipped when `collateral_amount = 0`.
**Phase 0 - Collateral**: Labs post refundable deposit K. Skipped when `collateral_amount = 0`.

**Phase 1 - Trading**: Agents submit bids → `Market.allocate()` → price discoverypermit allocation → wealth deduction
**Phase 1 - Trading**: Above-threshold labs submit bids → `Market.allocate()` → uniform-price auction`permits_held` updated (supports multi-unit FLOP-denominated permits when `flops_per_permit` is set)

**Phase 2 - Compliance Decision**: Per-agent detection probability calculated using expected signal strength from `planned_training_flops` vs `flop_threshold` → agents without permits call `decide_compliance()` → deterrence condition: `p_eff × B_total >= gain` where `B_total = (penalty + collateral + reputation_sensitivity) × risk_profile`
**Phase 2 - Compliance Decision**: Per-agent detection probability calculated via `compute_detection_probability()` using expected signal from `planned_training_flops` vs `flop_threshold` → labs with unpermitted excess call `decide_compliance()` → deterrence condition: `p_eff × B_total >= gain` where `B_total = (penalty + collateral + reputation_sensitivity) × risk_profile`

**Phase 3–4 - Signal, Audit & Enforcement**: Actual training FLOP usage generates signal → signal strength determines audit probability (interpolates between `base_prob` and `high_prob`) → `audit_finds_violation()` uses FPR/FNR → penalties applied (flexible: `max(fixed, pct × revenue)` with optional ceiling) → collateral seized on violation → collateral refunded otherwise → `on_audit_failure()` escalates reputation sensitivity and audit coefficient
**Phase 3–4 - Signal, Audit & Enforcement**: Actual training FLOP excess generates signal → `compute_audit_probability()` applies `base_prob` as a uniform floor; when `signal_dependent=True`, `audit_coefficient` scales the signal boost above that floor → `audit_detection_channel()` uses FNR/backcheck for two-stage outcome, with whistleblower (`p_w`) and monitoring (`p_m`) as nested fallback channels within the audit event → flat `penalty_amount` applied if caught → collateral seized on violation → `on_audit_failure()` escalates reputation sensitivity and audit coefficient

**Phase 5 - Value Realization**: Agents who ran compute (legally or illegally) realize `economic_value`
**Phase 5 - Value Realization**: Labs that ran realize `economic_value`

**Phase 6 - Dynamic Factor Updates**: Audit coefficients decay toward base via `decay_audit_coefficient()` → racing factors updated via `update_racing_factor()` based on relative capability position

Expand All @@ -234,10 +227,10 @@ reputation_sensitivity_t = base × (1 + escalation_factor)^failed_audit_count
Failed audits increase a lab's audit coefficient (making future audits more likely). The excess decays exponentially toward the base each step:
```
on failure: audit_coefficient += audit_escalation
each step: excess = current - base; current = base + excess × decay_rate
each step: excess = current - base; current = base + excess × (1 - decay_rate)
```
- `audit_escalation = 0.0` (static) or e.g. `1.0` per failure
- `audit_decay_rate = 0.8` (20% decay per step toward base of 1.0)
- `audit_decay_rate = 0.2` (20% decay per step toward base of 1.0)

### Racing Factor Dynamics
Racing pressure adjusts based on relative capability position:
Expand All @@ -252,12 +245,9 @@ where `gap = cumulative_capability - mean_capability`.

All monetary values are in **millions of USD (M$)**. This includes:
- `economic_value`: Training run value (M$)
- `firm_revenue`: Annual revenue/turnover (M$)
- `penalty_amount`, `penalty_fixed`, `penalty_ceiling`: Penalties (M$)
- `penalty_amount`: Flat per-firm penalty (M$)
- `collateral_amount`: Refundable deposit (M$)
- `reputation_sensitivity`: Perceived brand/trust damage (M$)
- `wealth`, `step_profit`: Cumulative and per-step financials (M$)
- `cost`: Per-audit cost for regulator (M$)

Training run compute is measured in **FLOPs** (floating point operations).
- `flop_threshold`: Regulatory threshold (FLOP)
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,4 @@ python_files = ["test_*.py"]
[tool.mypy]
ignore_missing_imports = true
check_untyped_defs = true
plugins = ["pydantic.mypy"]
7 changes: 0 additions & 7 deletions scenarios/scenario_1_lawless.json

This file was deleted.

10 changes: 10 additions & 0 deletions scenarios/scenario_1_minimal.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"name": "Minimal Enforcement",
"description": "Scarce permits, weak random auditing, high false-negative rate. Expected penalty (~5 M$) is far below the gain from running without a permit, so most unpermitted labs cheat.",
"steps": 10,
"n_agents": 20,
"audit": {},
"market": {
"permit_cap": 5.0
}
}
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
{
"name": "Crisis World (Maximum Safety)",
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this scenario I noticed some steps went by without formal audit. I think this is because 0.3 * 0.1 -> less than 1 expected audit per turn. Separate note on this

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

"name": "Strict Enforcement",
"description": "Intrusive telemetry (100% monitoring), rigid enforcement (30% base audit), high collateral.",
"steps": 10,
"n_agents": 20,
"audit": {
"base_prob": 0.3,
"high_prob": 1.0,
"false_negative_rate": 0.1
"false_negative_rate": 0.1,
"monitoring_prob": 1.0,
"penalty_amount": 50.0,
"signal_dependent": true
},
"lab": {
"audit_coefficient": 0.1
"capability_value": 40.0,
"racing_factor": 2.0
},
"collateral_amount": 15.75,
"collateral_amount": 100.0,
"market": {
"fixed_price": 70.0
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
{
"name": "Maxwell World (Smart Enforcement)",
"name": "Smart Enforcement",
"description": "Smart targeting (monitoring=0.2), efficient pricing ($2M), moderate audit (10%).",
"steps": 10,
"n_agents": 20,
"audit": {
"base_prob": 0.2
"base_prob": 0.2,
"monitoring_prob": 0.2,
"signal_dependent": true
},
"collateral_amount": 15.75,
"market": {
"fixed_price": 2.0,
"token_cap": 20.0
"permit_cap": 20.0
}
}
31 changes: 31 additions & 0 deletions scenarios/scenario_4_dynamic.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"name": "Dynamic Escalation (Time-Dependent)",
"description": "Demonstrates shifting deterrence via feedback loops: labs start non-compliant, get caught, and face escalating audit probabilities and reputation costs. Initial cheating gives way to compliance as enforcement ratchets up.",
"steps": 110,
"n_agents": 20,
"audit": {
"base_prob": 0.3,
"signal_dependent": false,
"false_negative_rate": 0.05,
"penalty_amount": 100.0,
"audit_escalation": 1.5,
"audit_decay_rate": 0.1
},
"lab": {
"compute_capacity_min": 1e26,
"compute_capacity_max": 1e26,
"economic_value_min": 40.0,
"economic_value_max": 60.0,
"capability_value": 0.0,
"racing_factor": 0.0,
"risk_profile_min": 1.0,
"risk_profile_max": 1.0,
"reputation_sensitivity": 10.0,
"reputation_escalation_factor": 1.0,
"audit_coefficient": 1.0
},
"market": {
"permit_cap": 5.0
},
"collateral_amount": 0.0
}
Loading
Loading