feat: Add formal policy verification (OpenShell Policy Prover) to Python SDK

### Problem Statement

Policy authoring today relies on manual review and runtime testing. There is no way to statically verify that a sandbox policy actually enforces what the author intended before deployment. This leaves gaps that runtime testing may miss — for example, an L4-only endpoint that bypasses HTTP method inspection, or a binary capability chain that enables data exfiltration through an otherwise locked-down sandbox.

### Proposed Design

Integrate the OpenShell Policy Prover (OPP) into the Python SDK as `openshell.prover`. OPP uses Z3 SMT solving to formally verify properties of sandbox policies by encoding policy rules, credential scopes, and binary capabilities as constraints and then checking reachability.

**Components:**
- `openshell.prover` Python subpackage under `python/openshell/prover/`
- 8 source modules: policy parser, Z3 constraint model, verification queries, binary registry, credential loader, report renderer, accepted risks filter, CLI
- Registry data files (binary capabilities, API endpoint descriptors, credential types) as package data
- `openshell policy prove` Rust CLI subcommand that delegates to the Python prover via subprocess
- `harden-sandbox-policy` agent skill that wraps the prover for interactive use

**Verification queries detect:**
- Data exfiltration paths (can data leave the sandbox?)
- Write bypass attacks (can agents modify despite read-only policies?)
- Privilege escalation via binary capability inheritance chains
- L7 enforcement gaps (L4-only endpoints bypassing HTTP inspection)
- Overpermissive method grants

**User-facing interface:**
- `openshell policy prove --policy <path> --credentials <path>` — primary CLI (Rust delegates to Python prover)
- `python -m openshell.prover.cli prove --policy <path> --credentials <path>` — direct Python CLI
- Python API: `from openshell.prover import ...` for programmatic use
- Agent skill: `harden-sandbox-policy` for interactive policy hardening workflow
- Output formats: human-readable console, compact CI format, Mermaid diagrams, HTML reports

**Dependencies:** `z3-solver>=4.12`, `pyyaml>=6.0`, `rich>=13.0` — added as an optional dependency group (`prover`) to avoid bloating the base SDK.

### Alternatives Considered

1. **Pure Rust with Z3 crate** — Z3 has Rust bindings but they're less mature than Python's. The prover logic is constraint-specification code where Python is more productive. Subprocess delegation from Rust CLI keeps the UX unified.

2. **Standalone package (separate pip install)** — More isolation but creates packaging/versioning overhead and the prover operates on the same policy YAML format the SDK already handles.

3. **OPA-based static analysis** — OPA is already used at runtime but its Rego-based approach is better suited for per-request evaluation than exhaustive reachability analysis across all policy+credential+binary combinations.

### Agent Investigation

- Explored the OpenShell repo structure: 12 Rust crates, Python SDK at `python/openshell/`, agent skills at `.agents/skills/`
- Confirmed `openshell-policy` crate handles YAML validation but has no formal verification
- Confirmed Rust CLI already uses `std::process::Command` for subprocess delegation (docker, ssh, git) — same pattern works for Python prover
- Confirmed `PolicyCommands` enum in `crates/openshell-cli/src/main.rs` is the right place to add `Prove` variant
- OPP prototype validated against production policies (NemoClaw) and surfaced real security findings

### Scope

- Python source modules under `python/openshell/prover/`
- Binary/API/credential registry data files
- Unit tests (`prover_test.py`)
- `pyproject.toml` optional dependency group (`prover`)
- `openshell policy prove` Rust CLI subcommand (subprocess delegation to Python prover)
- `harden-sandbox-policy` agent skill + `AGENTS.md` update

### Follow-up work

- [ ] Demo scenarios under `/examples/policy-prover/`
- [ ] Policy Advisor integration (verify proposed rules before approval)
- [ ] CI gate (run OPP against example policies in CI)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add formal policy verification (OpenShell Policy Prover) to Python SDK #699

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Scope

Follow-up work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Add formal policy verification (OpenShell Policy Prover) to Python SDK #699

Description

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Scope

Follow-up work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions