-
Notifications
You must be signed in to change notification settings - Fork 436
feat: Add formal policy verification (OpenShell Policy Prover) to Python SDK #699
Description
Problem Statement
Policy authoring today relies on manual review and runtime testing. There is no way to statically verify that a sandbox policy actually enforces what the author intended before deployment. This leaves gaps that runtime testing may miss — for example, an L4-only endpoint that bypasses HTTP method inspection, or a binary capability chain that enables data exfiltration through an otherwise locked-down sandbox.
Proposed Design
Integrate the OpenShell Policy Prover (OPP) into the Python SDK as openshell.prover. OPP uses Z3 SMT solving to formally verify properties of sandbox policies by encoding policy rules, credential scopes, and binary capabilities as constraints and then checking reachability.
Components:
openshell.proverPython subpackage underpython/openshell/prover/- 8 source modules: policy parser, Z3 constraint model, verification queries, binary registry, credential loader, report renderer, accepted risks filter, CLI
- Registry data files (binary capabilities, API endpoint descriptors, credential types) as package data
openshell policy proveRust CLI subcommand that delegates to the Python prover via subprocessharden-sandbox-policyagent skill that wraps the prover for interactive use
Verification queries detect:
- Data exfiltration paths (can data leave the sandbox?)
- Write bypass attacks (can agents modify despite read-only policies?)
- Privilege escalation via binary capability inheritance chains
- L7 enforcement gaps (L4-only endpoints bypassing HTTP inspection)
- Overpermissive method grants
User-facing interface:
openshell policy prove --policy <path> --credentials <path>— primary CLI (Rust delegates to Python prover)python -m openshell.prover.cli prove --policy <path> --credentials <path>— direct Python CLI- Python API:
from openshell.prover import ...for programmatic use - Agent skill:
harden-sandbox-policyfor interactive policy hardening workflow - Output formats: human-readable console, compact CI format, Mermaid diagrams, HTML reports
Dependencies: z3-solver>=4.12, pyyaml>=6.0, rich>=13.0 — added as an optional dependency group (prover) to avoid bloating the base SDK.
Alternatives Considered
-
Pure Rust with Z3 crate — Z3 has Rust bindings but they're less mature than Python's. The prover logic is constraint-specification code where Python is more productive. Subprocess delegation from Rust CLI keeps the UX unified.
-
Standalone package (separate pip install) — More isolation but creates packaging/versioning overhead and the prover operates on the same policy YAML format the SDK already handles.
-
OPA-based static analysis — OPA is already used at runtime but its Rego-based approach is better suited for per-request evaluation than exhaustive reachability analysis across all policy+credential+binary combinations.
Agent Investigation
- Explored the OpenShell repo structure: 12 Rust crates, Python SDK at
python/openshell/, agent skills at.agents/skills/ - Confirmed
openshell-policycrate handles YAML validation but has no formal verification - Confirmed Rust CLI already uses
std::process::Commandfor subprocess delegation (docker, ssh, git) — same pattern works for Python prover - Confirmed
PolicyCommandsenum incrates/openshell-cli/src/main.rsis the right place to addProvevariant - OPP prototype validated against production policies (NemoClaw) and surfaced real security findings
Scope
- Python source modules under
python/openshell/prover/ - Binary/API/credential registry data files
- Unit tests (
prover_test.py) pyproject.tomloptional dependency group (prover)openshell policy proveRust CLI subcommand (subprocess delegation to Python prover)harden-sandbox-policyagent skill +AGENTS.mdupdate
Follow-up work
- Demo scenarios under
/examples/policy-prover/ - Policy Advisor integration (verify proposed rules before approval)
- CI gate (run OPP against example policies in CI)