docs+tooling: external-audit follow-up — claim calibration + portable validation harness#50
Merged
Merged
Conversation
… validation harness Independent audit (calibration review) found the engineering sound; the residual gaps were claim wording and validation portability. This addresses both: - README: reword the ts_checker Components row from "state transition proofs" to "bounded risk-ceiling + declarative-invariant satisfiability checks (sound on its inputs)", linking THREAT_MODEL §8 — the lone residual overstatement. - README: add a "Guarantees today vs roadmap" table splitting the deterministic kernel floor from the cooperative user-space gate, with roadmap items tagged (per-action manifests JG #62, reproducible builds JG #46, OTLP JG #11). - run_professor_validation.sh: run on macOS stock bash 3.2 (indirect per-tier vars instead of `declare -A`), auto-export Homebrew Z3/libclang on Darwin, scope Tier 1 to portable crates (ts_checker, ts_wire) on non-Linux, fail-fast cd guard. Lifts the lowest-scored audit pillar (operational validation). - CHANGELOG: record the above. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Follow-up sweep per review: grep for overstated formal-method phrasing across README/THREAT_MODEL/SECURITY_ARCHITECTURE/CHANGELOG/ts_checker and reword each hit to an accurate description of the bounded SMT check. - README shippable-components: "Z3 totality audit" -> "Z3 risk-ceiling check (bounded SMT: r + w <= c)"; "Z3 per-agent invariant verification" -> "Z3 declarative-invariant satisfiability check". - ts_checker/src/lib.rs: verify_state_transition doc comment now states it checks the single inequality `current_risk + action_weight <= ceiling` with one SAT query (SAT=>ALLOW, else incl. Unknown=>DENY), not a "Deep Temporal State" proof. - cargo fmt: collapse the two `Sat => Ok(())` match arms (pre-existing fmt drift that was failing the Format check job). Verified: `grep` for the overstated phrases now returns zero hits; `cargo fmt --check` clean; `cargo test -p ts_checker` green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to the independent calibration audit (in
analysis/, not committed). The audit found the engineering sound and confirmed the kernel floor, tamper-evident ledger, fail-closed Z3, and SLSA/cosign provenance; the residual gaps were claim wording and validation portability. The claim-hygiene pass already absorbed most findings — this closes the rest.Tier 1 — quick wins
ts_checkerrow no longer says state transition proofs; now bounded risk-ceiling + declarative-invariant satisfiability checks (sound on its inputs), linking THREAT_MODEL §8. This was the lone surviving overstatement the audit flagged.scripts/run_professor_validation.sh): runs on macOS stock bash 3.2 (indirect per-tier vars instead ofdeclare -A), auto-exports Homebrew Z3/libclang on Darwin sots_checkerlinks, scopes Tier 1 to portable crates (ts_checker,ts_wire) on non-Linux, fail-fastcdguard. Lifts the lowest-scored pillar (operational validation, 2.5/5) — an evaluator on a Mac now gets a green Tier 1 instead of a link error.Tier 2 — messaging
Out of scope (filed)
Docs + one POSIX-portable shell script only; no daemon/runtime code paths touched.
🤖 Generated with Claude Code