release: v0.4.0 — Artifact-infrastructure track complete by ericckzhou · Pull Request #37 · ericckzhou/falsifyai

ericckzhou · 2026-05-24T12:49:11Z

Summary

Promotes `dev` to `main` for the v0.4.0 release.

Artifact-infrastructure track complete (3 of 3). v0.4.0 adds the final piece: persisted `cli_invocation` on `ReplayArtifact` — descriptive procedural provenance. The locked sequence `verify` → `export --bundle` → embedded CLI invocation is now closed.

After v0.4.0, the artifact answers four questions without external bookkeeping:

Question	Source
What happened	existing case results + verdict
How it was evaluated	existing materialized spec + invariants
What was exported	bundle manifest + `bundle_id`
What exact command produced it	new `cli_invocation`

That's a meaningful release boundary both architecturally (locked sequence closed) and externally (EU AI Act Annex IV §2(g) "the procedures used" is now answered by the artifact itself, not external bookkeeping).

Commits to promote (4)

Commit	Description
`799ba0c`	docs(readme): add case study 02 to case studies table
`e31b9dc`	feat(replay): persist cli_invocation on ReplayArtifact (PR-35)
`74cc30a`	docs: post-PR-35 audit — update stale 'next' / 'forthcoming' framing
`2aac982`	chore(release): v0.4.0

Version sources of truth (all aligned at 0.4.0)

`pyproject.toml` — `version = "0.4.0"`
`falsifyai/init.py` — `version = "0.4.0"`
`tests/unit/test_version.py` — asserts `"0.4.0"`
`CITATION.cff` — `version: 0.4.0`, `date-released: 2026-05-24`
`uv.lock` — editable-install entry at `0.4.0`

Release-state metadata audited (per RELEASE.md §7)

✅ README banner + "Status and roadmap" current-release entry
✅ CHANGELOG `[0.4.0]` section + link reference; `[Unreleased]` placeholder preserved
✅ docs/RELEASE.md illustrative tag commands + GitHub Release example + post-release dev-marker example + MINOR semver example
✅ docs/case-studies/01 pip install command
✅ docs/EVIDENCE.md §6 (artifact-infrastructure track complete)
✅ docs/COMPLIANCE.md §2 Annex IV table + the `vNEXT` placeholders missed in the v0.3.0 cycle

Test plan

541 tests pass on dev HEAD (`2aac982`)
Ruff + format clean
Version sources all report 0.4.0
uv.lock synced in PR chore(release): v0.4.0 #36 (no trailing fix needed)
After merge: `git checkout main && git pull` to sync local
Tag `v0.4.0` (`git tag -a v0.4.0 -m "Release 0.4.0" && git push origin v0.4.0`) → triggers PyPI publish workflow
GitHub Release page with body copied from `CHANGELOG.md` `[0.4.0]` section verbatim
Post-release dev sync: `git checkout dev && git reset --hard origin/main && git push --force-with-lease origin dev`

What's NOT in this release (deferred — waits for external pressure)

Cryptographic signing — reserved `attestations: []` and `signature_slots: []` slots in the bundle manifest. Implementation deferred until artifacts cross trust boundaries (e.g., regulatory submission).
`falsifyai import ` — bundle is forward-compatible; consumer surface not implemented until external pressure pulls.
Second case study formalization — case study 02's machine-reproducible specs exist; running them and bundling the resulting ReplayStore waits for an Anthropic API key.

Per the project's "wait for pressure" discipline, none of these is on a fixed schedule — each waits for external signal.

Missed in the v0.3.0 release prep (PR-29 added the case study; PR-33 bumped the roadmap entry but not the case studies table). Ships with the next dev → main promotion.

Third and final item in the locked artifact-infrastructure track. Closes the sequence verify -> export --bundle -> embedded CLI invocation. After PR-35, the artifact can answer four questions without external bookkeeping: - what happened (existing case results + verdict) - how it was evaluated (existing materialized spec + invariants) - what was exported (existing bundle manifest + bundle_id) - what exact command produced it (new cli_invocation) That's procedural provenance — a real capability increase on top of v0.3.0, even though implementation size is small. Semantic boundary (load-bearing): cli_invocation records *what command produced the artifact*, NOT a guarantee that re-running it will produce identical outputs. Replay determinism still lives in materialized_hash (preserved perturbation evidence) and the bundle's bundle_id (preserved manifest + file hashes). The bundle's auto-generated README carries an explicit disclaimer. Capture contract (deliberately narrow): - argv: normalized invocation tokens. argv[0] canonicalized to 'falsifyai' regardless of entry path (entry-point launcher, python -m, direct script). Subsequent tokens preserved verbatim. - falsifyai_version: runtime package version at capture time. Explicitly NOT captured by design: - environment variables (secret-leakage surface) - API keys (no auth-bearing CLI flag today; future flags MUST redact) - current working directory (specs are referenced by path in argv) - hostname / username / machine identifiers (operator identity belongs at the commit / export layer, not the artifact) - shell history / pre-shell-expansion argv (unavailable by construction) - file contents (spec YAML lives in MaterializedSpec already) Capture point: - cmd_run only (single capture point at function entry, before any subcommand-internal mutation) - Read-only consumer surfaces (replay, inspect, history, diff, verify, export) NEVER stamp invocation (preservation discipline) Backward compatibility: - Default value is None (explicit absence, not a sentinel) - Deserializer treats missing JSON key as None - Pre-PR-35 artifacts load cleanly; verify does NOT gain a 9th check Bundle wire-up: - PR-32 already added a conditional render path; PR-35 makes it light up - Rendering improved: shlex.join argv + falsifyai_version + the determinism-vs-provenance disclaimer Architectural assertion (Tier-4 discipline): - test_cli_invocation_model_does_not_import_resolver: importing CliInvocation alone never loads verdict.resolver - test_replay_models_module_does_not_import_resolver: broader assertion that the preservation layer stays independent of the interpretation layer (future field additions can't accidentally cross the boundary) Folded co-edits (per plan §G): - README compliance-routing callout above 'Status and roadmap' pointing procurement readers at docs/COMPLIANCE.md (candidate #5 fold) - One-line note in 'What's in the evidence?' section about the new field - Architectural-test-pattern-as-template observation lives in the walkthrough rather than a separate doc file (candidate #6 fold) Docs: - CHANGELOG [Unreleased]: PR-35 entry + capture contract summary - docs/EVIDENCE.md §6: mark artifact-infrastructure track 3-of-3 shipped; note signing is the remaining locked deferral - docs/COMPLIANCE.md §2 Annex IV mapping: new 'Command that produced this evidence' row mapping to cli_invocation.argv + cli_invocation.falsifyai_version Tests: +24 unit (model+capture+architectural) + 5 integration = 29 new. Suite: 519 -> 541 passing. Ruff + format clean. Out of scope (explicit non-goals): - Cryptographic signing (signature_slots remain reserved) - Bundle import / round-trip - --json output mode - Verify check for invocation presence (would break pre-PR-35 artifacts) - Capture from non-cmd_run commands - Capture from a hypothetical programmatic falsifyai.runtime.run() API Version: this is a MINOR bump (new feature, new public dataclass, new persisted schema field). Release-prep PR after merge should target v0.4.0 ('Artifact-infrastructure track complete'), not v0.3.1.

Three stale references found by audit after the artifact-infrastructure track closed with PR-35: - README.md 'Coming next' section — was 'persisted CLI-invocation field (next)'; the track is now complete. Replaced with 'track complete' framing + the 'driven by external pressure' next-pull dynamic (regulatory signing, second case study, falsifyai import, etc.). - docs/case-studies/02 opening — claimed specs were 'planned follow-up work', but PR-30 actually shipped them. Updated to link the existing specs/ directory. - docs/case-studies/README.md index row for CS-02 — said 'CLI formalization forthcoming'; specs already exist. Updated tools-used label. Release-state metadata (status banner, current-release entry, RELEASE.md illustrative examples) intentionally NOT changed here — those bump in the v0.4.0 release-prep cycle, not on dev between releases.

Artifact-infrastructure track complete (3 of 3 locked items shipped). v0.4.0 adds persisted cli_invocation on ReplayArtifact via PR-35. Version bumps (sources of truth, all aligned): pyproject.toml 0.3.0 -> 0.4.0 falsifyai/__init__.py 0.3.0 -> 0.4.0 tests/unit/test_version.py rename + bump asserted value CITATION.cff version 0.3.0 -> 0.4.0 uv.lock editable-install entry 0.3.0 -> 0.4.0 (synced in same commit, not as a trailing fix) Release-state metadata (drift-prone files per RELEASE.md §7): README.md status banner - new wave: artifact-infrastructure track complete; cli_invocation + four-question artifact summary README.md "Status and roadmap" - 0.4.0 (current release) section promoted; 0.3.0 demoted to compact historical entry README.md "Coming next" - cleaned up the leftover diff-sharpening "Shipped in v0.3.0" bullet; reduced to a single "driven by external pressure" paragraph CHANGELOG.md - [Unreleased] placeholder preserved on top; PR-35 entries promoted into new [0.4.0] - 2026-05-24 section; [0.4.0] link reference added docs/RELEASE.md - MINOR example (0.2.0->0.3.0 -> 0.3.0->0.4.0); tag commands v0.4.0; GitHub Release example v0.4.0 + thematic name; post-release dev-marker example 0.4.0->0.5.0 docs/case-studies/01-invisible-character-substitution.md - pip install command bumped to 0.4.0 (2 occurrences) Bonus audit fixes - stale 'vNEXT' placeholders left over from v0.3.0 release-prep (caught by the comprehensive grep this cycle): docs/COMPLIANCE.md:138, 165 - 'shipped in vNEXT' -> 'shipped in v0.3.0' (these always meant v0.3.0; vNEXT was the pre-release placeholder I forgot to bump) docs/COMPLIANCE.md:12 - 'As of v0.3.0...' -> 'As of v0.4.0...'; also names the new procedural-provenance capability that v0.4.0 adds docs/COMPLIANCE.md:56 - cli_invocation row: 'since v0.3.0+PR-35' -> 'since v0.4.0' (since the PR ships in 0.4.0, not 0.3.0) docs/EVIDENCE.md:328 - '3-of-3 shipped as of v0.3.0 + PR-35' -> 'as of v0.4.0' (cleaner once PR-35 has a release version of its own) Verified: 510 + 31 = 541 tests pass. Ruff + format clean. Preserved as intentional historical references (NOT changed): CHANGELOG.md [0.3.0] entry body - historical, immutable CHANGELOG.md [0.3.0] link reference - always present README byte-identical-to-v0.2.0 fixture claim - load-bearing fixture README.md "0.3.0 - Artifact-infrastructure track (2 of 3)" roadmap entry - historical tests/integration/test_diff_end_to_end.py v0.2.0_baseline.txt fixture refs - load-bearing fixture tests/unit/test_bundle_writer.py _FIXED_VERSION = "0.3.0.test" - test sentinel tests/unit/test_cli_invocation_model.py falsifyai_version="0.3.0" test-data values - test fixtures, not release-state metadata .github/workflows/publish.yml GITHUB_REF format comment - illustrative What v0.4.0 ships (since v0.3.0): PR-35 - feat(replay): persist cli_invocation on ReplayArtifact Post-PR-35 docs audit - update stale 'next' / 'forthcoming' framing

ericckzhou added 4 commits May 24, 2026 07:30

docs(readme): add case study 02 to case studies table

799ba0c

Missed in the v0.3.0 release prep (PR-29 added the case study; PR-33 bumped the roadmap entry but not the case studies table). Ships with the next dev → main promotion.

ericckzhou merged commit 5beeaea into main May 24, 2026
2 checks passed

ericckzhou mentioned this pull request May 24, 2026

release: docs + CS-02 bundle (post-v0.4.0) #42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.4.0 — Artifact-infrastructure track complete#37

release: v0.4.0 — Artifact-infrastructure track complete#37
ericckzhou merged 4 commits into
mainfrom
dev

ericckzhou commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericckzhou commented May 24, 2026

Summary

Commits to promote (4)

Version sources of truth (all aligned at 0.4.0)

Release-state metadata audited (per RELEASE.md §7)

Test plan

What's NOT in this release (deferred — waits for external pressure)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant