release: v0.4.0 — Artifact-infrastructure track complete#37
Merged
Conversation
Missed in the v0.3.0 release prep (PR-29 added the case study; PR-33 bumped the roadmap entry but not the case studies table). Ships with the next dev → main promotion.
Third and final item in the locked artifact-infrastructure track. Closes
the sequence verify -> export --bundle -> embedded CLI invocation. After
PR-35, the artifact can answer four questions without external bookkeeping:
- what happened (existing case results + verdict)
- how it was evaluated (existing materialized spec + invariants)
- what was exported (existing bundle manifest + bundle_id)
- what exact command produced it (new cli_invocation)
That's procedural provenance — a real capability increase on top of v0.3.0,
even though implementation size is small.
Semantic boundary (load-bearing):
cli_invocation records *what command produced the artifact*, NOT a
guarantee that re-running it will produce identical outputs. Replay
determinism still lives in materialized_hash (preserved perturbation
evidence) and the bundle's bundle_id (preserved manifest + file hashes).
The bundle's auto-generated README carries an explicit disclaimer.
Capture contract (deliberately narrow):
- argv: normalized invocation tokens. argv[0] canonicalized to
'falsifyai' regardless of entry path (entry-point launcher,
python -m, direct script). Subsequent tokens preserved verbatim.
- falsifyai_version: runtime package version at capture time.
Explicitly NOT captured by design:
- environment variables (secret-leakage surface)
- API keys (no auth-bearing CLI flag today; future flags MUST redact)
- current working directory (specs are referenced by path in argv)
- hostname / username / machine identifiers (operator identity belongs
at the commit / export layer, not the artifact)
- shell history / pre-shell-expansion argv (unavailable by construction)
- file contents (spec YAML lives in MaterializedSpec already)
Capture point:
- cmd_run only (single capture point at function entry, before any
subcommand-internal mutation)
- Read-only consumer surfaces (replay, inspect, history, diff, verify,
export) NEVER stamp invocation (preservation discipline)
Backward compatibility:
- Default value is None (explicit absence, not a sentinel)
- Deserializer treats missing JSON key as None
- Pre-PR-35 artifacts load cleanly; verify does NOT gain a 9th check
Bundle wire-up:
- PR-32 already added a conditional render path; PR-35 makes it light up
- Rendering improved: shlex.join argv + falsifyai_version + the
determinism-vs-provenance disclaimer
Architectural assertion (Tier-4 discipline):
- test_cli_invocation_model_does_not_import_resolver: importing
CliInvocation alone never loads verdict.resolver
- test_replay_models_module_does_not_import_resolver: broader assertion
that the preservation layer stays independent of the interpretation
layer (future field additions can't accidentally cross the boundary)
Folded co-edits (per plan §G):
- README compliance-routing callout above 'Status and roadmap' pointing
procurement readers at docs/COMPLIANCE.md (candidate #5 fold)
- One-line note in 'What's in the evidence?' section about the new field
- Architectural-test-pattern-as-template observation lives in the
walkthrough rather than a separate doc file (candidate #6 fold)
Docs:
- CHANGELOG [Unreleased]: PR-35 entry + capture contract summary
- docs/EVIDENCE.md §6: mark artifact-infrastructure track 3-of-3 shipped;
note signing is the remaining locked deferral
- docs/COMPLIANCE.md §2 Annex IV mapping: new 'Command that produced
this evidence' row mapping to cli_invocation.argv +
cli_invocation.falsifyai_version
Tests: +24 unit (model+capture+architectural) + 5 integration = 29 new.
Suite: 519 -> 541 passing. Ruff + format clean.
Out of scope (explicit non-goals):
- Cryptographic signing (signature_slots remain reserved)
- Bundle import / round-trip
- --json output mode
- Verify check for invocation presence (would break pre-PR-35 artifacts)
- Capture from non-cmd_run commands
- Capture from a hypothetical programmatic falsifyai.runtime.run() API
Version: this is a MINOR bump (new feature, new public dataclass, new
persisted schema field). Release-prep PR after merge should target v0.4.0
('Artifact-infrastructure track complete'), not v0.3.1.
Three stale references found by audit after the artifact-infrastructure track closed with PR-35: - README.md 'Coming next' section — was 'persisted CLI-invocation field (next)'; the track is now complete. Replaced with 'track complete' framing + the 'driven by external pressure' next-pull dynamic (regulatory signing, second case study, falsifyai import, etc.). - docs/case-studies/02 opening — claimed specs were 'planned follow-up work', but PR-30 actually shipped them. Updated to link the existing specs/ directory. - docs/case-studies/README.md index row for CS-02 — said 'CLI formalization forthcoming'; specs already exist. Updated tools-used label. Release-state metadata (status banner, current-release entry, RELEASE.md illustrative examples) intentionally NOT changed here — those bump in the v0.4.0 release-prep cycle, not on dev between releases.
Artifact-infrastructure track complete (3 of 3 locked items shipped).
v0.4.0 adds persisted cli_invocation on ReplayArtifact via PR-35.
Version bumps (sources of truth, all aligned):
pyproject.toml 0.3.0 -> 0.4.0
falsifyai/__init__.py 0.3.0 -> 0.4.0
tests/unit/test_version.py rename + bump asserted value
CITATION.cff version 0.3.0 -> 0.4.0
uv.lock editable-install entry 0.3.0 -> 0.4.0
(synced in same commit, not as a trailing fix)
Release-state metadata (drift-prone files per RELEASE.md §7):
README.md status banner - new wave: artifact-infrastructure
track complete; cli_invocation +
four-question artifact summary
README.md "Status and roadmap" - 0.4.0 (current release) section
promoted; 0.3.0 demoted to compact
historical entry
README.md "Coming next" - cleaned up the leftover diff-sharpening
"Shipped in v0.3.0" bullet; reduced to
a single "driven by external pressure"
paragraph
CHANGELOG.md - [Unreleased] placeholder preserved on
top; PR-35 entries promoted into new
[0.4.0] - 2026-05-24 section;
[0.4.0] link reference added
docs/RELEASE.md - MINOR example (0.2.0->0.3.0 ->
0.3.0->0.4.0); tag commands v0.4.0;
GitHub Release example v0.4.0 +
thematic name; post-release dev-marker
example 0.4.0->0.5.0
docs/case-studies/01-invisible-character-substitution.md
- pip install command bumped to 0.4.0
(2 occurrences)
Bonus audit fixes - stale 'vNEXT' placeholders left over from v0.3.0
release-prep (caught by the comprehensive grep this cycle):
docs/COMPLIANCE.md:138, 165 - 'shipped in vNEXT' -> 'shipped in v0.3.0'
(these always meant v0.3.0; vNEXT was the
pre-release placeholder I forgot to bump)
docs/COMPLIANCE.md:12 - 'As of v0.3.0...' -> 'As of v0.4.0...';
also names the new procedural-provenance
capability that v0.4.0 adds
docs/COMPLIANCE.md:56 - cli_invocation row: 'since v0.3.0+PR-35' ->
'since v0.4.0' (since the PR ships in 0.4.0,
not 0.3.0)
docs/EVIDENCE.md:328 - '3-of-3 shipped as of v0.3.0 + PR-35' ->
'as of v0.4.0' (cleaner once PR-35 has a
release version of its own)
Verified: 510 + 31 = 541 tests pass. Ruff + format clean.
Preserved as intentional historical references (NOT changed):
CHANGELOG.md [0.3.0] entry body - historical, immutable
CHANGELOG.md [0.3.0] link reference - always present
README byte-identical-to-v0.2.0 fixture claim - load-bearing fixture
README.md "0.3.0 - Artifact-infrastructure track
(2 of 3)" roadmap entry - historical
tests/integration/test_diff_end_to_end.py
v0.2.0_baseline.txt fixture refs - load-bearing fixture
tests/unit/test_bundle_writer.py
_FIXED_VERSION = "0.3.0.test" - test sentinel
tests/unit/test_cli_invocation_model.py
falsifyai_version="0.3.0" test-data values - test fixtures, not
release-state metadata
.github/workflows/publish.yml GITHUB_REF
format comment - illustrative
What v0.4.0 ships (since v0.3.0):
PR-35 - feat(replay): persist cli_invocation on ReplayArtifact
Post-PR-35 docs audit - update stale 'next' / 'forthcoming' framing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promotes `dev` to `main` for the v0.4.0 release.
Artifact-infrastructure track complete (3 of 3). v0.4.0 adds the final piece: persisted `cli_invocation` on `ReplayArtifact` — descriptive procedural provenance. The locked sequence `verify` → `export --bundle` → embedded CLI invocation is now closed.
After v0.4.0, the artifact answers four questions without external bookkeeping:
That's a meaningful release boundary both architecturally (locked sequence closed) and externally (EU AI Act Annex IV §2(g) "the procedures used" is now answered by the artifact itself, not external bookkeeping).
Commits to promote (4)
Version sources of truth (all aligned at 0.4.0)
Release-state metadata audited (per RELEASE.md §7)
Test plan
What's NOT in this release (deferred — waits for external pressure)
Per the project's "wait for pressure" discipline, none of these is on a fixed schedule — each waits for external signal.