Skip to content

feat: regenerate expected heredocs via CANON_REGENERATE_EXPECTED: https://github.com/lutaml/canon/issues/146#147

Open
opoudjis wants to merge 1 commit into
mainfrom
fix/regenerate-expected
Open

feat: regenerate expected heredocs via CANON_REGENERATE_EXPECTED: https://github.com/lutaml/canon/issues/146#147
opoudjis wants to merge 1 commit into
mainfrom
fix/regenerate-expected

Conversation

@opoudjis

Copy link
Copy Markdown
Contributor

Closes #146.

Summary

Adds opt-in mode CANON_REGENERATE_EXPECTED=true that rewrites the source heredoc backing a failing be_*_equivalent_to assertion with the prettyprinted received value. Default OFF; passing assertions are never touched; negated matchers (.not_to) are never rewritten.

Workflow:

CANON_REGENERATE_EXPECTED=true bundle exec rspec
git diff
git commit -am "rebaseline fixtures after upstream X change"

What's new

v1 supports:

  • <<~XML / <<-XML / <<XML heredoc assigned to a local variable in the same it block.
  • Multiple sequential assignments to the same variable (most-recent-before-expect-line semantics) — covers the common metanorma pattern of reassigning output between format-specific expects.
  • Inline heredoc passed directly to the matcher.
  • Substitution chains on the actual side (expect(strip_guid(actual).gsub(...))).

v1 skips with [canon:rebaseline] skipped_<reason> warning:

  • Heredoc with #{} interpolation (v2 plans token-preserving merge).
  • Expected from a method-call result (load_fixture(...)).
  • Expected from let / shared_context in a different file.
  • Inline string literal expected.

Line-shift handling. Multiple rewrites within one process correctly handle line numbering shifts between the in-memory source (which Ruby's caller_locations reports against) and the on-disk file (which the rebaseliner has changed). A per-file cumulative shift tracker translates each subsequent caller line into its current file position. Tested via the multi-assignment fixture.

Implementation

New module Canon::Rebaseliner under lib/canon/rebaseliner/:

  • AtomicWriter — tempfile + rename, cross-device safe.
  • Logger — single-line stderr writes with [canon:rebaseline] prefix.
  • HeredocSpec — struct describing a heredoc's byte range and style.
  • HeredocRewriter — re-indents <<~ bodies, writes verbatim for <<- / <<.
  • HeredocLocator — Prism-based AST walk; "most-recent assignment" semantics for local-var references.
  • CallSiteResolver — Prism parse, finds the matcher call and its enclosing it block.
  • Rebaseliner (orchestrator) — enabled?, rewrite!, line-shift tracker.

Matcher hook in lib/canon/rspec_matchers.rb. Env-var schema registered in lib/canon/config/env_schema.rb for --env-help discoverability. caller_locations is only captured when the env var is set, so passing assertions pay no overhead.

prism added as a runtime dependency (stdlib in Ruby 3.3+; thin gem on 2.7-3.2).

Tests

10 new fixture cases under spec/fixtures/rebaseliner/:

  • Squiggly heredoc rewrite + idempotent re-run.
  • Multi-assignment-in-same-it-block rewrite.
  • Inline heredoc rewrite.
  • Interpolation skip (file unchanged).
  • Inline-string skip (file unchanged).
  • Negated matcher no-op (file unchanged).

Plus unit tests on Rebaseliner.enabled? env-var truthiness.

Full canon suite: 2222 examples, 0 failures, 1 pending (no regressions).

Internal documentation

docs/features/regenerate-expected.adoc covers architecture, supported forms, line-shift handling, limitations, and v2 roadmap. README updated to link.

v2 (separate PR)

Per discussion on #146:

  • Interpolation support via token preservation (re-anchor #{var} fragments in the prettyprinted actual).
  • JSON / YAML format prettyprinter wiring.
  • canon regenerate SPEC_GLOB Thor subcommand.
  • File-locking for parallel_rspec safety.
  • Optional rubocop -A post-rewrite formatter.

Test plan

  • Canon's own suite passes (2222/0/1 pending).
  • All 10 rebaseliner fixture cases pass.
  • End-to-end against metanorma-iso pinned to this branch (left as a downstream test by the requester — canon-internal coverage is comprehensive).

🤖 Generated with Claude Code

Add a new opt-in mode that rewrites the source heredoc backing a
failing `be_*_equivalent_to` assertion with the prettyprinted received
value, so fixture rebaselining is `bundle exec rspec` + `git diff` +
commit instead of hundreds of manual copy-paste cycles.

Surface: env var `CANON_REGENERATE_EXPECTED=true`. Default OFF. When
set, `SerializationMatcher#matches?` on failure:

1. Captures `caller_locations` (only when env var is set, so passing
   assertions pay no overhead).
2. Parses the caller spec file with Prism, locates the matcher call
   at the failing line, identifies the `expected` argument and the
   enclosing `it` block.
3. Resolves the `expected` to a heredoc: inline literal, or local var
   tracked back to its most-recent assignment within the same block.
4. Pretty-prints the received value with the appropriate
   `Canon::PrettyPrinter` (XML or HTML; both are public APIs already
   used by `CANON_<FORMAT>_DIFF_SHOW_PRETTYPRINT_RECEIVED`).
5. Atomically rewrites the heredoc body in place.
6. Returns the assertion as passing so CI does not fail mid-rebaseline.
7. Logs `[canon:rebaseline] rewritten <file>:<line>` (or
   `skipped_<reason>` for cases that can't be safely rewritten).

Negated matchers (`.not_to`) are never rewritten (via a separate
`does_not_match?` that bypasses the rebaseliner hook). Multiple
rewrites within the same process correctly handle line-number shifts
between the in-memory source (what Ruby reports via `caller_locations`)
and the on-disk file (which has changed under us); a per-file
cumulative line-shift tracker translates each subsequent caller line
into its current file position.

v1 supports:

- `<<~XML` / `<<-XML` / `<<XML` heredoc assigned to a local variable
  in the same `it` block, including the metanorma-iso pattern of
  multiple sequential assignments to the same variable.
- Inline heredoc passed directly to the matcher.
- Substitution chains on the actual side (the matcher receives the
  post-substitution value, so idempotency holds next run).

v1 skips with a warning:

- Heredoc with `#{}` interpolation (v2 plans token-preserving merge).
- Expected value from a method call.
- Expected value from `let`/`shared_context` in a different file.
- Inline string literal expected (no heredoc).

Architecture (lib/canon/rebaseliner/):

- AtomicWriter — tempfile + rename, cross-device safe.
- Logger — single-line stderr writes with `[canon:rebaseline]` prefix.
- HeredocSpec — struct describing a heredoc's byte range and style.
- HeredocRewriter — re-indents `<<~` bodies, writes verbatim for
  `<<-`/`<<`.
- HeredocLocator — Prism-based AST walk, "most-recent assignment"
  semantics for local-var references.
- CallSiteResolver — Prism parse of the spec file, finds the matcher
  call and its enclosing `it` block.
- Rebaseliner (orchestrator) — `enabled?`, `rewrite!`, line-shift
  tracker.

Matcher hook in `lib/canon/rspec_matchers.rb`. Env-var schema
registered in `lib/canon/config/env_schema.rb` for `--env-help`
discoverability. `prism` added as a runtime dependency (in Ruby 3.3+
stdlib; gem on 2.7-3.2, matching canon's gemspec floor).

Tests: 10 fixture cases covering squiggly/dash/inline heredocs,
multi-assignment within `it` block, interpolation skip, inline-string
skip, method-call skip, negated-matcher no-op, and post-rewrite
idempotency (re-run with env var OFF still passes). Full suite green:
2222 examples, 0 failures.

Documentation: `docs/features/regenerate-expected.adoc` (internal,
architecture + supported forms + roadmap); README link.

v2 (separate PR per the discussion on #146): interpolation support
via token preservation, JSON/YAML prettyprinter wiring, `canon
regenerate SPEC_GLOB` Thor subcommand, parallel-rspec safety,
optional `rubocop -A` post-rewrite formatter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: CANON_REGENERATE_EXPECTED mode to rewrite heredoc expectations from actual values

2 participants