Skip to content

perf(vcr-ra): scope #468 redundant memory-base materialization (spike)#469

Merged
avrabe merged 1 commit into
mainfrom
vcr-ra/468-base-remat-spike
Jun 24, 2026
Merged

perf(vcr-ra): scope #468 redundant memory-base materialization (spike)#469
avrabe merged 1 commit into
mainfrom
vcr-ra/468-base-remat-spike

Conversation

@avrabe

@avrabe avrabe commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Scoping spike for #468 (epic #242, north-star #390). No codegen change — frozen-safe. The byte-changing CSE is the explicitly-separate next gated step.

The pattern

A straight-line run of stores to constant addresses re-materializes the loop-invariant linear-memory base on every store, in the optimized (non-relocatable) path:

movw ip,#<lo>; movt ip,#<hi>; add ip,ip,rN; str rV,[ip]   ; per store

Load-bearing finding: the two paths diverge, and one already does the right thing

fixture optimized path relocatable path
redundant_base_materialization (7 stores) 7 (6 redundant) 0
control_step 2 0
flight_seam 21 0
flight_seam_flat 42 0
  • Relocatable path (select_with_stack; the frozen-gate path): base is a relocated symbol → pinned in a stable reg (fp) once, every store [fp,#off]. This is the target shape perf: consecutive const-address stores re-materialize the linear-memory base every time #468 wants.
  • Optimized path (optimizer_bridge.rs): base is an absolute constant → re-materialized into ip/r12 per store. ip is encoder scratch (can't persist across the encoder's indexed-load expansion), so the path has no persistent base register today.

⇒ the fix is allocator-aware (reserve a callee-saved base reg, hoist movw/movt once per const-base store-run, mirror the relocatable [base,#off] shape) — not a post-pass peephole. On the 7-store fixture the 6 redundant materializations cost ~48 B + ~12 cyc; the waste scales with store-run length (42 in flight_seam_flat).

Frozen-safe

This commit touches no codegen; the frozen byte gate (--relocatable, count already 0) is green on the branch. The note records the contingent frozen-safety constraint the eventual CSE PR must hold (confine to the optimized path; that path's result evidence is the out-of-CI differential, not a cargo gate).

Next gated step (separate PR)

SYNTH_BASE_CSE flag-off → unicorn differential (this fixture is the non-vacuity case) → on-target cycle gate → default-on flip + re-freeze any optimized-path goldens touched.

Refs #468, #242, #390

🤖 Generated with Claude Code

… (spike)

Scoping spike for #468 (epic #242, north-star #390): a straight-line run of
stores to constant addresses re-materializes the loop-invariant linear-memory
base on every store in the optimized (non-relocatable) path.

Empirical finding (the load-bearing result): the two codegen paths diverge and
the relocatable path ALREADY implements the target shape — it pins the base in a
stable register (fp) once and addresses each store [fp,#off] (0 redundant
materializations). The optimized path re-materializes the base into ip/r12 per
store; ip is encoder scratch, so it cannot persist the base — the path has no
persistent base register today. Fix is therefore allocator-aware (reserve a
callee-saved base reg in optimizer_bridge.rs, hoist movw/movt once per const-base
store-run), NOT a post-pass peephole.

Blast radius (optimized path, cortex-m4): 7 base-materializations on the 7-store
fixture (6 redundant ~ 48 B + ~12 cyc), 21 on flight_seam, 42 on flight_seam_flat.

No codegen change — frozen-safe by construction: the frozen byte gate compiles
--relocatable, where the count is already 0, so the eventual optimized-path CSE
touches bytes the goldens never cover. Frozen gate verified green on this branch.

The byte-changing CSE (SYNTH_BASE_CSE flag-off -> on-target cycle gate ->
default-on flip) is the explicitly-separate next gated step, same protocol as the
cmp->select / local-promotion / immediate-shift levers.

Adds the generic neutral-address fixture + a scoping note recording the
measurement, root cause, the two-path finding, and the fix shape.

Refs #468, #242, #390

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 7b22b38 into main Jun 24, 2026
11 checks passed
@avrabe avrabe deleted the vcr-ra/468-base-remat-spike branch June 24, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant