perf(vcr-ra): scope #468 redundant memory-base materialization (spike)#469
Merged
Conversation
… (spike) Scoping spike for #468 (epic #242, north-star #390): a straight-line run of stores to constant addresses re-materializes the loop-invariant linear-memory base on every store in the optimized (non-relocatable) path. Empirical finding (the load-bearing result): the two codegen paths diverge and the relocatable path ALREADY implements the target shape — it pins the base in a stable register (fp) once and addresses each store [fp,#off] (0 redundant materializations). The optimized path re-materializes the base into ip/r12 per store; ip is encoder scratch, so it cannot persist the base — the path has no persistent base register today. Fix is therefore allocator-aware (reserve a callee-saved base reg in optimizer_bridge.rs, hoist movw/movt once per const-base store-run), NOT a post-pass peephole. Blast radius (optimized path, cortex-m4): 7 base-materializations on the 7-store fixture (6 redundant ~ 48 B + ~12 cyc), 21 on flight_seam, 42 on flight_seam_flat. No codegen change — frozen-safe by construction: the frozen byte gate compiles --relocatable, where the count is already 0, so the eventual optimized-path CSE touches bytes the goldens never cover. Frozen gate verified green on this branch. The byte-changing CSE (SYNTH_BASE_CSE flag-off -> on-target cycle gate -> default-on flip) is the explicitly-separate next gated step, same protocol as the cmp->select / local-promotion / immediate-shift levers. Adds the generic neutral-address fixture + a scoping note recording the measurement, root cause, the two-path finding, and the fix shape. Refs #468, #242, #390 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scoping spike for #468 (epic #242, north-star #390). No codegen change — frozen-safe. The byte-changing CSE is the explicitly-separate next gated step.
The pattern
A straight-line run of stores to constant addresses re-materializes the loop-invariant linear-memory base on every store, in the optimized (non-relocatable) path:
Load-bearing finding: the two paths diverge, and one already does the right thing
select_with_stack; the frozen-gate path): base is a relocated symbol → pinned in a stable reg (fp) once, every store[fp,#off]. This is the target shape perf: consecutive const-address stores re-materialize the linear-memory base every time #468 wants.optimizer_bridge.rs): base is an absolute constant → re-materialized intoip/r12 per store.ipis encoder scratch (can't persist across the encoder's indexed-load expansion), so the path has no persistent base register today.⇒ the fix is allocator-aware (reserve a callee-saved base reg, hoist
movw/movtonce per const-base store-run, mirror the relocatable[base,#off]shape) — not a post-pass peephole. On the 7-store fixture the 6 redundant materializations cost ~48 B + ~12 cyc; the waste scales with store-run length (42 in flight_seam_flat).Frozen-safe
This commit touches no codegen; the frozen byte gate (
--relocatable, count already 0) is green on the branch. The note records the contingent frozen-safety constraint the eventual CSE PR must hold (confine to the optimized path; that path's result evidence is the out-of-CI differential, not a cargo gate).Next gated step (separate PR)
SYNTH_BASE_CSEflag-off → unicorn differential (this fixture is the non-vacuity case) → on-target cycle gate → default-on flip + re-freeze any optimized-path goldens touched.Refs #468, #242, #390
🤖 Generated with Claude Code