perf(vcr-ra): scope VCR-RA-002 leaf prologue shrink (spike)#471
Merged
Conversation
Scoping spike for the 6-register leaf prologue (#428, epic #242, north-star #390): `push {r4-r8,lr}` / `pop {…,pc}` is the largest remaining cycle residual (~12 cyc) on the dissolved hot path. No codegen change — frozen-safe (frozen gate verified green). Falsifies the naive hypothesis and corrects the layer (both empirically): 1. Promoting locals into caller-saved homes is necessary-but-insufficient — the operand-stack TEMPS still occupy r4-r8 and keep the push alive. With 1 param the free caller-saved set is exactly {r1,r2,r3}; if locals claim them, temps are forced onto callee-saved (net-neutral or worse). VCR-RA-002 is a JOINT locals+temps coloring problem: both prefer caller-saved, spill to callee-saved only under combined pressure, then shrink drops the push. 2. The change belongs in `reallocate_function` (liveness.rs), NOT `compute_local_promotion`. The non-relocatable path (the host-link / native- pointer consumer's path) re-colors every segment over POOL=[r0..r8] AFTER selection, washing out any caller-saved reg the selector picks (verified: a selector-level reg choice is a no-op on that path; LOCAL_PROMOTE's structural reg-vs-frame change survives, a reg *choice* does not). The reallocator must be leaf-aware: bias POOL toward caller-saved for leaf segments so the body touches no callee-saved unless pressure forces it; shrink then removes the dead push. #193/#220-class — land flag-off + unicorn-differential + on-target gate. Adds the generic neutral-value fixture (1 param + 3 promotable i32 locals — the non-vacuity case where locals consume all free caller-saved regs, exposing the temps interaction) + a scoping note. Adjacent finding filed separately: the optimized/non-relocatable path emits this EXPORTED leaf clobbering callee-saved r4-r8 with `bx lr` and no push (independent of promotion + realloc/shrink; the --relocatable build is correct) — potential AAPCS violation, differential-blind, #470. If real it outranks this lever. Refs #428, #242, #390, #470 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
avrabe
added a commit
that referenced
this pull request
Jun 24, 2026
… design (analysis) (#473) #470 asked whether the optimized (non-relocatable) path emitting an exported leaf that clobbers callee-saved r4-r8 with `bx lr` and no push is an AAPCS violation. Determination: by-design, not a bug. synth has two ABI models selected by --relocatable: the optimized path is the self-contained bare-metal model (non-AAPCS by design — documented in arm_backend.rs: "does not preserve caller-saved registers across calls"; ir_to_arm also emits no callee-saved push and its temp pool prefers r4-r8), and --relocatable is the AAPCS/host-link path (select_with_stack, fp-relative, proper preservation). The self-contained model is self-consistent (verified — intra_module_callee_saved.wat): a call-containing function is declined by the optimizer to the direct selector, which holds across-call values on the FRAME (caller `a`: str [sp] before `bl`, ldr [sp] after — never relies on the callee preserving r4-r8) and pushes/pops {r4-r8,lr} for ITS caller. So an optimized leaf's callee-saved clobber corrupts nothing intra-module; the only exposure is an external AAPCS caller, for which --relocatable is the required documented path. No fix warranted. Also resolves the VCR-RA-002 (#471) baseline worry: the measured `push {r4-r8,lr}` is on the call-containing/relocatable path, NOT optimized leaves — so #470 does not reset VCR-RA-002's baseline. Adds the generic evidence fixture + the ABI-model note. Refs #470, #428, #242 Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scoping spike for the 6-register leaf prologue (#428, epic #242, north-star #390) — the largest remaining cycle residual (~12 cyc,
push {r4-r8,lr}/pop). No codegen change — frozen-safe (frozen gate green). The byte-changing allocator change is the separate next gated step.Two empirical findings (the value of the spike)
1. Pool change is necessary-but-insufficient — it's a JOINT locals+temps problem.
Promoting locals into caller-saved r1/r2/r3 works in isolation (disasm: locals moved r4,r5,r6 → r1,r2,r3) but the push does not shrink: operand-stack temps still occupy r4-r7. With 1 param the free caller-saved set is exactly {r1,r2,r3}; if locals claim them, temps are forced onto callee-saved (net-neutral or worse). Dropping the push needs locals and temps to prefer caller-saved, spilling to callee-saved only under combined pressure — then
shrink_callee_saved_savesfires.2. Wrong layer — it must live in
reallocate_function, notcompute_local_promotion.The non-relocatable path (host-link / native-pointer consumer) runs
select_with_stackthenreallocate_function+ shrink, and the reallocator re-colors every segment overPOOL=[r0..r8], washing out any caller-saved reg the selector picked. Verified:SYNTH_NO_LOCAL_PROMOTEon/off changes bytes (structural reg-vs-frame) but a selector-level reg choice is a no-op on that path. ⇒ make the reallocator leaf-aware (biasPOOLtoward caller-saved for leaf segments); shrink then drops the dead push. #193/#220-class → flag-off + unicorn differential + on-target gate.Adjacent finding (filed #470)
The optimized/non-relocatable path emits this exported leaf clobbering callee-saved r4-r8 with
bx lrand no push (independent of promotion + realloc/shrink; the--relocatablebuild is correct). Potential AAPCS violation, differential-blind. If real it outranks this lever and changes its baseline.Fixture
leaf_caller_saved.wat— 1 param + 3 promotable i32 locals, the non-vacuity case (locals consume all free caller-saved regs, exposing the temps interaction). Neutral values.Refs #428, #242, #390, #470
🤖 Generated with Claude Code