From 3ed03e1a77a2f50e30f2e499061f137849802d37 Mon Sep 17 00:00:00 2001 From: Ralf Anton Beier Date: Wed, 24 Jun 2026 20:37:27 +0200 Subject: [PATCH] =?UTF-8?q?docs(vcr-ra):=20resolve=20#470=20=E2=80=94=20op?= =?UTF-8?q?timized-path=20ABI=20model=20is=20non-AAPCS=20by=20design=20(an?= =?UTF-8?q?alysis)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit #470 asked whether the optimized (non-relocatable) path emitting an exported leaf that clobbers callee-saved r4-r8 with `bx lr` and no push is an AAPCS violation. Determination: by-design, not a bug. synth has two ABI models selected by --relocatable: the optimized path is the self-contained bare-metal model (non-AAPCS by design — documented in arm_backend.rs: "does not preserve caller-saved registers across calls"; ir_to_arm also emits no callee-saved push and its temp pool prefers r4-r8), and --relocatable is the AAPCS/host-link path (select_with_stack, fp-relative, proper preservation). The self-contained model is self-consistent (verified — intra_module_callee_saved.wat): a call-containing function is declined by the optimizer to the direct selector, which holds across-call values on the FRAME (caller `a`: str [sp] before `bl`, ldr [sp] after — never relies on the callee preserving r4-r8) and pushes/pops {r4-r8,lr} for ITS caller. So an optimized leaf's callee-saved clobber corrupts nothing intra-module; the only exposure is an external AAPCS caller, for which --relocatable is the required documented path. No fix warranted. Also resolves the VCR-RA-002 (#471) baseline worry: the measured `push {r4-r8,lr}` is on the call-containing/relocatable path, NOT optimized leaves — so #470 does not reset VCR-RA-002's baseline. Adds the generic evidence fixture + the ABI-model note. Refs #470, #428, #242 Co-Authored-By: Claude Opus 4.8 --- scripts/repro/intra_module_callee_saved.wat | 23 +++++++++ scripts/repro/optimized_path_abi_model.md | 57 +++++++++++++++++++++ 2 files changed, 80 insertions(+) create mode 100644 scripts/repro/intra_module_callee_saved.wat create mode 100644 scripts/repro/optimized_path_abi_model.md diff --git a/scripts/repro/intra_module_callee_saved.wat b/scripts/repro/intra_module_callee_saved.wat new file mode 100644 index 0000000..298df34 --- /dev/null +++ b/scripts/repro/intra_module_callee_saved.wat @@ -0,0 +1,23 @@ +;; ABI-model evidence (#470, epic #242): intra-module callee-saved self-consistency. +;; +;; The optimized (non-relocatable) path is non-AAPCS BY DESIGN (arm_backend.rs: +;; "does not preserve caller-saved registers across calls"). `ir_to_arm` also +;; emits no callee-saved `push` and its temp pool prefers r4..r8, so an optimized +;; leaf clobbers callee-saved registers. This fixture shows that is HARMLESS +;; intra-module: the caller `a` holds its across-call value on the FRAME, not in a +;; callee-saved register, so the leaf's clobber corrupts nothing. +;; +;; $b — leaf: uses temps that land in r4..r8 (ir_to_arm CANDIDATES), no push. +;; a — call-containing -> declined to the direct selector -> spills $x to +;; [sp] before `bl $b` and reloads after; pushes/pops {r4-r8,lr} for ITS +;; caller. So `a` does not rely on $b preserving r4..r8. +;; +;; See optimized_path_abi_model.md. Generic — neutral values, tied to nothing real. +(module + (memory 1) + (func $b (param $p i32) (result i32) + (i32.add (i32.mul (local.get $p) (i32.const 3)) (i32.const 7))) + (func (export "a") (param $p i32) (result i32) + (local $x i32) + (local.set $x (i32.add (local.get $p) (i32.const 100))) + (i32.add (call $b (local.get $p)) (local.get $x)))) diff --git a/scripts/repro/optimized_path_abi_model.md b/scripts/repro/optimized_path_abi_model.md new file mode 100644 index 0000000..5ed1ca3 --- /dev/null +++ b/scripts/repro/optimized_path_abi_model.md @@ -0,0 +1,57 @@ +# Optimized-path ABI model — and the #470 determination + +**Issue:** #470 (resolved here) · **Epic:** #242 (VCR-*) · context: VCR-RA-002 (#428) +**Status:** ANALYSIS / determination — no codegen change, frozen-safe. + +## Question + +#470 asked: the optimized (non-relocatable) path emits an **exported** leaf that +clobbers callee-saved r4-r8 with `bx lr` and **no push** — is that an AAPCS +violation? + +## Determination: by-design, not a bug + +synth has **two ABI models**, selected by `--relocatable`: + +| | optimized (default, non-reloc) | `--relocatable` | +|---|---|---| +| codegen | `OptimizerBridge::ir_to_arm` | `select_with_stack` (direct selector) | +| linmem base | absolute const (0x20000100) | `fp`-relative (arrives at runtime) | +| AAPCS | **no** — by design | **yes** | +| use | self-contained bare-metal image (synth controls all callers) | host-link ET_REL / external AAPCS callers | + +This is documented in `arm_backend.rs`: the optimized path *"does not preserve +caller-saved registers across calls — both wrong for a host-linked object, where +the linmem base arrives via `fp` at runtime and callees follow AAPCS."* The +callee-saved clobber is the same fact one level deeper: `ir_to_arm`'s +prologue/epilogue logic only frames on a spill (never emits a callee-saved +`push`), and its temp pool prefers r4-r8 (`CANDIDATES = [R4..R8]`), so optimized +leaves use and clobber callee-saved registers. + +## Why the self-contained model is self-consistent (verified) + +`intra_module_callee_saved.wat` is the evidence. A call-containing function is +**declined by the optimizer** (it only optimizes leaves) and falls back to the +**direct selector**, which: + +- holds values live across a call on the **frame**, not in registers — caller + `a` does `str.w r3,[sp]` before `bl $b` and `ldr.w r5,[sp]` after, so it never + relies on `$b` preserving r4-r8; +- `push {r4-r8,lr}` / `pop {…,pc}` to preserve callee-saved for **its** caller. + +So an optimized leaf (`$b`: clobbers r4,r5,r6, no push) corrupts nothing — every +caller in a self-contained image spills across calls. Confirmed on +`-b arm --target cortex-m4` (non-relocatable). + +The only exposure is an **external AAPCS caller** holding live r4-r8 across the +boundary — exactly the host-link case, for which `--relocatable` is the required, +documented path. No fix to the optimized path is warranted. + +## Consequence for VCR-RA-002 (#428) + +This also resolves the baseline worry flagged in the VCR-RA-002 spike (#471): the +`push {r4-r8,lr}` measured on-target lives on the **call-containing / relocatable** +path (direct selector), **not** optimized leaves (which have no push by design). +So #470 does **not** reset VCR-RA-002's baseline — the prologue-shrink lever +targets the direct-selector / relocatable prologue, which is correct to shrink +when the body provably uses no callee-saved register.