v1.1.15 — inliner: acyclic CF (#226) + multi-site small leaves & whole-function DCE (#228)#229
Merged
Merged
Conversation
Faithful local repro of gale #219's C<->Rust decide seam: `decide` returns {action, new_count} packed into an i64; the shim unpacks them. After `--passes inline` the `run` body contains the pack (extend_i32_u/shl/or) → i64 carrier local → unpack (and/shr_u/wrap_i64) round-trip on one value — exactly the residue to dissolve. Verified: optimizes + validates today; inline leaves the round-trip intact (confirmed via `loom optimize --passes inline` + wasm-tools print). The seam-SROA pass (next) must forward the scalar components and drop the carrier, yielding `(action & 0xff) + new_count` — each rewrite Z3-proven. Refs #219
gale attached the authentic post-loom-inline dissolved z_impl_k_sem_give on the issue. Decoded + verified: sha256 f81da42d… (matches), 5254 B, valid wasm. It contains the u64 pack/unpack round-trip to dissolve — pack (extend_i32_u/shl/or) at body lines 59-63, unpack (and/shr_u/wrap_i64) at 90/104, with a dead i64 carrier. Grounds the seam-SROA pass against the real body, not just the synthetic fixture. Design confirmed on #219 (proof-carrying: Z3 obligation per ISLE rewrite; ISLE-term route preferred). Implementation (mem2reg + pack/unpack forwarding + DCE) follows. Refs #219
…tend_i32_u) Lays the ISLE-layer foundation for the #219 pack/unpack forwarding rules: - types.isle: I32WrapI64 / I64ExtendI32U ValueData variants. - constructors.isle: decl + extern constructor + extern extractor for each. - loom-shared/src/lib.rs: i32_wrap_i64_extract / i64_extend_i32_u_extract (mirror iadd32_extract, single-field). - build.rs: register seam_sroa.isle AND add the two extractors to the fix_extractor_ownership list (cranelift-isle 0.132.1 emits extractor args as owned `Value`; the post-processor rewrites them to `&Value` — without this the generated trie move-then-borrows arg0 and loom-shared won't compile). - seam_sroa.isle: the unconditional, Z3-trivial rule `wrap_i64(extend_i32_u(x)) → x`. loom-shared builds clean. IMPORTANT (next step): the generated ISLE `simplify` is NOT in loom's live rewrite path — build.rs documents this, and the active rewriters are the Rust `rewrite_pure_impl` / `rewrite_with_dataflow` (loom-shared/src/lib.rs), validated by the Z3 translation validator per pass. So this rule does not fire yet; the LIVE seam-SROA rewrite must be added as a `rewrite_pure_impl` match arm (ValueData::I32WrapI64{val} where val is I64ExtendI32U{inner} → inner), with the Z3 verifier discharging the obligation. ISLE terms retained as proof-layer documentation of the identity. Refs #219
The actual rewrite (the ISLE rule committed earlier is in loom's dead simplify path; the live rewriters are the Rust rewrite_pure_impl/rewrite_with_dataflow, validated by the Z3 translation validator per pass). Adds rewrite_pure_impl match arms for the two #219 conversion terms: - I32WrapI64: recurse operand; if it is I64ExtendI32U(inner) → inner (unconditionally sound: extract(31,0, zero_ext(32,x)) = x — the u64-ABI round-trip a packed scalar pays at the dissolved decide seam); also folds wrap_i64(i64.const N) → i32.const (low 32). - I64ExtendI32U: recurse operand; folds extend_i32_u(i32.const N) → i64.const (zero-extended). Test: test_seam_sroa_wrap_extend_identity — wrap_i64(extend_i32_u(local.get 0)) dissolves to local.get 0. Full `cargo test --release --lib --test-threads=4` green, 0 failures. This is the foundation case. The sem repro's full dissolution additionally needs the carrier mem2reg + the masked and/shr_u forwarding rules (steps 2-3), since its unpack uses and/shr_u (not wrap) on the packed u64. Refs #219
…lves) Dissolves the u64-pack unpack arithmetic, in rewrite_pure_impl (Z3-validated per pass; pure-integer so the verifier actually verifies — no skip). Each rule is unconditionally sound with only constant side-conditions (no range analysis): I64And (the `& mask` unpack): - (shl Z k) & M → 0 when M's set bits are all below k (M>>k==0, 0<k<64) - (or A B) & M → survivor & M when one OR operand is such a cleared shift I64ShrU (the `>> k` unpack): - (shr_u (shl Z k) k) → Z & (u64::MAX>>k) (shl-then-shr-same masks low 64-k) - (shr_u (or P Q) k) → (P>>k)|(Q>>k) when an operand is (shl _ k) (logical shift distributes over OR; targeted to the pack shape — no general bloat; recursion collapses the shl side via the rule above) Together these reduce a forwarded pack `(or (shl (extend_u hi) 32) lo)`: & 0xff → lo & 0xff (low field) >> 32 → (extend_u hi) & 0xffffffff (high field; lo const shifts out) Tests: test_seam_sroa_mask_clears_shifted_pack, test_seam_sroa_shr_extracts_ high_field (+ the step-1 wrap/extend test). Full `cargo test --release --lib --test-threads=4` green (399/28/17, 0 failures). These fire once inlining delivers pack/unpack adjacency (the inline-modelability extension for control-flow + trap-call decides is the remaining prerequisite, tracked separately — higher blast radius, needs gale silicon validation). Refs #219
…ier) Design for the #219 inline prerequisite: wire in precise br_table/br_if/br symbolic execution (the deferred ExecutionState/merge_states scaffolding) so the verifier can soundly prove inline-equivalence of br_table-dispatching decides. Approach: state-merging-with-ITE via a label/continuation stack (NOT path forking), exact for the acyclic single-exit case. Grounded in Alive2 (PLDI'21, the production SMT bounded-TV precedent), Kuznetsov state-merging (PLDI'12), WASP (ECOOP'22 wasm br_table rules), Boogie block-predicate WP. Includes the soundness argument, the explicit pitfalls (selector constraint, fall-through join, ⊥-not- havoc for trap arms, branch arity, over-approx-and-skip), 2-phase staging (sem_give integer first, then memory for mutex/pipe), and the adversarial verify-or-revert + silicon test gate. For gale review before implementation. Refs #219
…tection) First component of the PR-C precise-CF inline-verifier extension (design approved by gale on #219, 2026-06-17). `is_noreturn_callee(func)`: conservative-sound — true iff the body has no Return and no Br/BrIf/BrTable anywhere (recursing) and ends in Unreachable. Admits exactly the core::panicking helpers (panic_fmt → unreachable; panic_const_add_overflow → call panic_fmt; unreachable) and rejects any normal-returning or branching function. The by-body inline modeler (PR-C M3, next) will use this to encode a `call` to such a callee as ⊥ "diverge if reached" — never havoc — so a trapping branch constrains only reachability, never the post-state (Alive2 `ub` discipline; avoids the #155/#159 havoc-vs-concrete false-positive surface). Test: test_is_noreturn_callee (bare trap / trap-after-call / normal-return / no-trailing-trap / branch-escape). #[allow(dead_code)] until the M3 gate wires it. Refs #219
The soundness-critical core of the inline-modelability extension: a precise symbolic executor for structured, acyclic wasm control flow (block / if / br / br_if / br_table) by state-merging-with-ITE over a label/join stack — never path-forking. For a bounded acyclic region this is EXACT (a passing SMT check is a real proof). Precedent: Alive2 (PLDI'21), Kuznetsov state-merging (PLDI'12), WASP br_table rules (ECOOP'22). See docs/design/pr-c-precise-cf-verification.md. Built as a NEW ISOLATED function, not by mutating the shared encoders (encode_block_body / encode_function_to_smt_impl_inner stay untouched -> zero regression risk to existing passes). encode_simple_instruction's catch-all HAVOCS (pushes a fresh BV without popping operands), so delegating an unmodeled op would desync the stack; instead an allowlist gate (is_acyclic_modelable_body) front-loads soundness and the executor runs only on admitted functions. Over-approximate-and-skip everything else: loops (back-edge), calls, memory, unknown opcodes, multi-result/param blocks. br_table default guard is UNSIGNED bvuge(sel, n) — a signed >= would leave selectors in [2^31, 2^32) in a partition gap. All #[cfg(feature="verification")] + #[allow(dead_code)] until M3 wires it into inline_functions' verify-or-revert. Tests (5, green): br_table switch proves equivalent to the if/else chain; identity; detects wrong-default and swapped-arm counterexamples; gate rejects loops/memory, admits the switch. Also folds in cargo-fmt drift in loom-shared from #219 SROA @4356441. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e acyclic executor Extends exec_acyclic to model a `call F` BY BODY (the inline-soundness principle from #155 extended to control flow): pop args, recursively encode the callee through exec_acyclic with params bound to the args, declared locals zero-init, globals SHARED with the caller (callee global writes propagate back); push the merged single result and narrow the caller path to the callee's returning paths. A call to a no-return callee (is_noreturn_callee: panic -> unreachable) sets reachable=false — the path DIVERGES (drops out of the result ITE), NEVER havoc (Alive2 ub discipline; havoc reopens the #155/#159 false-positive surface). Threads VerificationSignatureContext (function_bodies, the existing #151 callee lookup) + a recursion depth into exec_acyclic / encode_acyclic_function / is_acyclic_modelable_body. MAX_ACYCLIC_CALL_DEPTH=8 bounds by-body inlining and makes a cyclic/recursive call graph fail safely (reject, never infinite-unroll). The gate now admits Call(g) iff g is a local function that is no-return or recursively acyclic-modelable; imports/call_indirect/over-deep stay rejected. exec_acyclic remains the soundness enforcer (Err, never havoc, on anything unmodelable). Still #[cfg(verification)] + #[allow(dead_code)] until M3 wires it into inline_functions' verify-or-revert. Tests (7 total, green): + call-by-body of a br_table callee proves equivalent to its inlined switch (the seam capability); + no-return call diverges not havoc. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the equivalence harness over the precise acyclic CF executor: encodes both functions via encode_acyclic_function (deterministic input names unify in Z3) and checks a == b has no counterexample. Mirrors verify_function_equivalence's soundness boundaries (no width-matching the final equality; Unknown -> Err). Returns Err whenever either side is not acyclic-modelable, a width mismatches, or the encoding fails — so the caller (inline_functions' verify-or-revert, wired in M3.2) FALLS BACK to the existing encoder. This only ADDS proving power for the br_table-callee inline case; it never removes a check. Void functions defer to the fallback (no side-effect equivalence modeled yet). Still #[allow(dead_code)] until M3.2 wires it into inline_functions. Tests (9 total, green): + test_verify_acyclic_equivalence_inline — a caller that calls a br_table decide verifies equivalent to the inlined switch, a wrong-default body does NOT verify, and a loop function Errs (caller falls back). Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…liner TranslationValidator::verify() now tries verify_acyclic_equivalence for the inline pass before the existing encoder. The main encoder models br_table approximately and a br_table callee is not straight-line-by-body-modelable, so it cannot prove `call F` (br_table callee) equal to its inlined body → the inline reverts. The acyclic executor is EXACT for acyclic CF + by-body calls, so it can. PURELY ADDITIVE + scoped to pass_name == "inline_functions": ONLY a precise Ok(true) short-circuits to accept; Ok(false)/Err/panic FALL THROUGH to the existing verify_function_equivalence_with_context unchanged (acyclic call wrapped in its own catch_unwind). This can only ADD accepts (the br_table-callee inlines), never weaken an existing check; every non-inline pass is byte-identical. This also makes the M1a/M1b/M3.1 executor chain live (the #[allow(dead_code)] markers are now harmless no-ops). No br_table callee is admitted as an inline CANDIDATE yet (function_inline_modelable is still straight-line — M3.3), so this is a behavioral no-op on the current corpus: full `cargo test --release --lib` is 408 passed / 0 failed / 2 ignored, unchanged. clippy + fmt clean. The capability is proven by the 9 acyclic unit tests; M3.3 enables it end-to-end on the seam. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…+ Unreachable trap function_inline_modelable now admits a callee under EITHER regime A (straight-line incl. memory — the existing by-body modeler, #155/#157/#161) OR regime B (acyclic control flow + pure-integer ops + by-body calls to no-return/recursively-modelable local callees — the precise acyclic executor, PR-C). Mirrors verify::is_acyclic_modelable_body so the candidate gate admits exactly what the acyclic verifier can prove (admitting an unprovable callee → caller reverts, the #159 collateral problem). Threads module context (functions + num_imported_funcs) into the gate + a recursion-depth bound; a br_table+memory callee passes neither regime → stays opaque until Phase 2. Also handles Unreachable as a trap (⊥) in exec_acyclic and both gates: the path diverges (reachable=false, drops out of the result ITE), never havoc — the seam decides end their overflow arms in `call panic_const; unreachable`, so this is required to model them at all. Full `cargo test --release --lib` = 409 passed / 0 failed / 2 ignored (+1: test_acyclic_unreachable_diverges). fmt + clippy clean. SCOPE NOTE (found by inspecting the real repro-219/sem.loom.wasm): decide (gale_k_sem_give_decide) IS regime-B-eligible (acyclic br_table + i64 pack + no-return panic call, no memory), so it is now an inline candidate. BUT its caller z_impl_k_sem_give READS MEMORY (i32.load offset=8/12 for decide's args) and calls imports, so proving decide's inline INTO z_impl needs the acyclic executor to model z_impl's memory = PHASE 2. So sem_give's seam will not dissolve until Phase 2 (memory Array threaded through exec_acyclic). M3.1-M3.3 are correct + necessary infra; Phase 2 is the next real enabler for sem_give. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ecutor
The acyclic executor now models linear memory (loads/stores), required because
real seam callers (z_impl_k_sem_give) read memory for the decide's args.
AcyclicState gains `memory: Array`. Memory is MUTABLE branch state, so it is
ITE-merged at every join exactly like locals/globals: BranchEntry/MergedState
carry the Array, acyclic_merge_entries ITE-merges it (Bool::ite over the Array
sort), acyclic_push_branch records it, acyclic_apply_join sets it, and the If arm
saves/restores it for the else branch. exec_acyclic handles I32/I64 load/store +
partial-width load8/16/store8/16 EXPLICITLY via the shared encode_i32_load_from /
encode_i64_load_from / encode_*_store_into / encode_partial_* helpers (the same
ones the main encoder + by-body modeler use -> bit-identical memory expressions);
they are NEVER delegated to encode_simple_instruction (which havocs). memory is
threaded through encode_acyclic_call (shared + propagated like globals) and
initialized in encode_acyclic_function as Array::new_const("memory", ...) -- same
name as the main encoder so both encodings unify.
Gates: new is_acyclic_memory_op (verify.rs) admitted by is_acyclic_modelable_body;
lib-side is_acyclic_memory_modelable_instr admitted by the regime-B inline gate.
Call gate stays STRICT (no impure-call havoc yet -- that is Phase 2b, the
soundness-critical part). So sem_give's z_impl (which also calls k_spin_lock import
+ z_unpend) is not yet fully modelable; 2a is the contained, testable foundation.
Full `cargo test --release --lib` = 410 passed / 0 failed / 2 ignored. New test
test_acyclic_memory_merge: store-on-each-branch then load proves equivalent to the
direct if/else (validates the per-join memory ITE-merge feeds the loaded value).
test_acyclic_modelable_gate updated (memory now admitted). fmt + clippy clean.
Refs #219
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The acyclic executor now models impure calls (imports + non-acyclic-modelable
local callees) by HAVOC, so memory-bearing seam callers like z_impl_k_sem_give
(which call k_spin_lock + z_unpend) become verifier-modelable end-to-end.
exec_acyclic's Call arm is now 3-tier (mirrors the main encoder): no-return ->
diverge; acyclic single-result local -> by-body (encode_acyclic_call); else
(import / multi-result / too-deep / loop-containing) -> acyclic_havoc_call: pop
args, push fresh `call_{g}_result_{i}`, havoc all globals + memory with a SHARED
per-encode counter. The counter is threaded (&mut u64) through exec_acyclic +
encode_acyclic_call and reset per encode_acyclic_function -- so the Nth impure
call gets the SAME havoc id in the original and optimized encodings -> they unify
in Z3. Havoc is conservative-sound: it never forges a value equal to a concrete
one, so a havoc-vs-concrete mismatch refuses to prove (the #155/#159 guard).
is_acyclic_modelable_body is now purely STRUCTURAL (drops ctx/depth params -- any
direct call is modelable since havoc covers the opaque case; clippy
only_used_in_recursion). The lib candidate gate is unchanged (decide already
admitted via its no-return panic call).
Full `cargo test --release --lib` = 413 passed / 0 failed / 2 ignored. New tests:
test_acyclic_havoc_call_unifies (havoc id parity); test_acyclic_havoc_not_concrete
(havoc'd result NOT provably equal to a concrete value -- soundness);
test_acyclic_havoc_memory (load after impure call reads havoc'd memory, not the
initial -- soundness). fmt + clippy clean.
Next: e2e dissolve on repro-219/sem.loom.wasm (decide's inline into z_impl should
now be PROVEN by the acyclic path -> no `call $gale_k_sem_give_decide`).
Refs #219
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ction DCE The inline_functions candidate predicate `(call_count==1 || size<10) && size<limit` made MULTI_CALL_SITE_LIMIT dead: a multi-call-site callee only passed when size<10, so nothing in [10,50) ever inlined (gale's 23-insn gust `mix` seam). Govern purely by the site-count-dependent budget (`size < limit`); the per-inline Z3 verify-or-revert stays the correctness backstop. Companion: multi-site inlining duplicates the body and orphans the original, but `loom optimize` never ran whole-function DCE (its `dce` = eliminate_dead_code is intra-function only). Expose eliminate_dead_functions as a CLI `dce-functions` pass after `inline` (default-on). It keeps exports, the start function, and element-table (indirect-call) targets live, conservative on parse failure (#196-safe). Test: test_inline_multisite_small_leaf_228 (11-insn 2-site leaf inlines at both sites). e2e: --passes inline,dce-functions removes the orphan + validates. 414 lib tests green. Refs #228 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ves & whole-function DCE (#228) Found by gale's gust codegen bench. Both fixes are Z3-gated (verify-or-revert). - #226: precise acyclic-CF symbolic executor (PR-C) + Regime B candidate gate — inlines acyclic-CF callees with by-body / divergent (no-return) calls (gale's mix: bounds-check if + panic + load). Was straight-line-only before. - #228: candidate predicate governed by the site-count budget (size < limit) so multi-site leaves in [10,50) inline; new default-on CLI dce-functions pass (#196-safe whole-function DCE) removes the orphans multi-site inlining creates. The #219 seam-teardown (carrier forwarding) is intentionally NOT here — frozen on feat/219-seam-sroa until the G474RE silicon dual-flash confirms it. Refs #226 #228 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two Z3-gated inliner improvements found by gale's
gustcodegen bench. The #219 seam-teardown (carrier forwarding) is intentionally excluded — it stays frozen onfeat/219-seam-sroauntil the G474RE silicon dual-flash confirms it.#226 — acyclic control-flow inlining
v1.1.14's inline-modelability gate was straight-line only, so a
#[inline]leaf whose body isif oob { call panic_bounds_check; unreachable }+ a load + arithmetic (gale'smix) stayed an opaquecall. This brings the precise acyclic-CF symbolic executor (PR-C) + the Regime B candidate gate to a release: acyclic CF + by-body calls + divergent (no-return) calls are now admitted and proven by the translation validator. Verified: a mix-shaped repro inlines (call $mix→ 0, validates).#228 — multi-site small leaves + whole-function DCE
(call_count == 1 || size < 10) && size < limitmadeMULTI_CALL_SITE_LIMIT(50) dead — a multi-site callee only passed whensize < 10, so nothing in[10,50)ever inlined (gale's 23-insn 2-sitegust_mix). Nowsize < limit(site-count budget governs; Z3 verify-or-revert stays the correctness gate).loom optimizehad no module-level DCE. New default-ondce-functionsCLI pass exposeseliminate_dead_functions(keeps exports / start / element-table targets live; v1.1.11 element remap scrambles the function-pointer table — valid-but-wrong code, falcon SIL gate fails (0.13m → 593.8m) #196-safe).Validation
test_inline_multisite_small_leaf_228); clippy clean.loom optimize --passes inline,dce-functionson an 11-insn 2-site leaf → inlined at both sites + orphan removed + validates. mix-shaped repro still inlines.Refs #226 #228