diff --git a/CHANGELOG.md b/CHANGELOG.md index df0e1c0..b0850d6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,38 @@ All notable changes to LOOM will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [1.1.15] - 2026-06-22 + +Inliner release: the dissolve pipeline now inlines two real seam shapes it +previously left as opaque `call`s, both gated by the Z3 translation validator +(verify-or-revert). Found by gale's `gust` codegen bench. The #219 seam-teardown +(carrier forwarding) is intentionally NOT in this release — it stays on its branch +until the G474RE silicon dual-flash confirms it. + +### Added + +- **Acyclic control-flow inlining (#226).** The inline-modelability gate was + straight-line only, so a `#[inline]` leaf whose body is `if oob { call + panic_bounds_check; unreachable }` + a load + arithmetic (gale's `mix`) stayed + an opaque `call`. A precise acyclic-CF symbolic executor (PR-C) + a matching + candidate gate (Regime B) now admit acyclic CF + by-body calls + divergent + (no-return) calls, all proven by the validator. `mix` now inlines. +- **Whole-function dead-code elimination in `loom optimize` (#228).** New + `dce-functions` CLI pass (default-on, after `inline`) exposes + `eliminate_dead_functions`: multi-site inlining duplicates a body and orphans + the original, which the intra-function `dce` could never remove. Keeps exports, + the start function, and element-table (indirect-call) targets live — + conservative on a parse failure (#196-safe). + +### Fixed + +- **Small multi-call-site leaves are inlined again (#228).** The candidate + predicate `(call_count == 1 || size < 10) && size < limit` made + `MULTI_CALL_SITE_LIMIT` (50) dead code — a multi-site callee only passed when + `size < 10`, so nothing in `[10, 50)` ever inlined (gale's 23-instruction, + 2-site `gust_mix`). Now governed purely by the site-count-dependent budget + (`size < limit`); the per-inline Z3 verify-or-revert stays the correctness gate. + ## [1.1.14] - 2026-06-15 Correctness + release-engineering release. One optimizer correctness fix (#220); diff --git a/Cargo.toml b/Cargo.toml index f6eb812..f7dbbf1 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -9,7 +9,7 @@ members = [ ] [workspace.package] -version = "1.1.14" +version = "1.1.15" authors = ["PulseEngine "] edition = "2024" license = "Apache-2.0" diff --git a/docs/design/pr-c-precise-cf-verification.md b/docs/design/pr-c-precise-cf-verification.md new file mode 100644 index 0000000..cf54553 --- /dev/null +++ b/docs/design/pr-c-precise-cf-verification.md @@ -0,0 +1,168 @@ +# PR-C: Precise acyclic control-flow symbolic execution in the Z3 verifier + +Status: **design, for review** (gale #219). Prerequisite for inlining the 5 +`br_table`-dispatching decides (sem_give 860, mutex_unlock 472, pipe_write, +pipe_read, msgq_put) and for removing the existing `BrIf`/`BrTable` skips in +LICM/CSE/code_folding/coalesce_locals. + +## Problem + +loom's live verifier (`verify.rs`) does **not** soundly model multi-way control +flow. In both `encode_function_to_smt_impl_inner` and `encode_block_body`: + +- `BrIf(_depth)` pops the condition and **continues** ("a more precise encoding + would fork paths here"). +- `BrTable { .. }` pops the index and **`break`s** ("treat as terminating for + now"). +- `Br(_depth)` / `Return` `break`; branch **depth is ignored** (no label stack). + +So a function whose result depends on a `br_table` dispatch is modeled as if the +dispatch didn't happen. Reusing this to prove an inline equivalent would let Z3 +pick any arm for any selector → a **false** equivalence → an unsound inline that +loom's structural CI cannot catch (the #196/#220 failure class). This is why +LICM/CSE/etc. currently **skip** `BrIf`/`BrTable` functions entirely. + +The precise machinery already exists as **dead scaffolding**, flagged in-source +as the deferred upgrade: `struct ExecutionState { stack, locals, globals, +path_condition: Bool, reachable }`, `merge_states(cond, t, f)`, and +`BlockResult.branch_depth`. `merge_bv(cond, a, b) = cond.ite(a, b)` is live (used +for `If`). PR-C wires the scaffolding in. + +## Chosen approach (with precedent) + +**State-merging-with-ITE driven by an explicit label/continuation stack — never +path-forking.** For a bounded, acyclic (loops=0 on the critical path), +single-normal-exit region this is **exact** (a passing SMT check is a real +proof, not a bounded approximation), and it produces **one** verification +condition instead of N per-arm obligations. + +Precedent (all verified against primary sources): + +- **Alive2** (Lopes, Lee, Hur, Liu, Regehr, *Bounded Translation Validation for + LLVM*, PLDI 2021) — the closest production precedent. "We do **not** fork + expressions across paths in the CFG"; phi/branch merges become "a single SMT + expression per register"; the final state is "a linear chain of `ite` + expressions"; an explicit **`ub`** flag makes ill-defined paths refine rather + than vanish. `switch` is the N-way generalization of its 2-way branch merge. +- **Kuznetsov, Kinder, Bucur, Candea**, *Efficient State Merging in Symbolic + Execution*, PLDI 2012 — the exact merge rule `(ℓ, pc'∨pc'', λv. ite(pc', + s'[v], s''[v]))`; "does not over-approximate, no false positives." +- **WASP** (Marques et al., *Concolic Execution for WebAssembly*, ECOOP 2022) — + the literal wasm rules: `br_table` arm `k≤n` → `Cont(br j_{k+1})` with path + condition `ŝ = k`; default `k≥n` → `Cont(br jn)` with `ŝ ≥ n`. The selector + constraint is conjoined onto every arm; the target is resolved by label index. +- **Barnett & Leino**, *Weakest-Precondition of Unstructured Programs*, PASTE + 2005 — per-block predicates keep the VC **linear** in paths (a wide `br_table` + doesn't blow up); acyclic ⇒ topological, no recursion. +- **WebAssembly Core Spec** + Haas et al. (PLDI 2017) — wasm is structured with + a label stack; `br N` targets the N-th enclosing label (0 = innermost); + `br_table` takes the default when the operand is out of bounds. + +Per-path forking is also sound but is the wrong fit: exponential over a wide +`br_table`, N separate obligations. + +## The model + +Carry a single symbolic `ExecutionState`: +`{ stack: Vec, locals: Vec, globals: Vec, memory: Array, + path_condition: Bool, trapped: Bool }` (`trapped` = Alive2's `ub`/⊥). + +A **label/continuation stack**: each entry records `{ arity, join: JoinAccumulator }`. +`block`/`if`/`loop` push a label; the entry's `join` accumulates the merged state +of every branch that targets it. + +- `block tf … end`, `if … end`: push a label whose continuation is the code + **after** `end` (forward join). +- `br N`: resolve the N-th label from the top; **merge** the current state into + that label's `join` under `path_condition` (carry `arity` result values); mark + the current path terminated. +- `br_if N`: split — branch path `pc ∧ (cond≠0)` merges into the target join; + fall-through path `pc ∧ (cond=0)` continues. +- `br_table j0..jn-1 (default jd)` (per WASP): for each arm `i`, merge under + `pc ∧ (sel = i)` into label `ji`'s join; default under `pc ∧ (sel ≥ n)` into + `jd`'s join. Guards **partition** the selector domain (totality from the + default complement). +- At each `end` / the function exit (the arms' post-dominator), the label's + `join` is the **ITE-merge** of all incoming branches: `path_condition` = + disjunction; each value = `ite(guard, …)` (reusing `merge_bv` / `merge_states`). +- `unreachable`, and a `Call` to a no-return callee (`is_noreturn_callee`: no + `Return`, no `Br*` to the function label, every path ends in `Unreachable`), + set `trapped` (⊥) — **never havoc**. They constrain reachability only; on a + trapped path the return value is don't-care and the path drops out of the + result ITE. + +**Crucially**: this *one* executor models **both** the original `call decide` +(by-body) and the inlined body, so they yield the **same** merged expression → +equivalence is provable (the #155 by-body principle, extended to CF). + +## Soundness — and the pitfalls we explicitly guard + +1. **Exact for acyclic** — no fixpoint, no loop invariants; finite single pass; + merge only at structured-block ends + the single exit (post-dominators). +2. **Selector constraint preserved** — every `br_table` arm carries `sel = i`, + default carries `sel ≥ n`; guards partition the domain. (Avoids the + "terminate" false-equivalence.) +3. **Fall-through resolved via the label stack** to the correct join (a `br` to + an outer label skips intervening joins). +4. **⊥, not havoc, for trap/unreachable/no-return arms** — they refine, never + forge a matching value (the #155/#159 + Alive2 `ub` discipline). +5. **Branch arity recorded** — `br N` carries the label's result values; the + merged stack shape stays correct. +6. **Over-approximate-and-skip** anything unmodeled (LOOM's "skip rather than + risk"): any instruction/shape outside the precise model → the callee is not + by-body-modelable → opaque fallback (no inline), never a false ≡. + +## Scope / staging + +- **Loops stay out** — still unrolled (`MAX_LOOP_UNROLL`) / k-induction / skipped + as today. PR-C is acyclic-only. A back-edge ⇒ not by-body-modelable. +- **Phase 1 — integer acyclic CF (no memory):** activate `ExecutionState` + + label stack + `merge_states`; implement `Br`/`BrIf`/`BrTable`/`Block`/`If` + precisely in `encode_block_body`; add `is_noreturn_callee` + diverge-on-trap; + extend `is_inline_modelable_instr`/`callee_inlinable_by_body` to admit acyclic + CF + no-return calls; route the by-body modeler through the precise executor. + **Unblocks sem_give (860)** — pure integer + br_table, no memory. +- **Phase 2 — thread `memory: Array` through the CF executor** (today + `encode_block_body` has no memory param). Unblocks **mutex_unlock (472)** + + **pipe_write/read** (sret stores through the shadow frame). + +## Test strategy (the soundness gate) + +- **Adversarial verify-or-revert** (the hard gate): deliberately-wrong inlined + bodies MUST produce a Z3 counterexample → revert — wrong arm selected, dropped + selector guard, dropped trap branch, mis-merged join, wrong branch depth. +- **Differential**: correct inline of a `br_table` fixture proves; a near-miss + reverts. Fixtures ordered per gale: msgq_put (trap-only) → sem_give + (`block`+`br_table`+`panic→unreachable`) → mutex_unlock (most blocks, sret). +- **Unit**: `is_noreturn_callee`, the label-stack resolution, `merge_states`. +- **Regression**: full `cargo test` + dogfooding (loom optimizing itself) + unchanged; re-validate the LICM/CSE/etc. paths once their `br_table` skips are + removed (a bonus coverage win, but it widens what they verify). +- **e2e + silicon**: `loom optimize` on `repro-219/sem.loom.wasm` dissolves (no + `call $..._decide`, no i64 pack/unpack in `z_impl_k_sem_give`); then gale's + G474RE re-flash (sem_give 860→, mutex_unlock 472→) is the kill-criterion. + +## Risk + +Substantial, soundness-critical (touches the core verifier). De-risked by: the +acyclic restriction (the only source of unsoundness/incompleteness — loops — is +excluded), the existing `ExecutionState`/`merge_states` scaffolding, the +Alive2/WASP precedent, and the adversarial-revert + silicon gates. Staged so +Phase 1 (sem_give) lands and validates before Phase 2 (memory). WIP branch, no +merge until adversarial tests pass + gale silicon-validates. + +## References + +- Lopes, Lee, Hur, Liu, Regehr. *Alive2: Bounded Translation Validation for + LLVM.* PLDI 2021. https://users.cs.utah.edu/~regehr/alive2-pldi21.pdf +- Kuznetsov, Kinder, Bucur, Candea. *Efficient State Merging in Symbolic + Execution.* PLDI 2012. https://dslab.epfl.ch/pubs/stateMerging.pdf +- Avgerinos, Rebert, Cha, Brumley. *Enhancing Symbolic Execution with + Veritesting.* ICSE 2014. https://softsec.kaist.ac.kr/~sangkilc/papers/avgerinos-icse14.pdf +- Marques, Fragoso Santos, Santos, Adão. *Concolic Execution for WebAssembly + (WASP).* ECOOP 2022. https://drops.dagstuhl.de/opus/volltexte/2022/16239/pdf/LIPIcs-ECOOP-2022-11.pdf +- Barnett & Leino. *Weakest-Precondition of Unstructured Programs.* PASTE 2005. + https://www.microsoft.com/en-us/research/wp-content/uploads/2005/01/krml157.pdf +- Haas et al. *Bringing the Web up to Speed with WebAssembly.* PLDI 2017. +- Van Hattum et al. *Lightweight, Modular Verification for Wasm-to-Native + Instruction Selection (Crocus/veri-isle).* ASPLOS 2024. diff --git a/loom-cli/src/main.rs b/loom-cli/src/main.rs index 8f18e6f..f68cb04 100644 --- a/loom-cli/src/main.rs +++ b/loom-cli/src/main.rs @@ -502,6 +502,27 @@ fn optimize_command( track_pass("inline", before, after); } + // loom#228: whole-function (module-level) dead-function elimination. + // Multi-site inlining duplicates a callee's body into each caller and + // leaves the ORIGINAL orphaned; the body-level `dce` (eliminate_dead_code) + // is intra-function and never removes a whole function, so the orphan + // survived every `loom optimize` run and grew the module. Run it right + // after `inline` so the orphan is gone before downstream passes touch it. + // eliminate_dead_functions keeps exports, the start function, and every + // element-segment (indirect-call table) target live — conservative on a + // parse failure — so it can never drop a reachable function (#196). + if should_run("dce-functions") { + println!(" Running: dce-functions"); + let before = count_instructions(&module); + let removed = loom_core::fused_optimizer::eliminate_dead_functions(&mut module) + .context("Whole-function DCE failed")?; + if removed > 0 { + println!(" removed {removed} dead function(s)"); + } + let after = count_instructions(&module); + track_pass("dce-functions", before, after); + } + if should_run("precompute") { println!(" Running: precompute"); let before = count_instructions(&module); diff --git a/loom-core/src/lib.rs b/loom-core/src/lib.rs index 68dddbf..ca3d399 100644 --- a/loom-core/src/lib.rs +++ b/loom-core/src/lib.rs @@ -13447,25 +13447,30 @@ pub mod optimize { .functions .get((*func_idx - num_imported_funcs) as usize) { - if !function_inline_modelable(callee) { + if !function_inline_modelable(callee, &module.functions, num_imported_funcs) { continue; } } let size = function_sizes.get(func_idx).copied().unwrap_or(0); - // Heuristic: inline if: - // 1. Single call site — profitable: inlining removes the call - // overhead and opens cross-function optimization (the gale - // flight_control seam, #155). A single-call-site callee is - // not duplicated, so a generous size budget is justified. - // 2. Small function (< 10 instructions) — cheap even when - // called from multiple sites. + // Heuristic: inline a modelable callee whose size is under the + // site-count-dependent budget. + // - Single call site — generous budget (not duplicated): inlining + // removes the call and opens cross-function optimization (the + // gale flight_control seam, #155). + // - Multiple call sites — a tighter budget bounds the per-copy + // duplication cost, but the leaf STILL inlines (the gale gust + // `mix` seam, #228: a 23-insn 2-site leaf). // - // SIZE_LIMIT stays well under the Z3 verify cap - // (LOOM_Z3_MAX_INSTRUCTIONS, default 2000) so the inlined - // function is still VERIFIED, never silently skipped — the - // translation validator remains the correctness gate, and the - // #147 fixpoint guard bounds total expansion. + // loom#228: the old guard `(call_count == 1 || size < 10)` made + // MULTI_CALL_SITE_LIMIT dead — a multi-site callee only passed + // when `size < 10`, so nothing in [10, 50) ever inlined however + // small the budget said it could. Governing purely by the budget + // restores the intended behavior. Both limits stay well under the + // Z3 verify cap (LOOM_Z3_MAX_INSTRUCTIONS, default 2000) so every + // inlined body is still VERIFIED (the translation validator is the + // correctness gate, never bypassed); the #147 fixpoint guard + // bounds total expansion. const SINGLE_CALL_SITE_LIMIT: usize = 200; const MULTI_CALL_SITE_LIMIT: usize = 50; let limit = if call_count == 1 { @@ -13473,7 +13478,7 @@ pub mod optimize { } else { MULTI_CALL_SITE_LIMIT }; - if (call_count == 1 || size < 10) && size < limit { + if size < limit { inline_candidates.push(*func_idx); } } @@ -13551,100 +13556,289 @@ pub mod optimize { Ok(()) } - /// loom#159: is this callee FULLY by-body-inline-modelable — i.e. will the - /// translation validator be able to PROVE its inline? - /// - /// A callee is only a sound inline candidate if the verifier can model its - /// body by-body (loom#151/#155/#157). If it contains ANY op the verifier - /// can't model — partial-width loads/stores (`*.load8/16`, `*.store8/16/32`, - /// e.g. `filter_step`'s `i32.load16_s`), floats, globals, `memory.*` bulk - /// ops, calls (non-leaf), or control flow — then inlining it is unprovable: - /// the original models the call as a havoc while the optimized has concrete - /// effects, so a downstream read diverges → FALSE counterexample → the whole - /// caller reverts, dragging down any *other* inline in it too (the v1.1.7 - /// regression: a partial-width `filter_step` killed the `controller_step` - /// reader-inline). Excluding such callees keeps them opaque (havoc) calls, - /// which is exactly what a following modelable reader's inline needs. - /// - /// This MUST stay in lock-step with `verify::is_inline_modelable_instr` - /// (the verifier's by-body whitelist). Straight-line only: any control flow - /// or call disqualifies the callee. - fn function_inline_modelable(func: &super::Function) -> bool { - func.instructions.iter().all(|i| { - matches!( - i, - Instruction::I32Const(_) - | Instruction::I64Const(_) - | Instruction::LocalGet(_) - | Instruction::LocalSet(_) - | Instruction::LocalTee(_) - | Instruction::Drop - | Instruction::Select - | Instruction::I32Load { .. } - | Instruction::I64Load { .. } - | Instruction::I32Store { .. } - | Instruction::I64Store { .. } - | Instruction::I32Load8S { .. } - | Instruction::I32Load8U { .. } - | Instruction::I32Load16S { .. } - | Instruction::I32Load16U { .. } - | Instruction::I32Store8 { .. } - | Instruction::I32Store16 { .. } - | Instruction::I32Add - | Instruction::I32Sub - | Instruction::I32Mul - | Instruction::I32And - | Instruction::I32Or - | Instruction::I32Xor - | Instruction::I32Shl - | Instruction::I32ShrS - | Instruction::I32ShrU - | Instruction::I64Add - | Instruction::I64Sub - | Instruction::I64Mul - | Instruction::I64And - | Instruction::I64Or - | Instruction::I64Xor - | Instruction::I64Shl - | Instruction::I64ShrS - | Instruction::I64ShrU - | Instruction::I32DivS - | Instruction::I32DivU - | Instruction::I32RemS - | Instruction::I32RemU - | Instruction::I64DivS - | Instruction::I64DivU - | Instruction::I64RemS - | Instruction::I64RemU - | Instruction::I32Eq - | Instruction::I32Ne - | Instruction::I32LtS - | Instruction::I32LtU - | Instruction::I32GtS - | Instruction::I32GtU - | Instruction::I32LeS - | Instruction::I32LeU - | Instruction::I32GeS - | Instruction::I32GeU - | Instruction::I32Eqz - | Instruction::I64Eq - | Instruction::I64Ne - | Instruction::I64LtS - | Instruction::I64LtU - | Instruction::I64GtS - | Instruction::I64GtU - | Instruction::I64LeS - | Instruction::I64LeU - | Instruction::I64GeS - | Instruction::I64GeU - | Instruction::I64Eqz - | Instruction::I64ExtendI32S - | Instruction::I64ExtendI32U - | Instruction::I32WrapI64 - ) + /// REGIME A whitelist: per-instruction straight-line ops the by-body + /// modeler (`verify::encode_inlinable_callee_result`) proves faithfully, + /// INCLUDING full/partial-width memory loads/stores (loom#151/#155/#157/#161). + /// No control flow, no calls. MUST stay in lock-step with + /// `verify::is_inline_modelable_instr`. Used by `function_inline_modelable`. + fn is_straightline_modelable_instr(i: &Instruction) -> bool { + matches!( + i, + Instruction::I32Const(_) + | Instruction::I64Const(_) + | Instruction::LocalGet(_) + | Instruction::LocalSet(_) + | Instruction::LocalTee(_) + | Instruction::Drop + | Instruction::Select + | Instruction::I32Load { .. } + | Instruction::I64Load { .. } + | Instruction::I32Store { .. } + | Instruction::I64Store { .. } + | Instruction::I32Load8S { .. } + | Instruction::I32Load8U { .. } + | Instruction::I32Load16S { .. } + | Instruction::I32Load16U { .. } + | Instruction::I32Store8 { .. } + | Instruction::I32Store16 { .. } + | Instruction::I32Add + | Instruction::I32Sub + | Instruction::I32Mul + | Instruction::I32And + | Instruction::I32Or + | Instruction::I32Xor + | Instruction::I32Shl + | Instruction::I32ShrS + | Instruction::I32ShrU + | Instruction::I64Add + | Instruction::I64Sub + | Instruction::I64Mul + | Instruction::I64And + | Instruction::I64Or + | Instruction::I64Xor + | Instruction::I64Shl + | Instruction::I64ShrS + | Instruction::I64ShrU + | Instruction::I32DivS + | Instruction::I32DivU + | Instruction::I32RemS + | Instruction::I32RemU + | Instruction::I64DivS + | Instruction::I64DivU + | Instruction::I64RemS + | Instruction::I64RemU + | Instruction::I32Eq + | Instruction::I32Ne + | Instruction::I32LtS + | Instruction::I32LtU + | Instruction::I32GtS + | Instruction::I32GtU + | Instruction::I32LeS + | Instruction::I32LeU + | Instruction::I32GeS + | Instruction::I32GeU + | Instruction::I32Eqz + | Instruction::I64Eq + | Instruction::I64Ne + | Instruction::I64LtS + | Instruction::I64LtU + | Instruction::I64GtS + | Instruction::I64GtU + | Instruction::I64LeS + | Instruction::I64LeU + | Instruction::I64GeS + | Instruction::I64GeU + | Instruction::I64Eqz + | Instruction::I64ExtendI32S + | Instruction::I64ExtendI32U + | Instruction::I32WrapI64 + ) + } + + /// REGIME B per-instruction whitelist: pure-integer ops the precise acyclic + /// executor (`verify::exec_acyclic`) models — NO memory (Phase 1), NO + /// floats/unknown. MUST stay in lock-step with + /// `verify::is_acyclic_simple_modelable`. + fn is_acyclic_int_modelable_instr(i: &Instruction) -> bool { + matches!( + i, + Instruction::I32Const(_) + | Instruction::I64Const(_) + | Instruction::LocalGet(_) + | Instruction::LocalSet(_) + | Instruction::LocalTee(_) + | Instruction::GlobalGet(_) + | Instruction::GlobalSet(_) + | Instruction::Drop + | Instruction::Nop + | Instruction::Select + | Instruction::I32Add + | Instruction::I32Sub + | Instruction::I32Mul + | Instruction::I32And + | Instruction::I32Or + | Instruction::I32Xor + | Instruction::I32Shl + | Instruction::I32ShrS + | Instruction::I32ShrU + | Instruction::I32DivS + | Instruction::I32DivU + | Instruction::I32RemS + | Instruction::I32RemU + | Instruction::I64Add + | Instruction::I64Sub + | Instruction::I64Mul + | Instruction::I64And + | Instruction::I64Or + | Instruction::I64Xor + | Instruction::I64Shl + | Instruction::I64ShrS + | Instruction::I64ShrU + | Instruction::I64DivS + | Instruction::I64DivU + | Instruction::I64RemS + | Instruction::I64RemU + | Instruction::I32Eq + | Instruction::I32Ne + | Instruction::I32LtS + | Instruction::I32LtU + | Instruction::I32GtS + | Instruction::I32GtU + | Instruction::I32LeS + | Instruction::I32LeU + | Instruction::I32GeS + | Instruction::I32GeU + | Instruction::I32Eqz + | Instruction::I64Eq + | Instruction::I64Ne + | Instruction::I64LtS + | Instruction::I64LtU + | Instruction::I64GtS + | Instruction::I64GtU + | Instruction::I64LeS + | Instruction::I64LeU + | Instruction::I64GeS + | Instruction::I64GeU + | Instruction::I64Eqz + | Instruction::I64ExtendI32S + | Instruction::I64ExtendI32U + | Instruction::I32WrapI64 + ) + } + + /// PR-C Phase 2: memory loads/stores the precise acyclic executor models + /// (mirrors `verify::is_acyclic_memory_op`). Admitted by the regime-B inline + /// gate so memory-bearing acyclic callees are inline candidates. + fn is_acyclic_memory_modelable_instr(i: &Instruction) -> bool { + matches!( + i, + Instruction::I32Load { .. } + | Instruction::I64Load { .. } + | Instruction::I32Store { .. } + | Instruction::I64Store { .. } + | Instruction::I32Load8S { .. } + | Instruction::I32Load8U { .. } + | Instruction::I32Load16S { .. } + | Instruction::I32Load16U { .. } + | Instruction::I32Store8 { .. } + | Instruction::I32Store16 { .. } + ) + } + + /// loom-side mirror of `verify::is_noreturn_callee` (that one is + /// `#[cfg(feature = "verification")]`-gated, unavailable here). True if every + /// path traps: no `Return`/`Br*` anywhere (recursing into nested CF) and the + /// body ends in `Unreachable`. Admits exactly the `panic*` helpers. + fn inline_callee_is_noreturn(func: &super::Function) -> bool { + fn has_branch_or_return(instrs: &[Instruction]) -> bool { + instrs.iter().any(|i| match i { + Instruction::Return + | Instruction::Br(_) + | Instruction::BrIf(_) + | Instruction::BrTable { .. } => true, + Instruction::Block { body, .. } | Instruction::Loop { body, .. } => { + has_branch_or_return(body) + } + Instruction::If { + then_body, + else_body, + .. + } => has_branch_or_return(then_body) || has_branch_or_return(else_body), + _ => false, + }) + } + !has_branch_or_return(&func.instructions) + && matches!(func.instructions.last(), Some(Instruction::Unreachable)) + } + + /// Max by-body call-recursion depth for the regime-B check (mirrors + /// `verify::MAX_ACYCLIC_CALL_DEPTH`); bounds cyclic call graphs. + const INLINE_ACYCLIC_MAX_DEPTH: usize = 8; + + /// REGIME B: acyclic control flow (block/if/br/br_if/br_table) + pure-integer + /// ops + by-body calls (to no-return or recursively-modelable LOCAL callees). + /// NO memory (Phase 1). Mirrors `verify::is_acyclic_modelable_body` so the + /// inline candidate gate admits exactly what the acyclic verifier can prove. + fn inline_modelable_acyclic_body( + body: &[Instruction], + all_functions: &[super::Function], + num_imported_funcs: u32, + depth: usize, + ) -> bool { + body.iter().all(|i| match i { + Instruction::Block { body, .. } => { + inline_modelable_acyclic_body(body, all_functions, num_imported_funcs, depth) + } + Instruction::If { + then_body, + else_body, + .. + } => { + inline_modelable_acyclic_body(then_body, all_functions, num_imported_funcs, depth) + && inline_modelable_acyclic_body( + else_body, + all_functions, + num_imported_funcs, + depth, + ) + } + Instruction::Br(_) + | Instruction::BrIf(_) + | Instruction::BrTable { .. } + | Instruction::Return + // Unreachable is a trap (⊥): the acyclic executor diverges the path. + | Instruction::Unreachable => true, + // Back-edges and indirect calls are out of Phase 1 scope. + Instruction::Loop { .. } | Instruction::CallIndirect { .. } => false, + Instruction::Call(g) => { + if depth >= INLINE_ACYCLIC_MAX_DEPTH || *g < num_imported_funcs { + return false; // too deep, or an import (opaque) + } + match all_functions.get((*g - num_imported_funcs) as usize) { + Some(callee) => { + inline_callee_is_noreturn(callee) + || (callee.signature.results.len() <= 1 + && inline_modelable_acyclic_body( + &callee.instructions, + all_functions, + num_imported_funcs, + depth + 1, + )) + } + None => false, + } + } + other => { + is_acyclic_int_modelable_instr(other) || is_acyclic_memory_modelable_instr(other) + } }) } + /// loom#159: is this callee FULLY inline-modelable — i.e. will the + /// translation validator be able to PROVE its inline? Admits EITHER regime A + /// (straight-line incl. memory — by-body modeler) OR regime B (acyclic CF + + /// pure-int + by-body calls — the precise acyclic executor, PR-C #219). A + /// callee outside both stays an opaque call: admitting an unprovable one + /// makes the original (havoc) and optimized (concrete) encodings diverge → + /// FALSE counterexample → the whole caller reverts, dragging down any OTHER + /// inline in it (the v1.1.7 #159 regression). A br_table+memory callee + /// (mutex_unlock, pipe) passes NEITHER regime → stays opaque until Phase 2. + fn function_inline_modelable( + func: &super::Function, + all_functions: &[super::Function], + num_imported_funcs: u32, + ) -> bool { + func.instructions + .iter() + .all(is_straightline_modelable_instr) + || inline_modelable_acyclic_body( + &func.instructions, + all_functions, + num_imported_funcs, + 0, + ) + } + /// Inline function calls in a block of instructions fn inline_calls_in_block( instructions: &[Instruction], @@ -17746,6 +17940,118 @@ mod tests { assert_eq!(instrs[1], Instruction::I32TruncF32S); } + #[test] + fn test_seam_sroa_wrap_extend_identity() { + // #219 seam-SROA: wrap_i64(extend_i32_u(x)) → x. Zero-extending an i32 to + // i64 then wrapping back to i32 is the identity — the u64-ABI round-trip + // a packed scalar pays at the dissolved decide seam. Must dissolve to the + // bare operand (no wrap/extend left). + use loom_isle::{i32_wrap_i64, i64_extend_i32_u, local_get, rewrite_pure}; + let x = local_get(0); // an i32 value + let round_trip = i32_wrap_i64(i64_extend_i32_u(x)); + let simplified = rewrite_pure(round_trip); + + let instrs = terms::terms_to_instructions(&[simplified]).unwrap(); + assert_eq!( + instrs, + vec![Instruction::LocalGet(0)], + "#219: wrap_i64(extend_i32_u(x)) must dissolve to x" + ); + } + + #[test] + fn test_seam_sroa_mask_clears_shifted_pack() { + // #219 seam-SROA: (or (shl (extend_u x) 32) Y) & 0xff → Y & 0xff. + // The high-shifted field of a u64 pack cannot survive a low-byte unpack + // mask, so the whole high half drops out. This is the `& mask` half of + // dissolving the sem decide's pack/unpack round-trip. + use loom_isle::{ + Imm64, i64_extend_i32_u, iand64, iconst64, ior64, ishl64, local_get, rewrite_pure, + }; + let high = ishl64(i64_extend_i32_u(local_get(0)), iconst64(Imm64(32))); + let low = i64_extend_i32_u(local_get(1)); // the surviving low field + let pack = ior64(high, low); + let masked = iand64(pack, iconst64(Imm64(0xff))); + let simplified = rewrite_pure(masked); + + // Expect (extend_u(local.get 1)) & 0xff — the shifted high half is gone. + let want = terms::terms_to_instructions(&[iand64( + i64_extend_i32_u(local_get(1)), + iconst64(Imm64(0xff)), + )]) + .unwrap(); + let got = terms::terms_to_instructions(&[simplified]).unwrap(); + assert_eq!( + got, want, + "#219: (or (shl _ 32) Y) & 0xff must dissolve to Y & 0xff" + ); + } + + #[test] + fn test_seam_sroa_shr_extracts_high_field() { + // #219 seam-SROA: (shr_u (or (shl (extend_u x) 32) const) 32) extracts the + // HIGH field of a u64 pack → (extend_u x) & 0xffffffff. The low (const) + // field shifts out (logical shift distributes over OR; shl-then-shr-same + // masks to the low 64-k bits). Mirrors the sem decide whose low field is + // a constant 0/1. + use loom_isle::{ + Imm64, i64_extend_i32_u, iand64, iconst64, ior64, ishl64, ishru64, local_get, + rewrite_pure, + }; + let high = ishl64(i64_extend_i32_u(local_get(0)), iconst64(Imm64(32))); + let pack = ior64(high, iconst64(Imm64(1))); // low field = const (like local 3 = 0/1) + let unpacked = ishru64(pack, iconst64(Imm64(32))); + let simplified = rewrite_pure(unpacked); + + let want = terms::terms_to_instructions(&[iand64( + i64_extend_i32_u(local_get(0)), + iconst64(Imm64(0xffff_ffff)), + )]) + .unwrap(); + let got = terms::terms_to_instructions(&[simplified]).unwrap(); + assert_eq!( + got, want, + "#219: (shr_u (or (shl (extend_u x) 32) const) 32) must extract (extend_u x) & 0xffffffff" + ); + } + + #[cfg(feature = "verification")] + #[test] + fn test_is_noreturn_callee() { + // PR-C (#219): the divergent-call classifier. No-return = no Return, no + // branch anywhere, body ends in Unreachable (the core::panicking shape). + let cases: &[(&str, bool)] = &[ + // bare trap + ("(module (func (export \"f\") unreachable))", true), + // call (to an import) then trap — the panic_const_add_overflow shape + ( + "(module (import \"env\" \"p\" (func)) (func (export \"f\") call 0 unreachable))", + true, + ), + // normal return — must NOT be no-return + ( + "(module (func (export \"f\") (result i32) i32.const 0))", + false, + ), + // ends in a value, not unreachable + ("(module (func (export \"f\") nop))", false), + // contains a branch (could escape to the fn label) — conservatively false + ( + "(module (func (export \"f\") (block br 0) unreachable))", + false, + ), + ]; + for (wat, expected) in cases { + let module = parse::parse_wat(wat).expect("parse"); + let f = &module.functions[0]; + assert_eq!( + crate::verify::is_noreturn_callee(f), + *expected, + "#219 is_noreturn_callee mismatch for: {wat}" + ); + } + } + #[test] fn test_conversion_trunc_sat_folds_nan_to_zero() { // i32.trunc_sat_f32_s of NaN → i32.const 0 (saturating: NaN→0) @@ -18664,6 +18970,55 @@ mod tests { wasmparser::validate(&wasm_bytes).expect("output validates"); } + // loom#228: a modelable leaf whose size is in [10, 50) called from MULTIPLE + // sites was never inlined. The old candidate guard `(call_count == 1 || + // size < 10) && size < limit` made MULTI_CALL_SITE_LIMIT (=50) dead: a + // multi-site callee only passed when `size < 10`, so nothing in [10, 50) + // ever inlined. Budget-governed selection (`size < limit`) inlines it at + // every site; the Z3 gate stays the correctness backstop. + #[test] + fn test_inline_multisite_small_leaf_228() { + // 11-instruction straight-line modelable leaf (tee+select+arith), no + // calls, size in [10, 50), called from two sites. + let wat = r#"(module + (func $leaf (param i32 i32) (result i32) + local.get 0 local.get 1 i32.add local.tee 0 + i32.const 1 i32.add + local.get 0 i32.const 2 i32.shl + i32.const 0 select) + (func $a (export "a") (param i32) (result i32) + local.get 0 i32.const 7 call $leaf) + (func $b (export "b") (param i32) (result i32) + local.get 0 i32.const 9 call $leaf) + )"#; + let mut module = parse::parse_wat(wat).expect("parse"); + // leaf is flat (no nested blocks) → instruction count == body length. + let leaf_size = module.functions[0].instructions.len(); + assert!( + (10..50).contains(&leaf_size), + "leaf must be in the previously-dead [10,50) band (got {leaf_size})" + ); + + optimize::inline_functions(&mut module).expect("inline must not panic"); + + // Both call sites must be gone (the leaf body is inlined into $a and $b; + // the orphaned $leaf itself is removed by the CLI dce-functions pass, not + // by inline_functions — so it may still be present here, callless). + let total_calls: usize = module + .functions + .iter() + .flat_map(|f| f.instructions.iter()) + .filter(|i| matches!(i, Instruction::Call(_))) + .count(); + assert_eq!( + total_calls, 0, + "#228: the multi-site [10,50) leaf must inline at BOTH sites" + ); + + let wasm_bytes = encode::encode_wasm(&module).expect("encode"); + wasmparser::validate(&wasm_bytes).expect("output validates"); + } + #[cfg(feature = "verification")] #[test] fn test_inline_verifier_proves_correct_and_rejects_wrong_i64_inline() { diff --git a/loom-core/src/verify.rs b/loom-core/src/verify.rs index 8471cc1..4548eb6 100644 --- a/loom-core/src/verify.rs +++ b/loom-core/src/verify.rs @@ -209,6 +209,51 @@ fn callee_inlinable_by_body(func: &Function) -> bool { func.instructions.iter().all(is_inline_modelable_instr) } +/// PR-C (#219): true if `func` provably never returns to its caller — every +/// path traps. Conservative-sound definition: the body contains NO `Return` +/// and NO branch (`Br`/`BrIf`/`BrTable`) anywhere (recursing into nested +/// control flow), and its final top-level instruction is `Unreachable`. With +/// no `return` and no branch, the only way to leave the function is to fall off +/// the end — which `Unreachable` makes a trap. This admits exactly the +/// `core::panicking` helpers here (`panic_fmt → unreachable`, +/// `panic_const_add_overflow → call panic_fmt; unreachable`) and rejects any +/// normal-returning or branching function. +/// +/// Used by the by-body inline modeler to encode a `call` to such a callee as +/// "diverge if reached" (⊥) rather than havoc: control never returns, so the +/// call constrains only the reachability of the trapping branch, never the +/// post-state (Alive2's `ub` discipline). Modeling it as a fresh symbolic +/// (havoc) would reopen the #155/#159 havoc-vs-concrete false-positive surface. +#[cfg(feature = "verification")] +#[allow(dead_code)] // wired into the by-body inline gate in PR-C M3 (#219) +pub(crate) fn is_noreturn_callee(func: &Function) -> bool { + !body_has_branch_or_return(&func.instructions) + && matches!(func.instructions.last(), Some(Instruction::Unreachable)) +} + +/// Recursively detect any `Return` or branch (`Br`/`BrIf`/`BrTable`) in a body, +/// descending into Block/Loop/If. Used by `is_noreturn_callee` — a branch could +/// transfer control to the function-level label (a return-via-br), so any +/// branch conservatively disqualifies the "every path traps" classification. +#[cfg(feature = "verification")] +fn body_has_branch_or_return(instrs: &[Instruction]) -> bool { + instrs.iter().any(|i| match i { + Instruction::Return + | Instruction::Br(_) + | Instruction::BrIf(_) + | Instruction::BrTable { .. } => true, + Instruction::Block { body, .. } | Instruction::Loop { body, .. } => { + body_has_branch_or_return(body) + } + Instruction::If { + then_body, + else_body, + .. + } => body_has_branch_or_return(then_body) || body_has_branch_or_return(else_body), + _ => false, + }) +} + /// loom#151/#155: whitelist of instructions the by-body inline modeler can /// encode faithfully and **identically to the main encoder**. The callee must /// be straight-line (no Block/Loop/If), leaf (no calls), single-result, and @@ -2964,6 +3009,31 @@ impl TranslationValidator { return Ok(()); } + // PR-C (#219) M3.2: precise acyclic-CF fast-path for the inliner. The + // main encoder models br_table approximately and a br_table callee is + // not straight-line-by-body-modelable, so it cannot prove `call F` + // (br_table callee) equal to its inlined body → the inline reverts. The + // acyclic executor is EXACT for acyclic CF + by-body calls. Try it here. + // + // PURELY ADDITIVE + scoped to the inliner: ONLY a precise Ok(true) (a + // real proof) short-circuits to accept; Ok(false)/Err/panic FALL THROUGH + // to the existing encoder unchanged. So this can only ADD accepts (the + // br_table-callee inlines), never weaken an existing check, and every + // non-inline pass is byte-identical. (catch_unwind: the acyclic encoder + // builds Z3 exprs and could panic on a malformed input.) + if self.pass_name == "inline_functions" { + let original = self.original.clone(); + let sig_ctx = self.sig_ctx.clone(); + let optimized_clone = optimized.clone(); + let acyclic = std::panic::catch_unwind(std::panic::AssertUnwindSafe(move || { + verify_acyclic_equivalence(&original, &optimized_clone, &sig_ctx) + })); + if let Ok(Ok(true)) = acyclic { + return Ok(()); + } + // else: fall through to the existing encoder (no behavior change). + } + // Use catch_unwind to handle Z3 internal panics gracefully. (The // i64 inline hang in loom#147 turned out NOT to be a verify hang — // verify returns fast; it was a livelock in the inline-pass @@ -7169,6 +7239,890 @@ fn encode_simple_instruction( Ok(()) } +// =========================================================================== +// PR-C (gale #219): precise acyclic control-flow symbolic execution. +// +// An ISOLATED executor that models structured, acyclic wasm control flow +// (block / if / br / br_if / br_table) by *state merging with ITE* driven by an +// explicit label/join stack — never path-forking. For a bounded acyclic region +// this is EXACT (a passing SMT check is a real proof, not a bounded +// approximation), and it produces ONE verification condition instead of N +// per-arm obligations. Precedent: Alive2 (PLDI'21, "we do not fork expressions +// across paths"; final state a linear chain of ITEs), Kuznetsov state-merging +// (PLDI'12), WASP br_table rules (ECOOP'22). See +// docs/design/pr-c-precise-cf-verification.md. +// +// Phase 1 = integer only (no memory): every non-control instruction must be in +// the `is_acyclic_simple_modelable` allowlist (the ops `encode_simple_instruction` +// models PRECISELY) and is delegated there. Loop (back-edge), Call/CallIndirect, +// memory ops, unknown opcodes, and multi-result/param blocks are REJECTED — the +// `is_acyclic_modelable_body` gate fails the whole function and the caller falls +// back (over-approximate-and-skip), NEVER a false equivalence. We must reject +// rather than delegate-and-hope because `encode_simple_instruction`'s catch-all +// HAVOCS (pushes a fresh BV without popping operands) → it would desync the +// stack for an unmodeled op. +// +// Currently `#[allow(dead_code)]`: built + tested in isolation (Z3 equivalence +// over hand-written br_table fixtures, incl. the 0x8000_0000-selector default +// case) BEFORE being wired into the live inline-verify path (PR-C M3). +// =========================================================================== + +/// Symbolic state threaded along the *current* straight-line path. +#[cfg(feature = "verification")] +#[allow(dead_code)] +struct AcyclicState { + stack: Vec, + locals: Vec, + globals: Vec, + /// Linear memory as a Z3 `Array[BV32 → BV8]` (PR-C Phase 2). Loads/stores use + /// the SAME shared helpers (`encode_i32_load_from` etc.) the main encoder + /// uses, so a `call F` modeled by-body and the inlined body produce + /// bit-identical memory expressions. Memory is MUTABLE branch state, so it is + /// ITE-merged at every join exactly like locals/globals. + memory: Array, + /// Path condition guarding the current straight-line path. + path: Bool, + /// False once this path has branched / returned (subsequent straight-line + /// instructions in the same sequence are dead and must not be executed). + reachable: bool, +} + +/// One incoming branch to a label: its path guard plus the result/locals/globals/ +/// memory snapshot at the branch point. +#[cfg(feature = "verification")] +type BranchEntry = (Bool, Vec, Vec, Vec, Array); + +/// The ITE-merged successor of a join: result values, locals, globals, memory, +/// and the merged path condition. +#[cfg(feature = "verification")] +type MergedState = (Vec, Vec, Vec, Array, Bool); + +#[cfg(feature = "verification")] +#[allow(dead_code)] +struct LabelJoin { + /// Number of result values the label carries (0 or 1 in Phase 1). + arity: usize, + entries: Vec, +} + +/// Result arity of a structured block type, or `Err` if it has parameters or +/// more than one result (rejected in Phase 1 — over-approximate-and-skip). +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn acyclic_block_arity(block_type: &BlockType) -> Result { + match block_type { + BlockType::Empty => Ok(0), + BlockType::Value(_) => Ok(1), + BlockType::Func { params, results } => { + if !params.is_empty() || results.len() > 1 { + Err(anyhow!( + "acyclic executor: block with params/multi-result not modeled" + )) + } else { + Ok(results.len()) + } + } + } +} + +/// Allowlist of non-control instructions `encode_simple_instruction` models +/// PRECISELY (pure integer ops + locals/globals, no memory, no havoc). Anything +/// outside this set is rejected by `is_acyclic_modelable_body`. +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn is_acyclic_simple_modelable(instr: &Instruction) -> bool { + matches!( + instr, + Instruction::I32Const(_) + | Instruction::I64Const(_) + | Instruction::F32Const(_) + | Instruction::F64Const(_) + | Instruction::I32Add + | Instruction::I32Sub + | Instruction::I32Mul + | Instruction::I32And + | Instruction::I32Or + | Instruction::I32Xor + | Instruction::I32Shl + | Instruction::I32ShrS + | Instruction::I32ShrU + | Instruction::I32DivS + | Instruction::I32DivU + | Instruction::I32RemS + | Instruction::I32RemU + | Instruction::I32Eq + | Instruction::I32Ne + | Instruction::I32LtS + | Instruction::I32LtU + | Instruction::I32GtS + | Instruction::I32GtU + | Instruction::I32LeS + | Instruction::I32LeU + | Instruction::I32GeS + | Instruction::I32GeU + | Instruction::I32Eqz + | Instruction::I64Add + | Instruction::I64Sub + | Instruction::I64Mul + | Instruction::I64And + | Instruction::I64Or + | Instruction::I64Xor + | Instruction::I64Shl + | Instruction::I64ShrS + | Instruction::I64ShrU + | Instruction::I64DivS + | Instruction::I64DivU + | Instruction::I64RemS + | Instruction::I64RemU + | Instruction::I64Eq + | Instruction::I64Ne + | Instruction::I64LtS + | Instruction::I64LtU + | Instruction::I64GtS + | Instruction::I64GtU + | Instruction::I64LeS + | Instruction::I64LeU + | Instruction::I64GeS + | Instruction::I64GeU + | Instruction::I64Eqz + | Instruction::I32WrapI64 + | Instruction::I64ExtendI32S + | Instruction::I64ExtendI32U + | Instruction::LocalGet(_) + | Instruction::LocalSet(_) + | Instruction::LocalTee(_) + | Instruction::GlobalGet(_) + | Instruction::GlobalSet(_) + | Instruction::Drop + | Instruction::Nop + | Instruction::Select + ) +} + +/// PR-C Phase 2: memory loads/stores `exec_acyclic` models explicitly (via the +/// shared `encode_*_load_from`/`*_store_into` helpers). Kept SEPARATE from +/// `is_acyclic_simple_modelable` because memory ops must NOT be delegated to +/// `encode_simple_instruction` (it havocs them, desyncing the stack). +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn is_acyclic_memory_op(instr: &Instruction) -> bool { + matches!( + instr, + Instruction::I32Load { .. } + | Instruction::I64Load { .. } + | Instruction::I32Store { .. } + | Instruction::I64Store { .. } + | Instruction::I32Load8S { .. } + | Instruction::I32Load8U { .. } + | Instruction::I32Load16S { .. } + | Instruction::I32Load16U { .. } + | Instruction::I32Store8 { .. } + | Instruction::I32Store16 { .. } + ) +} + +/// Max by-body call-recursion depth the acyclic executor will model. The seam +/// decides call only `panic*` (no-return → diverge, no further recursion), so +/// depth 2 suffices; 8 gives headroom and bounds any cyclic/recursive call +/// graph (which is rejected, not unrolled). +#[cfg(feature = "verification")] +const MAX_ACYCLIC_CALL_DEPTH: usize = 8; + +/// Whole-function gate: true iff every instruction (recursing into block/if +/// bodies) is structurally modelable by `exec_acyclic`. Back-edges (Loop), +/// call_indirect, unknown opcodes, and unmodeled block shapes (params/multi- +/// result) fail. As of Phase 2b the gate is purely STRUCTURAL — it needs no +/// module context, because ANY direct call is modelable (the executor diverges +/// no-return callees, models acyclic locals by-body, and HAVOCS the rest — +/// havoc is conservative-sound), and memory ops are handled explicitly. +/// +/// A fast pre-check for the M3 routing decision; `exec_acyclic` is the actual +/// soundness enforcer (it returns `Err` on anything it cannot model, never havoc +/// a value-producing op silently). +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn is_acyclic_modelable_body(body: &[Instruction]) -> bool { + body.iter().all(|instr| match instr { + // Back-edges and indirect calls are out of scope. + Instruction::Loop { .. } | Instruction::CallIndirect { .. } => false, + // ANY direct call is modelable: by-body (acyclic local), ⊥-diverge + // (no-return), or HAVOC (import / loop-containing / multi-result). The + // executor's Call arm makes the choice; havoc never forges a false ≡. + Instruction::Call(_) => true, + Instruction::Block { block_type, body } => { + acyclic_block_arity(block_type).is_ok() && is_acyclic_modelable_body(body) + } + Instruction::If { + block_type, + then_body, + else_body, + } => { + acyclic_block_arity(block_type).is_ok() + && is_acyclic_modelable_body(then_body) + && is_acyclic_modelable_body(else_body) + } + Instruction::Br(_) + | Instruction::BrIf(_) + | Instruction::BrTable { .. } + | Instruction::Return + // Unreachable is a trap (⊥): the executor diverges the path, never havoc. + | Instruction::Unreachable => true, + other => is_acyclic_simple_modelable(other) || is_acyclic_memory_op(other), + }) +} + +/// Resolve a relative branch depth to an absolute index in the label stack +/// (0 = innermost = top). `labels[0]` is the implicit function label. +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn acyclic_label_index(labels: &[LabelJoin], depth: u32) -> Result { + let depth = depth as usize; + if depth >= labels.len() { + return Err(anyhow!( + "acyclic executor: branch depth {} out of range", + depth + )); + } + Ok(labels.len() - 1 - depth) +} + +/// Record the current state as an incoming branch to `labels[target]`, carrying +/// that label's arity result values from the top of the stack. +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn acyclic_push_branch( + labels: &mut [LabelJoin], + target: usize, + guard: Bool, + state: &AcyclicState, +) -> Result<()> { + let arity = labels[target].arity; + if state.stack.len() < arity { + return Err(anyhow!("acyclic executor: stack underflow at branch")); + } + let results: Vec = state.stack[state.stack.len() - arity..].to_vec(); + labels[target].entries.push(( + guard, + results, + state.locals.clone(), + state.globals.clone(), + state.memory.clone(), + )); + Ok(()) +} + +/// ITE-merge a label's incoming branches into one successor. Guards are pairwise +/// disjoint (if: cond vs ¬cond; br_table: sel==i vs sel≥n unsigned), so the +/// fold `ite(g0, v0, ite(g1, v1, … v_last))` selects the live branch under the +/// merged path (= disjunction of guards). Returns `(results, locals, globals, +/// path)`, or `None` if no branch reaches the label (unreachable join). +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn acyclic_merge_entries(entries: &[BranchEntry]) -> Option { + let (_, last_res, last_loc, last_glob, last_mem) = entries.last()?; + let mut res = last_res.clone(); + let mut loc = last_loc.clone(); + let mut glob = last_glob.clone(); + let mut mem = last_mem.clone(); + let mut guards: Vec = vec![entries.last().unwrap().0.clone()]; + // Fold earlier entries outward: each takes precedence under its own guard. + for (g, r, l, gl, m) in entries[..entries.len() - 1].iter().rev() { + res = r + .iter() + .zip(res.iter()) + .map(|(t, f)| merge_bv(g, t, f)) + .collect(); + loc = l + .iter() + .zip(loc.iter()) + .map(|(t, f)| merge_bv(g, t, f)) + .collect(); + glob = gl + .iter() + .zip(glob.iter()) + .map(|(t, f)| merge_bv(g, t, f)) + .collect(); + // ITE-merge the memory Array (Bool::ite is generic over the Z3 sort). + mem = g.ite(m, &mem); + guards.push(g.clone()); + } + let path = Bool::or(&guards); + Some((res, loc, glob, mem, path)) +} + +/// Apply a popped label's join to the state: the post-block state is the +/// ITE-merge of all branches that targeted it, with the block's result values +/// pushed on top of `base_stack` (the operand stack below the block entry). +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn acyclic_apply_join( + state: &mut AcyclicState, + base_stack: Vec, + join: &LabelJoin, +) -> Result<()> { + match acyclic_merge_entries(&join.entries) { + None => { + // No branch reaches this join — code after the block is unreachable. + state.reachable = false; + Ok(()) + } + Some((results, locals, globals, memory, path)) => { + let mut stack = base_stack; + stack.extend(results); + state.stack = stack; + state.locals = locals; + state.globals = globals; + state.memory = memory; + state.path = path; + state.reachable = true; + Ok(()) + } + } +} + +/// Execute a structured, acyclic instruction sequence, merging branch states at +/// label joins. Precondition: the enclosing function passed +/// `is_acyclic_modelable_body`. `labels[0]` is the function label. +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn exec_acyclic( + body: &[Instruction], + state: &mut AcyclicState, + labels: &mut Vec, + ctx: &VerificationSignatureContext, + depth: usize, + // Monotonic per-ENCODE counter for impure-call havoc names. Threaded (not in + // AcyclicState, which is saved/restored at branches) so the Nth impure call + // gets the same id in the original and optimized encodings → they unify. + havoc_counter: &mut u64, +) -> Result<()> { + for instr in body { + if !state.reachable { + // Dead code after an unconditional branch / return. + break; + } + match instr { + Instruction::Block { block_type, body } => { + let arity = acyclic_block_arity(block_type)?; + let base_stack = state.stack.clone(); + labels.push(LabelJoin { + arity, + entries: Vec::new(), + }); + exec_acyclic(body, state, labels, ctx, depth, havoc_counter)?; + // Fall-through off the end == implicit `br 0` to this block. + if state.reachable { + let target = labels.len() - 1; + acyclic_push_branch(labels, target, state.path.clone(), state)?; + } + let join = labels.pop().unwrap(); + acyclic_apply_join(state, base_stack, &join)?; + } + Instruction::If { + block_type, + then_body, + else_body, + } => { + let arity = acyclic_block_arity(block_type)?; + let cond = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: If underflow"))?; + let cond_nz = cond.eq(BV::from_i64(0, 32)).not(); + let base_stack = state.stack.clone(); + let saved_locals = state.locals.clone(); + let saved_globals = state.globals.clone(); + let saved_memory = state.memory.clone(); + let saved_path = state.path.clone(); + labels.push(LabelJoin { + arity, + entries: Vec::new(), + }); + let target = labels.len() - 1; + // then-branch under path ∧ cond≠0 + state.path = Bool::and(&[saved_path.clone(), cond_nz.clone()]); + state.reachable = true; + exec_acyclic(then_body, state, labels, ctx, depth, havoc_counter)?; + if state.reachable { + acyclic_push_branch(labels, target, state.path.clone(), state)?; + } + // else-branch under path ∧ cond=0 (restore entry state) + state.stack = base_stack.clone(); + state.locals = saved_locals; + state.globals = saved_globals; + state.memory = saved_memory; + state.path = Bool::and(&[saved_path, cond_nz.not()]); + state.reachable = true; + exec_acyclic(else_body, state, labels, ctx, depth, havoc_counter)?; + if state.reachable { + acyclic_push_branch(labels, target, state.path.clone(), state)?; + } + let join = labels.pop().unwrap(); + acyclic_apply_join(state, base_stack, &join)?; + } + Instruction::Br(depth) => { + let target = acyclic_label_index(labels, *depth)?; + acyclic_push_branch(labels, target, state.path.clone(), state)?; + state.reachable = false; + } + Instruction::BrIf(depth) => { + let cond = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: BrIf underflow"))?; + let cond_nz = cond.eq(BV::from_i64(0, 32)).not(); + let target = acyclic_label_index(labels, *depth)?; + let branch_guard = Bool::and(&[state.path.clone(), cond_nz.clone()]); + acyclic_push_branch(labels, target, branch_guard, state)?; + // Fall-through path narrows to cond=0. + state.path = Bool::and(&[state.path.clone(), cond_nz.not()]); + } + Instruction::BrTable { targets, default } => { + let sel = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: BrTable underflow"))?; + let n = targets.len() as u64; + // Arm i is taken iff sel == i. + for (i, d) in targets.iter().enumerate() { + let target = acyclic_label_index(labels, *d)?; + let guard = + Bool::and(&[state.path.clone(), sel.eq(BV::from_u64(i as u64, 32))]); + acyclic_push_branch(labels, target, guard, state)?; + } + // Default is taken iff sel >=u n (UNSIGNED — gale must-fix: a + // signed >= would leave selectors in [2^31, 2^32) in a gap + // between the arm guards and the default, letting Z3 pick any + // arm → a false equivalence). + let target = acyclic_label_index(labels, *default)?; + let guard = Bool::and(&[state.path.clone(), sel.bvuge(BV::from_u64(n, 32))]); + acyclic_push_branch(labels, target, guard, state)?; + state.reachable = false; + } + Instruction::Return => { + // Branch to the implicit function label. + acyclic_push_branch(labels, 0, state.path.clone(), state)?; + state.reachable = false; + } + Instruction::Unreachable => { + // Trap: this path diverges (⊥) — it drops out of the result + // ITE and contributes to no join, exactly like a no-return + // call. NEVER havoc (Alive2 ub discipline). + state.reachable = false; + } + Instruction::Call(g) => { + let callee_opt = ctx + .function_bodies + .get(*g as usize) + .and_then(|o| o.as_deref()); + match callee_opt { + // No-return callee (panic→unreachable): ⊥-diverge — the path + // traps and drops out of the result ITE, NEVER havoc (Alive2 + // `ub` discipline; havoc reopens #155/#159). + Some(callee) if is_noreturn_callee(callee) => { + state.reachable = false; + } + // Acyclic, single-result local callee: model BY BODY (the + // same executor models the original `call F` and the inlined + // body identically → the inline is provable). Its OWN impure + // calls get havoc'd via the shared counter. + Some(callee) + if depth + 1 < MAX_ACYCLIC_CALL_DEPTH + && callee.signature.results.len() <= 1 + && is_acyclic_modelable_body(&callee.instructions) => + { + encode_acyclic_call(callee, state, ctx, depth, havoc_counter)?; + } + // Impure / opaque call (import, multi-result, too-deep, or a + // local callee with a loop/unmodelable shape): HAVOC. Sound + // and conservative — corresponding calls in the original and + // optimized encodings get the SAME havoc id (shared counter) + // so they unify; only a havoc-vs-CONCRETE mismatch (e.g. + // original havocs `call F` but optimized inlines F's effect) + // refuses to prove → revert (the #155/#159 guard). + _ => { + acyclic_havoc_call(*g, state, ctx, havoc_counter)?; + } + } + } + // PR-C Phase 2: memory loads/stores via the SAME shared helpers the + // main encoder uses (so a by-body call and its inlined body produce + // bit-identical memory expressions). Handled explicitly — NEVER + // delegated to encode_simple_instruction, which havocs memory ops. + Instruction::I32Load { offset, .. } => { + let addr = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: I32Load underflow"))?; + state + .stack + .push(encode_i32_load_from(&state.memory, &addr, *offset)?); + } + Instruction::I64Load { offset, .. } => { + let addr = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: I64Load underflow"))?; + state + .stack + .push(encode_i64_load_from(&state.memory, &addr, *offset)?); + } + Instruction::I32Store { offset, .. } => { + let value = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: I32Store value underflow"))?; + let addr = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: I32Store addr underflow"))?; + state.memory = encode_i32_store_into(&state.memory, &addr, &value, *offset); + } + Instruction::I64Store { offset, .. } => { + let value = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: I64Store value underflow"))?; + let addr = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: I64Store addr underflow"))?; + state.memory = encode_i64_store_into(&state.memory, &addr, &value, *offset); + } + // Partial-width (8/16-bit) loads/stores (loom#161 helpers). + Instruction::I32Load8S { offset, .. } + | Instruction::I32Load8U { offset, .. } + | Instruction::I32Load16S { offset, .. } + | Instruction::I32Load16U { offset, .. } => { + let addr = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: partial load underflow"))?; + let (bytes, signed) = match instr { + Instruction::I32Load8S { .. } => (1, true), + Instruction::I32Load8U { .. } => (1, false), + Instruction::I32Load16S { .. } => (2, true), + _ => (2, false), + }; + state.stack.push(encode_partial_load_from( + &state.memory, + &addr, + *offset, + bytes, + signed, + 32, + )?); + } + Instruction::I32Store8 { offset, .. } | Instruction::I32Store16 { offset, .. } => { + let value = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: partial store value underflow"))?; + let addr = state + .stack + .pop() + .ok_or_else(|| anyhow!("acyclic executor: partial store addr underflow"))?; + let bytes = if matches!(instr, Instruction::I32Store8 { .. }) { + 1 + } else { + 2 + }; + state.memory = + encode_partial_store_into(&state.memory, &addr, &value, *offset, bytes); + } + other => { + // Defense in depth: never delegate an unmodeled op (the + // delegate's catch-all havocs and would desync the stack). + if !is_acyclic_simple_modelable(other) { + return Err(anyhow!( + "acyclic executor: instruction not modelable: {:?}", + other + )); + } + encode_simple_instruction( + other, + &mut state.stack, + &mut state.locals, + &mut state.globals, + )?; + } + } + } + Ok(()) +} + +/// Model an opaque / impure `call g` by HAVOC (PR-C Phase 2b): pop the args, +/// push a fresh symbolic per result, and havoc all globals + memory. Naming is +/// deterministic in the per-encode `havoc_counter` so the Nth impure call gets +/// the SAME names in the original and optimized encodings → they unify in Z3 +/// (the original's `call F` and the optimized's `call F` cancel). Mirrors the +/// main encoder's conservative Call tier. Used for imports and local callees the +/// acyclic executor cannot model by-body (loops, multi-result, too-deep). +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn acyclic_havoc_call( + g: u32, + state: &mut AcyclicState, + ctx: &VerificationSignatureContext, + havoc_counter: &mut u64, +) -> Result<()> { + let sig = ctx + .get_function_signature(g) + .ok_or_else(|| anyhow!("acyclic executor: no signature for call {}", g))?; + let nparams = sig.params.len(); + if state.stack.len() < nparams { + return Err(anyhow!("acyclic executor: havoc-call arg underflow")); + } + // Discard the args (the opaque result is independent of them, matching the + // main encoder's conservative tier). + state.stack.truncate(state.stack.len() - nparams); + // Fresh symbolic result(s). + for (i, rt) in sig.results.iter().enumerate() { + state.stack.push(BV::new_const( + format!("call_{}_result_{}", g, i), + bv_width_for_value_type(*rt), + )); + } + // Havoc globals + memory with the shared deterministic id. + let id = *havoc_counter; + *havoc_counter += 1; + for (idx, global) in state.globals.iter_mut().enumerate() { + *global = BV::new_const( + format!("global_{}_after_call_{}_{}", idx, g, id), + global.get_size(), + ); + } + state.memory = Array::new_const( + format!("mem_after_call_{}_{}", g, id), + &Sort::bitvector(32), + &Sort::bitvector(8), + ); + Ok(()) +} + +/// Model a `call` to `callee` BY BODY: pop the args, recursively encode the +/// callee through `exec_acyclic` (params bound to the args, declared locals +/// zero-init, globals SHARED with the caller so the callee's global writes +/// propagate back), and push the merged single result. Sets `reachable=false` +/// if every callee path traps (no normal exit). The SAME executor models both +/// the original `call F` (here) and the inlined body, so they yield the same +/// expression → the inline is provable. +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn encode_acyclic_call( + callee: &Function, + state: &mut AcyclicState, + ctx: &VerificationSignatureContext, + depth: usize, + havoc_counter: &mut u64, +) -> Result<()> { + if depth + 1 >= MAX_ACYCLIC_CALL_DEPTH { + return Err(anyhow!("acyclic executor: call recursion too deep")); + } + let arity = callee.signature.results.len(); + if arity > 1 { + return Err(anyhow!("acyclic executor: multi-result callee")); + } + let nparams = callee.signature.params.len(); + if state.stack.len() < nparams { + return Err(anyhow!("acyclic executor: call arg underflow")); + } + // Pop args (last param on top of stack); bind to the callee's param locals. + let mut callee_locals: Vec = state.stack.split_off(state.stack.len() - nparams); + for (count, lt) in callee.locals.iter() { + let width = bv_width_for_value_type(*lt); + for _ in 0..*count { + callee_locals.push(BV::from_u64(0, width)); + } + } + let mut sub = AcyclicState { + stack: Vec::new(), + locals: callee_locals, + // SHARED globals + memory — the callee may read/write module globals and + // linear memory; the merged result threads the updated state back to the + // caller. + globals: state.globals.clone(), + memory: state.memory.clone(), + // Inherit the caller's path so callee branch guards conjoin it. + path: state.path.clone(), + reachable: true, + }; + let mut sub_labels = vec![LabelJoin { + arity, + entries: Vec::new(), + }]; + exec_acyclic( + &callee.instructions, + &mut sub, + &mut sub_labels, + ctx, + depth + 1, + havoc_counter, + )?; + if sub.reachable { + acyclic_push_branch(&mut sub_labels, 0, sub.path.clone(), &sub)?; + } + let join = sub_labels.pop().unwrap(); + match acyclic_merge_entries(&join.entries) { + Some((results, _locals, globals, memory, path)) => { + // Propagate callee global + memory writes; narrow the caller path to + // the callee's returning paths. + state.globals = globals; + state.memory = memory; + state.path = path; + if arity == 1 { + state.stack.push(results[0].clone()); + } + } + None => { + // Callee never returns normally (all paths trap) → caller diverges. + state.reachable = false; + } + } + Ok(()) +} + +/// Encode a function via the precise acyclic CF executor. Mirrors +/// `encode_function_to_smt`'s deterministic input naming (`param{i}`, +/// `global{i}`) so two encodings UNIFY in Z3 and can be compared directly. +/// Returns `Err` if the function is not acyclic-modelable (caller falls back). +#[cfg(feature = "verification")] +#[allow(dead_code)] +fn encode_acyclic_function( + func: &Function, + ctx: &VerificationSignatureContext, +) -> Result> { + if !is_acyclic_modelable_body(&func.instructions) { + return Err(anyhow!("function not acyclic-modelable")); + } + let arity = func.signature.results.len(); + if arity > 1 { + return Err(anyhow!("acyclic executor: multi-result not modeled")); + } + + let mut locals: Vec = Vec::new(); + for (idx, pt) in func.signature.params.iter().enumerate() { + locals.push(BV::new_const( + format!("param{}", idx), + bv_width_for_value_type(*pt), + )); + } + for (count, lt) in func.locals.iter() { + let width = bv_width_for_value_type(*lt); + for _ in 0..*count { + locals.push(BV::from_u64(0, width)); + } + } + let mut globals: Vec = Vec::new(); + for i in 0..16 { + globals.push(BV::new_const(format!("global{}", i), 32)); + } + // Symbolic initial memory — same name as the main encoder so two + // encode_acyclic_function calls (original + optimized) unify in Z3. + let memory = Array::new_const("memory", &Sort::bitvector(32), &Sort::bitvector(8)); + + let mut state = AcyclicState { + stack: Vec::new(), + locals, + globals, + memory, + path: Bool::from_bool(true), + reachable: true, + }; + let mut labels = vec![LabelJoin { + arity, + entries: Vec::new(), + }]; + // Per-encode impure-call havoc counter (reset to 0 each encode so the Nth + // impure call gets the same id in the original and optimized encodings). + let mut havoc_counter: u64 = 0; + exec_acyclic( + &func.instructions, + &mut state, + &mut labels, + ctx, + 0, + &mut havoc_counter, + )?; + // Fall-through off the function body == implicit return. + if state.reachable { + acyclic_push_branch(&mut labels, 0, state.path.clone(), &state)?; + } + let func_join = labels.pop().unwrap(); + + if arity == 0 { + return Ok(None); + } + match acyclic_merge_entries(&func_join.entries) { + Some((results, _, _, _, _)) => Ok(Some(results[0].clone())), + None => Err(anyhow!("acyclic executor: function has no reachable exit")), + } +} + +/// PR-C (#219) M3: precise FAST PATH for proving `original ≡ optimized` when +/// BOTH functions are acyclic-CF-modelable (integer, block/if/br/br_if/br_table, +/// by-body calls, no loops/memory). Encodes both via the precise acyclic +/// executor (deterministic input names → they unify in Z3) and checks semantic +/// equivalence (`a == b` has no counterexample). This is what lets the verified +/// inliner prove a `call F` (with a `br_table` callee) equal to its inlined +/// body — the case the straight-line by-body modeler (`callee_inlinable_by_body`) +/// cannot reach. +/// +/// Returns `Err` if either function is not acyclic-modelable, if a result width +/// mismatches, or if the encoding fails — the caller then FALLS BACK to the +/// existing encoder-based `verify_function_equivalence` (no behavior change for +/// non-acyclic functions; this only ADDS proving power, never removes a check). +#[cfg(feature = "verification")] +#[allow(dead_code)] // wired into inline_functions' verify-or-revert in M3 step 2 +pub(crate) fn verify_acyclic_equivalence( + original: &Function, + optimized: &Function, + ctx: &VerificationSignatureContext, +) -> Result { + if original.signature.params != optimized.signature.params + || original.signature.results != optimized.signature.results + { + return Err(anyhow!("acyclic verify: signature changed")); + } + // Only handle the precise acyclic case; anything else → caller falls back. + if !is_acyclic_modelable_body(&original.instructions) + || !is_acyclic_modelable_body(&optimized.instructions) + { + return Err(anyhow!("acyclic verify: not acyclic-modelable")); + } + + let cfg = z3_config_with_timeout(); + with_z3_config(&cfg, || { + let solver = Solver::new(); + let orig_result = encode_acyclic_function(original, ctx); + let opt_result = encode_acyclic_function(optimized, ctx); + match (&orig_result, &opt_result) { + (Ok(Some(orig)), Ok(Some(opt))) => { + // loom#145 soundness boundary: never width-match the final + // equality — a width mismatch is a model artifact we cannot + // soundly resolve, so bail to "not proven" and let the caller + // fall back / revert. + if orig.get_size() != opt.get_size() { + return Err(anyhow!("acyclic verify: result width mismatch")); + } + solver.assert(orig.eq(opt).not()); + match solver.check() { + SatResult::Unsat => Ok(true), + SatResult::Sat => Ok(false), + SatResult::Unknown => Err(anyhow!("acyclic verify: solver unknown (timeout)")), + } + } + // Both void: no return value to compare. The acyclic executor does + // not yet model side-effect (global/memory) equivalence for void + // functions, so defer to the existing verifier — signal fall-back. + (Ok(None), Ok(None)) => Err(anyhow!("acyclic verify: void — defer to fallback")), + (Err(e), _) => Err(anyhow!("acyclic verify: encode original: {}", e)), + (_, Err(e)) => Err(anyhow!("acyclic verify: encode optimized: {}", e)), + _ => Err(anyhow!("acyclic verify: return-type mismatch")), + } + }) +} + /// Verify that block stack properties are preserved across optimization /// /// Uses Z3 to formally verify that a block transformation preserves: @@ -7397,6 +8351,412 @@ mod tests { use super::*; use crate::parse; + // ---- PR-C (gale #219) M1a: precise acyclic CF executor, tested in isolation ---- + + /// Encode both functions via the precise acyclic executor (deterministic + /// input names → they unify in Z3) and check semantic equivalence. + /// `Ok(true)` = proven equivalent, `Ok(false)` = counterexample found. + #[cfg(feature = "verification")] + fn acyclic_equiv(wat_a: &str, wat_b: &str) -> Result { + acyclic_equiv_at(wat_a, 0, wat_b, 0) + } + + /// As `acyclic_equiv` but comparing a chosen function index in each module + /// (so a caller that `call`s a sibling can be compared to its inlined form). + #[cfg(feature = "verification")] + fn acyclic_equiv_at(wat_a: &str, ia: usize, wat_b: &str, ib: usize) -> Result { + let ma = parse::parse_wat(wat_a).unwrap(); + let mb = parse::parse_wat(wat_b).unwrap(); + let ctx_a = VerificationSignatureContext::from_module(&ma); + let ctx_b = VerificationSignatureContext::from_module(&mb); + let fa = &ma.functions[ia]; + let fb = &mb.functions[ib]; + let cfg = z3_config_with_timeout(); + with_z3_config(&cfg, || { + let solver = Solver::new(); + let ra = encode_acyclic_function(fa, &ctx_a)?; + let rb = encode_acyclic_function(fb, &ctx_b)?; + match (ra, rb) { + (Some(a), Some(b)) => { + if a.get_size() != b.get_size() { + return Ok(false); + } + solver.assert(a.eq(&b).not()); + match solver.check() { + SatResult::Unsat => Ok(true), + SatResult::Sat => Ok(false), + SatResult::Unknown => Err(anyhow!("solver unknown")), + } + } + _ => Err(anyhow!("void/encode mismatch")), + } + }) + } + + /// 3-way `br_table` switch (0→10, 1→20, default→30). + #[cfg(feature = "verification")] + const SWITCH_10_20_30: &str = r#" + (module (func (param i32) (result i32) + (block (block (block + local.get 0 + br_table 0 1 2) + i32.const 10 + return) + i32.const 20 + return) + i32.const 30 + return))"#; + + /// Same dispatch expressed as an if/else chain — semantically identical. + #[cfg(feature = "verification")] + const IFELSE_10_20_30: &str = r#" + (module (func (param i32) (result i32) + local.get 0 + i32.eqz + if (result i32) + i32.const 10 + else + local.get 0 + i32.const 1 + i32.eq + if (result i32) + i32.const 20 + else + i32.const 30 + end + end))"#; + + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_brtable_equiv_ifelse() { + // The precise executor must model a br_table switch identically to the + // equivalent if/else chain (independent encodings → real proof). + assert!( + acyclic_equiv(SWITCH_10_20_30, IFELSE_10_20_30).unwrap(), + "br_table switch must prove equivalent to the if/else chain" + ); + } + + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_brtable_identity() { + // A switch is trivially equivalent to itself (sanity: the executor is + // deterministic and the merge is stable). + assert!(acyclic_equiv(SWITCH_10_20_30, SWITCH_10_20_30).unwrap()); + } + + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_brtable_detects_difference() { + // A switch whose default returns 99 instead of 30 must NOT prove + // equivalent — the executor must distinguish, not over-approve. + let switch_wrong_default = r#" + (module (func (param i32) (result i32) + (block (block (block + local.get 0 + br_table 0 1 2) + i32.const 10 + return) + i32.const 20 + return) + i32.const 99 + return))"#; + assert!( + !acyclic_equiv(SWITCH_10_20_30, switch_wrong_default).unwrap(), + "differing default arm must yield a counterexample (sel >= 2)" + ); + } + + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_brtable_detects_swapped_arms() { + // Swapping arm 0 and arm 1 results must be caught (sel==0 / sel==1). + let swapped = r#" + (module (func (param i32) (result i32) + (block (block (block + local.get 0 + br_table 0 1 2) + i32.const 20 + return) + i32.const 10 + return) + i32.const 30 + return))"#; + assert!( + !acyclic_equiv(SWITCH_10_20_30, swapped).unwrap(), + "swapped arm values must yield a counterexample" + ); + } + + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_modelable_gate() { + // The whole-function gate must reject loops (back-edges). + let with_loop = + parse::parse_wat("(module (func (param i32) (result i32) (loop local.get 0)))") + .unwrap(); + assert!( + !is_acyclic_modelable_body(&with_loop.functions[0].instructions), + "loops (back-edges) must be rejected" + ); + + // PR-C Phase 2: memory ops are now ADMITTED (the executor threads a + // memory Array). (Phase 1 rejected them.) + let with_memory = parse::parse_wat( + "(module (memory 1) (func (param i32) (result i32) local.get 0 i32.load))", + ) + .unwrap(); + assert!( + is_acyclic_modelable_body(&with_memory.functions[0].instructions), + "memory ops are modelable in Phase 2" + ); + + // A pure-integer br_table switch must be admitted. + let switch = parse::parse_wat(SWITCH_10_20_30).unwrap(); + assert!( + is_acyclic_modelable_body(&switch.functions[0].instructions), + "pure-integer acyclic br_table function must be admitted" + ); + } + + /// M1b: a caller that `call`s a br_table callee must prove equivalent to the + /// same logic with the callee inlined — i.e. the executor models `call F` + /// by-body identically to the inlined body (the inline-soundness principle + /// extended to control flow). func 0 = decide (switch), func 1 = z (calls it). + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_call_by_body_equiv_inlined() { + let caller_with_call = r#" + (module + (func (param i32) (result i32) + (block (block (block + local.get 0 + br_table 0 1 2) + i32.const 10 + return) + i32.const 20 + return) + i32.const 30 + return) + (func (param i32) (result i32) + local.get 0 + call 0))"#; + // z (func 1) calls decide (func 0); compare to the inlined switch. + assert!( + acyclic_equiv_at(caller_with_call, 1, SWITCH_10_20_30, 0).unwrap(), + "call-by-body must prove equivalent to the inlined callee body" + ); + } + + /// M1b: a call to a no-return callee (panic→unreachable) must be modeled as + /// ⊥ (diverge), never havoc. A function that returns 7 on one path and traps + /// on the other proves equivalent to one that simply returns 7 (the trapping + /// path drops out of the result; both sides agree on the returning path). + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_noreturn_call_diverges() { + // func 0 = panic (no params, unreachable); func 1 = g: if arg!=0 return 7 + // else call panic (diverge). func 1 should equal "always return 7". + let with_trap = r#" + (module + (func unreachable) + (func (param i32) (result i32) + local.get 0 + if (result i32) + i32.const 7 + else + call 0 + i32.const 7 + end))"#; + let always7 = r#" + (module (func (param i32) (result i32) i32.const 7))"#; + assert!( + acyclic_equiv_at(with_trap, 1, always7, 0).unwrap(), + "no-return call path must diverge (drop out), leaving only the return-7 path" + ); + } + + /// A direct `unreachable` (trap) on one path must diverge — same ⊥ discipline + /// as a no-return call. The seam decides end their overflow arms in + /// `unreachable`, so the executor must model it (not reject the function). + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_unreachable_diverges() { + let with_trap = r#" + (module (func (param i32) (result i32) + local.get 0 + if (result i32) + i32.const 7 + else + unreachable + end))"#; + let always7 = r#"(module (func (param i32) (result i32) i32.const 7))"#; + assert!( + acyclic_equiv(with_trap, always7).unwrap(), + "the unreachable (trap) path must diverge, leaving only the return-7 path" + ); + } + + /// PR-C Phase 2: a store on each branch followed by a load must MERGE memory + /// correctly (the load sees ite(cond, mem_then, mem_else)). Equivalent to the + /// direct if/else — proves the per-join memory ITE-merge is wired right. + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_memory_merge() { + let with_mem = r#" + (module (memory 1) + (func (param i32) (result i32) + local.get 0 + if + i32.const 100 + i32.const 5 + i32.store + else + i32.const 100 + i32.const 6 + i32.store + end + i32.const 100 + i32.load))"#; + let direct = r#" + (module (memory 1) + (func (param i32) (result i32) + local.get 0 + if (result i32) + i32.const 5 + else + i32.const 6 + end))"#; + assert!( + acyclic_equiv(with_mem, direct).unwrap(), + "store-on-each-branch then load must merge memory (ite) → loaded value matches" + ); + } + + /// PR-C Phase 2b: a call to an impure import is HAVOC'd. Two encodings of the + /// same impure call must UNIFY (same per-encode havoc id) → equivalent. (If + /// the counter weren't deterministic per-encode, the two `call_0_result_0` + /// names would differ and this would spuriously fail.) + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_havoc_call_unifies() { + let calls_import = r#" + (module + (import "env" "f" (func $f (param i32) (result i32))) + (func (param i32) (result i32) + local.get 0 + call $f))"#; + assert!( + acyclic_equiv(calls_import, calls_import).unwrap(), + "two encodings of the same impure call must unify (havoc id parity)" + ); + } + + /// PR-C Phase 2b SOUNDNESS: a havoc'd call result must NOT be provably equal + /// to a concrete value — `call $f(x)` (opaque) is not `x`. This is the guard + /// against a havoc-vs-concrete false equivalence (#155/#159 surface). + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_havoc_not_concrete() { + let calls_import = r#" + (module + (import "env" "f" (func $f (param i32) (result i32))) + (func (param i32) (result i32) + local.get 0 + call $f))"#; + let identity = r#" + (module + (import "env" "f" (func $f (param i32) (result i32))) + (func (param i32) (result i32) + local.get 0))"#; + assert!( + !acyclic_equiv(calls_import, identity).unwrap(), + "a havoc'd call result must NOT prove equal to a concrete value" + ); + } + + /// PR-C Phase 2b SOUNDNESS: an impure call havocs memory, so a load AFTER the + /// call reads the havoc'd memory — NOT provably equal to loading the initial + /// memory. Guards against the havoc failing to invalidate memory. + #[test] + #[cfg(feature = "verification")] + fn test_acyclic_havoc_memory() { + let load_after_call = r#" + (module + (import "env" "g" (func $g)) + (memory 1) + (func (result i32) + call $g + i32.const 0 + i32.load))"#; + let load_initial = r#" + (module + (import "env" "g" (func $g)) + (memory 1) + (func (result i32) + i32.const 0 + i32.load))"#; + assert!( + !acyclic_equiv(load_after_call, load_initial).unwrap(), + "a load after an impure call must read havoc'd memory, not the initial" + ); + } + + /// M3: the public verify_acyclic_equivalence fast-path proves a caller that + /// `call`s a br_table decide equivalent to the inlined switch, and rejects a + /// non-equivalent optimized body — the exact inline verify-or-revert pattern. + #[test] + #[cfg(feature = "verification")] + fn test_verify_acyclic_equivalence_inline() { + let module_wat = r#" + (module + (func (param i32) (result i32) + (block (block (block + local.get 0 + br_table 0 1 2) + i32.const 10 + return) + i32.const 20 + return) + i32.const 30 + return) + (func (param i32) (result i32) + local.get 0 + call 0))"#; + let m = parse::parse_wat(module_wat).unwrap(); + let ctx = VerificationSignatureContext::from_module(&m); + let z = &m.functions[1]; // caller: local.get 0; call decide + + // Equivalent: the inlined switch. + let inlined = parse::parse_wat(SWITCH_10_20_30).unwrap(); + assert!( + verify_acyclic_equivalence(z, &inlined.functions[0], &ctx).unwrap(), + "z (calls decide) must verify ≡ the inlined switch" + ); + + // Non-equivalent: a switch with a wrong default (99) must NOT verify. + let wrong = parse::parse_wat( + r#"(module (func (param i32) (result i32) + (block (block (block local.get 0 br_table 0 1 2) + i32.const 10 return) i32.const 20 return) i32.const 99 return))"#, + ) + .unwrap(); + assert!( + !verify_acyclic_equivalence(z, &wrong.functions[0], &ctx).unwrap(), + "a wrong-default body must NOT verify equivalent" + ); + + // A function with a loop is not acyclic-modelable → Err (caller falls back). + let loopy = parse::parse_wat( + "(module (func (param i32) (result i32) (loop (result i32) local.get 0)))", + ) + .unwrap(); + assert!( + verify_acyclic_equivalence(&loopy.functions[0], &loopy.functions[0], &ctx).is_err(), + "non-acyclic functions must Err so the caller falls back" + ); + } + #[test] fn test_verify_constant_folding() { // Original: (i32.add (i32.const 2) (i32.const 3)) diff --git a/loom-core/tests/fixtures/repro-219/README.md b/loom-core/tests/fixtures/repro-219/README.md new file mode 100644 index 0000000..7e090fd --- /dev/null +++ b/loom-core/tests/fixtures/repro-219/README.md @@ -0,0 +1,30 @@ +# #219 seam-SROA repro + +`sem.loom.wasm` — gale's authentic post-loom-inline dissolved `z_impl_k_sem_give` +(the C↔Rust k_sem_give decide seam). Provided on issue #219; **sha256 +`f81da42d…`, 5254 B** (verified on decode). From +`gale-smart-data/benches/engine_control/silicon/wasm-testbed/repro-loom-seam-sroa/`. + +It still contains the u64 pack/unpack round-trip the seam-SROA pass must dissolve: + +- **pack** (decide builds the u64): `i64.extend_i32_u; i64.shl; i64.or` +- **unpack** (shim tears it back to scalars): `i64.and`, `i64.shr_u`, `i32.wrap_i64` +- a dead `i64` carrier local in between + +## Kill-criterion (gale, on-silicon G474RE) + +- structural: dissolved body has **no i64 pack/unpack**, carrier local gone, + ARM body 83 → ~55–60 insns. +- silicon: sem 860 → toward 471 (LLVM-LTO); mutex 472. gale re-flashes. + +## Plan (design confirmed on the issue — proof-carrying) + +1. wasm-local mem2reg: promote the single-assignment non-escaping i64 carrier so + the pack expression reaches the unpack sites. +2. SROA / pack-unpack algebraic forwarding via ISLE rewrites (preferred: add + `i64.extend_i32_u` / `i32.wrap_i64` ISLE terms), each with a **Z3 proof** + discharged before the rule ships: `(extend_u(a) | (extend_u(b)<>k → b`, `wrap_i64(extend_i32_u(x)) → x`. Then DCE drops the carrier+pack. +3. const dedup/hoist. + +See also the synthetic `../seam_sroa_decide.wat`. diff --git a/loom-core/tests/fixtures/repro-219/sem.loom.wasm b/loom-core/tests/fixtures/repro-219/sem.loom.wasm new file mode 100644 index 0000000..04328ba Binary files /dev/null and b/loom-core/tests/fixtures/repro-219/sem.loom.wasm differ diff --git a/loom-core/tests/fixtures/seam_sroa_decide.wat b/loom-core/tests/fixtures/seam_sroa_decide.wat new file mode 100644 index 0000000..da582f5 --- /dev/null +++ b/loom-core/tests/fixtures/seam_sroa_decide.wat @@ -0,0 +1,51 @@ +;; #219 — seam SROA / scalar-forwarding fixture. +;; +;; Models the gale C<->Rust "decide" seam: `decide` returns two i32 scalars +;; {action, new_count} PACKED into a single i64 (the u64-return ABI), and the +;; shim immediately UNPACKS them back to scalars. After `--passes inline`, +;; `decide` is inlined into `shim`, so on one value we get: +;; +;; pack: i64.extend_i32_u ; i64.shl ; i64.or (build the u64) +;; carry: local.set $t / local.get $t (the dead i64 carrier) +;; unpack: i64.and ; i64.shr_u ; i32.wrap_i64 (tear it back apart) +;; +;; The u64 is constructed from two i32s and immediately decomposed — textbook +;; SROA. The kill-criterion (gale #219): after the seam-SROA pass the dissolved +;; body has NO i64 pack/unpack and the i64 carrier local is gone; the result is +;; equivalent to operating on `action`/`new_count` directly. +;; +;; Layout: action in bits [0..8), new_count in bits [8..40). +(module + ;; decide: pack {action, new_count} -> u64 + ;; t = (extend_u(new_count) << 8) | (extend_u(action) & 0xff) + (func $decide (param $action i32) (param $new_count i32) (result i64) + local.get $new_count + i64.extend_i32_u + i64.const 8 + i64.shl + local.get $action + i64.extend_i32_u + i64.const 0xff + i64.and + i64.or) + + ;; shim: call decide, then unpack both scalars back and combine. + ;; Post-inline this is the pure pack/unpack round-trip on $t. + ;; Semantically equivalent to: (action & 0xff) + new_count + (func (export "run") (param $action i32) (param $new_count i32) (result i32) + (local $t i64) + local.get $action + local.get $new_count + call $decide + local.set $t + ;; unpack action = (t & 0xff) + local.get $t + i64.const 0xff + i64.and + i32.wrap_i64 + ;; unpack new_count = (t >> 8) + local.get $t + i64.const 8 + i64.shr_u + i32.wrap_i64 + i32.add)) diff --git a/loom-shared/build.rs b/loom-shared/build.rs index 8380229..adfd7e7 100644 --- a/loom-shared/build.rs +++ b/loom-shared/build.rs @@ -29,6 +29,7 @@ fn main() { PathBuf::from("isle/types.isle"), PathBuf::from("isle/constructors.isle"), PathBuf::from("isle/rules/constant_folding.isle"), + PathBuf::from("isle/rules/seam_sroa.isle"), PathBuf::from("isle/rules/default.isle"), ]; @@ -100,6 +101,10 @@ fn fix_extractor_ownership(code: &str) -> String { "iadd64_extract", "isub64_extract", "imul64_extract", + // #219 seam-SROA — unary conversion extractors (cranelift-isle emits + // these with owned `Value`; rewrite to `&Value` like the rest). + "i32_wrap_i64_extract", + "i64_extend_i32_u_extract", ]; for extractor in &extractors { diff --git a/loom-shared/isle/constructors.isle b/loom-shared/isle/constructors.isle index 3c82f6a..fb14ca6 100644 --- a/loom-shared/isle/constructors.isle +++ b/loom-shared/isle/constructors.isle @@ -137,6 +137,22 @@ (decl ipopcnt64 (Value) Value) (extern constructor ipopcnt64 ipopcnt64) +;; ============================================================================ +;; Integer width conversions (#219 seam-SROA) +;; ============================================================================ +;; Both constructor and extractor are declared so the pack/unpack forwarding +;; rules can match these terms on a rule LHS and rebuild them on the RHS. +;; NOTE: the *_extract fns must also be listed in build.rs `fix_extractor_ +;; ownership` so the generated trait takes &Value (cranelift-isle emits owned). + +(decl i32_wrap_i64 (Value) Value) +(extern constructor i32_wrap_i64 i32_wrap_i64) +(extern extractor i32_wrap_i64 i32_wrap_i64_extract) + +(decl i64_extend_i32_u (Value) Value) +(extern constructor i64_extend_i32_u i64_extend_i32_u) +(extern extractor i64_extend_i32_u i64_extend_i32_u_extract) + ;; ============================================================================ ;; Other Constructors ;; ============================================================================ diff --git a/loom-shared/isle/rules/seam_sroa.isle b/loom-shared/isle/rules/seam_sroa.isle new file mode 100644 index 0000000..cd827bc --- /dev/null +++ b/loom-shared/isle/rules/seam_sroa.isle @@ -0,0 +1,18 @@ +;; LOOM - #219 seam-SROA rewrite rules (proof-carrying) +;; +;; Dissolves the C<->Rust decide seam's u64 pack/unpack residue. Each rule +;; carries a Z3 soundness obligation discharged before it ships (the pass +;; output is also translation-validated). The bit-precise masked-forward rules +;; (`(extend_u(a) | (extend_u(b)< a`, `>>k -> b`) land in a later +;; step once the carrier mem2reg brings the pack expression to the unpack site; +;; this file starts with the unconditional identity that needs no side +;; conditions. + +;; wrap_i64(extend_i32_u(x)) = x +;; +;; Soundness: extend_i32_u zero-extends an i32 to i64 (Z3: zero_ext(32, x)); +;; wrap_i64 takes the low 32 bits (Z3: extract(31, 0, .)). For all i32 x, +;; extract(31,0, zero_ext(32, x)) = x. Unconditionally sound — no escape or +;; mask side conditions. Removes the round-trip a packed scalar pays when its +;; carrier is widened to the u64 ABI and immediately narrowed back. +(rule 100 (simplify (i32_wrap_i64 (i64_extend_i32_u x))) x) diff --git a/loom-shared/isle/types.isle b/loom-shared/isle/types.isle index f5cf1c3..35f3c23 100644 --- a/loom-shared/isle/types.isle +++ b/loom-shared/isle/types.isle @@ -115,6 +115,12 @@ (I64Ctz (val Value)) (I64Popcnt (val Value)) + ;; Integer width conversions (#219 seam-SROA — needed so the pack/unpack + ;; forwarding rules can match on these terms). Field name `val` mirrors the + ;; Rust ValueData variants in loom-shared/src/lib.rs. + (I32WrapI64 (val Value)) + (I64ExtendI32U (val Value)) + ;; Select instruction (Select (cond Value) (true_val Value) (false_val Value)) diff --git a/loom-shared/src/lib.rs b/loom-shared/src/lib.rs index b1d70bc..2cce36e 100644 --- a/loom-shared/src/lib.rs +++ b/loom-shared/src/lib.rs @@ -2132,6 +2132,23 @@ pub fn i64_extend_i32_u(val: Value) -> Value { Value(Box::new(ValueData::I64ExtendI32U { val })) } +/// Extractor for i32.wrap_i64 — used by ISLE rules to pattern-match the term +/// on a rule LHS (#219 seam-SROA). Returns the inner operand. +pub fn i32_wrap_i64_extract(val: &Value) -> Option { + match val.0.as_ref() { + ValueData::I32WrapI64 { val } => Some(val.clone()), + _ => None, + } +} + +/// Extractor for i64.extend_i32_u — ISLE rule LHS matching (#219 seam-SROA). +pub fn i64_extend_i32_u_extract(val: &Value) -> Option { + match val.0.as_ref() { + ValueData::I64ExtendI32U { val } => Some(val.clone()), + _ => None, + } +} + // ============================================================================ // Float-to-Integer Truncation Constructors (trapping) // ============================================================================ @@ -3344,6 +3361,51 @@ fn are_values_equal(lhs: &Value, rhs: &Value) -> bool { } } +/// #219 seam-SROA helper: true if `op` is `(i64.shl _ (i64.const k))` with +/// `0 < k < 64` and every set bit of `mask` is below k (`(mask as u64) >> k == +/// 0`). In that case `(op & mask) == 0` for ALL shift inputs — the shifted +/// value only occupies bits [k,64), which the mask zeroes. k is restricted to +/// (0,64) so the wasm shift-amount-mod-64 wrap can't change the effective +/// shift; k>=64 is left to the verifier rather than reasoned about here. +fn i64_shl_cleared_by_mask(op: &Value, mask: &Imm64) -> bool { + if let ValueData::I64Shl { rhs, .. } = op.data() { + if let ValueData::I64Const { val: k } = rhs.data() { + let k = k.value(); + if k > 0 && k < 64 { + return (mask.value() as u64) >> (k as u64) == 0; + } + } + } + false +} + +/// #219 seam-SROA helper: true if `shl_amt` is the constant `k` and `0 < k < +/// 64` — i.e. an `(i64.shl _ k)` whose amount equals the enclosing `shr_u`'s. +/// `(Z << k) >> k == Z & (u64::MAX >> k)` exactly when k is in (0,64) (no wasm +/// shift-mod-64 wrap). +fn i64_shr_undoes_shl(shl_amt: &Value, shr_amt: &Imm64) -> bool { + if let ValueData::I64Const { val: k2 } = shl_amt.data() { + let k = shr_amt.value(); + return k == k2.value() && k > 0 && k < 64; + } + false +} + +/// #219 seam-SROA helper: true if `op` is `(i64.shl _ (i64.const amt))` with +/// `0 < amt < 64`. Used to target the shr_u-over-or distribution at the pack +/// shape (`(or (shl Z k) B) >> k`) so it only fires where it dissolves. +fn i64_is_shl_by(op: &Value, amt: i64) -> bool { + if amt <= 0 || amt >= 64 { + return false; + } + if let ValueData::I64Shl { rhs, .. } = op.data() { + if let ValueData::I64Const { val: k } = rhs.data() { + return k.value() == amt; + } + } + false +} + /// Stateless simplification (expression-level only) fn rewrite_pure_impl(val: Value) -> Value { match val.data() { @@ -3713,6 +3775,37 @@ fn rewrite_pure_impl(val: Value) -> Value { // Algebraic: x & -1 = x (all bits set) (_, ValueData::I64Const { val }) if val.value() == -1 => lhs_simplified, (ValueData::I64Const { val }, _) if val.value() == -1 => rhs_simplified, + + // #219 seam-SROA: (shl Z k) & M → 0 when M's set bits are all + // below k — the left shift zeroes bits [0,k), the mask zeroes + // bits [k,64), so nothing survives. Unconditionally sound for + // any Z (Z3: (Z<> k == 0). Dissolves the + // high half of a u64 pack under a low-byte unpack mask. + (ValueData::I64Shl { .. }, ValueData::I64Const { val: m }) + if i64_shl_cleared_by_mask(&lhs_simplified, m) => + { + iconst64(Imm64(0)) + } + (ValueData::I64Const { val: m }, ValueData::I64Shl { .. }) + if i64_shl_cleared_by_mask(&rhs_simplified, m) => + { + iconst64(Imm64(0)) + } + // #219 seam-SROA: (or A B) & M → (survivor & M) when one OR + // operand is a left shift the mask clears. Recurse so the + // survivor (and a both-shifted case) simplifies further. + (ValueData::I64Or { lhs: a, rhs: b }, ValueData::I64Const { val: m }) + if i64_shl_cleared_by_mask(a, m) || i64_shl_cleared_by_mask(b, m) => + { + let survivor = if i64_shl_cleared_by_mask(a, m) { b } else { a }; + rewrite_pure(iand64(survivor.clone(), iconst64(*m))) + } + (ValueData::I64Const { val: m }, ValueData::I64Or { lhs: a, rhs: b }) + if i64_shl_cleared_by_mask(a, m) || i64_shl_cleared_by_mask(b, m) => + { + let survivor = if i64_shl_cleared_by_mask(a, m) { b } else { a }; + rewrite_pure(iand64(survivor.clone(), iconst64(*m))) + } _ => iand64(lhs_simplified, rhs_simplified), } } @@ -3808,6 +3901,31 @@ fn rewrite_pure_impl(val: Value) -> Value { } // Algebraic: x >> 0 = x (_, ValueData::I64Const { val }) if (val.value() & 0x3F) == 0 => lhs_simplified, + + // #219 seam-SROA: (shr_u (shl Z k) k) → Z & (low 64-k bits). + // Shifting left by k then logically right by the same k clears + // the high k bits and keeps the low 64-k. Unconditionally sound + // for 0>k == Z & (2^(64-k)-1)). Dissolves the + // high half of a u64 pack under a >>k unpack. + (ValueData::I64Shl { lhs: z, rhs: shamt }, ValueData::I64Const { val: k }) + if i64_shr_undoes_shl(shamt, k) => + { + let mask = (u64::MAX >> (k.value() as u64)) as i64; + rewrite_pure(iand64(z.clone(), iconst64(Imm64(mask)))) + } + // #219 seam-SROA: (shr_u (or P Q) k) → (or (P>>k) (Q>>k)) when an + // OR operand is (shl _ k) — logical right shift distributes over + // bitwise OR (sound), and recursing lets the shl side collapse + // via the rule above. Targeted to the pack shape (matching shl + // present) so it never bloats unrelated `(or _ _) >> k`. + (ValueData::I64Or { lhs: p, rhs: q }, ValueData::I64Const { val: k }) + if i64_is_shl_by(p, k.value()) || i64_is_shl_by(q, k.value()) => + { + rewrite_pure(ior64( + ishru64(p.clone(), iconst64(*k)), + ishru64(q.clone(), iconst64(*k)), + )) + } _ => ishru64(lhs_simplified, rhs_simplified), } } @@ -4345,6 +4463,30 @@ fn rewrite_pure_impl(val: Value) -> Value { } } + // Integer width conversions (#219 seam-SROA). The live rewrite for the + // decide seam's u64 pack/unpack round-trip. Z3-validated per pass. + ValueData::I32WrapI64 { val } => { + let val_simplified = rewrite_pure(val.clone()); + match val_simplified.data() { + // #219 seam-SROA: wrap_i64(extend_i32_u(x)) = x. zero-extending an + // i32 to i64 then taking the low 32 bits is the identity on i32 — + // unconditionally sound (Z3: extract(31,0, zero_ext(32,x)) = x). + ValueData::I64ExtendI32U { val: inner } => inner.clone(), + // Constant folding: wrap_i64(i64.const N) → i32.const (low 32 bits). + ValueData::I64Const { val: v } => iconst32(Imm32(v.value() as i32)), + _ => i32_wrap_i64(val_simplified), + } + } + + ValueData::I64ExtendI32U { val } => { + let val_simplified = rewrite_pure(val.clone()); + match val_simplified.data() { + // Constant folding: extend_i32_u(i32.const N) → i64.const (zero-extended). + ValueData::I32Const { val: v } => iconst64(Imm64((v.value() as u32 as u64) as i64)), + _ => i64_extend_i32_u(val_simplified), + } + } + // Select instruction optimization ValueData::Select { cond,