diff --git a/CHANGELOG.md b/CHANGELOG.md
index df0e1c0..b0850d6 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,38 @@ All notable changes to LOOM will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [1.1.15] - 2026-06-22
+
+Inliner release: the dissolve pipeline now inlines two real seam shapes it
+previously left as opaque `call`s, both gated by the Z3 translation validator
+(verify-or-revert). Found by gale's `gust` codegen bench. The #219 seam-teardown
+(carrier forwarding) is intentionally NOT in this release — it stays on its branch
+until the G474RE silicon dual-flash confirms it.
+
+### Added
+
+- **Acyclic control-flow inlining (#226).** The inline-modelability gate was
+  straight-line only, so a `#[inline]` leaf whose body is `if oob { call
+  panic_bounds_check; unreachable }` + a load + arithmetic (gale's `mix`) stayed
+  an opaque `call`. A precise acyclic-CF symbolic executor (PR-C) + a matching
+  candidate gate (Regime B) now admit acyclic CF + by-body calls + divergent
+  (no-return) calls, all proven by the validator. `mix` now inlines.
+- **Whole-function dead-code elimination in `loom optimize` (#228).** New
+  `dce-functions` CLI pass (default-on, after `inline`) exposes
+  `eliminate_dead_functions`: multi-site inlining duplicates a body and orphans
+  the original, which the intra-function `dce` could never remove. Keeps exports,
+  the start function, and element-table (indirect-call) targets live —
+  conservative on a parse failure (#196-safe).
+
+### Fixed
+
+- **Small multi-call-site leaves are inlined again (#228).** The candidate
+  predicate `(call_count == 1 || size < 10) && size < limit` made
+  `MULTI_CALL_SITE_LIMIT` (50) dead code — a multi-site callee only passed when
+  `size < 10`, so nothing in `[10, 50)` ever inlined (gale's 23-instruction,
+  2-site `gust_mix`). Now governed purely by the site-count-dependent budget
+  (`size < limit`); the per-inline Z3 verify-or-revert stays the correctness gate.
+
 ## [1.1.14] - 2026-06-15
 
 Correctness + release-engineering release. One optimizer correctness fix (#220);
diff --git a/Cargo.toml b/Cargo.toml
index f6eb812..f7dbbf1 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -9,7 +9,7 @@ members = [
 ]
 
 [workspace.package]
-version = "1.1.14"
+version = "1.1.15"
 authors = ["PulseEngine <https://github.com/pulseengine>"]
 edition = "2024"
 license = "Apache-2.0"
diff --git a/docs/design/pr-c-precise-cf-verification.md b/docs/design/pr-c-precise-cf-verification.md
new file mode 100644
index 0000000..cf54553
--- /dev/null
+++ b/docs/design/pr-c-precise-cf-verification.md
@@ -0,0 +1,168 @@
+# PR-C: Precise acyclic control-flow symbolic execution in the Z3 verifier
+
+Status: **design, for review** (gale #219). Prerequisite for inlining the 5
+`br_table`-dispatching decides (sem_give 860, mutex_unlock 472, pipe_write,
+pipe_read, msgq_put) and for removing the existing `BrIf`/`BrTable` skips in
+LICM/CSE/code_folding/coalesce_locals.
+
+## Problem
+
+loom's live verifier (`verify.rs`) does **not** soundly model multi-way control
+flow. In both `encode_function_to_smt_impl_inner` and `encode_block_body`:
+
+- `BrIf(_depth)` pops the condition and **continues** ("a more precise encoding
+  would fork paths here").
+- `BrTable { .. }` pops the index and **`break`s** ("treat as terminating for
+  now").
+- `Br(_depth)` / `Return` `break`; branch **depth is ignored** (no label stack).
+
+So a function whose result depends on a `br_table` dispatch is modeled as if the
+dispatch didn't happen. Reusing this to prove an inline equivalent would let Z3
+pick any arm for any selector → a **false** equivalence → an unsound inline that
+loom's structural CI cannot catch (the #196/#220 failure class). This is why
+LICM/CSE/etc. currently **skip** `BrIf`/`BrTable` functions entirely.
+
+The precise machinery already exists as **dead scaffolding**, flagged in-source
+as the deferred upgrade: `struct ExecutionState { stack, locals, globals,
+path_condition: Bool, reachable }`, `merge_states(cond, t, f)`, and
+`BlockResult.branch_depth`. `merge_bv(cond, a, b) = cond.ite(a, b)` is live (used
+for `If`). PR-C wires the scaffolding in.
+
+## Chosen approach (with precedent)
+
+**State-merging-with-ITE driven by an explicit label/continuation stack — never
+path-forking.** For a bounded, acyclic (loops=0 on the critical path),
+single-normal-exit region this is **exact** (a passing SMT check is a real
+proof, not a bounded approximation), and it produces **one** verification
+condition instead of N per-arm obligations.
+
+Precedent (all verified against primary sources):
+
+- **Alive2** (Lopes, Lee, Hur, Liu, Regehr, *Bounded Translation Validation for
+  LLVM*, PLDI 2021) — the closest production precedent. "We do **not** fork
+  expressions across paths in the CFG"; phi/branch merges become "a single SMT
+  expression per register"; the final state is "a linear chain of `ite`
+  expressions"; an explicit **`ub`** flag makes ill-defined paths refine rather
+  than vanish. `switch` is the N-way generalization of its 2-way branch merge.
+- **Kuznetsov, Kinder, Bucur, Candea**, *Efficient State Merging in Symbolic
+  Execution*, PLDI 2012 — the exact merge rule `(ℓ, pc'∨pc'', λv. ite(pc',
+  s'[v], s''[v]))`; "does not over-approximate, no false positives."
+- **WASP** (Marques et al., *Concolic Execution for WebAssembly*, ECOOP 2022) —
+  the literal wasm rules: `br_table` arm `k≤n` → `Cont(br j_{k+1})` with path
+  condition `ŝ = k`; default `k≥n` → `Cont(br jn)` with `ŝ ≥ n`. The selector
+  constraint is conjoined onto every arm; the target is resolved by label index.
+- **Barnett & Leino**, *Weakest-Precondition of Unstructured Programs*, PASTE
+  2005 — per-block predicates keep the VC **linear** in paths (a wide `br_table`
+  doesn't blow up); acyclic ⇒ topological, no recursion.
+- **WebAssembly Core Spec** + Haas et al. (PLDI 2017) — wasm is structured with
+  a label stack; `br N` targets the N-th enclosing label (0 = innermost);
+  `br_table` takes the default when the operand is out of bounds.
+
+Per-path forking is also sound but is the wrong fit: exponential over a wide
+`br_table`, N separate obligations.
+
+## The model
+
+Carry a single symbolic `ExecutionState`:
+`{ stack: Vec<BV>, locals: Vec<BV>, globals: Vec<BV>, memory: Array,
+   path_condition: Bool, trapped: Bool }` (`trapped` = Alive2's `ub`/⊥).
+
+A **label/continuation stack**: each entry records `{ arity, join: JoinAccumulator }`.
+`block`/`if`/`loop` push a label; the entry's `join` accumulates the merged state
+of every branch that targets it.
+
+- `block tf … end`, `if … end`: push a label whose continuation is the code
+  **after** `end` (forward join).
+- `br N`: resolve the N-th label from the top; **merge** the current state into
+  that label's `join` under `path_condition` (carry `arity` result values); mark
+  the current path terminated.
+- `br_if N`: split — branch path `pc ∧ (cond≠0)` merges into the target join;
+  fall-through path `pc ∧ (cond=0)` continues.
+- `br_table j0..jn-1 (default jd)` (per WASP): for each arm `i`, merge under
+  `pc ∧ (sel = i)` into label `ji`'s join; default under `pc ∧ (sel ≥ n)` into
+  `jd`'s join. Guards **partition** the selector domain (totality from the
+  default complement).
+- At each `end` / the function exit (the arms' post-dominator), the label's
+  `join` is the **ITE-merge** of all incoming branches: `path_condition` =
+  disjunction; each value = `ite(guard, …)` (reusing `merge_bv` / `merge_states`).
+- `unreachable`, and a `Call` to a no-return callee (`is_noreturn_callee`: no
+  `Return`, no `Br*` to the function label, every path ends in `Unreachable`),
+  set `trapped` (⊥) — **never havoc**. They constrain reachability only; on a
+  trapped path the return value is don't-care and the path drops out of the
+  result ITE.
+
+**Crucially**: this *one* executor models **both** the original `call decide`
+(by-body) and the inlined body, so they yield the **same** merged expression →
+equivalence is provable (the #155 by-body principle, extended to CF).
+
+## Soundness — and the pitfalls we explicitly guard
+
+1. **Exact for acyclic** — no fixpoint, no loop invariants; finite single pass;
+   merge only at structured-block ends + the single exit (post-dominators).
+2. **Selector constraint preserved** — every `br_table` arm carries `sel = i`,
+   default carries `sel ≥ n`; guards partition the domain. (Avoids the
+   "terminate" false-equivalence.)
+3. **Fall-through resolved via the label stack** to the correct join (a `br` to
+   an outer label skips intervening joins).
+4. **⊥, not havoc, for trap/unreachable/no-return arms** — they refine, never
+   forge a matching value (the #155/#159 + Alive2 `ub` discipline).
+5. **Branch arity recorded** — `br N` carries the label's result values; the
+   merged stack shape stays correct.
+6. **Over-approximate-and-skip** anything unmodeled (LOOM's "skip rather than
+   risk"): any instruction/shape outside the precise model → the callee is not
+   by-body-modelable → opaque fallback (no inline), never a false ≡.
+
+## Scope / staging
+
+- **Loops stay out** — still unrolled (`MAX_LOOP_UNROLL`) / k-induction / skipped
+  as today. PR-C is acyclic-only. A back-edge ⇒ not by-body-modelable.
+- **Phase 1 — integer acyclic CF (no memory):** activate `ExecutionState` +
+  label stack + `merge_states`; implement `Br`/`BrIf`/`BrTable`/`Block`/`If`
+  precisely in `encode_block_body`; add `is_noreturn_callee` + diverge-on-trap;
+  extend `is_inline_modelable_instr`/`callee_inlinable_by_body` to admit acyclic
+  CF + no-return calls; route the by-body modeler through the precise executor.
+  **Unblocks sem_give (860)** — pure integer + br_table, no memory.
+- **Phase 2 — thread `memory: Array` through the CF executor** (today
+  `encode_block_body` has no memory param). Unblocks **mutex_unlock (472)** +
+  **pipe_write/read** (sret stores through the shadow frame).
+
+## Test strategy (the soundness gate)
+
+- **Adversarial verify-or-revert** (the hard gate): deliberately-wrong inlined
+  bodies MUST produce a Z3 counterexample → revert — wrong arm selected, dropped
+  selector guard, dropped trap branch, mis-merged join, wrong branch depth.
+- **Differential**: correct inline of a `br_table` fixture proves; a near-miss
+  reverts. Fixtures ordered per gale: msgq_put (trap-only) → sem_give
+  (`block`+`br_table`+`panic→unreachable`) → mutex_unlock (most blocks, sret).
+- **Unit**: `is_noreturn_callee`, the label-stack resolution, `merge_states`.
+- **Regression**: full `cargo test` + dogfooding (loom optimizing itself)
+  unchanged; re-validate the LICM/CSE/etc. paths once their `br_table` skips are
+  removed (a bonus coverage win, but it widens what they verify).
+- **e2e + silicon**: `loom optimize` on `repro-219/sem.loom.wasm` dissolves (no
+  `call $..._decide`, no i64 pack/unpack in `z_impl_k_sem_give`); then gale's
+  G474RE re-flash (sem_give 860→, mutex_unlock 472→) is the kill-criterion.
+
+## Risk
+
+Substantial, soundness-critical (touches the core verifier). De-risked by: the
+acyclic restriction (the only source of unsoundness/incompleteness — loops — is
+excluded), the existing `ExecutionState`/`merge_states` scaffolding, the
+Alive2/WASP precedent, and the adversarial-revert + silicon gates. Staged so
+Phase 1 (sem_give) lands and validates before Phase 2 (memory). WIP branch, no
+merge until adversarial tests pass + gale silicon-validates.
+
+## References
+
+- Lopes, Lee, Hur, Liu, Regehr. *Alive2: Bounded Translation Validation for
+  LLVM.* PLDI 2021. https://users.cs.utah.edu/~regehr/alive2-pldi21.pdf
+- Kuznetsov, Kinder, Bucur, Candea. *Efficient State Merging in Symbolic
+  Execution.* PLDI 2012. https://dslab.epfl.ch/pubs/stateMerging.pdf
+- Avgerinos, Rebert, Cha, Brumley. *Enhancing Symbolic Execution with
+  Veritesting.* ICSE 2014. https://softsec.kaist.ac.kr/~sangkilc/papers/avgerinos-icse14.pdf
+- Marques, Fragoso Santos, Santos, Adão. *Concolic Execution for WebAssembly
+  (WASP).* ECOOP 2022. https://drops.dagstuhl.de/opus/volltexte/2022/16239/pdf/LIPIcs-ECOOP-2022-11.pdf
+- Barnett & Leino. *Weakest-Precondition of Unstructured Programs.* PASTE 2005.
+  https://www.microsoft.com/en-us/research/wp-content/uploads/2005/01/krml157.pdf
+- Haas et al. *Bringing the Web up to Speed with WebAssembly.* PLDI 2017.
+- Van Hattum et al. *Lightweight, Modular Verification for Wasm-to-Native
+  Instruction Selection (Crocus/veri-isle).* ASPLOS 2024.
diff --git a/loom-cli/src/main.rs b/loom-cli/src/main.rs
index 8f18e6f..f68cb04 100644
--- a/loom-cli/src/main.rs
+++ b/loom-cli/src/main.rs
@@ -502,6 +502,27 @@ fn optimize_command(
         track_pass("inline", before, after);
     }
 
+    // loom#228: whole-function (module-level) dead-function elimination.
+    // Multi-site inlining duplicates a callee's body into each caller and
+    // leaves the ORIGINAL orphaned; the body-level `dce` (eliminate_dead_code)
+    // is intra-function and never removes a whole function, so the orphan
+    // survived every `loom optimize` run and grew the module. Run it right
+    // after `inline` so the orphan is gone before downstream passes touch it.
+    // eliminate_dead_functions keeps exports, the start function, and every
+    // element-segment (indirect-call table) target live — conservative on a
+    // parse failure — so it can never drop a reachable function (#196).
+    if should_run("dce-functions") {
+        println!("  Running: dce-functions");
+        let before = count_instructions(&module);
+        let removed = loom_core::fused_optimizer::eliminate_dead_functions(&mut module)
+            .context("Whole-function DCE failed")?;
+        if removed > 0 {
+            println!("    removed {removed} dead function(s)");
+        }
+        let after = count_instructions(&module);
+        track_pass("dce-functions", before, after);
+    }
+
     if should_run("precompute") {
         println!("  Running: precompute");
         let before = count_instructions(&module);
diff --git a/loom-core/src/lib.rs b/loom-core/src/lib.rs
index 68dddbf..ca3d399 100644
--- a/loom-core/src/lib.rs
+++ b/loom-core/src/lib.rs
@@ -13447,25 +13447,30 @@ pub mod optimize {
                     .functions
                     .get((*func_idx - num_imported_funcs) as usize)
                 {
-                    if !function_inline_modelable(callee) {
+                    if !function_inline_modelable(callee, &module.functions, num_imported_funcs) {
                         continue;
                     }
                 }
                 let size = function_sizes.get(func_idx).copied().unwrap_or(0);
 
-                // Heuristic: inline if:
-                // 1. Single call site — profitable: inlining removes the call
-                //    overhead and opens cross-function optimization (the gale
-                //    flight_control seam, #155). A single-call-site callee is
-                //    not duplicated, so a generous size budget is justified.
-                // 2. Small function (< 10 instructions) — cheap even when
-                //    called from multiple sites.
+                // Heuristic: inline a modelable callee whose size is under the
+                // site-count-dependent budget.
+                // - Single call site — generous budget (not duplicated): inlining
+                //   removes the call and opens cross-function optimization (the
+                //   gale flight_control seam, #155).
+                // - Multiple call sites — a tighter budget bounds the per-copy
+                //   duplication cost, but the leaf STILL inlines (the gale gust
+                //   `mix` seam, #228: a 23-insn 2-site leaf).
                 //
-                // SIZE_LIMIT stays well under the Z3 verify cap
-                // (LOOM_Z3_MAX_INSTRUCTIONS, default 2000) so the inlined
-                // function is still VERIFIED, never silently skipped — the
-                // translation validator remains the correctness gate, and the
-                // #147 fixpoint guard bounds total expansion.
+                // loom#228: the old guard `(call_count == 1 || size < 10)` made
+                // MULTI_CALL_SITE_LIMIT dead — a multi-site callee only passed
+                // when `size < 10`, so nothing in [10, 50) ever inlined however
+                // small the budget said it could. Governing purely by the budget
+                // restores the intended behavior. Both limits stay well under the
+                // Z3 verify cap (LOOM_Z3_MAX_INSTRUCTIONS, default 2000) so every
+                // inlined body is still VERIFIED (the translation validator is the
+                // correctness gate, never bypassed); the #147 fixpoint guard
+                // bounds total expansion.
                 const SINGLE_CALL_SITE_LIMIT: usize = 200;
                 const MULTI_CALL_SITE_LIMIT: usize = 50;
                 let limit = if call_count == 1 {
@@ -13473,7 +13478,7 @@ pub mod optimize {
                 } else {
                     MULTI_CALL_SITE_LIMIT
                 };
-                if (call_count == 1 || size < 10) && size < limit {
+                if size < limit {
                     inline_candidates.push(*func_idx);
                 }
             }
@@ -13551,100 +13556,289 @@ pub mod optimize {
         Ok(())
     }
 
-    /// loom#159: is this callee FULLY by-body-inline-modelable — i.e. will the
-    /// translation validator be able to PROVE its inline?
-    ///
-    /// A callee is only a sound inline candidate if the verifier can model its
-    /// body by-body (loom#151/#155/#157). If it contains ANY op the verifier
-    /// can't model — partial-width loads/stores (`*.load8/16`, `*.store8/16/32`,
-    /// e.g. `filter_step`'s `i32.load16_s`), floats, globals, `memory.*` bulk
-    /// ops, calls (non-leaf), or control flow — then inlining it is unprovable:
-    /// the original models the call as a havoc while the optimized has concrete
-    /// effects, so a downstream read diverges → FALSE counterexample → the whole
-    /// caller reverts, dragging down any *other* inline in it too (the v1.1.7
-    /// regression: a partial-width `filter_step` killed the `controller_step`
-    /// reader-inline). Excluding such callees keeps them opaque (havoc) calls,
-    /// which is exactly what a following modelable reader's inline needs.
-    ///
-    /// This MUST stay in lock-step with `verify::is_inline_modelable_instr`
-    /// (the verifier's by-body whitelist). Straight-line only: any control flow
-    /// or call disqualifies the callee.
-    fn function_inline_modelable(func: &super::Function) -> bool {
-        func.instructions.iter().all(|i| {
-            matches!(
-                i,
-                Instruction::I32Const(_)
-                    | Instruction::I64Const(_)
-                    | Instruction::LocalGet(_)
-                    | Instruction::LocalSet(_)
-                    | Instruction::LocalTee(_)
-                    | Instruction::Drop
-                    | Instruction::Select
-                    | Instruction::I32Load { .. }
-                    | Instruction::I64Load { .. }
-                    | Instruction::I32Store { .. }
-                    | Instruction::I64Store { .. }
-                    | Instruction::I32Load8S { .. }
-                    | Instruction::I32Load8U { .. }
-                    | Instruction::I32Load16S { .. }
-                    | Instruction::I32Load16U { .. }
-                    | Instruction::I32Store8 { .. }
-                    | Instruction::I32Store16 { .. }
-                    | Instruction::I32Add
-                    | Instruction::I32Sub
-                    | Instruction::I32Mul
-                    | Instruction::I32And
-                    | Instruction::I32Or
-                    | Instruction::I32Xor
-                    | Instruction::I32Shl
-                    | Instruction::I32ShrS
-                    | Instruction::I32ShrU
-                    | Instruction::I64Add
-                    | Instruction::I64Sub
-                    | Instruction::I64Mul
-                    | Instruction::I64And
-                    | Instruction::I64Or
-                    | Instruction::I64Xor
-                    | Instruction::I64Shl
-                    | Instruction::I64ShrS
-                    | Instruction::I64ShrU
-                    | Instruction::I32DivS
-                    | Instruction::I32DivU
-                    | Instruction::I32RemS
-                    | Instruction::I32RemU
-                    | Instruction::I64DivS
-                    | Instruction::I64DivU
-                    | Instruction::I64RemS
-                    | Instruction::I64RemU
-                    | Instruction::I32Eq
-                    | Instruction::I32Ne
-                    | Instruction::I32LtS
-                    | Instruction::I32LtU
-                    | Instruction::I32GtS
-                    | Instruction::I32GtU
-                    | Instruction::I32LeS
-                    | Instruction::I32LeU
-                    | Instruction::I32GeS
-                    | Instruction::I32GeU
-                    | Instruction::I32Eqz
-                    | Instruction::I64Eq
-                    | Instruction::I64Ne
-                    | Instruction::I64LtS
-                    | Instruction::I64LtU
-                    | Instruction::I64GtS
-                    | Instruction::I64GtU
-                    | Instruction::I64LeS
-                    | Instruction::I64LeU
-                    | Instruction::I64GeS
-                    | Instruction::I64GeU
-                    | Instruction::I64Eqz
-                    | Instruction::I64ExtendI32S
-                    | Instruction::I64ExtendI32U
-                    | Instruction::I32WrapI64
-            )
+    /// REGIME A whitelist: per-instruction straight-line ops the by-body
+    /// modeler (`verify::encode_inlinable_callee_result`) proves faithfully,
+    /// INCLUDING full/partial-width memory loads/stores (loom#151/#155/#157/#161).
+    /// No control flow, no calls. MUST stay in lock-step with
+    /// `verify::is_inline_modelable_instr`. Used by `function_inline_modelable`.
+    fn is_straightline_modelable_instr(i: &Instruction) -> bool {
+        matches!(
+            i,
+            Instruction::I32Const(_)
+                | Instruction::I64Const(_)
+                | Instruction::LocalGet(_)
+                | Instruction::LocalSet(_)
+                | Instruction::LocalTee(_)
+                | Instruction::Drop
+                | Instruction::Select
+                | Instruction::I32Load { .. }
+                | Instruction::I64Load { .. }
+                | Instruction::I32Store { .. }
+                | Instruction::I64Store { .. }
+                | Instruction::I32Load8S { .. }
+                | Instruction::I32Load8U { .. }
+                | Instruction::I32Load16S { .. }
+                | Instruction::I32Load16U { .. }
+                | Instruction::I32Store8 { .. }
+                | Instruction::I32Store16 { .. }
+                | Instruction::I32Add
+                | Instruction::I32Sub
+                | Instruction::I32Mul
+                | Instruction::I32And
+                | Instruction::I32Or
+                | Instruction::I32Xor
+                | Instruction::I32Shl
+                | Instruction::I32ShrS
+                | Instruction::I32ShrU
+                | Instruction::I64Add
+                | Instruction::I64Sub
+                | Instruction::I64Mul
+                | Instruction::I64And
+                | Instruction::I64Or
+                | Instruction::I64Xor
+                | Instruction::I64Shl
+                | Instruction::I64ShrS
+                | Instruction::I64ShrU
+                | Instruction::I32DivS
+                | Instruction::I32DivU
+                | Instruction::I32RemS
+                | Instruction::I32RemU
+                | Instruction::I64DivS
+                | Instruction::I64DivU
+                | Instruction::I64RemS
+                | Instruction::I64RemU
+                | Instruction::I32Eq
+                | Instruction::I32Ne
+                | Instruction::I32LtS
+                | Instruction::I32LtU
+                | Instruction::I32GtS
+                | Instruction::I32GtU
+                | Instruction::I32LeS
+                | Instruction::I32LeU
+                | Instruction::I32GeS
+                | Instruction::I32GeU
+                | Instruction::I32Eqz
+                | Instruction::I64Eq
+                | Instruction::I64Ne
+                | Instruction::I64LtS
+                | Instruction::I64LtU
+                | Instruction::I64GtS
+                | Instruction::I64GtU
+                | Instruction::I64LeS
+                | Instruction::I64LeU
+                | Instruction::I64GeS
+                | Instruction::I64GeU
+                | Instruction::I64Eqz
+                | Instruction::I64ExtendI32S
+                | Instruction::I64ExtendI32U
+                | Instruction::I32WrapI64
+        )
+    }
+
+    /// REGIME B per-instruction whitelist: pure-integer ops the precise acyclic
+    /// executor (`verify::exec_acyclic`) models — NO memory (Phase 1), NO
+    /// floats/unknown. MUST stay in lock-step with
+    /// `verify::is_acyclic_simple_modelable`.
+    fn is_acyclic_int_modelable_instr(i: &Instruction) -> bool {
+        matches!(
+            i,
+            Instruction::I32Const(_)
+                | Instruction::I64Const(_)
+                | Instruction::LocalGet(_)
+                | Instruction::LocalSet(_)
+                | Instruction::LocalTee(_)
+                | Instruction::GlobalGet(_)
+                | Instruction::GlobalSet(_)
+                | Instruction::Drop
+                | Instruction::Nop
+                | Instruction::Select
+                | Instruction::I32Add
+                | Instruction::I32Sub
+                | Instruction::I32Mul
+                | Instruction::I32And
+                | Instruction::I32Or
+                | Instruction::I32Xor
+                | Instruction::I32Shl
+                | Instruction::I32ShrS
+                | Instruction::I32ShrU
+                | Instruction::I32DivS
+                | Instruction::I32DivU
+                | Instruction::I32RemS
+                | Instruction::I32RemU
+                | Instruction::I64Add
+                | Instruction::I64Sub
+                | Instruction::I64Mul
+                | Instruction::I64And
+                | Instruction::I64Or
+                | Instruction::I64Xor
+                | Instruction::I64Shl
+                | Instruction::I64ShrS
+                | Instruction::I64ShrU
+                | Instruction::I64DivS
+                | Instruction::I64DivU
+                | Instruction::I64RemS
+                | Instruction::I64RemU
+                | Instruction::I32Eq
+                | Instruction::I32Ne
+                | Instruction::I32LtS
+                | Instruction::I32LtU
+                | Instruction::I32GtS
+                | Instruction::I32GtU
+                | Instruction::I32LeS
+                | Instruction::I32LeU
+                | Instruction::I32GeS
+                | Instruction::I32GeU
+                | Instruction::I32Eqz
+                | Instruction::I64Eq
+                | Instruction::I64Ne
+                | Instruction::I64LtS
+                | Instruction::I64LtU
+                | Instruction::I64GtS
+                | Instruction::I64GtU
+                | Instruction::I64LeS
+                | Instruction::I64LeU
+                | Instruction::I64GeS
+                | Instruction::I64GeU
+                | Instruction::I64Eqz
+                | Instruction::I64ExtendI32S
+                | Instruction::I64ExtendI32U
+                | Instruction::I32WrapI64
+        )
+    }
+
+    /// PR-C Phase 2: memory loads/stores the precise acyclic executor models
+    /// (mirrors `verify::is_acyclic_memory_op`). Admitted by the regime-B inline
+    /// gate so memory-bearing acyclic callees are inline candidates.
+    fn is_acyclic_memory_modelable_instr(i: &Instruction) -> bool {
+        matches!(
+            i,
+            Instruction::I32Load { .. }
+                | Instruction::I64Load { .. }
+                | Instruction::I32Store { .. }
+                | Instruction::I64Store { .. }
+                | Instruction::I32Load8S { .. }
+                | Instruction::I32Load8U { .. }
+                | Instruction::I32Load16S { .. }
+                | Instruction::I32Load16U { .. }
+                | Instruction::I32Store8 { .. }
+                | Instruction::I32Store16 { .. }
+        )
+    }
+
+    /// loom-side mirror of `verify::is_noreturn_callee` (that one is
+    /// `#[cfg(feature = "verification")]`-gated, unavailable here). True if every
+    /// path traps: no `Return`/`Br*` anywhere (recursing into nested CF) and the
+    /// body ends in `Unreachable`. Admits exactly the `panic*` helpers.
+    fn inline_callee_is_noreturn(func: &super::Function) -> bool {
+        fn has_branch_or_return(instrs: &[Instruction]) -> bool {
+            instrs.iter().any(|i| match i {
+                Instruction::Return
+                | Instruction::Br(_)
+                | Instruction::BrIf(_)
+                | Instruction::BrTable { .. } => true,
+                Instruction::Block { body, .. } | Instruction::Loop { body, .. } => {
+                    has_branch_or_return(body)
+                }
+                Instruction::If {
+                    then_body,
+                    else_body,
+                    ..
+                } => has_branch_or_return(then_body) || has_branch_or_return(else_body),
+                _ => false,
+            })
+        }
+        !has_branch_or_return(&func.instructions)
+            && matches!(func.instructions.last(), Some(Instruction::Unreachable))
+    }
+
+    /// Max by-body call-recursion depth for the regime-B check (mirrors
+    /// `verify::MAX_ACYCLIC_CALL_DEPTH`); bounds cyclic call graphs.
+    const INLINE_ACYCLIC_MAX_DEPTH: usize = 8;
+
+    /// REGIME B: acyclic control flow (block/if/br/br_if/br_table) + pure-integer
+    /// ops + by-body calls (to no-return or recursively-modelable LOCAL callees).
+    /// NO memory (Phase 1). Mirrors `verify::is_acyclic_modelable_body` so the
+    /// inline candidate gate admits exactly what the acyclic verifier can prove.
+    fn inline_modelable_acyclic_body(
+        body: &[Instruction],
+        all_functions: &[super::Function],
+        num_imported_funcs: u32,
+        depth: usize,
+    ) -> bool {
+        body.iter().all(|i| match i {
+            Instruction::Block { body, .. } => {
+                inline_modelable_acyclic_body(body, all_functions, num_imported_funcs, depth)
+            }
+            Instruction::If {
+                then_body,
+                else_body,
+                ..
+            } => {
+                inline_modelable_acyclic_body(then_body, all_functions, num_imported_funcs, depth)
+                    && inline_modelable_acyclic_body(
+                        else_body,
+                        all_functions,
+                        num_imported_funcs,
+                        depth,
+                    )
+            }
+            Instruction::Br(_)
+            | Instruction::BrIf(_)
+            | Instruction::BrTable { .. }
+            | Instruction::Return
+            // Unreachable is a trap (⊥): the acyclic executor diverges the path.
+            | Instruction::Unreachable => true,
+            // Back-edges and indirect calls are out of Phase 1 scope.
+            Instruction::Loop { .. } | Instruction::CallIndirect { .. } => false,
+            Instruction::Call(g) => {
+                if depth >= INLINE_ACYCLIC_MAX_DEPTH || *g < num_imported_funcs {
+                    return false; // too deep, or an import (opaque)
+                }
+                match all_functions.get((*g - num_imported_funcs) as usize) {
+                    Some(callee) => {
+                        inline_callee_is_noreturn(callee)
+                            || (callee.signature.results.len() <= 1
+                                && inline_modelable_acyclic_body(
+                                    &callee.instructions,
+                                    all_functions,
+                                    num_imported_funcs,
+                                    depth + 1,
+                                ))
+                    }
+                    None => false,
+                }
+            }
+            other => {
+                is_acyclic_int_modelable_instr(other) || is_acyclic_memory_modelable_instr(other)
+            }
         })
     }
 
+    /// loom#159: is this callee FULLY inline-modelable — i.e. will the
+    /// translation validator be able to PROVE its inline? Admits EITHER regime A
+    /// (straight-line incl. memory — by-body modeler) OR regime B (acyclic CF +
+    /// pure-int + by-body calls — the precise acyclic executor, PR-C #219). A
+    /// callee outside both stays an opaque call: admitting an unprovable one
+    /// makes the original (havoc) and optimized (concrete) encodings diverge →
+    /// FALSE counterexample → the whole caller reverts, dragging down any OTHER
+    /// inline in it (the v1.1.7 #159 regression). A br_table+memory callee
+    /// (mutex_unlock, pipe) passes NEITHER regime → stays opaque until Phase 2.
+    fn function_inline_modelable(
+        func: &super::Function,
+        all_functions: &[super::Function],
+        num_imported_funcs: u32,
+    ) -> bool {
+        func.instructions
+            .iter()
+            .all(is_straightline_modelable_instr)
+            || inline_modelable_acyclic_body(
+                &func.instructions,
+                all_functions,
+                num_imported_funcs,
+                0,
+            )
+    }
+
     /// Inline function calls in a block of instructions
     fn inline_calls_in_block(
         instructions: &[Instruction],
@@ -17746,6 +17940,118 @@ mod tests {
         assert_eq!(instrs[1], Instruction::I32TruncF32S);
     }
 
+    #[test]
+    fn test_seam_sroa_wrap_extend_identity() {
+        // #219 seam-SROA: wrap_i64(extend_i32_u(x)) → x. Zero-extending an i32 to
+        // i64 then wrapping back to i32 is the identity — the u64-ABI round-trip
+        // a packed scalar pays at the dissolved decide seam. Must dissolve to the
+        // bare operand (no wrap/extend left).
+        use loom_isle::{i32_wrap_i64, i64_extend_i32_u, local_get, rewrite_pure};
+        let x = local_get(0); // an i32 value
+        let round_trip = i32_wrap_i64(i64_extend_i32_u(x));
+        let simplified = rewrite_pure(round_trip);
+
+        let instrs = terms::terms_to_instructions(&[simplified]).unwrap();
+        assert_eq!(
+            instrs,
+            vec![Instruction::LocalGet(0)],
+            "#219: wrap_i64(extend_i32_u(x)) must dissolve to x"
+        );
+    }
+
+    #[test]
+    fn test_seam_sroa_mask_clears_shifted_pack() {
+        // #219 seam-SROA: (or (shl (extend_u x) 32) Y) & 0xff → Y & 0xff.
+        // The high-shifted field of a u64 pack cannot survive a low-byte unpack
+        // mask, so the whole high half drops out. This is the `& mask` half of
+        // dissolving the sem decide's pack/unpack round-trip.
+        use loom_isle::{
+            Imm64, i64_extend_i32_u, iand64, iconst64, ior64, ishl64, local_get, rewrite_pure,
+        };
+        let high = ishl64(i64_extend_i32_u(local_get(0)), iconst64(Imm64(32)));
+        let low = i64_extend_i32_u(local_get(1)); // the surviving low field
+        let pack = ior64(high, low);
+        let masked = iand64(pack, iconst64(Imm64(0xff)));
+        let simplified = rewrite_pure(masked);
+
+        // Expect (extend_u(local.get 1)) & 0xff — the shifted high half is gone.
+        let want = terms::terms_to_instructions(&[iand64(
+            i64_extend_i32_u(local_get(1)),
+            iconst64(Imm64(0xff)),
+        )])
+        .unwrap();
+        let got = terms::terms_to_instructions(&[simplified]).unwrap();
+        assert_eq!(
+            got, want,
+            "#219: (or (shl _ 32) Y) & 0xff must dissolve to Y & 0xff"
+        );
+    }
+
+    #[test]
+    fn test_seam_sroa_shr_extracts_high_field() {
+        // #219 seam-SROA: (shr_u (or (shl (extend_u x) 32) const) 32) extracts the
+        // HIGH field of a u64 pack → (extend_u x) & 0xffffffff. The low (const)
+        // field shifts out (logical shift distributes over OR; shl-then-shr-same
+        // masks to the low 64-k bits). Mirrors the sem decide whose low field is
+        // a constant 0/1.
+        use loom_isle::{
+            Imm64, i64_extend_i32_u, iand64, iconst64, ior64, ishl64, ishru64, local_get,
+            rewrite_pure,
+        };
+        let high = ishl64(i64_extend_i32_u(local_get(0)), iconst64(Imm64(32)));
+        let pack = ior64(high, iconst64(Imm64(1))); // low field = const (like local 3 = 0/1)
+        let unpacked = ishru64(pack, iconst64(Imm64(32)));
+        let simplified = rewrite_pure(unpacked);
+
+        let want = terms::terms_to_instructions(&[iand64(
+            i64_extend_i32_u(local_get(0)),
+            iconst64(Imm64(0xffff_ffff)),
+        )])
+        .unwrap();
+        let got = terms::terms_to_instructions(&[simplified]).unwrap();
+        assert_eq!(
+            got, want,
+            "#219: (shr_u (or (shl (extend_u x) 32) const) 32) must extract (extend_u x) & 0xffffffff"
+        );
+    }
+
+    #[cfg(feature = "verification")]
+    #[test]
+    fn test_is_noreturn_callee() {
+        // PR-C (#219): the divergent-call classifier. No-return = no Return, no
+        // branch anywhere, body ends in Unreachable (the core::panicking shape).
+        let cases: &[(&str, bool)] = &[
+            // bare trap
+            ("(module (func (export \"f\") unreachable))", true),
+            // call (to an import) then trap — the panic_const_add_overflow shape
+            (
+                "(module (import \"env\" \"p\" (func)) (func (export \"f\") call 0 unreachable))",
+                true,
+            ),
+            // normal return — must NOT be no-return
+            (
+                "(module (func (export \"f\") (result i32) i32.const 0))",
+                false,
+            ),
+            // ends in a value, not unreachable
+            ("(module (func (export \"f\") nop))", false),
+            // contains a branch (could escape to the fn label) — conservatively false
+            (
+                "(module (func (export \"f\") (block br 0) unreachable))",
+                false,
+            ),
+        ];
+        for (wat, expected) in cases {
+            let module = parse::parse_wat(wat).expect("parse");
+            let f = &module.functions[0];
+            assert_eq!(
+                crate::verify::is_noreturn_callee(f),
+                *expected,
+                "#219 is_noreturn_callee mismatch for: {wat}"
+            );
+        }
+    }
+
     #[test]
     fn test_conversion_trunc_sat_folds_nan_to_zero() {
         // i32.trunc_sat_f32_s of NaN → i32.const 0 (saturating: NaN→0)
@@ -18664,6 +18970,55 @@ mod tests {
         wasmparser::validate(&wasm_bytes).expect("output validates");
     }
 
+    // loom#228: a modelable leaf whose size is in [10, 50) called from MULTIPLE
+    // sites was never inlined. The old candidate guard `(call_count == 1 ||
+    // size < 10) && size < limit` made MULTI_CALL_SITE_LIMIT (=50) dead: a
+    // multi-site callee only passed when `size < 10`, so nothing in [10, 50)
+    // ever inlined. Budget-governed selection (`size < limit`) inlines it at
+    // every site; the Z3 gate stays the correctness backstop.
+    #[test]
+    fn test_inline_multisite_small_leaf_228() {
+        // 11-instruction straight-line modelable leaf (tee+select+arith), no
+        // calls, size in [10, 50), called from two sites.
+        let wat = r#"(module
+            (func $leaf (param i32 i32) (result i32)
+                local.get 0  local.get 1  i32.add  local.tee 0
+                i32.const 1  i32.add
+                local.get 0  i32.const 2  i32.shl
+                i32.const 0  select)
+            (func $a (export "a") (param i32) (result i32)
+                local.get 0  i32.const 7  call $leaf)
+            (func $b (export "b") (param i32) (result i32)
+                local.get 0  i32.const 9  call $leaf)
+        )"#;
+        let mut module = parse::parse_wat(wat).expect("parse");
+        // leaf is flat (no nested blocks) → instruction count == body length.
+        let leaf_size = module.functions[0].instructions.len();
+        assert!(
+            (10..50).contains(&leaf_size),
+            "leaf must be in the previously-dead [10,50) band (got {leaf_size})"
+        );
+
+        optimize::inline_functions(&mut module).expect("inline must not panic");
+
+        // Both call sites must be gone (the leaf body is inlined into $a and $b;
+        // the orphaned $leaf itself is removed by the CLI dce-functions pass, not
+        // by inline_functions — so it may still be present here, callless).
+        let total_calls: usize = module
+            .functions
+            .iter()
+            .flat_map(|f| f.instructions.iter())
+            .filter(|i| matches!(i, Instruction::Call(_)))
+            .count();
+        assert_eq!(
+            total_calls, 0,
+            "#228: the multi-site [10,50) leaf must inline at BOTH sites"
+        );
+
+        let wasm_bytes = encode::encode_wasm(&module).expect("encode");
+        wasmparser::validate(&wasm_bytes).expect("output validates");
+    }
+
     #[cfg(feature = "verification")]
     #[test]
     fn test_inline_verifier_proves_correct_and_rejects_wrong_i64_inline() {
diff --git a/loom-core/src/verify.rs b/loom-core/src/verify.rs
index 8471cc1..4548eb6 100644
--- a/loom-core/src/verify.rs
+++ b/loom-core/src/verify.rs
@@ -209,6 +209,51 @@ fn callee_inlinable_by_body(func: &Function) -> bool {
     func.instructions.iter().all(is_inline_modelable_instr)
 }
 
+/// PR-C (#219): true if `func` provably never returns to its caller — every
+/// path traps. Conservative-sound definition: the body contains NO `Return`
+/// and NO branch (`Br`/`BrIf`/`BrTable`) anywhere (recursing into nested
+/// control flow), and its final top-level instruction is `Unreachable`. With
+/// no `return` and no branch, the only way to leave the function is to fall off
+/// the end — which `Unreachable` makes a trap. This admits exactly the
+/// `core::panicking` helpers here (`panic_fmt → unreachable`,
+/// `panic_const_add_overflow → call panic_fmt; unreachable`) and rejects any
+/// normal-returning or branching function.
+///
+/// Used by the by-body inline modeler to encode a `call` to such a callee as
+/// "diverge if reached" (⊥) rather than havoc: control never returns, so the
+/// call constrains only the reachability of the trapping branch, never the
+/// post-state (Alive2's `ub` discipline). Modeling it as a fresh symbolic
+/// (havoc) would reopen the #155/#159 havoc-vs-concrete false-positive surface.
+#[cfg(feature = "verification")]
+#[allow(dead_code)] // wired into the by-body inline gate in PR-C M3 (#219)
+pub(crate) fn is_noreturn_callee(func: &Function) -> bool {
+    !body_has_branch_or_return(&func.instructions)
+        && matches!(func.instructions.last(), Some(Instruction::Unreachable))
+}
+
+/// Recursively detect any `Return` or branch (`Br`/`BrIf`/`BrTable`) in a body,
+/// descending into Block/Loop/If. Used by `is_noreturn_callee` — a branch could
+/// transfer control to the function-level label (a return-via-br), so any
+/// branch conservatively disqualifies the "every path traps" classification.
+#[cfg(feature = "verification")]
+fn body_has_branch_or_return(instrs: &[Instruction]) -> bool {
+    instrs.iter().any(|i| match i {
+        Instruction::Return
+        | Instruction::Br(_)
+        | Instruction::BrIf(_)
+        | Instruction::BrTable { .. } => true,
+        Instruction::Block { body, .. } | Instruction::Loop { body, .. } => {
+            body_has_branch_or_return(body)
+        }
+        Instruction::If {
+            then_body,
+            else_body,
+            ..
+        } => body_has_branch_or_return(then_body) || body_has_branch_or_return(else_body),
+        _ => false,
+    })
+}
+
 /// loom#151/#155: whitelist of instructions the by-body inline modeler can
 /// encode faithfully and **identically to the main encoder**. The callee must
 /// be straight-line (no Block/Loop/If), leaf (no calls), single-result, and
@@ -2964,6 +3009,31 @@ impl TranslationValidator {
             return Ok(());
         }
 
+        // PR-C (#219) M3.2: precise acyclic-CF fast-path for the inliner. The
+        // main encoder models br_table approximately and a br_table callee is
+        // not straight-line-by-body-modelable, so it cannot prove `call F`
+        // (br_table callee) equal to its inlined body → the inline reverts. The
+        // acyclic executor is EXACT for acyclic CF + by-body calls. Try it here.
+        //
+        // PURELY ADDITIVE + scoped to the inliner: ONLY a precise Ok(true) (a
+        // real proof) short-circuits to accept; Ok(false)/Err/panic FALL THROUGH
+        // to the existing encoder unchanged. So this can only ADD accepts (the
+        // br_table-callee inlines), never weaken an existing check, and every
+        // non-inline pass is byte-identical. (catch_unwind: the acyclic encoder
+        // builds Z3 exprs and could panic on a malformed input.)
+        if self.pass_name == "inline_functions" {
+            let original = self.original.clone();
+            let sig_ctx = self.sig_ctx.clone();
+            let optimized_clone = optimized.clone();
+            let acyclic = std::panic::catch_unwind(std::panic::AssertUnwindSafe(move || {
+                verify_acyclic_equivalence(&original, &optimized_clone, &sig_ctx)
+            }));
+            if let Ok(Ok(true)) = acyclic {
+                return Ok(());
+            }
+            // else: fall through to the existing encoder (no behavior change).
+        }
+
         // Use catch_unwind to handle Z3 internal panics gracefully. (The
         // i64 inline hang in loom#147 turned out NOT to be a verify hang —
         // verify returns fast; it was a livelock in the inline-pass
@@ -7169,6 +7239,890 @@ fn encode_simple_instruction(
     Ok(())
 }
 
+// ===========================================================================
+// PR-C (gale #219): precise acyclic control-flow symbolic execution.
+//
+// An ISOLATED executor that models structured, acyclic wasm control flow
+// (block / if / br / br_if / br_table) by *state merging with ITE* driven by an
+// explicit label/join stack — never path-forking. For a bounded acyclic region
+// this is EXACT (a passing SMT check is a real proof, not a bounded
+// approximation), and it produces ONE verification condition instead of N
+// per-arm obligations. Precedent: Alive2 (PLDI'21, "we do not fork expressions
+// across paths"; final state a linear chain of ITEs), Kuznetsov state-merging
+// (PLDI'12), WASP br_table rules (ECOOP'22). See
+// docs/design/pr-c-precise-cf-verification.md.
+//
+// Phase 1 = integer only (no memory): every non-control instruction must be in
+// the `is_acyclic_simple_modelable` allowlist (the ops `encode_simple_instruction`
+// models PRECISELY) and is delegated there. Loop (back-edge), Call/CallIndirect,
+// memory ops, unknown opcodes, and multi-result/param blocks are REJECTED — the
+// `is_acyclic_modelable_body` gate fails the whole function and the caller falls
+// back (over-approximate-and-skip), NEVER a false equivalence. We must reject
+// rather than delegate-and-hope because `encode_simple_instruction`'s catch-all
+// HAVOCS (pushes a fresh BV without popping operands) → it would desync the
+// stack for an unmodeled op.
+//
+// Currently `#[allow(dead_code)]`: built + tested in isolation (Z3 equivalence
+// over hand-written br_table fixtures, incl. the 0x8000_0000-selector default
+// case) BEFORE being wired into the live inline-verify path (PR-C M3).
+// ===========================================================================
+
+/// Symbolic state threaded along the *current* straight-line path.
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+struct AcyclicState {
+    stack: Vec<BV>,
+    locals: Vec<BV>,
+    globals: Vec<BV>,
+    /// Linear memory as a Z3 `Array[BV32 → BV8]` (PR-C Phase 2). Loads/stores use
+    /// the SAME shared helpers (`encode_i32_load_from` etc.) the main encoder
+    /// uses, so a `call F` modeled by-body and the inlined body produce
+    /// bit-identical memory expressions. Memory is MUTABLE branch state, so it is
+    /// ITE-merged at every join exactly like locals/globals.
+    memory: Array,
+    /// Path condition guarding the current straight-line path.
+    path: Bool,
+    /// False once this path has branched / returned (subsequent straight-line
+    /// instructions in the same sequence are dead and must not be executed).
+    reachable: bool,
+}
+
+/// One incoming branch to a label: its path guard plus the result/locals/globals/
+/// memory snapshot at the branch point.
+#[cfg(feature = "verification")]
+type BranchEntry = (Bool, Vec<BV>, Vec<BV>, Vec<BV>, Array);
+
+/// The ITE-merged successor of a join: result values, locals, globals, memory,
+/// and the merged path condition.
+#[cfg(feature = "verification")]
+type MergedState = (Vec<BV>, Vec<BV>, Vec<BV>, Array, Bool);
+
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+struct LabelJoin {
+    /// Number of result values the label carries (0 or 1 in Phase 1).
+    arity: usize,
+    entries: Vec<BranchEntry>,
+}
+
+/// Result arity of a structured block type, or `Err` if it has parameters or
+/// more than one result (rejected in Phase 1 — over-approximate-and-skip).
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn acyclic_block_arity(block_type: &BlockType) -> Result<usize> {
+    match block_type {
+        BlockType::Empty => Ok(0),
+        BlockType::Value(_) => Ok(1),
+        BlockType::Func { params, results } => {
+            if !params.is_empty() || results.len() > 1 {
+                Err(anyhow!(
+                    "acyclic executor: block with params/multi-result not modeled"
+                ))
+            } else {
+                Ok(results.len())
+            }
+        }
+    }
+}
+
+/// Allowlist of non-control instructions `encode_simple_instruction` models
+/// PRECISELY (pure integer ops + locals/globals, no memory, no havoc). Anything
+/// outside this set is rejected by `is_acyclic_modelable_body`.
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn is_acyclic_simple_modelable(instr: &Instruction) -> bool {
+    matches!(
+        instr,
+        Instruction::I32Const(_)
+            | Instruction::I64Const(_)
+            | Instruction::F32Const(_)
+            | Instruction::F64Const(_)
+            | Instruction::I32Add
+            | Instruction::I32Sub
+            | Instruction::I32Mul
+            | Instruction::I32And
+            | Instruction::I32Or
+            | Instruction::I32Xor
+            | Instruction::I32Shl
+            | Instruction::I32ShrS
+            | Instruction::I32ShrU
+            | Instruction::I32DivS
+            | Instruction::I32DivU
+            | Instruction::I32RemS
+            | Instruction::I32RemU
+            | Instruction::I32Eq
+            | Instruction::I32Ne
+            | Instruction::I32LtS
+            | Instruction::I32LtU
+            | Instruction::I32GtS
+            | Instruction::I32GtU
+            | Instruction::I32LeS
+            | Instruction::I32LeU
+            | Instruction::I32GeS
+            | Instruction::I32GeU
+            | Instruction::I32Eqz
+            | Instruction::I64Add
+            | Instruction::I64Sub
+            | Instruction::I64Mul
+            | Instruction::I64And
+            | Instruction::I64Or
+            | Instruction::I64Xor
+            | Instruction::I64Shl
+            | Instruction::I64ShrS
+            | Instruction::I64ShrU
+            | Instruction::I64DivS
+            | Instruction::I64DivU
+            | Instruction::I64RemS
+            | Instruction::I64RemU
+            | Instruction::I64Eq
+            | Instruction::I64Ne
+            | Instruction::I64LtS
+            | Instruction::I64LtU
+            | Instruction::I64GtS
+            | Instruction::I64GtU
+            | Instruction::I64LeS
+            | Instruction::I64LeU
+            | Instruction::I64GeS
+            | Instruction::I64GeU
+            | Instruction::I64Eqz
+            | Instruction::I32WrapI64
+            | Instruction::I64ExtendI32S
+            | Instruction::I64ExtendI32U
+            | Instruction::LocalGet(_)
+            | Instruction::LocalSet(_)
+            | Instruction::LocalTee(_)
+            | Instruction::GlobalGet(_)
+            | Instruction::GlobalSet(_)
+            | Instruction::Drop
+            | Instruction::Nop
+            | Instruction::Select
+    )
+}
+
+/// PR-C Phase 2: memory loads/stores `exec_acyclic` models explicitly (via the
+/// shared `encode_*_load_from`/`*_store_into` helpers). Kept SEPARATE from
+/// `is_acyclic_simple_modelable` because memory ops must NOT be delegated to
+/// `encode_simple_instruction` (it havocs them, desyncing the stack).
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn is_acyclic_memory_op(instr: &Instruction) -> bool {
+    matches!(
+        instr,
+        Instruction::I32Load { .. }
+            | Instruction::I64Load { .. }
+            | Instruction::I32Store { .. }
+            | Instruction::I64Store { .. }
+            | Instruction::I32Load8S { .. }
+            | Instruction::I32Load8U { .. }
+            | Instruction::I32Load16S { .. }
+            | Instruction::I32Load16U { .. }
+            | Instruction::I32Store8 { .. }
+            | Instruction::I32Store16 { .. }
+    )
+}
+
+/// Max by-body call-recursion depth the acyclic executor will model. The seam
+/// decides call only `panic*` (no-return → diverge, no further recursion), so
+/// depth 2 suffices; 8 gives headroom and bounds any cyclic/recursive call
+/// graph (which is rejected, not unrolled).
+#[cfg(feature = "verification")]
+const MAX_ACYCLIC_CALL_DEPTH: usize = 8;
+
+/// Whole-function gate: true iff every instruction (recursing into block/if
+/// bodies) is structurally modelable by `exec_acyclic`. Back-edges (Loop),
+/// call_indirect, unknown opcodes, and unmodeled block shapes (params/multi-
+/// result) fail. As of Phase 2b the gate is purely STRUCTURAL — it needs no
+/// module context, because ANY direct call is modelable (the executor diverges
+/// no-return callees, models acyclic locals by-body, and HAVOCS the rest —
+/// havoc is conservative-sound), and memory ops are handled explicitly.
+///
+/// A fast pre-check for the M3 routing decision; `exec_acyclic` is the actual
+/// soundness enforcer (it returns `Err` on anything it cannot model, never havoc
+/// a value-producing op silently).
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn is_acyclic_modelable_body(body: &[Instruction]) -> bool {
+    body.iter().all(|instr| match instr {
+        // Back-edges and indirect calls are out of scope.
+        Instruction::Loop { .. } | Instruction::CallIndirect { .. } => false,
+        // ANY direct call is modelable: by-body (acyclic local), ⊥-diverge
+        // (no-return), or HAVOC (import / loop-containing / multi-result). The
+        // executor's Call arm makes the choice; havoc never forges a false ≡.
+        Instruction::Call(_) => true,
+        Instruction::Block { block_type, body } => {
+            acyclic_block_arity(block_type).is_ok() && is_acyclic_modelable_body(body)
+        }
+        Instruction::If {
+            block_type,
+            then_body,
+            else_body,
+        } => {
+            acyclic_block_arity(block_type).is_ok()
+                && is_acyclic_modelable_body(then_body)
+                && is_acyclic_modelable_body(else_body)
+        }
+        Instruction::Br(_)
+        | Instruction::BrIf(_)
+        | Instruction::BrTable { .. }
+        | Instruction::Return
+        // Unreachable is a trap (⊥): the executor diverges the path, never havoc.
+        | Instruction::Unreachable => true,
+        other => is_acyclic_simple_modelable(other) || is_acyclic_memory_op(other),
+    })
+}
+
+/// Resolve a relative branch depth to an absolute index in the label stack
+/// (0 = innermost = top). `labels[0]` is the implicit function label.
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn acyclic_label_index(labels: &[LabelJoin], depth: u32) -> Result<usize> {
+    let depth = depth as usize;
+    if depth >= labels.len() {
+        return Err(anyhow!(
+            "acyclic executor: branch depth {} out of range",
+            depth
+        ));
+    }
+    Ok(labels.len() - 1 - depth)
+}
+
+/// Record the current state as an incoming branch to `labels[target]`, carrying
+/// that label's arity result values from the top of the stack.
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn acyclic_push_branch(
+    labels: &mut [LabelJoin],
+    target: usize,
+    guard: Bool,
+    state: &AcyclicState,
+) -> Result<()> {
+    let arity = labels[target].arity;
+    if state.stack.len() < arity {
+        return Err(anyhow!("acyclic executor: stack underflow at branch"));
+    }
+    let results: Vec<BV> = state.stack[state.stack.len() - arity..].to_vec();
+    labels[target].entries.push((
+        guard,
+        results,
+        state.locals.clone(),
+        state.globals.clone(),
+        state.memory.clone(),
+    ));
+    Ok(())
+}
+
+/// ITE-merge a label's incoming branches into one successor. Guards are pairwise
+/// disjoint (if: cond vs ¬cond; br_table: sel==i vs sel≥n unsigned), so the
+/// fold `ite(g0, v0, ite(g1, v1, … v_last))` selects the live branch under the
+/// merged path (= disjunction of guards). Returns `(results, locals, globals,
+/// path)`, or `None` if no branch reaches the label (unreachable join).
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn acyclic_merge_entries(entries: &[BranchEntry]) -> Option<MergedState> {
+    let (_, last_res, last_loc, last_glob, last_mem) = entries.last()?;
+    let mut res = last_res.clone();
+    let mut loc = last_loc.clone();
+    let mut glob = last_glob.clone();
+    let mut mem = last_mem.clone();
+    let mut guards: Vec<Bool> = vec![entries.last().unwrap().0.clone()];
+    // Fold earlier entries outward: each takes precedence under its own guard.
+    for (g, r, l, gl, m) in entries[..entries.len() - 1].iter().rev() {
+        res = r
+            .iter()
+            .zip(res.iter())
+            .map(|(t, f)| merge_bv(g, t, f))
+            .collect();
+        loc = l
+            .iter()
+            .zip(loc.iter())
+            .map(|(t, f)| merge_bv(g, t, f))
+            .collect();
+        glob = gl
+            .iter()
+            .zip(glob.iter())
+            .map(|(t, f)| merge_bv(g, t, f))
+            .collect();
+        // ITE-merge the memory Array (Bool::ite is generic over the Z3 sort).
+        mem = g.ite(m, &mem);
+        guards.push(g.clone());
+    }
+    let path = Bool::or(&guards);
+    Some((res, loc, glob, mem, path))
+}
+
+/// Apply a popped label's join to the state: the post-block state is the
+/// ITE-merge of all branches that targeted it, with the block's result values
+/// pushed on top of `base_stack` (the operand stack below the block entry).
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn acyclic_apply_join(
+    state: &mut AcyclicState,
+    base_stack: Vec<BV>,
+    join: &LabelJoin,
+) -> Result<()> {
+    match acyclic_merge_entries(&join.entries) {
+        None => {
+            // No branch reaches this join — code after the block is unreachable.
+            state.reachable = false;
+            Ok(())
+        }
+        Some((results, locals, globals, memory, path)) => {
+            let mut stack = base_stack;
+            stack.extend(results);
+            state.stack = stack;
+            state.locals = locals;
+            state.globals = globals;
+            state.memory = memory;
+            state.path = path;
+            state.reachable = true;
+            Ok(())
+        }
+    }
+}
+
+/// Execute a structured, acyclic instruction sequence, merging branch states at
+/// label joins. Precondition: the enclosing function passed
+/// `is_acyclic_modelable_body`. `labels[0]` is the function label.
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn exec_acyclic(
+    body: &[Instruction],
+    state: &mut AcyclicState,
+    labels: &mut Vec<LabelJoin>,
+    ctx: &VerificationSignatureContext,
+    depth: usize,
+    // Monotonic per-ENCODE counter for impure-call havoc names. Threaded (not in
+    // AcyclicState, which is saved/restored at branches) so the Nth impure call
+    // gets the same id in the original and optimized encodings → they unify.
+    havoc_counter: &mut u64,
+) -> Result<()> {
+    for instr in body {
+        if !state.reachable {
+            // Dead code after an unconditional branch / return.
+            break;
+        }
+        match instr {
+            Instruction::Block { block_type, body } => {
+                let arity = acyclic_block_arity(block_type)?;
+                let base_stack = state.stack.clone();
+                labels.push(LabelJoin {
+                    arity,
+                    entries: Vec::new(),
+                });
+                exec_acyclic(body, state, labels, ctx, depth, havoc_counter)?;
+                // Fall-through off the end == implicit `br 0` to this block.
+                if state.reachable {
+                    let target = labels.len() - 1;
+                    acyclic_push_branch(labels, target, state.path.clone(), state)?;
+                }
+                let join = labels.pop().unwrap();
+                acyclic_apply_join(state, base_stack, &join)?;
+            }
+            Instruction::If {
+                block_type,
+                then_body,
+                else_body,
+            } => {
+                let arity = acyclic_block_arity(block_type)?;
+                let cond = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: If underflow"))?;
+                let cond_nz = cond.eq(BV::from_i64(0, 32)).not();
+                let base_stack = state.stack.clone();
+                let saved_locals = state.locals.clone();
+                let saved_globals = state.globals.clone();
+                let saved_memory = state.memory.clone();
+                let saved_path = state.path.clone();
+                labels.push(LabelJoin {
+                    arity,
+                    entries: Vec::new(),
+                });
+                let target = labels.len() - 1;
+                // then-branch under path ∧ cond≠0
+                state.path = Bool::and(&[saved_path.clone(), cond_nz.clone()]);
+                state.reachable = true;
+                exec_acyclic(then_body, state, labels, ctx, depth, havoc_counter)?;
+                if state.reachable {
+                    acyclic_push_branch(labels, target, state.path.clone(), state)?;
+                }
+                // else-branch under path ∧ cond=0 (restore entry state)
+                state.stack = base_stack.clone();
+                state.locals = saved_locals;
+                state.globals = saved_globals;
+                state.memory = saved_memory;
+                state.path = Bool::and(&[saved_path, cond_nz.not()]);
+                state.reachable = true;
+                exec_acyclic(else_body, state, labels, ctx, depth, havoc_counter)?;
+                if state.reachable {
+                    acyclic_push_branch(labels, target, state.path.clone(), state)?;
+                }
+                let join = labels.pop().unwrap();
+                acyclic_apply_join(state, base_stack, &join)?;
+            }
+            Instruction::Br(depth) => {
+                let target = acyclic_label_index(labels, *depth)?;
+                acyclic_push_branch(labels, target, state.path.clone(), state)?;
+                state.reachable = false;
+            }
+            Instruction::BrIf(depth) => {
+                let cond = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: BrIf underflow"))?;
+                let cond_nz = cond.eq(BV::from_i64(0, 32)).not();
+                let target = acyclic_label_index(labels, *depth)?;
+                let branch_guard = Bool::and(&[state.path.clone(), cond_nz.clone()]);
+                acyclic_push_branch(labels, target, branch_guard, state)?;
+                // Fall-through path narrows to cond=0.
+                state.path = Bool::and(&[state.path.clone(), cond_nz.not()]);
+            }
+            Instruction::BrTable { targets, default } => {
+                let sel = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: BrTable underflow"))?;
+                let n = targets.len() as u64;
+                // Arm i is taken iff sel == i.
+                for (i, d) in targets.iter().enumerate() {
+                    let target = acyclic_label_index(labels, *d)?;
+                    let guard =
+                        Bool::and(&[state.path.clone(), sel.eq(BV::from_u64(i as u64, 32))]);
+                    acyclic_push_branch(labels, target, guard, state)?;
+                }
+                // Default is taken iff sel >=u n (UNSIGNED — gale must-fix: a
+                // signed >= would leave selectors in [2^31, 2^32) in a gap
+                // between the arm guards and the default, letting Z3 pick any
+                // arm → a false equivalence).
+                let target = acyclic_label_index(labels, *default)?;
+                let guard = Bool::and(&[state.path.clone(), sel.bvuge(BV::from_u64(n, 32))]);
+                acyclic_push_branch(labels, target, guard, state)?;
+                state.reachable = false;
+            }
+            Instruction::Return => {
+                // Branch to the implicit function label.
+                acyclic_push_branch(labels, 0, state.path.clone(), state)?;
+                state.reachable = false;
+            }
+            Instruction::Unreachable => {
+                // Trap: this path diverges (⊥) — it drops out of the result
+                // ITE and contributes to no join, exactly like a no-return
+                // call. NEVER havoc (Alive2 ub discipline).
+                state.reachable = false;
+            }
+            Instruction::Call(g) => {
+                let callee_opt = ctx
+                    .function_bodies
+                    .get(*g as usize)
+                    .and_then(|o| o.as_deref());
+                match callee_opt {
+                    // No-return callee (panic→unreachable): ⊥-diverge — the path
+                    // traps and drops out of the result ITE, NEVER havoc (Alive2
+                    // `ub` discipline; havoc reopens #155/#159).
+                    Some(callee) if is_noreturn_callee(callee) => {
+                        state.reachable = false;
+                    }
+                    // Acyclic, single-result local callee: model BY BODY (the
+                    // same executor models the original `call F` and the inlined
+                    // body identically → the inline is provable). Its OWN impure
+                    // calls get havoc'd via the shared counter.
+                    Some(callee)
+                        if depth + 1 < MAX_ACYCLIC_CALL_DEPTH
+                            && callee.signature.results.len() <= 1
+                            && is_acyclic_modelable_body(&callee.instructions) =>
+                    {
+                        encode_acyclic_call(callee, state, ctx, depth, havoc_counter)?;
+                    }
+                    // Impure / opaque call (import, multi-result, too-deep, or a
+                    // local callee with a loop/unmodelable shape): HAVOC. Sound
+                    // and conservative — corresponding calls in the original and
+                    // optimized encodings get the SAME havoc id (shared counter)
+                    // so they unify; only a havoc-vs-CONCRETE mismatch (e.g.
+                    // original havocs `call F` but optimized inlines F's effect)
+                    // refuses to prove → revert (the #155/#159 guard).
+                    _ => {
+                        acyclic_havoc_call(*g, state, ctx, havoc_counter)?;
+                    }
+                }
+            }
+            // PR-C Phase 2: memory loads/stores via the SAME shared helpers the
+            // main encoder uses (so a by-body call and its inlined body produce
+            // bit-identical memory expressions). Handled explicitly — NEVER
+            // delegated to encode_simple_instruction, which havocs memory ops.
+            Instruction::I32Load { offset, .. } => {
+                let addr = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: I32Load underflow"))?;
+                state
+                    .stack
+                    .push(encode_i32_load_from(&state.memory, &addr, *offset)?);
+            }
+            Instruction::I64Load { offset, .. } => {
+                let addr = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: I64Load underflow"))?;
+                state
+                    .stack
+                    .push(encode_i64_load_from(&state.memory, &addr, *offset)?);
+            }
+            Instruction::I32Store { offset, .. } => {
+                let value = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: I32Store value underflow"))?;
+                let addr = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: I32Store addr underflow"))?;
+                state.memory = encode_i32_store_into(&state.memory, &addr, &value, *offset);
+            }
+            Instruction::I64Store { offset, .. } => {
+                let value = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: I64Store value underflow"))?;
+                let addr = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: I64Store addr underflow"))?;
+                state.memory = encode_i64_store_into(&state.memory, &addr, &value, *offset);
+            }
+            // Partial-width (8/16-bit) loads/stores (loom#161 helpers).
+            Instruction::I32Load8S { offset, .. }
+            | Instruction::I32Load8U { offset, .. }
+            | Instruction::I32Load16S { offset, .. }
+            | Instruction::I32Load16U { offset, .. } => {
+                let addr = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: partial load underflow"))?;
+                let (bytes, signed) = match instr {
+                    Instruction::I32Load8S { .. } => (1, true),
+                    Instruction::I32Load8U { .. } => (1, false),
+                    Instruction::I32Load16S { .. } => (2, true),
+                    _ => (2, false),
+                };
+                state.stack.push(encode_partial_load_from(
+                    &state.memory,
+                    &addr,
+                    *offset,
+                    bytes,
+                    signed,
+                    32,
+                )?);
+            }
+            Instruction::I32Store8 { offset, .. } | Instruction::I32Store16 { offset, .. } => {
+                let value = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: partial store value underflow"))?;
+                let addr = state
+                    .stack
+                    .pop()
+                    .ok_or_else(|| anyhow!("acyclic executor: partial store addr underflow"))?;
+                let bytes = if matches!(instr, Instruction::I32Store8 { .. }) {
+                    1
+                } else {
+                    2
+                };
+                state.memory =
+                    encode_partial_store_into(&state.memory, &addr, &value, *offset, bytes);
+            }
+            other => {
+                // Defense in depth: never delegate an unmodeled op (the
+                // delegate's catch-all havocs and would desync the stack).
+                if !is_acyclic_simple_modelable(other) {
+                    return Err(anyhow!(
+                        "acyclic executor: instruction not modelable: {:?}",
+                        other
+                    ));
+                }
+                encode_simple_instruction(
+                    other,
+                    &mut state.stack,
+                    &mut state.locals,
+                    &mut state.globals,
+                )?;
+            }
+        }
+    }
+    Ok(())
+}
+
+/// Model an opaque / impure `call g` by HAVOC (PR-C Phase 2b): pop the args,
+/// push a fresh symbolic per result, and havoc all globals + memory. Naming is
+/// deterministic in the per-encode `havoc_counter` so the Nth impure call gets
+/// the SAME names in the original and optimized encodings → they unify in Z3
+/// (the original's `call F` and the optimized's `call F` cancel). Mirrors the
+/// main encoder's conservative Call tier. Used for imports and local callees the
+/// acyclic executor cannot model by-body (loops, multi-result, too-deep).
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn acyclic_havoc_call(
+    g: u32,
+    state: &mut AcyclicState,
+    ctx: &VerificationSignatureContext,
+    havoc_counter: &mut u64,
+) -> Result<()> {
+    let sig = ctx
+        .get_function_signature(g)
+        .ok_or_else(|| anyhow!("acyclic executor: no signature for call {}", g))?;
+    let nparams = sig.params.len();
+    if state.stack.len() < nparams {
+        return Err(anyhow!("acyclic executor: havoc-call arg underflow"));
+    }
+    // Discard the args (the opaque result is independent of them, matching the
+    // main encoder's conservative tier).
+    state.stack.truncate(state.stack.len() - nparams);
+    // Fresh symbolic result(s).
+    for (i, rt) in sig.results.iter().enumerate() {
+        state.stack.push(BV::new_const(
+            format!("call_{}_result_{}", g, i),
+            bv_width_for_value_type(*rt),
+        ));
+    }
+    // Havoc globals + memory with the shared deterministic id.
+    let id = *havoc_counter;
+    *havoc_counter += 1;
+    for (idx, global) in state.globals.iter_mut().enumerate() {
+        *global = BV::new_const(
+            format!("global_{}_after_call_{}_{}", idx, g, id),
+            global.get_size(),
+        );
+    }
+    state.memory = Array::new_const(
+        format!("mem_after_call_{}_{}", g, id),
+        &Sort::bitvector(32),
+        &Sort::bitvector(8),
+    );
+    Ok(())
+}
+
+/// Model a `call` to `callee` BY BODY: pop the args, recursively encode the
+/// callee through `exec_acyclic` (params bound to the args, declared locals
+/// zero-init, globals SHARED with the caller so the callee's global writes
+/// propagate back), and push the merged single result. Sets `reachable=false`
+/// if every callee path traps (no normal exit). The SAME executor models both
+/// the original `call F` (here) and the inlined body, so they yield the same
+/// expression → the inline is provable.
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn encode_acyclic_call(
+    callee: &Function,
+    state: &mut AcyclicState,
+    ctx: &VerificationSignatureContext,
+    depth: usize,
+    havoc_counter: &mut u64,
+) -> Result<()> {
+    if depth + 1 >= MAX_ACYCLIC_CALL_DEPTH {
+        return Err(anyhow!("acyclic executor: call recursion too deep"));
+    }
+    let arity = callee.signature.results.len();
+    if arity > 1 {
+        return Err(anyhow!("acyclic executor: multi-result callee"));
+    }
+    let nparams = callee.signature.params.len();
+    if state.stack.len() < nparams {
+        return Err(anyhow!("acyclic executor: call arg underflow"));
+    }
+    // Pop args (last param on top of stack); bind to the callee's param locals.
+    let mut callee_locals: Vec<BV> = state.stack.split_off(state.stack.len() - nparams);
+    for (count, lt) in callee.locals.iter() {
+        let width = bv_width_for_value_type(*lt);
+        for _ in 0..*count {
+            callee_locals.push(BV::from_u64(0, width));
+        }
+    }
+    let mut sub = AcyclicState {
+        stack: Vec::new(),
+        locals: callee_locals,
+        // SHARED globals + memory — the callee may read/write module globals and
+        // linear memory; the merged result threads the updated state back to the
+        // caller.
+        globals: state.globals.clone(),
+        memory: state.memory.clone(),
+        // Inherit the caller's path so callee branch guards conjoin it.
+        path: state.path.clone(),
+        reachable: true,
+    };
+    let mut sub_labels = vec![LabelJoin {
+        arity,
+        entries: Vec::new(),
+    }];
+    exec_acyclic(
+        &callee.instructions,
+        &mut sub,
+        &mut sub_labels,
+        ctx,
+        depth + 1,
+        havoc_counter,
+    )?;
+    if sub.reachable {
+        acyclic_push_branch(&mut sub_labels, 0, sub.path.clone(), &sub)?;
+    }
+    let join = sub_labels.pop().unwrap();
+    match acyclic_merge_entries(&join.entries) {
+        Some((results, _locals, globals, memory, path)) => {
+            // Propagate callee global + memory writes; narrow the caller path to
+            // the callee's returning paths.
+            state.globals = globals;
+            state.memory = memory;
+            state.path = path;
+            if arity == 1 {
+                state.stack.push(results[0].clone());
+            }
+        }
+        None => {
+            // Callee never returns normally (all paths trap) → caller diverges.
+            state.reachable = false;
+        }
+    }
+    Ok(())
+}
+
+/// Encode a function via the precise acyclic CF executor. Mirrors
+/// `encode_function_to_smt`'s deterministic input naming (`param{i}`,
+/// `global{i}`) so two encodings UNIFY in Z3 and can be compared directly.
+/// Returns `Err` if the function is not acyclic-modelable (caller falls back).
+#[cfg(feature = "verification")]
+#[allow(dead_code)]
+fn encode_acyclic_function(
+    func: &Function,
+    ctx: &VerificationSignatureContext,
+) -> Result<Option<BV>> {
+    if !is_acyclic_modelable_body(&func.instructions) {
+        return Err(anyhow!("function not acyclic-modelable"));
+    }
+    let arity = func.signature.results.len();
+    if arity > 1 {
+        return Err(anyhow!("acyclic executor: multi-result not modeled"));
+    }
+
+    let mut locals: Vec<BV> = Vec::new();
+    for (idx, pt) in func.signature.params.iter().enumerate() {
+        locals.push(BV::new_const(
+            format!("param{}", idx),
+            bv_width_for_value_type(*pt),
+        ));
+    }
+    for (count, lt) in func.locals.iter() {
+        let width = bv_width_for_value_type(*lt);
+        for _ in 0..*count {
+            locals.push(BV::from_u64(0, width));
+        }
+    }
+    let mut globals: Vec<BV> = Vec::new();
+    for i in 0..16 {
+        globals.push(BV::new_const(format!("global{}", i), 32));
+    }
+    // Symbolic initial memory — same name as the main encoder so two
+    // encode_acyclic_function calls (original + optimized) unify in Z3.
+    let memory = Array::new_const("memory", &Sort::bitvector(32), &Sort::bitvector(8));
+
+    let mut state = AcyclicState {
+        stack: Vec::new(),
+        locals,
+        globals,
+        memory,
+        path: Bool::from_bool(true),
+        reachable: true,
+    };
+    let mut labels = vec![LabelJoin {
+        arity,
+        entries: Vec::new(),
+    }];
+    // Per-encode impure-call havoc counter (reset to 0 each encode so the Nth
+    // impure call gets the same id in the original and optimized encodings).
+    let mut havoc_counter: u64 = 0;
+    exec_acyclic(
+        &func.instructions,
+        &mut state,
+        &mut labels,
+        ctx,
+        0,
+        &mut havoc_counter,
+    )?;
+    // Fall-through off the function body == implicit return.
+    if state.reachable {
+        acyclic_push_branch(&mut labels, 0, state.path.clone(), &state)?;
+    }
+    let func_join = labels.pop().unwrap();
+
+    if arity == 0 {
+        return Ok(None);
+    }
+    match acyclic_merge_entries(&func_join.entries) {
+        Some((results, _, _, _, _)) => Ok(Some(results[0].clone())),
+        None => Err(anyhow!("acyclic executor: function has no reachable exit")),
+    }
+}
+
+/// PR-C (#219) M3: precise FAST PATH for proving `original ≡ optimized` when
+/// BOTH functions are acyclic-CF-modelable (integer, block/if/br/br_if/br_table,
+/// by-body calls, no loops/memory). Encodes both via the precise acyclic
+/// executor (deterministic input names → they unify in Z3) and checks semantic
+/// equivalence (`a == b` has no counterexample). This is what lets the verified
+/// inliner prove a `call F` (with a `br_table` callee) equal to its inlined
+/// body — the case the straight-line by-body modeler (`callee_inlinable_by_body`)
+/// cannot reach.
+///
+/// Returns `Err` if either function is not acyclic-modelable, if a result width
+/// mismatches, or if the encoding fails — the caller then FALLS BACK to the
+/// existing encoder-based `verify_function_equivalence` (no behavior change for
+/// non-acyclic functions; this only ADDS proving power, never removes a check).
+#[cfg(feature = "verification")]
+#[allow(dead_code)] // wired into inline_functions' verify-or-revert in M3 step 2
+pub(crate) fn verify_acyclic_equivalence(
+    original: &Function,
+    optimized: &Function,
+    ctx: &VerificationSignatureContext,
+) -> Result<bool> {
+    if original.signature.params != optimized.signature.params
+        || original.signature.results != optimized.signature.results
+    {
+        return Err(anyhow!("acyclic verify: signature changed"));
+    }
+    // Only handle the precise acyclic case; anything else → caller falls back.
+    if !is_acyclic_modelable_body(&original.instructions)
+        || !is_acyclic_modelable_body(&optimized.instructions)
+    {
+        return Err(anyhow!("acyclic verify: not acyclic-modelable"));
+    }
+
+    let cfg = z3_config_with_timeout();
+    with_z3_config(&cfg, || {
+        let solver = Solver::new();
+        let orig_result = encode_acyclic_function(original, ctx);
+        let opt_result = encode_acyclic_function(optimized, ctx);
+        match (&orig_result, &opt_result) {
+            (Ok(Some(orig)), Ok(Some(opt))) => {
+                // loom#145 soundness boundary: never width-match the final
+                // equality — a width mismatch is a model artifact we cannot
+                // soundly resolve, so bail to "not proven" and let the caller
+                // fall back / revert.
+                if orig.get_size() != opt.get_size() {
+                    return Err(anyhow!("acyclic verify: result width mismatch"));
+                }
+                solver.assert(orig.eq(opt).not());
+                match solver.check() {
+                    SatResult::Unsat => Ok(true),
+                    SatResult::Sat => Ok(false),
+                    SatResult::Unknown => Err(anyhow!("acyclic verify: solver unknown (timeout)")),
+                }
+            }
+            // Both void: no return value to compare. The acyclic executor does
+            // not yet model side-effect (global/memory) equivalence for void
+            // functions, so defer to the existing verifier — signal fall-back.
+            (Ok(None), Ok(None)) => Err(anyhow!("acyclic verify: void — defer to fallback")),
+            (Err(e), _) => Err(anyhow!("acyclic verify: encode original: {}", e)),
+            (_, Err(e)) => Err(anyhow!("acyclic verify: encode optimized: {}", e)),
+            _ => Err(anyhow!("acyclic verify: return-type mismatch")),
+        }
+    })
+}
+
 /// Verify that block stack properties are preserved across optimization
 ///
 /// Uses Z3 to formally verify that a block transformation preserves:
@@ -7397,6 +8351,412 @@ mod tests {
     use super::*;
     use crate::parse;
 
+    // ---- PR-C (gale #219) M1a: precise acyclic CF executor, tested in isolation ----
+
+    /// Encode both functions via the precise acyclic executor (deterministic
+    /// input names → they unify in Z3) and check semantic equivalence.
+    /// `Ok(true)` = proven equivalent, `Ok(false)` = counterexample found.
+    #[cfg(feature = "verification")]
+    fn acyclic_equiv(wat_a: &str, wat_b: &str) -> Result<bool> {
+        acyclic_equiv_at(wat_a, 0, wat_b, 0)
+    }
+
+    /// As `acyclic_equiv` but comparing a chosen function index in each module
+    /// (so a caller that `call`s a sibling can be compared to its inlined form).
+    #[cfg(feature = "verification")]
+    fn acyclic_equiv_at(wat_a: &str, ia: usize, wat_b: &str, ib: usize) -> Result<bool> {
+        let ma = parse::parse_wat(wat_a).unwrap();
+        let mb = parse::parse_wat(wat_b).unwrap();
+        let ctx_a = VerificationSignatureContext::from_module(&ma);
+        let ctx_b = VerificationSignatureContext::from_module(&mb);
+        let fa = &ma.functions[ia];
+        let fb = &mb.functions[ib];
+        let cfg = z3_config_with_timeout();
+        with_z3_config(&cfg, || {
+            let solver = Solver::new();
+            let ra = encode_acyclic_function(fa, &ctx_a)?;
+            let rb = encode_acyclic_function(fb, &ctx_b)?;
+            match (ra, rb) {
+                (Some(a), Some(b)) => {
+                    if a.get_size() != b.get_size() {
+                        return Ok(false);
+                    }
+                    solver.assert(a.eq(&b).not());
+                    match solver.check() {
+                        SatResult::Unsat => Ok(true),
+                        SatResult::Sat => Ok(false),
+                        SatResult::Unknown => Err(anyhow!("solver unknown")),
+                    }
+                }
+                _ => Err(anyhow!("void/encode mismatch")),
+            }
+        })
+    }
+
+    /// 3-way `br_table` switch (0→10, 1→20, default→30).
+    #[cfg(feature = "verification")]
+    const SWITCH_10_20_30: &str = r#"
+        (module (func (param i32) (result i32)
+            (block (block (block
+                local.get 0
+                br_table 0 1 2)
+              i32.const 10
+              return)
+              i32.const 20
+              return)
+            i32.const 30
+            return))"#;
+
+    /// Same dispatch expressed as an if/else chain — semantically identical.
+    #[cfg(feature = "verification")]
+    const IFELSE_10_20_30: &str = r#"
+        (module (func (param i32) (result i32)
+            local.get 0
+            i32.eqz
+            if (result i32)
+                i32.const 10
+            else
+                local.get 0
+                i32.const 1
+                i32.eq
+                if (result i32)
+                    i32.const 20
+                else
+                    i32.const 30
+                end
+            end))"#;
+
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_brtable_equiv_ifelse() {
+        // The precise executor must model a br_table switch identically to the
+        // equivalent if/else chain (independent encodings → real proof).
+        assert!(
+            acyclic_equiv(SWITCH_10_20_30, IFELSE_10_20_30).unwrap(),
+            "br_table switch must prove equivalent to the if/else chain"
+        );
+    }
+
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_brtable_identity() {
+        // A switch is trivially equivalent to itself (sanity: the executor is
+        // deterministic and the merge is stable).
+        assert!(acyclic_equiv(SWITCH_10_20_30, SWITCH_10_20_30).unwrap());
+    }
+
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_brtable_detects_difference() {
+        // A switch whose default returns 99 instead of 30 must NOT prove
+        // equivalent — the executor must distinguish, not over-approve.
+        let switch_wrong_default = r#"
+            (module (func (param i32) (result i32)
+                (block (block (block
+                    local.get 0
+                    br_table 0 1 2)
+                  i32.const 10
+                  return)
+                  i32.const 20
+                  return)
+                i32.const 99
+                return))"#;
+        assert!(
+            !acyclic_equiv(SWITCH_10_20_30, switch_wrong_default).unwrap(),
+            "differing default arm must yield a counterexample (sel >= 2)"
+        );
+    }
+
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_brtable_detects_swapped_arms() {
+        // Swapping arm 0 and arm 1 results must be caught (sel==0 / sel==1).
+        let swapped = r#"
+            (module (func (param i32) (result i32)
+                (block (block (block
+                    local.get 0
+                    br_table 0 1 2)
+                  i32.const 20
+                  return)
+                  i32.const 10
+                  return)
+                i32.const 30
+                return))"#;
+        assert!(
+            !acyclic_equiv(SWITCH_10_20_30, swapped).unwrap(),
+            "swapped arm values must yield a counterexample"
+        );
+    }
+
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_modelable_gate() {
+        // The whole-function gate must reject loops (back-edges).
+        let with_loop =
+            parse::parse_wat("(module (func (param i32) (result i32) (loop local.get 0)))")
+                .unwrap();
+        assert!(
+            !is_acyclic_modelable_body(&with_loop.functions[0].instructions),
+            "loops (back-edges) must be rejected"
+        );
+
+        // PR-C Phase 2: memory ops are now ADMITTED (the executor threads a
+        // memory Array). (Phase 1 rejected them.)
+        let with_memory = parse::parse_wat(
+            "(module (memory 1) (func (param i32) (result i32) local.get 0 i32.load))",
+        )
+        .unwrap();
+        assert!(
+            is_acyclic_modelable_body(&with_memory.functions[0].instructions),
+            "memory ops are modelable in Phase 2"
+        );
+
+        // A pure-integer br_table switch must be admitted.
+        let switch = parse::parse_wat(SWITCH_10_20_30).unwrap();
+        assert!(
+            is_acyclic_modelable_body(&switch.functions[0].instructions),
+            "pure-integer acyclic br_table function must be admitted"
+        );
+    }
+
+    /// M1b: a caller that `call`s a br_table callee must prove equivalent to the
+    /// same logic with the callee inlined — i.e. the executor models `call F`
+    /// by-body identically to the inlined body (the inline-soundness principle
+    /// extended to control flow). func 0 = decide (switch), func 1 = z (calls it).
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_call_by_body_equiv_inlined() {
+        let caller_with_call = r#"
+            (module
+                (func (param i32) (result i32)
+                    (block (block (block
+                        local.get 0
+                        br_table 0 1 2)
+                      i32.const 10
+                      return)
+                      i32.const 20
+                      return)
+                    i32.const 30
+                    return)
+                (func (param i32) (result i32)
+                    local.get 0
+                    call 0))"#;
+        // z (func 1) calls decide (func 0); compare to the inlined switch.
+        assert!(
+            acyclic_equiv_at(caller_with_call, 1, SWITCH_10_20_30, 0).unwrap(),
+            "call-by-body must prove equivalent to the inlined callee body"
+        );
+    }
+
+    /// M1b: a call to a no-return callee (panic→unreachable) must be modeled as
+    /// ⊥ (diverge), never havoc. A function that returns 7 on one path and traps
+    /// on the other proves equivalent to one that simply returns 7 (the trapping
+    /// path drops out of the result; both sides agree on the returning path).
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_noreturn_call_diverges() {
+        // func 0 = panic (no params, unreachable); func 1 = g: if arg!=0 return 7
+        // else call panic (diverge). func 1 should equal "always return 7".
+        let with_trap = r#"
+            (module
+                (func unreachable)
+                (func (param i32) (result i32)
+                    local.get 0
+                    if (result i32)
+                        i32.const 7
+                    else
+                        call 0
+                        i32.const 7
+                    end))"#;
+        let always7 = r#"
+            (module (func (param i32) (result i32) i32.const 7))"#;
+        assert!(
+            acyclic_equiv_at(with_trap, 1, always7, 0).unwrap(),
+            "no-return call path must diverge (drop out), leaving only the return-7 path"
+        );
+    }
+
+    /// A direct `unreachable` (trap) on one path must diverge — same ⊥ discipline
+    /// as a no-return call. The seam decides end their overflow arms in
+    /// `unreachable`, so the executor must model it (not reject the function).
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_unreachable_diverges() {
+        let with_trap = r#"
+            (module (func (param i32) (result i32)
+                local.get 0
+                if (result i32)
+                    i32.const 7
+                else
+                    unreachable
+                end))"#;
+        let always7 = r#"(module (func (param i32) (result i32) i32.const 7))"#;
+        assert!(
+            acyclic_equiv(with_trap, always7).unwrap(),
+            "the unreachable (trap) path must diverge, leaving only the return-7 path"
+        );
+    }
+
+    /// PR-C Phase 2: a store on each branch followed by a load must MERGE memory
+    /// correctly (the load sees ite(cond, mem_then, mem_else)). Equivalent to the
+    /// direct if/else — proves the per-join memory ITE-merge is wired right.
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_memory_merge() {
+        let with_mem = r#"
+            (module (memory 1)
+                (func (param i32) (result i32)
+                    local.get 0
+                    if
+                        i32.const 100
+                        i32.const 5
+                        i32.store
+                    else
+                        i32.const 100
+                        i32.const 6
+                        i32.store
+                    end
+                    i32.const 100
+                    i32.load))"#;
+        let direct = r#"
+            (module (memory 1)
+                (func (param i32) (result i32)
+                    local.get 0
+                    if (result i32)
+                        i32.const 5
+                    else
+                        i32.const 6
+                    end))"#;
+        assert!(
+            acyclic_equiv(with_mem, direct).unwrap(),
+            "store-on-each-branch then load must merge memory (ite) → loaded value matches"
+        );
+    }
+
+    /// PR-C Phase 2b: a call to an impure import is HAVOC'd. Two encodings of the
+    /// same impure call must UNIFY (same per-encode havoc id) → equivalent. (If
+    /// the counter weren't deterministic per-encode, the two `call_0_result_0`
+    /// names would differ and this would spuriously fail.)
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_havoc_call_unifies() {
+        let calls_import = r#"
+            (module
+                (import "env" "f" (func $f (param i32) (result i32)))
+                (func (param i32) (result i32)
+                    local.get 0
+                    call $f))"#;
+        assert!(
+            acyclic_equiv(calls_import, calls_import).unwrap(),
+            "two encodings of the same impure call must unify (havoc id parity)"
+        );
+    }
+
+    /// PR-C Phase 2b SOUNDNESS: a havoc'd call result must NOT be provably equal
+    /// to a concrete value — `call $f(x)` (opaque) is not `x`. This is the guard
+    /// against a havoc-vs-concrete false equivalence (#155/#159 surface).
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_havoc_not_concrete() {
+        let calls_import = r#"
+            (module
+                (import "env" "f" (func $f (param i32) (result i32)))
+                (func (param i32) (result i32)
+                    local.get 0
+                    call $f))"#;
+        let identity = r#"
+            (module
+                (import "env" "f" (func $f (param i32) (result i32)))
+                (func (param i32) (result i32)
+                    local.get 0))"#;
+        assert!(
+            !acyclic_equiv(calls_import, identity).unwrap(),
+            "a havoc'd call result must NOT prove equal to a concrete value"
+        );
+    }
+
+    /// PR-C Phase 2b SOUNDNESS: an impure call havocs memory, so a load AFTER the
+    /// call reads the havoc'd memory — NOT provably equal to loading the initial
+    /// memory. Guards against the havoc failing to invalidate memory.
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_acyclic_havoc_memory() {
+        let load_after_call = r#"
+            (module
+                (import "env" "g" (func $g))
+                (memory 1)
+                (func (result i32)
+                    call $g
+                    i32.const 0
+                    i32.load))"#;
+        let load_initial = r#"
+            (module
+                (import "env" "g" (func $g))
+                (memory 1)
+                (func (result i32)
+                    i32.const 0
+                    i32.load))"#;
+        assert!(
+            !acyclic_equiv(load_after_call, load_initial).unwrap(),
+            "a load after an impure call must read havoc'd memory, not the initial"
+        );
+    }
+
+    /// M3: the public verify_acyclic_equivalence fast-path proves a caller that
+    /// `call`s a br_table decide equivalent to the inlined switch, and rejects a
+    /// non-equivalent optimized body — the exact inline verify-or-revert pattern.
+    #[test]
+    #[cfg(feature = "verification")]
+    fn test_verify_acyclic_equivalence_inline() {
+        let module_wat = r#"
+            (module
+                (func (param i32) (result i32)
+                    (block (block (block
+                        local.get 0
+                        br_table 0 1 2)
+                      i32.const 10
+                      return)
+                      i32.const 20
+                      return)
+                    i32.const 30
+                    return)
+                (func (param i32) (result i32)
+                    local.get 0
+                    call 0))"#;
+        let m = parse::parse_wat(module_wat).unwrap();
+        let ctx = VerificationSignatureContext::from_module(&m);
+        let z = &m.functions[1]; // caller: local.get 0; call decide
+
+        // Equivalent: the inlined switch.
+        let inlined = parse::parse_wat(SWITCH_10_20_30).unwrap();
+        assert!(
+            verify_acyclic_equivalence(z, &inlined.functions[0], &ctx).unwrap(),
+            "z (calls decide) must verify ≡ the inlined switch"
+        );
+
+        // Non-equivalent: a switch with a wrong default (99) must NOT verify.
+        let wrong = parse::parse_wat(
+            r#"(module (func (param i32) (result i32)
+                (block (block (block local.get 0 br_table 0 1 2)
+                  i32.const 10 return) i32.const 20 return) i32.const 99 return))"#,
+        )
+        .unwrap();
+        assert!(
+            !verify_acyclic_equivalence(z, &wrong.functions[0], &ctx).unwrap(),
+            "a wrong-default body must NOT verify equivalent"
+        );
+
+        // A function with a loop is not acyclic-modelable → Err (caller falls back).
+        let loopy = parse::parse_wat(
+            "(module (func (param i32) (result i32) (loop (result i32) local.get 0)))",
+        )
+        .unwrap();
+        assert!(
+            verify_acyclic_equivalence(&loopy.functions[0], &loopy.functions[0], &ctx).is_err(),
+            "non-acyclic functions must Err so the caller falls back"
+        );
+    }
+
     #[test]
     fn test_verify_constant_folding() {
         // Original: (i32.add (i32.const 2) (i32.const 3))
diff --git a/loom-core/tests/fixtures/repro-219/README.md b/loom-core/tests/fixtures/repro-219/README.md
new file mode 100644
index 0000000..7e090fd
--- /dev/null
+++ b/loom-core/tests/fixtures/repro-219/README.md
@@ -0,0 +1,30 @@
+# #219 seam-SROA repro
+
+`sem.loom.wasm` — gale's authentic post-loom-inline dissolved `z_impl_k_sem_give`
+(the C↔Rust k_sem_give decide seam). Provided on issue #219; **sha256
+`f81da42d…`, 5254 B** (verified on decode). From
+`gale-smart-data/benches/engine_control/silicon/wasm-testbed/repro-loom-seam-sroa/`.
+
+It still contains the u64 pack/unpack round-trip the seam-SROA pass must dissolve:
+
+- **pack** (decide builds the u64): `i64.extend_i32_u; i64.shl; i64.or`
+- **unpack** (shim tears it back to scalars): `i64.and`, `i64.shr_u`, `i32.wrap_i64`
+- a dead `i64` carrier local in between
+
+## Kill-criterion (gale, on-silicon G474RE)
+
+- structural: dissolved body has **no i64 pack/unpack**, carrier local gone,
+  ARM body 83 → ~55–60 insns.
+- silicon: sem 860 → toward 471 (LLVM-LTO); mutex 472. gale re-flashes.
+
+## Plan (design confirmed on the issue — proof-carrying)
+
+1. wasm-local mem2reg: promote the single-assignment non-escaping i64 carrier so
+   the pack expression reaches the unpack sites.
+2. SROA / pack-unpack algebraic forwarding via ISLE rewrites (preferred: add
+   `i64.extend_i32_u` / `i32.wrap_i64` ISLE terms), each with a **Z3 proof**
+   discharged before the rule ships: `(extend_u(a) | (extend_u(b)<<k)) & mask → a`,
+   `(...)>>k → b`, `wrap_i64(extend_i32_u(x)) → x`. Then DCE drops the carrier+pack.
+3. const dedup/hoist.
+
+See also the synthetic `../seam_sroa_decide.wat`.
diff --git a/loom-core/tests/fixtures/repro-219/sem.loom.wasm b/loom-core/tests/fixtures/repro-219/sem.loom.wasm
new file mode 100644
index 0000000..04328ba
Binary files /dev/null and b/loom-core/tests/fixtures/repro-219/sem.loom.wasm differ
diff --git a/loom-core/tests/fixtures/seam_sroa_decide.wat b/loom-core/tests/fixtures/seam_sroa_decide.wat
new file mode 100644
index 0000000..da582f5
--- /dev/null
+++ b/loom-core/tests/fixtures/seam_sroa_decide.wat
@@ -0,0 +1,51 @@
+;; #219 — seam SROA / scalar-forwarding fixture.
+;;
+;; Models the gale C<->Rust "decide" seam: `decide` returns two i32 scalars
+;; {action, new_count} PACKED into a single i64 (the u64-return ABI), and the
+;; shim immediately UNPACKS them back to scalars. After `--passes inline`,
+;; `decide` is inlined into `shim`, so on one value we get:
+;;
+;;   pack:   i64.extend_i32_u ; i64.shl ; i64.or       (build the u64)
+;;   carry:  local.set $t / local.get $t               (the dead i64 carrier)
+;;   unpack: i64.and ; i64.shr_u ; i32.wrap_i64         (tear it back apart)
+;;
+;; The u64 is constructed from two i32s and immediately decomposed — textbook
+;; SROA. The kill-criterion (gale #219): after the seam-SROA pass the dissolved
+;; body has NO i64 pack/unpack and the i64 carrier local is gone; the result is
+;; equivalent to operating on `action`/`new_count` directly.
+;;
+;; Layout: action in bits [0..8), new_count in bits [8..40).
+(module
+  ;; decide: pack {action, new_count} -> u64
+  ;;   t = (extend_u(new_count) << 8) | (extend_u(action) & 0xff)
+  (func $decide (param $action i32) (param $new_count i32) (result i64)
+    local.get $new_count
+    i64.extend_i32_u
+    i64.const 8
+    i64.shl
+    local.get $action
+    i64.extend_i32_u
+    i64.const 0xff
+    i64.and
+    i64.or)
+
+  ;; shim: call decide, then unpack both scalars back and combine.
+  ;; Post-inline this is the pure pack/unpack round-trip on $t.
+  ;; Semantically equivalent to: (action & 0xff) + new_count
+  (func (export "run") (param $action i32) (param $new_count i32) (result i32)
+    (local $t i64)
+    local.get $action
+    local.get $new_count
+    call $decide
+    local.set $t
+    ;; unpack action = (t & 0xff)
+    local.get $t
+    i64.const 0xff
+    i64.and
+    i32.wrap_i64
+    ;; unpack new_count = (t >> 8)
+    local.get $t
+    i64.const 8
+    i64.shr_u
+    i32.wrap_i64
+    i32.add))
diff --git a/loom-shared/build.rs b/loom-shared/build.rs
index 8380229..adfd7e7 100644
--- a/loom-shared/build.rs
+++ b/loom-shared/build.rs
@@ -29,6 +29,7 @@ fn main() {
         PathBuf::from("isle/types.isle"),
         PathBuf::from("isle/constructors.isle"),
         PathBuf::from("isle/rules/constant_folding.isle"),
+        PathBuf::from("isle/rules/seam_sroa.isle"),
         PathBuf::from("isle/rules/default.isle"),
     ];
 
@@ -100,6 +101,10 @@ fn fix_extractor_ownership(code: &str) -> String {
         "iadd64_extract",
         "isub64_extract",
         "imul64_extract",
+        // #219 seam-SROA — unary conversion extractors (cranelift-isle emits
+        // these with owned `Value`; rewrite to `&Value` like the rest).
+        "i32_wrap_i64_extract",
+        "i64_extend_i32_u_extract",
     ];
 
     for extractor in &extractors {
diff --git a/loom-shared/isle/constructors.isle b/loom-shared/isle/constructors.isle
index 3c82f6a..fb14ca6 100644
--- a/loom-shared/isle/constructors.isle
+++ b/loom-shared/isle/constructors.isle
@@ -137,6 +137,22 @@
 (decl ipopcnt64 (Value) Value)
 (extern constructor ipopcnt64 ipopcnt64)
 
+;; ============================================================================
+;; Integer width conversions (#219 seam-SROA)
+;; ============================================================================
+;; Both constructor and extractor are declared so the pack/unpack forwarding
+;; rules can match these terms on a rule LHS and rebuild them on the RHS.
+;; NOTE: the *_extract fns must also be listed in build.rs `fix_extractor_
+;; ownership` so the generated trait takes &Value (cranelift-isle emits owned).
+
+(decl i32_wrap_i64 (Value) Value)
+(extern constructor i32_wrap_i64 i32_wrap_i64)
+(extern extractor i32_wrap_i64 i32_wrap_i64_extract)
+
+(decl i64_extend_i32_u (Value) Value)
+(extern constructor i64_extend_i32_u i64_extend_i32_u)
+(extern extractor i64_extend_i32_u i64_extend_i32_u_extract)
+
 ;; ============================================================================
 ;; Other Constructors
 ;; ============================================================================
diff --git a/loom-shared/isle/rules/seam_sroa.isle b/loom-shared/isle/rules/seam_sroa.isle
new file mode 100644
index 0000000..cd827bc
--- /dev/null
+++ b/loom-shared/isle/rules/seam_sroa.isle
@@ -0,0 +1,18 @@
+;; LOOM - #219 seam-SROA rewrite rules (proof-carrying)
+;;
+;; Dissolves the C<->Rust decide seam's u64 pack/unpack residue. Each rule
+;; carries a Z3 soundness obligation discharged before it ships (the pass
+;; output is also translation-validated). The bit-precise masked-forward rules
+;; (`(extend_u(a) | (extend_u(b)<<k)) & mask -> a`, `>>k -> b`) land in a later
+;; step once the carrier mem2reg brings the pack expression to the unpack site;
+;; this file starts with the unconditional identity that needs no side
+;; conditions.
+
+;; wrap_i64(extend_i32_u(x)) = x
+;;
+;; Soundness: extend_i32_u zero-extends an i32 to i64 (Z3: zero_ext(32, x));
+;; wrap_i64 takes the low 32 bits (Z3: extract(31, 0, .)). For all i32 x,
+;; extract(31,0, zero_ext(32, x)) = x. Unconditionally sound — no escape or
+;; mask side conditions. Removes the round-trip a packed scalar pays when its
+;; carrier is widened to the u64 ABI and immediately narrowed back.
+(rule 100 (simplify (i32_wrap_i64 (i64_extend_i32_u x))) x)
diff --git a/loom-shared/isle/types.isle b/loom-shared/isle/types.isle
index f5cf1c3..35f3c23 100644
--- a/loom-shared/isle/types.isle
+++ b/loom-shared/isle/types.isle
@@ -115,6 +115,12 @@
     (I64Ctz (val Value))
     (I64Popcnt (val Value))
 
+    ;; Integer width conversions (#219 seam-SROA — needed so the pack/unpack
+    ;; forwarding rules can match on these terms). Field name `val` mirrors the
+    ;; Rust ValueData variants in loom-shared/src/lib.rs.
+    (I32WrapI64 (val Value))
+    (I64ExtendI32U (val Value))
+
     ;; Select instruction
     (Select (cond Value) (true_val Value) (false_val Value))
 
diff --git a/loom-shared/src/lib.rs b/loom-shared/src/lib.rs
index b1d70bc..2cce36e 100644
--- a/loom-shared/src/lib.rs
+++ b/loom-shared/src/lib.rs
@@ -2132,6 +2132,23 @@ pub fn i64_extend_i32_u(val: Value) -> Value {
     Value(Box::new(ValueData::I64ExtendI32U { val }))
 }
 
+/// Extractor for i32.wrap_i64 — used by ISLE rules to pattern-match the term
+/// on a rule LHS (#219 seam-SROA). Returns the inner operand.
+pub fn i32_wrap_i64_extract(val: &Value) -> Option<Value> {
+    match val.0.as_ref() {
+        ValueData::I32WrapI64 { val } => Some(val.clone()),
+        _ => None,
+    }
+}
+
+/// Extractor for i64.extend_i32_u — ISLE rule LHS matching (#219 seam-SROA).
+pub fn i64_extend_i32_u_extract(val: &Value) -> Option<Value> {
+    match val.0.as_ref() {
+        ValueData::I64ExtendI32U { val } => Some(val.clone()),
+        _ => None,
+    }
+}
+
 // ============================================================================
 // Float-to-Integer Truncation Constructors (trapping)
 // ============================================================================
@@ -3344,6 +3361,51 @@ fn are_values_equal(lhs: &Value, rhs: &Value) -> bool {
     }
 }
 
+/// #219 seam-SROA helper: true if `op` is `(i64.shl _ (i64.const k))` with
+/// `0 < k < 64` and every set bit of `mask` is below k (`(mask as u64) >> k ==
+/// 0`). In that case `(op & mask) == 0` for ALL shift inputs — the shifted
+/// value only occupies bits [k,64), which the mask zeroes. k is restricted to
+/// (0,64) so the wasm shift-amount-mod-64 wrap can't change the effective
+/// shift; k>=64 is left to the verifier rather than reasoned about here.
+fn i64_shl_cleared_by_mask(op: &Value, mask: &Imm64) -> bool {
+    if let ValueData::I64Shl { rhs, .. } = op.data() {
+        if let ValueData::I64Const { val: k } = rhs.data() {
+            let k = k.value();
+            if k > 0 && k < 64 {
+                return (mask.value() as u64) >> (k as u64) == 0;
+            }
+        }
+    }
+    false
+}
+
+/// #219 seam-SROA helper: true if `shl_amt` is the constant `k` and `0 < k <
+/// 64` — i.e. an `(i64.shl _ k)` whose amount equals the enclosing `shr_u`'s.
+/// `(Z << k) >> k == Z & (u64::MAX >> k)` exactly when k is in (0,64) (no wasm
+/// shift-mod-64 wrap).
+fn i64_shr_undoes_shl(shl_amt: &Value, shr_amt: &Imm64) -> bool {
+    if let ValueData::I64Const { val: k2 } = shl_amt.data() {
+        let k = shr_amt.value();
+        return k == k2.value() && k > 0 && k < 64;
+    }
+    false
+}
+
+/// #219 seam-SROA helper: true if `op` is `(i64.shl _ (i64.const amt))` with
+/// `0 < amt < 64`. Used to target the shr_u-over-or distribution at the pack
+/// shape (`(or (shl Z k) B) >> k`) so it only fires where it dissolves.
+fn i64_is_shl_by(op: &Value, amt: i64) -> bool {
+    if amt <= 0 || amt >= 64 {
+        return false;
+    }
+    if let ValueData::I64Shl { rhs, .. } = op.data() {
+        if let ValueData::I64Const { val: k } = rhs.data() {
+            return k.value() == amt;
+        }
+    }
+    false
+}
+
 /// Stateless simplification (expression-level only)
 fn rewrite_pure_impl(val: Value) -> Value {
     match val.data() {
@@ -3713,6 +3775,37 @@ fn rewrite_pure_impl(val: Value) -> Value {
                 // Algebraic: x & -1 = x (all bits set)
                 (_, ValueData::I64Const { val }) if val.value() == -1 => lhs_simplified,
                 (ValueData::I64Const { val }, _) if val.value() == -1 => rhs_simplified,
+
+                // #219 seam-SROA: (shl Z k) & M → 0 when M's set bits are all
+                // below k — the left shift zeroes bits [0,k), the mask zeroes
+                // bits [k,64), so nothing survives. Unconditionally sound for
+                // any Z (Z3: (Z<<k) & M == 0 when M >> k == 0). Dissolves the
+                // high half of a u64 pack under a low-byte unpack mask.
+                (ValueData::I64Shl { .. }, ValueData::I64Const { val: m })
+                    if i64_shl_cleared_by_mask(&lhs_simplified, m) =>
+                {
+                    iconst64(Imm64(0))
+                }
+                (ValueData::I64Const { val: m }, ValueData::I64Shl { .. })
+                    if i64_shl_cleared_by_mask(&rhs_simplified, m) =>
+                {
+                    iconst64(Imm64(0))
+                }
+                // #219 seam-SROA: (or A B) & M → (survivor & M) when one OR
+                // operand is a left shift the mask clears. Recurse so the
+                // survivor (and a both-shifted case) simplifies further.
+                (ValueData::I64Or { lhs: a, rhs: b }, ValueData::I64Const { val: m })
+                    if i64_shl_cleared_by_mask(a, m) || i64_shl_cleared_by_mask(b, m) =>
+                {
+                    let survivor = if i64_shl_cleared_by_mask(a, m) { b } else { a };
+                    rewrite_pure(iand64(survivor.clone(), iconst64(*m)))
+                }
+                (ValueData::I64Const { val: m }, ValueData::I64Or { lhs: a, rhs: b })
+                    if i64_shl_cleared_by_mask(a, m) || i64_shl_cleared_by_mask(b, m) =>
+                {
+                    let survivor = if i64_shl_cleared_by_mask(a, m) { b } else { a };
+                    rewrite_pure(iand64(survivor.clone(), iconst64(*m)))
+                }
                 _ => iand64(lhs_simplified, rhs_simplified),
             }
         }
@@ -3808,6 +3901,31 @@ fn rewrite_pure_impl(val: Value) -> Value {
                 }
                 // Algebraic: x >> 0 = x
                 (_, ValueData::I64Const { val }) if (val.value() & 0x3F) == 0 => lhs_simplified,
+
+                // #219 seam-SROA: (shr_u (shl Z k) k) → Z & (low 64-k bits).
+                // Shifting left by k then logically right by the same k clears
+                // the high k bits and keeps the low 64-k. Unconditionally sound
+                // for 0<k<64 (Z3: (Z<<k)>>k == Z & (2^(64-k)-1)). Dissolves the
+                // high half of a u64 pack under a >>k unpack.
+                (ValueData::I64Shl { lhs: z, rhs: shamt }, ValueData::I64Const { val: k })
+                    if i64_shr_undoes_shl(shamt, k) =>
+                {
+                    let mask = (u64::MAX >> (k.value() as u64)) as i64;
+                    rewrite_pure(iand64(z.clone(), iconst64(Imm64(mask))))
+                }
+                // #219 seam-SROA: (shr_u (or P Q) k) → (or (P>>k) (Q>>k)) when an
+                // OR operand is (shl _ k) — logical right shift distributes over
+                // bitwise OR (sound), and recursing lets the shl side collapse
+                // via the rule above. Targeted to the pack shape (matching shl
+                // present) so it never bloats unrelated `(or _ _) >> k`.
+                (ValueData::I64Or { lhs: p, rhs: q }, ValueData::I64Const { val: k })
+                    if i64_is_shl_by(p, k.value()) || i64_is_shl_by(q, k.value()) =>
+                {
+                    rewrite_pure(ior64(
+                        ishru64(p.clone(), iconst64(*k)),
+                        ishru64(q.clone(), iconst64(*k)),
+                    ))
+                }
                 _ => ishru64(lhs_simplified, rhs_simplified),
             }
         }
@@ -4345,6 +4463,30 @@ fn rewrite_pure_impl(val: Value) -> Value {
             }
         }
 
+        // Integer width conversions (#219 seam-SROA). The live rewrite for the
+        // decide seam's u64 pack/unpack round-trip. Z3-validated per pass.
+        ValueData::I32WrapI64 { val } => {
+            let val_simplified = rewrite_pure(val.clone());
+            match val_simplified.data() {
+                // #219 seam-SROA: wrap_i64(extend_i32_u(x)) = x. zero-extending an
+                // i32 to i64 then taking the low 32 bits is the identity on i32 —
+                // unconditionally sound (Z3: extract(31,0, zero_ext(32,x)) = x).
+                ValueData::I64ExtendI32U { val: inner } => inner.clone(),
+                // Constant folding: wrap_i64(i64.const N) → i32.const (low 32 bits).
+                ValueData::I64Const { val: v } => iconst32(Imm32(v.value() as i32)),
+                _ => i32_wrap_i64(val_simplified),
+            }
+        }
+
+        ValueData::I64ExtendI32U { val } => {
+            let val_simplified = rewrite_pure(val.clone());
+            match val_simplified.data() {
+                // Constant folding: extend_i32_u(i32.const N) → i64.const (zero-extended).
+                ValueData::I32Const { val: v } => iconst64(Imm64((v.value() as u32 as u64) as i64)),
+                _ => i64_extend_i32_u(val_simplified),
+            }
+        }
+
         // Select instruction optimization
         ValueData::Select {
             cond,