Fusion emits identity trampolines that never inline (dead inline_adapters flag); de-export fusion-vestigial symbols
Context
gale's dissolved-codegen work (pulseengine/gale#97) found that un-inlined forwarding wrappers cost ~2× cycles on tiny functions (an extra bl + stack round-trip + second prologue). For the meld-fused demonstrator path (gale-app-demo + gale-kiln → meld fuse --memory shared --address-rebase → loom → synth → 668 B native, gale benches/gust/build-fused.sh), meld introduces the same pathology structurally.
Finding 1 — identity trampolines + a dead inline_adapters flag
meld emits one adapter function per cross-component call edge and rewires the caller to it (meld-core/src/lib.rs:wire_adapter_indices:372; adapter_base :380). For the common shared-memory, no-transcode, no-resource case meld emits a pure forwarding thunk — generate_direct_adapter (adapter/fact.rs:552-567): re-push every param, call $target, end. Left un-inlined into native, each becomes the bl-chain + frame round-trip we measured.
AdapterConfig.inline_adapters (adapter/mod.rs:35) is set true (lib.rs:296) but read nowhere in meld-core (grep finds only the definition + assignment). Ask: honor it — when generate_adapter selects the identity-trampoline case (analyze_call_site → direct adapter, same memory, no post-return/resource ops), rewire the import directly to target_func instead of emitting a thunk. Sound (the body only forwards), and it's the leanest output — fewer functions for loom/synth, no thunk for loom to inline (and loom currently doesn't inline small multi-site funcs anyway — see pulseengine/loom issue).
Finding 2 — de-export fusion-vestigial symbols (generalize meld#298)
gale's build-fused.sh must manually grep -vE '\(export "(cabi_realloc|gale:kernel|__data_end|__heap_base)' AFTER meld so loom/synth can DCE the now-dead canonical-ABI/realloc path. meld emits these exports (merger.rs cabi_realloc :1218-1256). Once fusion closes the import graph, any export that existed only to satisfy a now-resolved cross-component import is vestigial. Ask: de-export them at fusion time (a --strip-vestigial flag or default) — generalizes meld#298 beyond cabi_realloc, removes the manual strip step.
Not meld's gap (honest)
- The gust_mix 2.81× number itself is not meld's — that module never goes through
meld fuse.
--address-rebase is free on scalar access: the base folds into the static load/store offset immediate (rewriter.rs:convert_memarg:585-587; data segments rebased statically segments.rs:rebase_const_expr_value:427). Runtime i32.add is inserted only for dynamic-address bulk ops (memory.copy/fill/init, rewriter.rs:append_rebased_address:543), gated on base != 0 and only in functions that contain such ops. Not a per-access tax.
- Per-component handle tables (branch
feat/per-component-handle-tables) aren't wired into codegen yet (only reexporter_components exists) — zero overhead today, and irrelevant to the gust composition.
Boundary / beat-LLVM
Keep the meld/loom split: meld emits minimal verifiable structure, loom does cost-based inlining + layout DCE (build-fused.sh already runs loom --passes inline). The exception is Finding 1 — not emitting provably-no-op indirection isn't "doing loom's job," it's not creating garbage. meld's real leverage: fusion turns cross-"object" calls into inlinable intra-module calls (the LTO win, structurally) and a single flat address space with no per-access tax — and the adapter/fact.rs + component-provenance machinery can forward typed canonical-ABI facts (owned vs borrowed, layouts) as proof-carrying metadata that a native multi-object link discards. Lean into that; just don't interpose the identity trampoline.
Refs: meld#298 (cabi_realloc de-export — Finding 2 generalizes it). Pipeline: gale benches/gust/build-fused.sh.
Fusion emits identity trampolines that never inline (dead
inline_adaptersflag); de-export fusion-vestigial symbolsContext
gale's dissolved-codegen work (pulseengine/gale#97) found that un-inlined forwarding wrappers cost ~2× cycles on tiny functions (an extra
bl+ stack round-trip + second prologue). For the meld-fused demonstrator path (gale-app-demo + gale-kiln →meld fuse --memory shared --address-rebase→ loom → synth → 668 B native, galebenches/gust/build-fused.sh), meld introduces the same pathology structurally.Finding 1 — identity trampolines + a dead
inline_adaptersflagmeld emits one adapter function per cross-component call edge and rewires the caller to it (
meld-core/src/lib.rs:wire_adapter_indices:372;adapter_base:380). For the common shared-memory, no-transcode, no-resource case meld emits a pure forwarding thunk —generate_direct_adapter(adapter/fact.rs:552-567): re-push every param,call $target, end. Left un-inlined into native, each becomes thebl-chain + frame round-trip we measured.AdapterConfig.inline_adapters(adapter/mod.rs:35) is settrue(lib.rs:296) but read nowhere in meld-core (grep finds only the definition + assignment). Ask: honor it — whengenerate_adapterselects the identity-trampoline case (analyze_call_site → direct adapter, same memory, no post-return/resource ops), rewire the import directly totarget_funcinstead of emitting a thunk. Sound (the body only forwards), and it's the leanest output — fewer functions for loom/synth, no thunk for loom to inline (and loom currently doesn't inline small multi-site funcs anyway — see pulseengine/loom issue).Finding 2 — de-export fusion-vestigial symbols (generalize meld#298)
gale's
build-fused.shmust manuallygrep -vE '\(export "(cabi_realloc|gale:kernel|__data_end|__heap_base)'AFTER meld so loom/synth can DCE the now-dead canonical-ABI/realloc path. meld emits these exports (merger.rscabi_realloc :1218-1256). Once fusion closes the import graph, any export that existed only to satisfy a now-resolved cross-component import is vestigial. Ask: de-export them at fusion time (a--strip-vestigialflag or default) — generalizes meld#298 beyond cabi_realloc, removes the manual strip step.Not meld's gap (honest)
meld fuse.--address-rebaseis free on scalar access: the base folds into the static load/storeoffsetimmediate (rewriter.rs:convert_memarg:585-587; data segments rebased staticallysegments.rs:rebase_const_expr_value:427). Runtimei32.addis inserted only for dynamic-address bulk ops (memory.copy/fill/init,rewriter.rs:append_rebased_address:543), gated onbase != 0and only in functions that contain such ops. Not a per-access tax.feat/per-component-handle-tables) aren't wired into codegen yet (onlyreexporter_componentsexists) — zero overhead today, and irrelevant to the gust composition.Boundary / beat-LLVM
Keep the meld/loom split: meld emits minimal verifiable structure, loom does cost-based inlining + layout DCE (build-fused.sh already runs
loom --passes inline). The exception is Finding 1 — not emitting provably-no-op indirection isn't "doing loom's job," it's not creating garbage. meld's real leverage: fusion turns cross-"object" calls into inlinable intra-module calls (the LTO win, structurally) and a single flat address space with no per-access tax — and theadapter/fact.rs+component-provenancemachinery can forward typed canonical-ABI facts (owned vs borrowed, layouts) as proof-carrying metadata that a native multi-object link discards. Lean into that; just don't interpose the identity trampoline.Refs: meld#298 (cabi_realloc de-export — Finding 2 generalizes it). Pipeline: gale
benches/gust/build-fused.sh.