Tracks engine performance work: what was tried, what failed, and what's planned.
| # | Optimisation | Date | Result |
|---|---|---|---|
| 1 | Strict-path parity via __path |
March 2026 | ✅ Done (correctness first, measurable slowdown) |
| 2 | Single-segment fast path via __get |
March 2026 | ✅ Done (partial recovery on compiled hot paths) |
| 3 | Array loop IIFE elimination | June 2026 | ✅ Done (array benchmarks within 3–7 % of baseline) |
| 4 | Batch-level loc annotation | June 2026 | ✅ Done (tool-input/output IIFEs replaced with statement-level try/catch) |
| 5 | Remove .finally() from __memoize |
June 2026 | ✅ Done (simple chain +7 %, chained 3-tool +12 %) |
| 6 | Cached tool fn references | June 2026 | ✅ Done (eliminates repeated tools['name'] lookups in getter bodies) |
Benchmarks live in packages/bridge/bench/engine.bench.ts (tinybench) under the
compiled: suite. Historical tracking via
Bencher — look for
benchmark names prefixed compiled:.
Run locally: pnpm bench
Hardware: MacBook Air M4 (4th gen, 15″). All numbers in this document are from this machine — compare only against the same hardware.
| Benchmark | ops/sec | avg (ms) |
|---|---|---|
| compiled: passthrough (no tools) | ~650K | 0.002 |
| compiled: short-circuit | ~614K | 0.002 |
| compiled: simple chain (1 tool) | ~589K | 0.002 |
| compiled: chained 3-tool fan-out | ~386K | 0.003 |
| compiled: flat array 10 | ~443K | 0.002 |
| compiled: flat array 100 | ~180K | 0.006 |
| compiled: flat array 1000 | ~26K | 0.039 |
| compiled: nested array 5×5 | ~225K | 0.005 |
| compiled: nested array 10×10 | ~100K | 0.010 |
| compiled: nested array 20×10 | ~55K | 0.019 |
| compiled: array + tool-per-element 10 | ~283K | 0.004 |
| compiled: array + tool-per-element 100 | ~56K | 0.020 |
This table is the current perf level. It is updated after a successful optimisation is committed.
Date: March 2026 Status: ✅ Done
Why:
Runtime source-mapping work tightened strict path traversal semantics so
primitive property access throws at the failing segment instead of silently
flowing through as undefined. Compiled execution still had some strict paths
emitted as raw bracket access, which caused AOT/runtime divergence in parity
fuzzing.
What changed:
appendPathExpr(...) was switched to route compiled path traversal through the
generated __path(...) helper so compiled execution matched runtime semantics.
Result:
Correctness and parity were restored, but this imposed a noticeable cost on the compiled hot path because even one-segment accesses paid the generic loop-based helper.
Observed branch-level compiled numbers before the follow-up optimisation:
| Benchmark | Baseline | With __path everywhere |
Change |
|---|---|---|---|
| compiled: passthrough (no tools) | ~644K | ~561K | -13% |
| compiled: simple chain (1 tool) | ~612K | ~536K | -12% |
| compiled: flat array 1000 | ~27.9K | ~14.1K | -49% |
| compiled: array + tool-per-element 100 | ~58.7K | ~45.2K | -23% |
Date: March 2026 Status: ✅ Done
Hypothesis:
The vast majority of compiled property reads in the benchmark suite are short,
especially one-segment accesses. Running every one of them through the generic
__path(base, path, safe, allowMissingBase) loop was overpaying for the common
case.
What changed:
- Added a generated
__get(base, segment, accessSafe, allowMissingBase)helper for the one-segment case. - Kept the strict primitive-property failure semantics from
__path(...). - Left multi-segment accesses on
__path(...)so correctness stays uniform.
Result:
This recovered a meaningful portion of the compiled regression while preserving the stricter source-mapping semantics.
| Benchmark | Before __get |
After __get |
Change |
|---|---|---|---|
| compiled: passthrough (no tools) | ~561K | ~639K | +14% |
| compiled: simple chain (1 tool) | ~536K | ~583K | +9% |
| compiled: flat array 1000 | ~14.1K | ~15.7K | +11% |
| compiled: nested array 20×10 | ~36.0K | ~39.1K | +9% |
| compiled: array + tool-per-element 100 | ~45.2K | ~50.0K | +11% |
What remains:
Compiled performance is much closer to baseline now, but still below the March 2026 table on some heavy array benchmarks. The obvious next step, if needed, is specialising short strict paths of length 2–3 rather than routing every multi-segment path through the generic loop helper.
Date: March 2026 Status: ✅ Done
Problem:
Array loop bodies were emitting a per-field IIFE with try/catch for bridgeLoc
error annotation:
__el_0.id = await (async () => { try { return __el_0.id; } catch (__e) { ... wrapErr(bridgeLoc({...})) ... } })();Three separate sources of overhead in the hot loop:
- Per-field IIFE closure — one closure allocation + call per field per element.
Object.values().find()sentinel check — ran every iteration even when nobreak/continuewas possible.- Per-iteration loc object allocation —
bridgeLoc({startLine:7,...})allocated a fresh object per field per element.
What changed:
-
Static analysis: Added
bodyHasControlFlow(body)/exprHasControlFlow(expr)helpers that recursively scan array body AST forbreak/continueexpressions. When absent, the sentinel check (Object.values().find(v => v === SENTINEL_BREAK)) is elided entirely. -
Consolidated try/catch with
loopLocInfo: Instead of per-field IIFEs, a single try/catch is hoisted outside the for-loop. Each field expression becomes a comma expression that sets an integer index before evaluating:(__li_0 = 2, __el_0.id). In the catch handler, the actual loc is looked up from a precomputed array:[loc0, loc1, loc2][__li_0]. -
Hoisted try/catch: The try block wraps the entire for-loop rather than each iteration, removing per-iteration overhead.
Result:
| Benchmark | Before | After | Change |
|---|---|---|---|
| compiled: flat array 10 | ~283K | ~424K | +50% |
| compiled: flat array 100 | ~61K | ~176K | +189% |
| compiled: flat array 1000 | ~7K | ~22K | +216% |
| compiled: nested array 5×5 | ~80K | ~220K | +175% |
| compiled: nested array 10×10 | ~46K | ~92K | +100% |
| compiled: nested array 20×10 | ~24K | ~49K | +104% |
| compiled: tool-per-element 10 | ~217K | ~278K | +28% |
Array benchmarks went from 50–75 % below baseline to within 3–7 %.
Date: March 2026 Status: ✅ Done
Problem:
Outside of array loops, every output wire and tool-input field was still wrapped
in an async IIFE for bridgeLoc annotation:
__result.foo = await (async () => { try { return expr; } catch (__e) { ... } })();For wires going into emitParallelAssignments (Promise.all batches), this
per-expression IIFE was unnecessary — error annotation could happen at the batch
level instead.
What changed:
-
compileBodypending wires: For single-source expressions withoutwireCatch, usescompileSourceChain(raw expression, no IIFE) and captureslocExprseparately. Falls back tocompileSourceChainWithLocfor multi-source or wireCatch cases. -
Tool input field wires: Same pattern — single-source without wireCatch uses
compileSourceChain+locExpr. -
emitParallelAssignments: AcceptslocExpr?: stringper item. For sync items with a loc, wraps the assignment in a statement-level try/catch. For async batches, builds a__locsarray and annotates errors in the existing rethrow loop. Single async items with a loc get a try/catch around the assignment.
Result:
| Benchmark | Before | After | Change |
|---|---|---|---|
| compiled: simple chain (1 tool) | ~536K | ~551K | +3% |
| compiled: chained 3-tool fan-out | ~329K | ~343K | +4% |
Modest gains because most expressions were already single-segment. The remaining
gap on chained 3-tool (~343K vs ~523K baseline, −34 %) comes from feature
additions in tool getter bodies that the baseline did not have: sync tool
detection, timeout handling, __checkAbort() calls, and conditional await.
These are correctness requirements and are not optimisable without removing
features.
Date: June 2026 Status: ✅ Done
Problem:
The __memoize helper wrapped every getter’s Promise in .finally(() => { active = false; }). This added an extra microtask per tool getter invocation.
Because cached is set on the first call and re-returned on every subsequent
call, active is never checked after the first invocation completes and
therefore never needs to be reset.
What changed:
fn().finally(() => { active = false; }) → fn(). One fewer .finally()
allocation per memoized tool getter.
Result:
| Benchmark | Before | After | Change |
|---|---|---|---|
| compiled: simple chain (1 tool) | ~551K | ~589K | +7% |
| compiled: chained 3-tool fan-out | ~343K | ~386K | +12% |
Date: June 2026 Status: ✅ Done
Problem:
Tool getter bodies referenced tools via tools['name'] on every access (type
check, sync detection, trace detection, invocation, etc.). The preamble already
declared const __toolFn_name_0 = tools['name'], but resolveToolFnExpr
ignored this cached variable and returned the dynamic lookup.
What changed:
resolveToolFnExpr now returns the cached __toolFn_ variable from the scope
binding. The tool function reference is resolved once at declaration time and
reused in all getter body accesses.
Result:
Combined with optimisation #5 (measured together):
| Benchmark | Start | After #5 + #6 | Baseline | Gap |
|---|---|---|---|---|
| compiled: simple chain (1 tool) | ~551K | ~589K | ~612K | −4% |
| compiled: chained 3-tool fan-out | ~343K | ~386K | ~523K | −26% |
| compiled: array + tool-per-element 100 | ~49K | ~56K | ~59K | −6% |
What remains:
The remaining chained 3-tool gap (−26 %) comes from per-tool correctness
overhead that the baseline lacked: sync tool validation
(tool.bridge?.sync && typeof __raw.then), timeout handling (Promise.race),
__checkAbort() calls, and conditional await. These are not optimisable
without reducing features.