Two full runs of the IDENTICAL tree report different WAST assertion totals: 65776 pass vs 65794 pass (~18 vary run-to-run); total executed assertions also drift (66325 vs 66307). So before/after assertion-count comparison is unreliable — a real change's delta (+2 for #359) is smaller than the ±18 run-to-run noise, which can mask or fake a regression.
The FILE-level result is stable: Files: N pass/M fail and the failing-file SET reproduce. The #149 no-regression check had to be done by diffing failing-file sets (comm), not counts:
files failing AFTER but not BASE: (none) # no regression
files failing BASE but not AFTER: type-subtyping.wast # the fix
Impact: blocks gating conformance in CI on the assertion count (not reproducible); slowed #149 verification (multiple ~20-min full runs + set-diffing).
Suspected cause: order/state-dependence (a module failing instantiation changes how many downstream assert_returns execute) and/or runner parallelism.
Asks: (1) make the executed-assertion count deterministic (fixed order, isolated per-module state), and/or (2) emit a stable machine-readable per-file report (--wast-report JSON) so CI gates on the file-level set/counts (already stable). Surfaced while landing #359 (GC call_indirect subtyping fix, #149).
Two full runs of the IDENTICAL tree report different WAST assertion totals: 65776 pass vs 65794 pass (~18 vary run-to-run); total executed assertions also drift (66325 vs 66307). So before/after assertion-count comparison is unreliable — a real change's delta (+2 for #359) is smaller than the ±18 run-to-run noise, which can mask or fake a regression.
The FILE-level result is stable:
Files: N pass/M failand the failing-file SET reproduce. The #149 no-regression check had to be done by diffing failing-file sets (comm), not counts:Impact: blocks gating conformance in CI on the assertion count (not reproducible); slowed #149 verification (multiple ~20-min full runs + set-diffing).
Suspected cause: order/state-dependence (a module failing instantiation changes how many downstream assert_returns execute) and/or runner parallelism.
Asks: (1) make the executed-assertion count deterministic (fixed order, isolated per-module state), and/or (2) emit a stable machine-readable per-file report (
--wast-reportJSON) so CI gates on the file-level set/counts (already stable). Surfaced while landing #359 (GC call_indirect subtyping fix, #149).