Skip to content

Test262: run unflagged tests as sloppy+strict variant pairs (split from #882) #885

Description

@nickna

Summary

Test262's canonical harness runs each unflagged test (no flags: [onlyStrict] / [noStrict]) as two variants — once in sloppy mode and once with a "use strict" prologue. SharpTS currently runs each unflagged test once, sloppy only. This issue tracks adopting variant-pair execution so we match the canonical harness coverage.

Split out from #882 (now closed), where the investigation and the codegen prerequisites were completed. This is the remaining deliberate, maintainer-gated piece — a coverage/infrastructure enhancement, not a bug.

Why it was deferred from #882

Running unflagged tests as both variants is a baseline-semantics + coverage change, not a drop-in. Measured impact during #882:

  • Baseline model changes — every unflagged test gains a second path → bucket entry (a strict variant). The current model is one result per path; variant-pairs make it two.
  • Run time roughly doubles — every unflagged test executes twice.
  • It surfaces a real strict-failure surface — a 495-test unflagged sample run force-strict initially showed ~9% (49/495) pass sloppy but fail strict, with 0 strict-only gains.

That made it a planned rollout decision rather than an autonomous change, consistent with #882's "regenerate the baseline only after triage" stance.

What's already done (prerequisite work, shipped in #884)

The concrete strict-mode codegen bugs that variant-pair coverage would expose were fixed proactively in #882/#884, so the runner change won't land on a pile of red. The 495-test force-strict sample went 49 → 40 → 34 → 27 → 8 → 4 → 0 sloppy-Pass→strict-Fail across the sweep:

  • strict dynamic property writes on built-in objects persist
  • strict writes on Date/RegExp/Promise/Error (PDS) objects persist
  • strict delete removes Object.defineProperty (PDS) properties + honors configurability
  • strict writes honor non-writable / accessor descriptors (TypeError / setter invocation)
  • strict indexed writes on $Object receivers persist + honor preventExtensions
  • strict symbol-keyed and globalThis-sentinel writes persist
  • onlyStrict tests now actually run strict (Assemble() prepends a program-level "use strict")

So the codegen floor is in place; the sampled systematic strict clusters are resolved. A full (non-sampled) regen is still needed to confirm suite-wide before/at rollout.

Scope of this issue

  1. Harness runner — for each unflagged test, execute both a sloppy variant and a "use strict"-prefixed strict variant (the strict prologue mechanism already exists from the onlyStrict fix). onlyStrict / noStrict-flagged tests keep running their single designated variant.
  2. Baseline model — extend path → bucket to distinguish the two variants per unflagged path (e.g. a variant suffix/qualifier on the key). Applies to both baselines/compiled.txt and baselines/interpreted.txt, interpreter + compiled runners.
  3. Full regeneration — regenerate both baselines under the new model (SHARPTS_TEST262_UPDATE_BASELINE=1) and record the resulting strict-variant Fail entries; triage any clusters a full run surfaces beyond the 495-test sample.
  4. Run-time budget — confirm the ~2× wall-clock is acceptable for the subset config / CI, or gate strict variants behind a config flag if not.

Open decisions for the maintainer

  • Go / no-go on adopting variant-pairs at all (vs. keeping sloppy-only and relying on onlyStrict-flagged tests for strict coverage).
  • Baseline key encoding for the second variant.
  • Default-on vs. opt-in flag, given the run-time cost.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions