Skip to content

Eliminate 44 PEP-conformance false positives (170→126), zero regression + FP-ceiling/mutation hardening#74

Merged
MelbourneDeveloper merged 6 commits into
mainfrom
eliminatedfalsepositives
Jun 3, 2026
Merged

Eliminate 44 PEP-conformance false positives (170→126), zero regression + FP-ceiling/mutation hardening#74
MelbourneDeveloper merged 6 commits into
mainfrom
eliminatedfalsepositives

Conversation

@MelbourneDeveloper
Copy link
Copy Markdown
Collaborator

TLDR

Eliminate 44 PEP-conformance false positives (170 → 126) with zero conformance regression, and lock the win in place with a ratcheting FP ceiling plus mutation-kill tests.

What Was Added?

  • crates/basilisk-checker/src/rules/e0014/alias_match.rs — recursive value-matcher for legacy Name = Union[...] aliases (including self-referential ones like Json). Uses positive-match semantics so genuinely-incompatible assignments still fire; Unknown/Any values never spuriously match a concrete target.
  • crates/basilisk-checker/tests/fp_elimination_tests.rs — 9 bidirectional coarse e2e tests; each asserts the eliminated false positive is gone AND the paired true positive still fires.
  • FP-ceiling gate in crates/basilisk-cli/tests/conformance_tests.rs — reads conformance.max_false_positives from coverage-thresholds.json and fails CI if the suite-wide FP total exceeds it (ratchets DOWN only — the mirror of the pass-percentage gate).
  • scripts/fp_verify.sh — rebuild + conformance + CSV-diff harness that flags any PASS→FAIL flip, caught-count drop, or missed-count rise, plus the FP delta.
  • Two #[mutation_safe(rule = "e0014", fns = "check_vars")] tests in mutation_kill_tests.rs covering the two new check_vars branches.

What Was Changed or Deleted?

False positives removed across several engine paths (all behind the no-regression invariant):

  • types_parsing.rs — bare Callable/type parse as Any-like; tuple[()] → empty-tuple type; non-decimal int literals (0x/0o/0b) compare equal across spellings; shared parse_key_value_args helper de-duplicates dict[…]/Mapping[…] parsing.
  • types.rs — PEP 646 variadic-tuple star matching (tuple[int, *tuple[str, ...], int] etc.; unhandled shapes never manufacture an FP); gradual tuple[Any, ...] source assignable to a fixed tuple; None is Hashable.
  • e0014/mod.rs — whole-quoted forward-reference annotations are skipped; bare references to legacy Union aliases are intercepted for value-level matching.
  • e0013.rsis_unverifiable_return_type recursively skips return-type checks E0013 cannot verify (Named/Literal and their nestings), while preserving the tuple[X, ...] terminator path.
  • suppression.rs — honours a file-level # type: ignore (PEP 484) placed before any substantial line, and treats mypy-style # type: ignore[assignment] (non-BSK- codes) as a blanket line suppression.
  • protocol_ext.rs — E0099 no longer arg-checks type-checking utilities (assert_type/reveal_type/cast).
  • coverage-thresholds.jsonmax_false_positives 161 → 126 (tightened to the measured value). mutation_scores.json working-scope baseline ratcheted 90.54% → 90.79%.

How Do The Automated Tests Prove It Works?

  • Conformance gate (conformance_score test): suite-wide false positives 170 → 126 (−44) while PASS = 136/146, caught = 858, missed = 95 are all unchanged — no file flips PASS→FAIL, caught never drops, missed never rises. The new FP ceiling (126) fails CI if even one FP returns.
  • fp_elimination_tests.rs (9 tests): e.g. recursive_union_alias_accepts_valid_and_rejects_invalid (valid Json values pass, two 3j complex values still fire), variadic_tuple_star_targets (only the too-short tuple fires), gradual_variadic_tuple_source_is_assignable, bare_callable_and_type_are_any_like, none_is_hashable_but_not_iterable, empty_tuple_type, non_decimal_literal_equivalence, quoted_forward_ref_annotation_is_skipped.
  • Mutation testing: the two new check_vars branches introduced mutants e0014/mod.rs:140 replace || with && and :219 delete !; the two added mutation_safe tests kill both — working-scope score 67→69 caught, 7 missed unchanged, kill_rate 90.79% ≥ 90.54% baseline.

Spec / Doc Changes

  • docs/plans/CHECK-ELIMINATE-FALSE-POSITIVES.md — execution log, the −44 breakdown, and a bottom checklist (done items ticked; remaining engine-blocked clusters — callable/protocol structural subtyping, TypedDict structural, narrowing, constructors — documented as deferred with unblocking notes). All code references [CHKARCH-DIAG-TYPESAFETY] / [CHKARCH-TESTING].

Breaking Changes

  • None

…e FP 128→126; tighten ceiling, harden tests)

Final batch of the FP-elimination plan. Net branch impact vs main:
PEP conformance false positives 170 → 126 (−44) with PASS=136/146,
caught=858, missed=95 all UNCHANGED — no file flips PASS→FAIL, caught
never drops, missed never rises.

This commit (128 → 126):
- e0014/alias_match: recursive Union-alias value matcher refinements
  (positive-match semantics; Mapping/tuple structural; Unknown/Any do
  NOT match concrete targets so genuine TPs keep firing).
- types: PEP 646 variadic-tuple star matching (tuple_assignable_with_star,
  prefix/suffix; unhandled shapes never manufacture an FP) + gradual
  tuple[Any, ...] source; None is Hashable.
- types_parsing: bare Callable/type → Any-like, tuple[()] empty tuple,
  non-decimal literal equivalence (0x/0o/0b), shared parse_key_value_args
  (dedup dict[]/Mapping[]).
- e0014/mod: skip whole-quoted forward-ref annotations; intercept bare
  legacy Union aliases for value-level matching.

Mutation hardening:
- tests/fp_elimination_tests.rs: 9 bidirectional coarse e2e tests — each
  asserts the eliminated FP is gone AND the paired true positive still
  fires, killing mutants that would silently regress either direction.
- coverage-thresholds.json: max_false_positives 161 → 126 (tight to the
  measured value — any reintroduced FP pushes 127 > 126 and fails CI).

Refs [CHKARCH-DIAG-TYPESAFETY], docs/plans/CHECK-ELIMINATE-FALSE-POSITIVES.md.
…baseline up

The FP-elimination commit added two branches to e0014 check_vars (the
quoted forward-reference annotation skip and the bare recursive-Union-alias
interception), which introduced two new cargo-mutants targets that the
existing mutation_safe suite did not cover:

  e0014/mod.rs:140 `replace || with &&` (quoted-annotation guard)
  e0014/mod.rs:219 `delete !`           (bare-alias guard)

Add two bidirectional #[mutation_safe(rule = "e0014", fns = "check_vars")]
tests that assert the valid form is NOT flagged AND the genuinely-invalid
form still fires, so flipping either guard changes an observable E0014 count.

Working-scope mutation score: 67→69 caught, 7 missed unchanged, kill_rate
90.54% → 90.79%. Baseline (mutation_scores.json) ratcheted up accordingly.

Refs [CHKARCH-TESTING].
cargo fmt was not re-run after adding the kill tests in de8acff; CI's
fmt check flagged the multi-arg assert_eq! at mutation_kill_tests.rs:502
and :540. Apply canonical rustfmt (each assert_eq! operand on its own
line). Formatting-only; no behavioural change.
@MelbourneDeveloper MelbourneDeveloper merged commit 75aa31c into main Jun 3, 2026
12 checks passed
@MelbourneDeveloper MelbourneDeveloper deleted the eliminatedfalsepositives branch June 3, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant