Eliminate 44 PEP-conformance false positives (170→126), zero regression + FP-ceiling/mutation hardening#74
Merged
Conversation
…e FP 128→126; tighten ceiling, harden tests) Final batch of the FP-elimination plan. Net branch impact vs main: PEP conformance false positives 170 → 126 (−44) with PASS=136/146, caught=858, missed=95 all UNCHANGED — no file flips PASS→FAIL, caught never drops, missed never rises. This commit (128 → 126): - e0014/alias_match: recursive Union-alias value matcher refinements (positive-match semantics; Mapping/tuple structural; Unknown/Any do NOT match concrete targets so genuine TPs keep firing). - types: PEP 646 variadic-tuple star matching (tuple_assignable_with_star, prefix/suffix; unhandled shapes never manufacture an FP) + gradual tuple[Any, ...] source; None is Hashable. - types_parsing: bare Callable/type → Any-like, tuple[()] empty tuple, non-decimal literal equivalence (0x/0o/0b), shared parse_key_value_args (dedup dict[]/Mapping[]). - e0014/mod: skip whole-quoted forward-ref annotations; intercept bare legacy Union aliases for value-level matching. Mutation hardening: - tests/fp_elimination_tests.rs: 9 bidirectional coarse e2e tests — each asserts the eliminated FP is gone AND the paired true positive still fires, killing mutants that would silently regress either direction. - coverage-thresholds.json: max_false_positives 161 → 126 (tight to the measured value — any reintroduced FP pushes 127 > 126 and fails CI). Refs [CHKARCH-DIAG-TYPESAFETY], docs/plans/CHECK-ELIMINATE-FALSE-POSITIVES.md.
…baseline up The FP-elimination commit added two branches to e0014 check_vars (the quoted forward-reference annotation skip and the bare recursive-Union-alias interception), which introduced two new cargo-mutants targets that the existing mutation_safe suite did not cover: e0014/mod.rs:140 `replace || with &&` (quoted-annotation guard) e0014/mod.rs:219 `delete !` (bare-alias guard) Add two bidirectional #[mutation_safe(rule = "e0014", fns = "check_vars")] tests that assert the valid form is NOT flagged AND the genuinely-invalid form still fires, so flipping either guard changes an observable E0014 count. Working-scope mutation score: 67→69 caught, 7 missed unchanged, kill_rate 90.54% → 90.79%. Baseline (mutation_scores.json) ratcheted up accordingly. Refs [CHKARCH-TESTING].
cargo fmt was not re-run after adding the kill tests in de8acff; CI's fmt check flagged the multi-arg assert_eq! at mutation_kill_tests.rs:502 and :540. Apply canonical rustfmt (each assert_eq! operand on its own line). Formatting-only; no behavioural change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TLDR
Eliminate 44 PEP-conformance false positives (170 → 126) with zero conformance regression, and lock the win in place with a ratcheting FP ceiling plus mutation-kill tests.
What Was Added?
crates/basilisk-checker/src/rules/e0014/alias_match.rs— recursive value-matcher for legacyName = Union[...]aliases (including self-referential ones likeJson). Uses positive-match semantics so genuinely-incompatible assignments still fire;Unknown/Anyvalues never spuriously match a concrete target.crates/basilisk-checker/tests/fp_elimination_tests.rs— 9 bidirectional coarse e2e tests; each asserts the eliminated false positive is gone AND the paired true positive still fires.crates/basilisk-cli/tests/conformance_tests.rs— readsconformance.max_false_positivesfromcoverage-thresholds.jsonand fails CI if the suite-wide FP total exceeds it (ratchets DOWN only — the mirror of the pass-percentage gate).scripts/fp_verify.sh— rebuild + conformance + CSV-diff harness that flags any PASS→FAIL flip, caught-count drop, or missed-count rise, plus the FP delta.#[mutation_safe(rule = "e0014", fns = "check_vars")]tests inmutation_kill_tests.rscovering the two newcheck_varsbranches.What Was Changed or Deleted?
False positives removed across several engine paths (all behind the no-regression invariant):
types_parsing.rs— bareCallable/typeparse as Any-like;tuple[()]→ empty-tuple type; non-decimal int literals (0x/0o/0b) compare equal across spellings; sharedparse_key_value_argshelper de-duplicatesdict[…]/Mapping[…]parsing.types.rs— PEP 646 variadic-tuple star matching (tuple[int, *tuple[str, ...], int]etc.; unhandled shapes never manufacture an FP); gradualtuple[Any, ...]source assignable to a fixed tuple;NoneisHashable.e0014/mod.rs— whole-quoted forward-reference annotations are skipped; bare references to legacyUnionaliases are intercepted for value-level matching.e0013.rs—is_unverifiable_return_typerecursively skips return-type checks E0013 cannot verify (Named/Literaland their nestings), while preserving thetuple[X, ...]terminator path.suppression.rs— honours a file-level# type: ignore(PEP 484) placed before any substantial line, and treats mypy-style# type: ignore[assignment](non-BSK-codes) as a blanket line suppression.protocol_ext.rs— E0099 no longer arg-checks type-checking utilities (assert_type/reveal_type/cast).coverage-thresholds.json—max_false_positives161 → 126 (tightened to the measured value).mutation_scores.jsonworking-scope baseline ratcheted 90.54% → 90.79%.How Do The Automated Tests Prove It Works?
conformance_scoretest): suite-wide false positives 170 → 126 (−44) while PASS = 136/146, caught = 858, missed = 95 are all unchanged — no file flips PASS→FAIL,caughtnever drops,missednever rises. The new FP ceiling (126) fails CI if even one FP returns.fp_elimination_tests.rs(9 tests): e.g.recursive_union_alias_accepts_valid_and_rejects_invalid(validJsonvalues pass, two3jcomplex values still fire),variadic_tuple_star_targets(only the too-short tuple fires),gradual_variadic_tuple_source_is_assignable,bare_callable_and_type_are_any_like,none_is_hashable_but_not_iterable,empty_tuple_type,non_decimal_literal_equivalence,quoted_forward_ref_annotation_is_skipped.check_varsbranches introduced mutantse0014/mod.rs:140 replace || with &&and:219 delete !; the two addedmutation_safetests kill both — working-scope score 67→69 caught, 7 missed unchanged, kill_rate 90.79% ≥ 90.54% baseline.Spec / Doc Changes
docs/plans/CHECK-ELIMINATE-FALSE-POSITIVES.md— execution log, the −44 breakdown, and a bottom checklist (done items ticked; remaining engine-blocked clusters — callable/protocol structural subtyping, TypedDict structural, narrowing, constructors — documented as deferred with unblocking notes). All code references[CHKARCH-DIAG-TYPESAFETY]/[CHKARCH-TESTING].Breaking Changes