Skip to content

fix: batch of 73 bug/doc fixes (#739–#823)#957

Merged
dekobon merged 51 commits into
mainfrom
fix/batch-2026-06-14
Jun 15, 2026
Merged

fix: batch of 73 bug/doc fixes (#739–#823)#957
dekobon merged 51 commits into
mainfrom
fix/batch-2026-06-14

Conversation

@dekobon

@dekobon dekobon commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Summary

Batch fix of the open #739#823 backlog: 73 issues fixed across the
library, CLI, output serializers, VCS subsystem, and docs. Two issues were
investigated and found to be non-bugs (left open with evidence), and one
hardening follow-up was filed.

Each fix is its own atomic commit carrying a Fixes #N trailer (so the
issues close on merge), with a regression test verified by reverting the
production change.

Highlights by area

Not fixed (investigated → non-bug)

Follow-up

Quality assurance

Survived five independent passes — simplify-rust, rust-optimize,
audit-tests (each of ~150 new tests verified by revert; 2 weak assertions
strengthened), review, and code-review max (one C# predicate dedup
applied). make pre-commit is green: fmt, clippy ×2 -D warnings, workspace
tests --all-features, doc, udeps, all lint families, manpages, self-scan
both tiers, and the Python stages.

Snapshots

Four metric changes shifted integration snapshots in the
big-code-analysis-output submodule; those commits are pushed
(edacd90f..3fba6241 on main) and the parent records the matching SHA.

dekobon added 30 commits June 14, 2026 13:40
bus_factor_rows handed raw directory paths to write_md_table, which
only pads cells. A `|` in a Unix directory path injected a spurious
column separator and corrupted the Markdown table; the doc comment on
write_markdown_bus_factor falsely claimed the renderer escaped cells.

Return typed Cells from bus_factor_rows so each format applies its own
escaping the way the ranked-files path does: Markdown maps through
render_cell_md (GFM-escapes the directory Cell::Path), HTML maps
through cell_text (write_table_with_tooltips escape_html-escapes).
Numerics are pre-formatted Cell::Num and pass through raw.

Fixes #739
`run_command_preproc` finalized the shared worker accumulator with two
`.expect()` calls -- the panic-on-poison / panic-on-un-joined-Arc
pattern #445 removed from the sibling count collector. A worker panic
poisoned the mutex and crashed `bca preproc` with a backtrace instead
of a clean exit.

Extract a `into_preproc_data` helper that mirrors
`CountCollector::into_count`: `Arc::try_unwrap` then poison-tolerant
`into_inner` on the sole-owned path, and `mem::take` from behind a
poison-recovered guard on the still-shared path. The recovered data
still holds every joined worker's file entry, so finalization degrades
rather than panics. Add regression tests for both the poisoned-mutex
and shared-Arc paths (verified by revert).

Fixes #740
`Touched::churn()` used a plain `+` to sum `added` and `deleted`, the
only churn computation in the VCS subsystem that was not saturating.
This contradicts the subsystem's documented no-panic-on-overflow
invariant and would panic in a debug build on overflow. Switch to
`saturating_add`, matching every sibling accumulator (stats.rs,
diff_parse.rs, size_features), and add a regression test that would
panic under the old `+`.

Fixes #742
read_file_with_eol previously classified its 64-byte prefix with
String::from_utf8_lossy + an unconditional pop() + a U+FFFD scan. That
heuristic dropped the probe's last character unconditionally, hiding a
genuinely invalid trailing byte (wrongly accepting non-UTF-8 input) and
rejecting files that legitimately contained the U+FFFD replacement
scalar within the probe window.

Replace it with a std::str::from_utf8 check: accept on Ok; tolerate an
error only when it is a trailing incomplete multibyte sequence
(error_len().is_none()) AND the file continues past the probe, so the
split character is completed by later bytes; reject otherwise. When the
probe is the whole file, an incomplete tail is genuine corruption.

The public Ok(None)/Ok(Some(..)) contract and the 64-byte window (now a
named UTF8_PROBE_BYTES constant) are unchanged.

Fixes #746
Fixes #758
CountCollector::into_count's Err(shared) arm silently returned a
race-dependent partial tally when the collector was still shared,
violating its own "after every worker has joined" contract and
turning a coordination bug into undercounted metrics with no signal.

Add a debug_assert!(false, ...) in the still-shared arm so misuse
trips loudly in debug and test builds, while release continues to
degrade to the best-effort snapshot rather than panicking, honoring
the no-panic-in-production contract. Sharpen the doc comment to state
that a still-shared collector indicates a caller-side coordination
bug and that the returned Count is a non-final snapshot, not the
complete aggregate the #[must_use] return implies.

into_count is pub and re-exported from lib.rs, so its signature must
stay stable until 2.0; the debug_assert path makes the misuse
explicit without a SemVer break, mirroring the sibling
into_preproc_data recovery pattern.

Add a debug_assertions-gated #[should_panic] regression test for the
still-shared misuse path (verified by revert: it does not panic
without the assert).

Fixes #757
Ops::new initialised every space name from get_func_space_name, which
returns Some("<anonymous>") for root Unit nodes that have no name field.
ops_inner overwrites only the final top-level node, so any non-top-level
Unit space kept the synthetic "<anonymous>" placeholder even though the
API docs describe None as the "name could not be resolved" state.

Mirror the SpaceKind::Unit special-case already in FuncSpace::new:
skip name derivation for Unit kinds and leave name = None, so the ops
and metrics paths agree. The top-level override in ops_inner still sets
the caller-supplied name, leaving top-level behaviour unchanged.

Fixes #755
Under --exclude-tests, the prune hook records each pruned test
subtree's row span on its innermost enclosing func-space via
Sloc::exclude_span. Sloc::sloc() subtracts only that space's own
excluded_lines, and Sloc::merge folded only sloc_min/sloc_max, not
excluded_lines. For a #[test] fn inside a retained impl/trait/closure
(all Rust func-spaces), only that space's SLOC dropped; the unit's
span-based SLOC stayed inflated, so MI's 16.2*ln(SLOC) term still
inherited the test lines. Ploc was already correct because its
line-set merges upward.

Fold excluded_lines in Sloc::merge so the pruned count propagates to
every enclosing space up to the unit, mirroring Ploc's upward line-set
union. Each ancestor's span includes the pruned rows exactly once and
pruned subtrees never descend, so the count is subtracted once per
altitude with no double-counting.

Strengthen sloc_drops_for_test_fn_nested_in_impl to assert the unit
root (not just the impl). Add cases for a #[test] fn in a production
impl, a test item nested in a closure, and a non-test impl that must
be unaffected. Verified via test-via-revert: the unit assertions fail
without the fold.

Fixes #741
MetricsOptions::with_metric_set stored the caller-supplied MetricSet
verbatim, so a public caller could pass empty().with(Metric::Mi) and
have the walker compute a maintainability index from zero-valued Loc /
Cyclomatic / Halstead inputs with no error.

Add MetricSet::resolved() (the set-in/set-out counterpart of
from_slice_with_deps) and apply it inside with_metric_set so the set is
closed under Metric::dependencies before storing. This matches the
existing with_only behaviour, which already resolves the closure and
emits the dependencies; compute and emit share one selected set by
design, so no output widens beyond what with_only(&[Mi]) already does.
The operation is idempotent: an already-closed set is unchanged.

Keeps the builder infallible (additive correctness fix, minor bump).

Fixes #743
A top-level comma list whose first operand is `not(...)` and whose
last operand ends in `)` (e.g. `not(foo), all(test)`) was misread as a
single `not(...)` operand: the `not` short-circuit stripped `not` to
leave `(foo), all(test)`, which both starts with `(` and ends with
`)`, so the whole list was dropped and the trailing `test` operand was
lost. Items so annotated were not recognised as test-only and survived
exclude_tests.

Move the bare-comma-list split ahead of the single-operand
not/all/any prefix checks, so each operand is classified in isolation.
The existing cfg_split_top_level_args already respects paren depth, so
a comma nested inside a predicate's own parens (`not(foo, bar)`,
`all(test, unix)`) remains a single operand and is unaffected.

Fixes #763
The byte-level C/C++ lexer prepass in `replace` treated every `'` as a
character-literal opener. C++14 ([lex.icon]/[lex.fcon]) and C23 allow a
single-quote digit separator inside numeric literals (`1'000`,
`0xDEAD'BEEF`, `3.14'159`). With an odd separator count the opening `'`
had no closing `'`, so the scanner stayed in a spurious Char state to
EOF and every later macro identifier leaked through unmasked.

Track numeric-literal context (`in_number`) and recognize a `'` flanked
by hex digits inside a number as a separator rather than a Char opener.
The numeric-context guard distinguishes separators from length-prefixed
char literals (`L'x'`, `u8'x'`), whose prefix is an identifier run, not
a numeric one, so those still open Char correctly.

Bundle the masking-pass cursors into a `MaskState` struct to keep the
scanner's working state in one place.

Fixes #765
remove_from_code substituted bare LF for every line a removed multi-line
comment spanned, so stripping comments from a CRLF source produced mixed
CRLF/LF output and `bca strip-comments --in-place` silently rewrote line
endings. Detect the source's dominant line ending from its first newline
and substitute the matching sequence (\r\n for CRLF, \n otherwise) per
removed-comment line, via a CRLF fast-path buffer mirroring the existing
LF buffer. Also restore the \r when a single-line comment node absorbed
the CR of its terminating CRLF, so the kept line keeps a full \r\n.

Regression test strips a multi-line comment from a CRLF fixture and
asserts no bare LF survives; the LF test now asserts no CR is introduced.

Fixes #767
`Node::child_by_field_name` returned `Option<Node<'_>>`, tying the
child to the borrow of `&self` rather than the underlying tree
lifetime `'a`. tree_sitter::Node is Copy and its children are valid
for the whole tree, so the narrowing was needless and inconsistent
with the sibling accessors (`child`, `parent`, `children`,
`next/previous_sibling`), which already return `Node<'a>`.

Widen the return to `Option<Node<'a>>` so callers may hold a child
past the parent borrow. This is additive: every existing call site
uses the result transiently, and a wider return lifetime only adds
capability, so nothing breaks across lib/cli/web/py.

Add a regression test whose helper returns a child past an inner
`&parent` reborrow; it fails to compile (E0515) under the old
signature and passes under the widened one.

Fixes #786
ops_inner lacked metrics_inner's synthetic-Unit-root wrapper, so
`Ast::ops()` returned Err(EmptyRoot) for ERROR-root parses (e.g. Lua
partial input) where `Ast::metrics()` succeeds. Mirror the metrics
seam's push_synthetic_unit_root so the two seams agree: ops() now
yields a top-level Unit Ops (name = caller-supplied Source::name)
in the same cases metrics() does. Additive correctness, not a
breaking shape change to the public Ops type.

Also correct the Ops::operands / Ops::operators doc comments: they
return the DISTINCT (deduplicated) Halstead operand/operator
vocabularies (n2 / n1, HashMap keys), not "all" occurrences.

Fixes #789
Fixes #790
The dedicated `(tree_sitter_cpp)` arm of `get_language!` produced
`tree_sitter_cpp::LANGUAGE.into()`, which is byte-identical to what
the generic `($name:ident)` arm produces for `$name = tree_sitter_cpp`
(`$name::LANGUAGE.into()`). Before #718 the arm differed — it routed
to `tree_sitter_mozcpp::LANGUAGE` — but commit 95fccbb (the upstream
cpp default / opt-in mozcpp work) repointed it at `tree_sitter_cpp`,
leaving it dead. `tree_sitter_cpp` exposes `pub const LANGUAGE`, so
the generic arm covers Cpp unchanged. Pure no-op: the language→grammar
mapping is identical before and after. The typescript/tsx/php arms are
retained because they use distinct `LANGUAGE_*` constants.

Fixes #770
Add the missing Objective-C bullet to the crate-level rustdoc
`## Supported Languages` list and correct two stale slugs the same
list carried: TSX (`tsx`, not `typescript`) and the internal
`ccomment` / `preproc` C-family helpers (lowercase canonical slugs).

Add a drift-guard `#[test]` that derives the canonical slug set from
`LANG::into_enum_iter()`/`name()` (the single source of truth, always
compiled in regardless of per-language Cargo features) and asserts
each distinct slug appears as a backtick token within the section, so
a future add-lang cannot silently desync the docs.rs landing page.

Fixes #769
Fix five independent documentation/comment accuracy issues. All are
comment/doc-only — no behavior change.

- concurrent_files_tests.rs: the consumer error-handling test comment
  overclaimed it exercises both sides of the call-site guard; its
  `invocations == 2` assertion cannot distinguish swallow vs emit.
  State that it verifies only loop-continuation and point at the
  per_file_error_* unit tests for the branch distinction.
- concurrent_files.rs: document ConcurrentRunner::new's max(2,n)-1
  worker-count transform (total budget = producer + consumers).
- macros/mod.rs: add Mi to the bracketed-arm trait list (Mi uses the
  [Trait] default-body arm alongside Tokens/Nom/NArgs).
- metrics/wmc.rs: Kotlin companion_object IS a func_space with its own
  Class scope (#431); correct the comment that claimed otherwise.
- book metrics.md: the mi.sei comment term is a percentage in [0,100],
  not a [0,1] ratio; relabel comment_ratio -> comment_percentage.

Fixes #764
Fixes #766
Fixes #771
Fixes #788
Fixes #774
The #765 digit-separator fix grows step_normal's cyclomatic to the hard
limit (a flat hand-rolled lexer dispatch) and the #767 CRLF helper adds a
small LineEnding type that lifts comment_rm's file-level nargs sum. Both
pass the hard gate but encroach the 95% headroom band. Mark them with
in-source suppression + rationale rather than contorting the code: the
lexer is clearest as one state machine and file-level nargs is an
aggregation artifact, not a wide-signature smell.
The predefined-macro table listed 14 `UINT*_MIN` entries that are not
real C standard macros: ISO C defines `UINTN_MAX` but no `UINTN_MIN`
(the minimum of any unsigned type is 0, so no macro exists). As a
result `is_predefined_macros` returned true for non-macros and the
macro-masking prepass could clobber a user identifier such as
`UINT16_MIN`. The legitimate signed `INT*_MIN` macros are retained.

The preproc specials table likewise listed `char64_t` and `charptr_t`,
which the `enums/data/special.py` generator documents as not real
C/C++ types and never produces (it widens `char{8,16,32}_t` only and
omits `charptr_t`). Because that generator only adds entries
(diff = specials - old) it can never prune stale rows, so these made
`is_specials` return true and a `#define char64_t ...` was wrongly
dropped from macro extraction.

Removed the entries from the `.txt` sources and regenerated the `.rs`
slices via `cargo run -p enums -- --language c_macros`; the slices
stay sorted (binary_search lookup), and regeneration is idempotent.
Added regression tests pinning the pruned names (the codegen smoke
tests only cover generic `names[0]`/`names[1]` placeholders).

Fixes #760
Fixes #762
Bash `heredoc_body` (Bash::HeredocBody2) and Perl
`heredoc_body_statement` (Perl::HeredocBodyStatement) match
Checker::is_string but were omitted from their Alterator flatten
arms, so a heredoc body with interpolation kept its structured
interpolation/fragment children in the AST dump while is_string
treated the whole node as one string -- the dump asymmetry #699
eliminated elsewhere (Ruby HeredocBody, PHP Heredoc/Nowdoc).

Add both variants to the flatten arms, mirroring Ruby/PHP. This is
a dump-only change: find/count already treated these nodes as
strings, so no metric value moves and no snapshot shifts. Adds
regression tests verifying the dumped node is a flat string with
its interpolation text preserved (test-via-revert confirmed both
fail against the pre-fix arms).

Fixes #761
Issue #768 reported that the LPAREN2/LBRACK2/LBRACK3 second-alias
opener arms in get_op_type (Cpp/C/Objc/Mozcpp/Tcl/iRules/Php/Elixir/
Ruby) escape the #695 pair-glyph fold, inflating Halstead n1 and
rendering a bare "(" instead of the folded "()".

Investigation shows the premise does not hold. tree-sitter's runtime
collapses each second alias to its base via the grammar's
public_symbol_map before Node::kind_id() (ts_node_symbol) returns, so
the alias kind_id is never observable to the metric layer. Verified
empirically across the exact constructs that produce the alias
internally (#if defined(FOO), f(1), a[0], in Point[1,2]): the alias
kind_ids 20/47/46/95/96/155 appear zero times; only base openers reach
compute_halstead, n1 is not inflated, and openers already render
folded. No metric value changes; no snapshots shift.

The alias match arms are therefore defensive, not active. Rather than
apply the no-op fold #768 proposed (which would imply a fold is needed
when it is not), this documents each alias arm as defensive and adds
second_alias_opener_collapses_to_base_kind_id, which pins the
runtime-collapse invariant. If a future grammar bump drops the
public_symbol_map collapse, the test goes red and signals that the
alias arms must then fold to their pair glyph in
get_operator_id_as_str.

Fixes #768
The ABC condition-counting terminal-bool sets for Python and the JS
family (JavaScript, Mozjs, TypeScript, Tsx) omitted the numeric-literal
kind, so a numeric-truthy operand in an if/while/&&/|| slot scored zero
conditions -- the exact gap fixed for Lua.

Python treats every non-zero number as truthy and the JS family treats
every non-zero/non-NaN number as truthy, so `if 5:` (Python) and
`while (5)` / `x && 5` (JS/TS) each count their numeric literal as a
Fitzpatrick unary condition. Add Python::Integer / Python::Float and the
JS-family Number literal kind to the affected macros.

TypeScript and Tsx emit `number` under two ids: the numeric literal
(Number) and the `predefined_type` keyword `number` in a type annotation
(Number2). Only the literal is a value, so Number2 is deliberately
omitted. Statically-typed languages (Go/Rust/Java/Kotlin/C#) correctly
omit numerics -- a bare int in a bool slot is a type error -- and are
left untouched.

Adds regression tests mirroring lua_number_truthy_condition_counts and
records the JS integration-fixture snapshot shifts in the submodule.

Fixes #772
Kotlin's ABC compute had no Phase-2B condition-slot arm, so a
non-comparison predicate (`if (flag)`, `while (running())`,
`do {} while (ok)`) counted zero ABC conditions — diverging from every
sibling language and dropping below Kotlin's own cyclomatic decision
count. Route the `if`/`while`/`do-while` `condition` field through a new
`kotlin_count_condition` helper that counts a bare terminal once and
unwraps paren/negation via the existing `kotlin_inspect_container`. A
comparison (`a == b`) or `&&`/`||` predicate is a nested binary_expression
already counted by the token and chain-walker arms, so it adds nothing
here — no double-count.

Regression tests cover the three bare-predicate forms plus the
parenthesised case, with guard tests pinning that comparison and
short-circuit predicates are unchanged. No snapshots shifted.

Fixes #773
PHP `Cognitive::compute` was missing the function-definition boundary
arm that every sibling language gained in #696. It never reset
structural `nesting` to 0 nor bumped the function-depth surcharge at a
`FunctionDefinition` / `MethodDeclaration`, so a PHP function or method
nested inside control flow inherited the enclosing nesting and missed
the SonarSource B-nesting amplification.

Add the boundary arm mirroring Java exactly: `nesting = 0` plus
`increment_function_depth(&mut depth, node, &[FunctionDefinition,
MethodDeclaration])`. Closures (`AnonymousFunction` / `ArrowFunction`)
keep their existing lambda arm and intentionally do not reset nesting,
matching how the siblings treat lambdas vs named functions.

A function `inner()` defined inside `if ($a) { if ($c) { … } }` whose
body is `if ($b) { if ($d) { … } }` drops from cognitive 7 to 5; the
same body as a top-level function is unchanged at 3. No snapshot
fixtures contain nested function definitions, so none shifted.

Fixes #775
The Elixir cyclomatic impl added +1 for every `stab_clause`, but the
first `stab_clause` of an `anonymous_function` is the closure's
head/definition, not a pattern-dispatch decision. Because the closure
already opens its own function space seeded with base cyclomatic 1,
a trivial `fn x -> x end` reported cyclomatic 2 for zero decisions,
diverging from cognitive (which already scores it 0) and from trivial
closures in Python/Rust/JS.

Discriminate on the parent: a `stab_clause` keeps counting unless it
is the first `stab_clause` child of an `anonymous_function` parent.
`case`/`cond`/`with`/`try` arms have a `do_block` parent and are
unaffected; multi-clause `fn`s skip only their head clause, so N
clauses add N-1 branches.

Modified cyclomatic is untouched (the anon fn is not a Call).

Fixes #776
The JS-family LOC impls (Mozjs, Javascript, Typescript, Tsx) listed
`StatementBlock` among the LLOC contributors, so every `{ … }` brace
block added +1 logical line. No other language does this: Rust's
`Block`, C/C++/Objc/Mozcpp's `CompoundStatement`, and
Java/Groovy/C#'s `Block` all contribute 0 lloc. A statement_block is
a syntactic grouping, not a logical statement, so the count inflated
JS/TS lloc and broke cross-language parity.

Remove `StatementBlock` from all four JS-family LLOC match arms; it
now falls through to the PLOC catch-all like every sibling block
node. For `if (x) { return 1 } else { return -1 }` the JS lloc drops
from 6 to 3, matching the C and Rust translations.

Update the existing anchored lloc tests to the corrected values and
add `js_family_if_lloc_matches_c_and_rust`, which pins parity across
Mozjs/JS/TS/TSX and the C/Rust baselines. Refresh the lib insta
snapshots and the integration-output submodule fixtures (every
changed lloc value decreases; no other metric shifts).

Fixes #777
Interior rows of a multi-line string literal were classified
inconsistently across languages. Python credits every row of a
non-docstring multi-line string to PLOC (#415: "real code, not blank
lines"), but most other languages no-op'd their string nodes, so the
interior rows reached neither PLOC nor CLOC and
`blank = sloc - ploc - cloc` mislabelled them as blank.

Add a shared `add_multiline_string_ploc` helper mirroring Python's #415
mechanism exactly (insert the opening row only when the enclosing
statement begins earlier, then insert `start + 1..=end`) and route every
multi-line-capable string node through it: Perl (quoted strings +
heredoc body), Ruby (strings + heredoc body), PHP (encapsed / single
string), Go (raw string), Rust (string / raw string), C / C++ / Mozcpp /
Objc (string / raw string), Java + Groovy (text blocks), C# (verbatim /
raw string), Kotlin (multiline string), Lua (long-bracket string), and
JS / TS / TSX / Mozjs (template literals). Python is the canonical
reference and is unchanged.

A 3-line string assignment now reports ploc == 3 / blank == 0 in every
language with a multi-line string literal, matching Python. Tcl, Bash,
and PHP heredoc already behaved correctly.

Fixes #778
Go's panic() and Lua's error()/os.exit() are built-in abrupt-exit
calls that unwind the stack like throw/raise in the exception
languages, but were silently uncounted while every other exit-language
counts its throw/raise equivalent.

Match them by callee text, mirroring the Bash/Elixir builtin-by-name
pattern already in this file:

- Go: a call_expression whose `function` field is a bare identifier
  spelling `panic`. A package-qualified `foo.panic()` has a
  selector_expression callee and is correctly not counted.
- Lua: a function_call whose `name` field text is `error` or
  `os.exit`. A user call such as `foo()` or `myError(...)` is not
  counted.

`return` is still counted; panic/error/os.exit are additional exits.

Fixes #779
C# 8+ permits explicit private/protected modifiers on interface
members, but CsharpCode::Npa::compute counted every interface field
as a public attribute (interface_npa = interface_na), inheriting
Java's idiom without Java's invariant that interface fields are
always public.

Interface members default to public, so gate interface_npa per
declaration on the inverse of the class rule: a field counts as a
public attribute unless it carries an explicit private/protected
modifier. The new csharp_interface_member_is_public helper mirrors
ts_member_is_public!'s default-public gate. The modifier applies to
every declarator of a field, so the public/private split is
per-declaration. Also replaces the assign-the-whole-sum smell
(stats.interface_npa = stats.interface_na) with a proper increment.

Fixes #780
PHP was the only language whose Npa impl counted enum cases as class
attributes (`enum Color { case Red; case Green; case Blue; }` reported
class_na = class_npa = 3). Java, Kotlin, Rust, and C# all treat enum
cases/variants as sum-type tags, not data fields, and exclude them.

Remove the `EnumDeclarationList` arm so PHP enum bodies contribute no
npa attributes, matching the established cross-language convention. PHP
enums cannot declare instance properties, and class-level `const`s are
not counted as attributes outside an enum either (the class-body arm
counts only `PropertyDeclaration`), so excluding the body entirely keeps
PHP internally consistent as well.

Adds regression tests: enum cases report 0, enum-with-const reports 0,
and a cross-language parity check that PHP and Java equivalent enums
both report (0, 0). Regenerates the traits_enums.php integration
snapshot in the big-code-analysis-output submodule.

Fixes #781
The Display impl for nargs::Stats emitted the function_args/closure_args
headline from the per-space accumulators (function_args/closure_args),
while every structured serializer (JSON/YAML/TOML/CBOR via wire.rs) emits
them from the cross-space sum (function_args_sum/closure_args_sum). The
accumulators are not combined by Stats::merge, so at a parent space that
rolls up child function-spaces the text output disagreed with JSON for
the same field name (e.g. python_nested_functions: JSON 3, Display 2).

Switch the two Display lines to the *_sum accessors so text matches JSON.
total/average/*_average already sourced from the summed accessors and are
unaffected; this is an output-format fix, not a metric-value change, so
no JSON/YAML snapshots shift. Adds a parity regression test asserting the
Display headline equals the summed accessors for a nested-function space.

Fixes #782
dekobon added 16 commits June 15, 2026 06:51
Add per-language smoke tests for C, Objective-C, Elixir, Ruby, and
iRules, the five registered languages that lacked the non-zero-tokens
positive test required by Lesson 1. Each mirrors the existing
smoke-test idiom (check_metrics + tokens_sum() > 0) on small realistic
source.

Fix cpp_tokens_nested_attribution: the docstring described a
forward-declaration fixture the test never used; correct it to match
the actual lambda fixture and replace the vacuous max >= 7 / max <= sum
inequalities with exact assertions (sum == 24, max == 24, min == 0).
A C++ lambda is a closure but opens no FuncSpace, so all leaves
attribute to the single outer scope; the exact values fail if a
regression split the lambda into its own scope or credited the unit,
which the prior inequalities would have passed. Drop the unfulfilled
"unit-level sum must equal total of all scopes" comment.

Fixes #785
Fixes #787
- csv.rs: numeric metric cells CAN start with `-` (negative MI), but
  are safe as self-contained numeric literals, not "cannot begin with
  a trigger". State the real invariant.
- dump_ops/dump_metrics/dump.rs: drop dead `[`Result`]: #variant.Result`
  doc-links; bare `[`Result`]` resolves via intra-doc.
- dump_metrics.rs: format_number's whole-float-as-integer DIVERGES from
  JSON (which keeps `.0`); it is a legibility choice, not a match.
- dump.rs: document the `# Panics` precondition that `code` must be the
  exact source `node` was parsed from, on dump_node and
  dump_node_with_color.
- dump.rs: name `dump_tree_helper` in the depth-limit test comment; the
  old `dump_children` no longer exists post-#700.
- warning_line.rs: drop the false GitHub Actions `::warning::` /
  "out of the box" claims; CI annotators need a registered problem
  matcher for Clang/MSVC lines.
- parser.rs + nodes.md: document numeric -t/--type as a raw, grammar-
  version-unstable kind_id match.

Fixes #791 #792 #793 #795 #796 #797 #799
read_file_with_eol stripped a leading UTF-16 BE/LE BOM and kept the
interleaved-NUL body, which then passed the byte-level UTF-8 probe
(each NUL is a valid single-byte scalar) and reached the parser as
garbage. A UTF-16 BOM now skips the file (Ok(None)), consistent with
is_generated's documented stance that UTF-16 source is unsupported.
The UTF-8 BOM is still stripped and read, and the BOM checks now use
bounds-safe slice prefixes so a sub-3-byte probe cannot panic.

The probe read also collapsed every read_exact failure to Ok(None),
swallowing real I/O faults (permission, hardware) that the documented
contract promises to surface as Err. The read now distinguishes a
clean short read (UnexpectedEof -> Ok(None)) from any other error
kind, which propagates as Err.

Fixes #803
Fixes #804
The UnknownMetric error hint re-derived the suppressible vocabulary
from Metric::NAMES with a hardcoded "tokens" filter, contradicting
Metric::suppressible()'s documented claim of being the single source
of truth. Route the hint through suppressible() (sorted to keep the
list alphabetised) and guard the derivation with a test assertion.

Also fix two doc inaccuracies on published API:
- Search::act_on_node's trait-declaration param was named `pred`
  though the callback is an action with no return; rename to `action`
  to match act_on_child and the node.rs impl (cosmetic, non-breaking).
- PreprocFile::new_macros's doc said it "adds new macros" but it
  constructs a fresh PreprocFile; reword as a constructor.

Fixes #801
Fixes #802
Fixes #805
Five VCS-internal integer-arithmetic sites used bare +/-/sum that
panic in a debug build (overflow checks on) and wrap in release,
violating the no-panic invariant the subsystem documents
(src/vcs/stats.rs: "saturating_add everywhere ... the crate forbids
panics in non-test code"). None changes a metric value for normal
inputs; each makes one degenerate-input path total.

- blame line math: convert a blame hunk's 0-based start + length to a
  1-based inclusive LineRun via a named saturating constructor
  (LineRun::from_blame_hunk), so a line number at the u32 ceiling
  clamps instead of overflowing the `+ 1` / `+ len - 1`.
- window cutoffs: `now - *_window_secs` underflows i64 for a garbage
  extreme `--as-of` (e.g. i64::MIN). saturating_sub floors the
  boundary at i64::MIN ("no lower bound" = include all history), the
  safe total behavior. Fixed uniformly at all four sites:
  git/history.rs, replay.rs (long + recent), git/jit.rs (long +
  recent).
- secs_to_days: saturating_add the rounding term so secs_to_days(i64::MAX)
  saturates to u32::MAX instead of wrapping negative and flooring to 0.
- cross-author edit sum: fold with u32::saturating_add (the per-author
  counts were already saturating_add-clamped; std Sum<u32> panicked on
  overflow).

Each fix carries an overflow regression test that panics against the
pre-fix bare arithmetic in debug. The pre-existing
secs_to_days_saturates_huge_and_floors_negative test was
false-confidence (it only exercised an in-range magnitude); it now
asserts secs_to_days(i64::MAX) == u32::MAX, the case that actually
triggered the overflow.

Fixes #809
Fixes #814
Fixes #820
Fixes #821
Two precision regressions in commit-message classification, both
violating the module's "favour precision over recall" contract.

#806: rollback detection ran an unanchored `\brollback\b` against the
full multi-line message while revert was subject-anchored, so a
body-prose "rollback" mention flipped the whole commit to revert. Give
rollback the same discipline as revert: match `^rollback\b` against the
subject line only. A leading "Rollback the migration" still classifies;
a buried mention no longer does.

#808: the security regex matched bare `injection` and `overflow`, which
fire on routine non-security commits ("dependency injection", "text
overflow", arithmetic "integer overflow"). Replace them with qualified
attack-vector forms: `(?:sql|command|code|html|ldap|os|xml)\s+injection`
and `(?:buffer|heap)\s+overflow`. "integer overflow" and "stack
overflow" are deliberately excluded as ambiguous (arithmetic bug /
website name); precision wins over recall.

Adds negative/positive regression tests for both, verified via revert
against the old regexes.

Fixes #806
Fixes #808
Two author-identity bugs in VCS participant resolution that skew
bus-factor / ownership / JIT-experience metrics.

#812: ParticipantResolver::participants ran the Co-authored-by regex
over the whole commit message, so a `Co-authored-by:` line quoted in
the body (a pasted log, a reverted commit, an indented code block) was
falsely attributed as a real co-author. Git only honours trailers in
the message's final block, so scan only the detected trailer block: the
last paragraph (after the final blank-line separator), treated as
trailers only when at least one line is a recognised `Token: value`
trailer and at least 25% of its non-blank lines are (mirroring
git-interpret-trailers' threshold). A genuine trailing
`Co-authored-by:` alongside a `Signed-off-by:` still counts.

#817: AuthorId::new keyed an author with empty name AND empty email to
the empty string, collapsing every such author into one phantom
identity that accrued all their edits, ownership, and first-authorship
and deflated the distinct-author count. Add AuthorId::has_identity and
drop keyless authors in push_if_human, mirroring the existing bot
guard. A commit whose only author is keyless yields an empty
participant set, which process_commit already skips as unknown-author.

Eq/Hash on AuthorId are unchanged; has_identity is additive.

Fixes #812
Fixes #817
The persistent VCS history cache wrote a `truncated_shallow_clone`
flag but `is_compatible` never read it back. An entry written from a
shallow clone was reused verbatim after the repo was deepened
(`git fetch --unshallow` leaves HEAD unmoved, so the head-SHA key is
unchanged and the pure-hit/incremental paths fire), silently replaying
truncated event counts while reporting `shallow = false`. The reverse
(cached full, now shallow) reported a false truncation flag over
complete counts.

Thread the repository's current shallow state through `is_compatible`
and `load_compatible` and require `truncated_shallow_clone` to equal
it for reuse. A mismatch in either direction now falls through to the
re-walk path, which produces the correct counts and flag. The normal
same-shallow-state pure-hit and incremental-splice paths are
unaffected.

Fixes #810
A pure rename (similarity 100%, no body) emits no `+++ b/<new>` line, so
diff_git_new_path took the new path solely from the ambiguous
`diff --git a/<old> b/<new>` header. The `rsplit(' ')` fallback walked
from the right and yielded the first `b/`-prefixed token, truncating a
spaced new name (`b/new name.rs` -> `new`) and corrupting the diffusion
key.

Capture the unambiguous `rename to <path>` line in the stanza parser and
set the new-side path from it (decoded with the same C-style unquoting),
in preference to splitting the spaceful header. Content-change renames
and plain modifies are unaffected; non-rename stanzas see no
`rename to` line and keep the existing logic.

Fixes #813
blob_line_stats diffed every touched text blob twice per scored
commit: line_counts() for the added/deleted totals and lines()
only to count hunks. In gix 0.83 each runs a full blob diff, so
every modified file was diffed twice.

Prepare the diff once and read insertions, removals, and the hunk
count off a single gix_diff::blob::Diff::compute pass, mirroring
line_counts()'s own internals. Because this uses the same
un-slider-postprocessed algorithm line_counts() uses, the
added/deleted totals stay bit-identical to the old two-pass code
and to the file-level history walk (history.rs still calls
line_counts()), so the JIT size features and the history walk
cannot diverge. The full integration suite (which pins exact
added/deleted/hunks values) and all insta snapshots are unchanged.

Fixes #815
The #810 module-doc reference to the pub(crate) HistoryCache::is_compatible
tripped -D rustdoc::private-intra-doc-links; use a plain code span.
The unsalted SHA-256 of a canonical email emitted by
`--emit-author-details` (and stored in the persistent cache) was
described as "irreversible" / "injective for practical purposes" /
publishable "without disclosing the underlying email". An email is
low-entropy and enumerable and commit histories are public, so the
digest is recoverable against a candidate email set (the Gravatar
weakness). Replace the false guarantees with an honest threat model:
the digest is a stable pseudonym that avoids emitting plaintext email
and deters casual disclosure, but is not cryptographically
irreversible and is not anonymization.

Doc/comment + test-rename only; no change to the hashing scheme.
Hardening (keyed HMAC / KDF) is deferred as a follow-up because it must
be reconciled with the #334 cache-replay stable-digest invariant.

- identity.rs / cache.rs / git/history.rs: honest threat-model notes
- book vcs.md: new "Author-detail privacy" section; corrected cache note
- book python/vcs.md: corrected cache note, links to the new section
- rename test hashed_is_stable_and_irreversible_looking ->
  hashed_is_stable_and_avoids_plaintext_email; assert it avoids the
  plaintext email rather than "looks irreversible"

Fixes #811
Fix six VCS documentation/test-accuracy issues, all comment/doc/test
changes with no production logic change:

- #807: assert Display for all 16 vcs::Error variants (was 11); add
  InvalidFileTypeScope, InvalidBusFactorThreshold, InvalidTrend, Blame,
  InvalidDiff cases so the test lives up to its "pin each variant" claim.
- #816: JitDiffReport doc claimed the partial diff score is "always
  lower" than the full commit score, but the experience term is
  negatively signed, so the full score can fall below the partial; the
  two are not ordered. Removed the false ordering claim.
- #818: workdir_root doctest asserted None == None (relative paths never
  resolve). Rewrote it to illustrate two same-checkout paths sharing one
  Some(root), asserting only when both resolve.
- #819: trend.rs comment claimed each point is an O(commits) lookup
  rather than its own walk, but since #648 every per-point build
  re-anchors and re-walks the first-parent timeline. Corrected to
  describe the shared timeline's actual (oid-resolution) role and the
  per-point re-walk cost.
- #822: wire.rs test comment claimed VcsTrend(Point) use serde(flatten);
  they carry a nested vcs key. Removed the false flatten/CBOR-footgun
  claim, kept the write-only-untested rationale.
- #823: score.rs called size_factor a "tiny tie-breaker" / "well under
  one churn-point", but it enters base at coefficient 1.0 (~0.85 at 10k
  SLOC, >1.0 past ~50k). Corrected both comments to cite real magnitudes.

Fixes #807 #816 #818 #819 #822 #823
Record the Unreleased entries for the #739-#823 batch: cross-language
metric consistency fixes, output edge cases, VCS correctness/arithmetic/
security fixes, the read_file_with_eol hardening, new APIs
(MetricSet::resolved, public defang_formula, AuthorId::has_identity),
the JIT single-pass diff, and the author-hash privacy honesty fix.
audit-tests found two new tests that passed without fully exercising
their fix:

- ccomment_remove_comments_preserves_crlf (#767) only asserted CRLF
  consistency, so an impl that *dropped* removed-comment lines would
  still pass. Add a line-count-preservation assertion (comment removal
  blanks lines, never deletes them).
- coauthor_in_prose_last_paragraph_is_not_counted (#812) used a mid-line
  Co-authored-by mention that the ^-anchored regex never matched, so it
  passed regardless of trailer-block scoping. Rewrite to a column-0
  co-author in a non-final paragraph (renamed
  coauthor_outside_final_trailer_block_is_not_counted) that goes red if
  scoping is dropped.
code-review --fix: npa.rs (#780) and npm.rs (#783) independently computed
"node has an explicit private/protected Modifier" with inconsistent casts
(Csharp::Private vs Private as u16). Extract one pub(crate)
csharp_node_has_private_or_protected_modifier in npa.rs and reuse it from
both gates — single source of truth for the C# narrowing rule.

Also scope a clippy::naive_bytecount allow on the small CRLF test fixture
counter added in the prior audit-tests commit (the bytecount crate is
unwarranted there).
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.94872% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.69%. Comparing base (48cd18a) to head (6581ae4).

Files with missing lines Patch % Lines
src/metrics/loc.rs 95.08% 9 Missing ⚠️
src/comment_rm.rs 87.50% 5 Missing and 1 partial ⚠️
src/vcs/git/diff_parse.rs 81.25% 1 Missing and 2 partials ⚠️
src/alterator.rs 90.00% 2 Missing ⚠️
src/vcs/git/jit.rs 93.33% 0 Missing and 2 partials ⚠️
src/c_macro.rs 99.01% 0 Missing and 1 partial ⚠️
src/metrics/abc.rs 98.83% 0 Missing and 1 partial ⚠️
src/metrics/cyclomatic.rs 95.45% 1 Missing ⚠️
src/metrics/npa.rs 99.13% 1 Missing ⚠️
src/output/checkstyle.rs 0.00% 0 Missing and 1 partial ⚠️
... and 1 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #957      +/-   ##
==========================================
+ Coverage   85.50%   85.69%   +0.19%     
==========================================
  Files         106      101       -5     
  Lines       64642    65364     +722     
  Branches    64248    65364    +1116     
==========================================
+ Hits        55270    56015     +745     
+ Misses       8943     8918      -25     
- Partials      429      431       +2     
Flag Coverage Δ
python ?
rust 85.69% <97.94%> (+0.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/c_langs_macros/c_macros.rs 100.00% <ø> (ø)
src/c_langs_macros/c_specials.rs 100.00% <ø> (ø)
src/c_langs_macros/mod.rs 87.50% <100.00%> (+15.08%) ⬆️
src/cfg_predicate.rs 98.74% <100.00%> (+0.11%) ⬆️
src/concurrent_files.rs 90.37% <ø> (ø)
src/count.rs 90.90% <100.00%> (+1.30%) ⬆️
src/getter.rs 91.01% <ø> (-1.13%) ⬇️
src/langs.rs 91.44% <100.00%> (+1.00%) ⬆️
src/macros/mod.rs 91.97% <ø> (ø)
src/metric_set.rs 99.02% <100.00%> (+0.08%) ⬆️
... and 46 more

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant