fix: batch of 73 bug/doc fixes (#739–#823)#957
Merged
Conversation
bus_factor_rows handed raw directory paths to write_md_table, which only pads cells. A `|` in a Unix directory path injected a spurious column separator and corrupted the Markdown table; the doc comment on write_markdown_bus_factor falsely claimed the renderer escaped cells. Return typed Cells from bus_factor_rows so each format applies its own escaping the way the ranked-files path does: Markdown maps through render_cell_md (GFM-escapes the directory Cell::Path), HTML maps through cell_text (write_table_with_tooltips escape_html-escapes). Numerics are pre-formatted Cell::Num and pass through raw. Fixes #739
`run_command_preproc` finalized the shared worker accumulator with two `.expect()` calls -- the panic-on-poison / panic-on-un-joined-Arc pattern #445 removed from the sibling count collector. A worker panic poisoned the mutex and crashed `bca preproc` with a backtrace instead of a clean exit. Extract a `into_preproc_data` helper that mirrors `CountCollector::into_count`: `Arc::try_unwrap` then poison-tolerant `into_inner` on the sole-owned path, and `mem::take` from behind a poison-recovered guard on the still-shared path. The recovered data still holds every joined worker's file entry, so finalization degrades rather than panics. Add regression tests for both the poisoned-mutex and shared-Arc paths (verified by revert). Fixes #740
`Touched::churn()` used a plain `+` to sum `added` and `deleted`, the only churn computation in the VCS subsystem that was not saturating. This contradicts the subsystem's documented no-panic-on-overflow invariant and would panic in a debug build on overflow. Switch to `saturating_add`, matching every sibling accumulator (stats.rs, diff_parse.rs, size_features), and add a regression test that would panic under the old `+`. Fixes #742
read_file_with_eol previously classified its 64-byte prefix with String::from_utf8_lossy + an unconditional pop() + a U+FFFD scan. That heuristic dropped the probe's last character unconditionally, hiding a genuinely invalid trailing byte (wrongly accepting non-UTF-8 input) and rejecting files that legitimately contained the U+FFFD replacement scalar within the probe window. Replace it with a std::str::from_utf8 check: accept on Ok; tolerate an error only when it is a trailing incomplete multibyte sequence (error_len().is_none()) AND the file continues past the probe, so the split character is completed by later bytes; reject otherwise. When the probe is the whole file, an incomplete tail is genuine corruption. The public Ok(None)/Ok(Some(..)) contract and the 64-byte window (now a named UTF8_PROBE_BYTES constant) are unchanged. Fixes #746 Fixes #758
CountCollector::into_count's Err(shared) arm silently returned a race-dependent partial tally when the collector was still shared, violating its own "after every worker has joined" contract and turning a coordination bug into undercounted metrics with no signal. Add a debug_assert!(false, ...) in the still-shared arm so misuse trips loudly in debug and test builds, while release continues to degrade to the best-effort snapshot rather than panicking, honoring the no-panic-in-production contract. Sharpen the doc comment to state that a still-shared collector indicates a caller-side coordination bug and that the returned Count is a non-final snapshot, not the complete aggregate the #[must_use] return implies. into_count is pub and re-exported from lib.rs, so its signature must stay stable until 2.0; the debug_assert path makes the misuse explicit without a SemVer break, mirroring the sibling into_preproc_data recovery pattern. Add a debug_assertions-gated #[should_panic] regression test for the still-shared misuse path (verified by revert: it does not panic without the assert). Fixes #757
Ops::new initialised every space name from get_func_space_name, which
returns Some("<anonymous>") for root Unit nodes that have no name field.
ops_inner overwrites only the final top-level node, so any non-top-level
Unit space kept the synthetic "<anonymous>" placeholder even though the
API docs describe None as the "name could not be resolved" state.
Mirror the SpaceKind::Unit special-case already in FuncSpace::new:
skip name derivation for Unit kinds and leave name = None, so the ops
and metrics paths agree. The top-level override in ops_inner still sets
the caller-supplied name, leaving top-level behaviour unchanged.
Fixes #755
Under --exclude-tests, the prune hook records each pruned test subtree's row span on its innermost enclosing func-space via Sloc::exclude_span. Sloc::sloc() subtracts only that space's own excluded_lines, and Sloc::merge folded only sloc_min/sloc_max, not excluded_lines. For a #[test] fn inside a retained impl/trait/closure (all Rust func-spaces), only that space's SLOC dropped; the unit's span-based SLOC stayed inflated, so MI's 16.2*ln(SLOC) term still inherited the test lines. Ploc was already correct because its line-set merges upward. Fold excluded_lines in Sloc::merge so the pruned count propagates to every enclosing space up to the unit, mirroring Ploc's upward line-set union. Each ancestor's span includes the pruned rows exactly once and pruned subtrees never descend, so the count is subtracted once per altitude with no double-counting. Strengthen sloc_drops_for_test_fn_nested_in_impl to assert the unit root (not just the impl). Add cases for a #[test] fn in a production impl, a test item nested in a closure, and a non-test impl that must be unaffected. Verified via test-via-revert: the unit assertions fail without the fold. Fixes #741
MetricsOptions::with_metric_set stored the caller-supplied MetricSet verbatim, so a public caller could pass empty().with(Metric::Mi) and have the walker compute a maintainability index from zero-valued Loc / Cyclomatic / Halstead inputs with no error. Add MetricSet::resolved() (the set-in/set-out counterpart of from_slice_with_deps) and apply it inside with_metric_set so the set is closed under Metric::dependencies before storing. This matches the existing with_only behaviour, which already resolves the closure and emits the dependencies; compute and emit share one selected set by design, so no output widens beyond what with_only(&[Mi]) already does. The operation is idempotent: an already-closed set is unchanged. Keeps the builder infallible (additive correctness fix, minor bump). Fixes #743
A top-level comma list whose first operand is `not(...)` and whose last operand ends in `)` (e.g. `not(foo), all(test)`) was misread as a single `not(...)` operand: the `not` short-circuit stripped `not` to leave `(foo), all(test)`, which both starts with `(` and ends with `)`, so the whole list was dropped and the trailing `test` operand was lost. Items so annotated were not recognised as test-only and survived exclude_tests. Move the bare-comma-list split ahead of the single-operand not/all/any prefix checks, so each operand is classified in isolation. The existing cfg_split_top_level_args already respects paren depth, so a comma nested inside a predicate's own parens (`not(foo, bar)`, `all(test, unix)`) remains a single operand and is unaffected. Fixes #763
The byte-level C/C++ lexer prepass in `replace` treated every `'` as a character-literal opener. C++14 ([lex.icon]/[lex.fcon]) and C23 allow a single-quote digit separator inside numeric literals (`1'000`, `0xDEAD'BEEF`, `3.14'159`). With an odd separator count the opening `'` had no closing `'`, so the scanner stayed in a spurious Char state to EOF and every later macro identifier leaked through unmasked. Track numeric-literal context (`in_number`) and recognize a `'` flanked by hex digits inside a number as a separator rather than a Char opener. The numeric-context guard distinguishes separators from length-prefixed char literals (`L'x'`, `u8'x'`), whose prefix is an identifier run, not a numeric one, so those still open Char correctly. Bundle the masking-pass cursors into a `MaskState` struct to keep the scanner's working state in one place. Fixes #765
remove_from_code substituted bare LF for every line a removed multi-line comment spanned, so stripping comments from a CRLF source produced mixed CRLF/LF output and `bca strip-comments --in-place` silently rewrote line endings. Detect the source's dominant line ending from its first newline and substitute the matching sequence (\r\n for CRLF, \n otherwise) per removed-comment line, via a CRLF fast-path buffer mirroring the existing LF buffer. Also restore the \r when a single-line comment node absorbed the CR of its terminating CRLF, so the kept line keeps a full \r\n. Regression test strips a multi-line comment from a CRLF fixture and asserts no bare LF survives; the LF test now asserts no CR is introduced. Fixes #767
`Node::child_by_field_name` returned `Option<Node<'_>>`, tying the child to the borrow of `&self` rather than the underlying tree lifetime `'a`. tree_sitter::Node is Copy and its children are valid for the whole tree, so the narrowing was needless and inconsistent with the sibling accessors (`child`, `parent`, `children`, `next/previous_sibling`), which already return `Node<'a>`. Widen the return to `Option<Node<'a>>` so callers may hold a child past the parent borrow. This is additive: every existing call site uses the result transiently, and a wider return lifetime only adds capability, so nothing breaks across lib/cli/web/py. Add a regression test whose helper returns a child past an inner `&parent` reborrow; it fails to compile (E0515) under the old signature and passes under the widened one. Fixes #786
ops_inner lacked metrics_inner's synthetic-Unit-root wrapper, so `Ast::ops()` returned Err(EmptyRoot) for ERROR-root parses (e.g. Lua partial input) where `Ast::metrics()` succeeds. Mirror the metrics seam's push_synthetic_unit_root so the two seams agree: ops() now yields a top-level Unit Ops (name = caller-supplied Source::name) in the same cases metrics() does. Additive correctness, not a breaking shape change to the public Ops type. Also correct the Ops::operands / Ops::operators doc comments: they return the DISTINCT (deduplicated) Halstead operand/operator vocabularies (n2 / n1, HashMap keys), not "all" occurrences. Fixes #789 Fixes #790
The dedicated `(tree_sitter_cpp)` arm of `get_language!` produced `tree_sitter_cpp::LANGUAGE.into()`, which is byte-identical to what the generic `($name:ident)` arm produces for `$name = tree_sitter_cpp` (`$name::LANGUAGE.into()`). Before #718 the arm differed — it routed to `tree_sitter_mozcpp::LANGUAGE` — but commit 95fccbb (the upstream cpp default / opt-in mozcpp work) repointed it at `tree_sitter_cpp`, leaving it dead. `tree_sitter_cpp` exposes `pub const LANGUAGE`, so the generic arm covers Cpp unchanged. Pure no-op: the language→grammar mapping is identical before and after. The typescript/tsx/php arms are retained because they use distinct `LANGUAGE_*` constants. Fixes #770
Add the missing Objective-C bullet to the crate-level rustdoc `## Supported Languages` list and correct two stale slugs the same list carried: TSX (`tsx`, not `typescript`) and the internal `ccomment` / `preproc` C-family helpers (lowercase canonical slugs). Add a drift-guard `#[test]` that derives the canonical slug set from `LANG::into_enum_iter()`/`name()` (the single source of truth, always compiled in regardless of per-language Cargo features) and asserts each distinct slug appears as a backtick token within the section, so a future add-lang cannot silently desync the docs.rs landing page. Fixes #769
Fix five independent documentation/comment accuracy issues. All are comment/doc-only — no behavior change. - concurrent_files_tests.rs: the consumer error-handling test comment overclaimed it exercises both sides of the call-site guard; its `invocations == 2` assertion cannot distinguish swallow vs emit. State that it verifies only loop-continuation and point at the per_file_error_* unit tests for the branch distinction. - concurrent_files.rs: document ConcurrentRunner::new's max(2,n)-1 worker-count transform (total budget = producer + consumers). - macros/mod.rs: add Mi to the bracketed-arm trait list (Mi uses the [Trait] default-body arm alongside Tokens/Nom/NArgs). - metrics/wmc.rs: Kotlin companion_object IS a func_space with its own Class scope (#431); correct the comment that claimed otherwise. - book metrics.md: the mi.sei comment term is a percentage in [0,100], not a [0,1] ratio; relabel comment_ratio -> comment_percentage. Fixes #764 Fixes #766 Fixes #771 Fixes #788 Fixes #774
The #765 digit-separator fix grows step_normal's cyclomatic to the hard limit (a flat hand-rolled lexer dispatch) and the #767 CRLF helper adds a small LineEnding type that lifts comment_rm's file-level nargs sum. Both pass the hard gate but encroach the 95% headroom band. Mark them with in-source suppression + rationale rather than contorting the code: the lexer is clearest as one state machine and file-level nargs is an aggregation artifact, not a wide-signature smell.
The predefined-macro table listed 14 `UINT*_MIN` entries that are not
real C standard macros: ISO C defines `UINTN_MAX` but no `UINTN_MIN`
(the minimum of any unsigned type is 0, so no macro exists). As a
result `is_predefined_macros` returned true for non-macros and the
macro-masking prepass could clobber a user identifier such as
`UINT16_MIN`. The legitimate signed `INT*_MIN` macros are retained.
The preproc specials table likewise listed `char64_t` and `charptr_t`,
which the `enums/data/special.py` generator documents as not real
C/C++ types and never produces (it widens `char{8,16,32}_t` only and
omits `charptr_t`). Because that generator only adds entries
(diff = specials - old) it can never prune stale rows, so these made
`is_specials` return true and a `#define char64_t ...` was wrongly
dropped from macro extraction.
Removed the entries from the `.txt` sources and regenerated the `.rs`
slices via `cargo run -p enums -- --language c_macros`; the slices
stay sorted (binary_search lookup), and regeneration is idempotent.
Added regression tests pinning the pruned names (the codegen smoke
tests only cover generic `names[0]`/`names[1]` placeholders).
Fixes #760
Fixes #762
Bash `heredoc_body` (Bash::HeredocBody2) and Perl `heredoc_body_statement` (Perl::HeredocBodyStatement) match Checker::is_string but were omitted from their Alterator flatten arms, so a heredoc body with interpolation kept its structured interpolation/fragment children in the AST dump while is_string treated the whole node as one string -- the dump asymmetry #699 eliminated elsewhere (Ruby HeredocBody, PHP Heredoc/Nowdoc). Add both variants to the flatten arms, mirroring Ruby/PHP. This is a dump-only change: find/count already treated these nodes as strings, so no metric value moves and no snapshot shifts. Adds regression tests verifying the dumped node is a flat string with its interpolation text preserved (test-via-revert confirmed both fail against the pre-fix arms). Fixes #761
Issue #768 reported that the LPAREN2/LBRACK2/LBRACK3 second-alias opener arms in get_op_type (Cpp/C/Objc/Mozcpp/Tcl/iRules/Php/Elixir/ Ruby) escape the #695 pair-glyph fold, inflating Halstead n1 and rendering a bare "(" instead of the folded "()". Investigation shows the premise does not hold. tree-sitter's runtime collapses each second alias to its base via the grammar's public_symbol_map before Node::kind_id() (ts_node_symbol) returns, so the alias kind_id is never observable to the metric layer. Verified empirically across the exact constructs that produce the alias internally (#if defined(FOO), f(1), a[0], in Point[1,2]): the alias kind_ids 20/47/46/95/96/155 appear zero times; only base openers reach compute_halstead, n1 is not inflated, and openers already render folded. No metric value changes; no snapshots shift. The alias match arms are therefore defensive, not active. Rather than apply the no-op fold #768 proposed (which would imply a fold is needed when it is not), this documents each alias arm as defensive and adds second_alias_opener_collapses_to_base_kind_id, which pins the runtime-collapse invariant. If a future grammar bump drops the public_symbol_map collapse, the test goes red and signals that the alias arms must then fold to their pair glyph in get_operator_id_as_str. Fixes #768
The ABC condition-counting terminal-bool sets for Python and the JS family (JavaScript, Mozjs, TypeScript, Tsx) omitted the numeric-literal kind, so a numeric-truthy operand in an if/while/&&/|| slot scored zero conditions -- the exact gap fixed for Lua. Python treats every non-zero number as truthy and the JS family treats every non-zero/non-NaN number as truthy, so `if 5:` (Python) and `while (5)` / `x && 5` (JS/TS) each count their numeric literal as a Fitzpatrick unary condition. Add Python::Integer / Python::Float and the JS-family Number literal kind to the affected macros. TypeScript and Tsx emit `number` under two ids: the numeric literal (Number) and the `predefined_type` keyword `number` in a type annotation (Number2). Only the literal is a value, so Number2 is deliberately omitted. Statically-typed languages (Go/Rust/Java/Kotlin/C#) correctly omit numerics -- a bare int in a bool slot is a type error -- and are left untouched. Adds regression tests mirroring lua_number_truthy_condition_counts and records the JS integration-fixture snapshot shifts in the submodule. Fixes #772
Kotlin's ABC compute had no Phase-2B condition-slot arm, so a
non-comparison predicate (`if (flag)`, `while (running())`,
`do {} while (ok)`) counted zero ABC conditions — diverging from every
sibling language and dropping below Kotlin's own cyclomatic decision
count. Route the `if`/`while`/`do-while` `condition` field through a new
`kotlin_count_condition` helper that counts a bare terminal once and
unwraps paren/negation via the existing `kotlin_inspect_container`. A
comparison (`a == b`) or `&&`/`||` predicate is a nested binary_expression
already counted by the token and chain-walker arms, so it adds nothing
here — no double-count.
Regression tests cover the three bare-predicate forms plus the
parenthesised case, with guard tests pinning that comparison and
short-circuit predicates are unchanged. No snapshots shifted.
Fixes #773
PHP `Cognitive::compute` was missing the function-definition boundary arm that every sibling language gained in #696. It never reset structural `nesting` to 0 nor bumped the function-depth surcharge at a `FunctionDefinition` / `MethodDeclaration`, so a PHP function or method nested inside control flow inherited the enclosing nesting and missed the SonarSource B-nesting amplification. Add the boundary arm mirroring Java exactly: `nesting = 0` plus `increment_function_depth(&mut depth, node, &[FunctionDefinition, MethodDeclaration])`. Closures (`AnonymousFunction` / `ArrowFunction`) keep their existing lambda arm and intentionally do not reset nesting, matching how the siblings treat lambdas vs named functions. A function `inner()` defined inside `if ($a) { if ($c) { … } }` whose body is `if ($b) { if ($d) { … } }` drops from cognitive 7 to 5; the same body as a top-level function is unchanged at 3. No snapshot fixtures contain nested function definitions, so none shifted. Fixes #775
The Elixir cyclomatic impl added +1 for every `stab_clause`, but the first `stab_clause` of an `anonymous_function` is the closure's head/definition, not a pattern-dispatch decision. Because the closure already opens its own function space seeded with base cyclomatic 1, a trivial `fn x -> x end` reported cyclomatic 2 for zero decisions, diverging from cognitive (which already scores it 0) and from trivial closures in Python/Rust/JS. Discriminate on the parent: a `stab_clause` keeps counting unless it is the first `stab_clause` child of an `anonymous_function` parent. `case`/`cond`/`with`/`try` arms have a `do_block` parent and are unaffected; multi-clause `fn`s skip only their head clause, so N clauses add N-1 branches. Modified cyclomatic is untouched (the anon fn is not a Call). Fixes #776
The JS-family LOC impls (Mozjs, Javascript, Typescript, Tsx) listed
`StatementBlock` among the LLOC contributors, so every `{ … }` brace
block added +1 logical line. No other language does this: Rust's
`Block`, C/C++/Objc/Mozcpp's `CompoundStatement`, and
Java/Groovy/C#'s `Block` all contribute 0 lloc. A statement_block is
a syntactic grouping, not a logical statement, so the count inflated
JS/TS lloc and broke cross-language parity.
Remove `StatementBlock` from all four JS-family LLOC match arms; it
now falls through to the PLOC catch-all like every sibling block
node. For `if (x) { return 1 } else { return -1 }` the JS lloc drops
from 6 to 3, matching the C and Rust translations.
Update the existing anchored lloc tests to the corrected values and
add `js_family_if_lloc_matches_c_and_rust`, which pins parity across
Mozjs/JS/TS/TSX and the C/Rust baselines. Refresh the lib insta
snapshots and the integration-output submodule fixtures (every
changed lloc value decreases; no other metric shifts).
Fixes #777
Interior rows of a multi-line string literal were classified inconsistently across languages. Python credits every row of a non-docstring multi-line string to PLOC (#415: "real code, not blank lines"), but most other languages no-op'd their string nodes, so the interior rows reached neither PLOC nor CLOC and `blank = sloc - ploc - cloc` mislabelled them as blank. Add a shared `add_multiline_string_ploc` helper mirroring Python's #415 mechanism exactly (insert the opening row only when the enclosing statement begins earlier, then insert `start + 1..=end`) and route every multi-line-capable string node through it: Perl (quoted strings + heredoc body), Ruby (strings + heredoc body), PHP (encapsed / single string), Go (raw string), Rust (string / raw string), C / C++ / Mozcpp / Objc (string / raw string), Java + Groovy (text blocks), C# (verbatim / raw string), Kotlin (multiline string), Lua (long-bracket string), and JS / TS / TSX / Mozjs (template literals). Python is the canonical reference and is unchanged. A 3-line string assignment now reports ploc == 3 / blank == 0 in every language with a multi-line string literal, matching Python. Tcl, Bash, and PHP heredoc already behaved correctly. Fixes #778
Go's panic() and Lua's error()/os.exit() are built-in abrupt-exit calls that unwind the stack like throw/raise in the exception languages, but were silently uncounted while every other exit-language counts its throw/raise equivalent. Match them by callee text, mirroring the Bash/Elixir builtin-by-name pattern already in this file: - Go: a call_expression whose `function` field is a bare identifier spelling `panic`. A package-qualified `foo.panic()` has a selector_expression callee and is correctly not counted. - Lua: a function_call whose `name` field text is `error` or `os.exit`. A user call such as `foo()` or `myError(...)` is not counted. `return` is still counted; panic/error/os.exit are additional exits. Fixes #779
C# 8+ permits explicit private/protected modifiers on interface members, but CsharpCode::Npa::compute counted every interface field as a public attribute (interface_npa = interface_na), inheriting Java's idiom without Java's invariant that interface fields are always public. Interface members default to public, so gate interface_npa per declaration on the inverse of the class rule: a field counts as a public attribute unless it carries an explicit private/protected modifier. The new csharp_interface_member_is_public helper mirrors ts_member_is_public!'s default-public gate. The modifier applies to every declarator of a field, so the public/private split is per-declaration. Also replaces the assign-the-whole-sum smell (stats.interface_npa = stats.interface_na) with a proper increment. Fixes #780
PHP was the only language whose Npa impl counted enum cases as class
attributes (`enum Color { case Red; case Green; case Blue; }` reported
class_na = class_npa = 3). Java, Kotlin, Rust, and C# all treat enum
cases/variants as sum-type tags, not data fields, and exclude them.
Remove the `EnumDeclarationList` arm so PHP enum bodies contribute no
npa attributes, matching the established cross-language convention. PHP
enums cannot declare instance properties, and class-level `const`s are
not counted as attributes outside an enum either (the class-body arm
counts only `PropertyDeclaration`), so excluding the body entirely keeps
PHP internally consistent as well.
Adds regression tests: enum cases report 0, enum-with-const reports 0,
and a cross-language parity check that PHP and Java equivalent enums
both report (0, 0). Regenerates the traits_enums.php integration
snapshot in the big-code-analysis-output submodule.
Fixes #781
The Display impl for nargs::Stats emitted the function_args/closure_args headline from the per-space accumulators (function_args/closure_args), while every structured serializer (JSON/YAML/TOML/CBOR via wire.rs) emits them from the cross-space sum (function_args_sum/closure_args_sum). The accumulators are not combined by Stats::merge, so at a parent space that rolls up child function-spaces the text output disagreed with JSON for the same field name (e.g. python_nested_functions: JSON 3, Display 2). Switch the two Display lines to the *_sum accessors so text matches JSON. total/average/*_average already sourced from the summed accessors and are unaffected; this is an output-format fix, not a metric-value change, so no JSON/YAML snapshots shift. Adds a parity regression test asserting the Display headline equals the summed accessors for a nested-function space. Fixes #782
Add per-language smoke tests for C, Objective-C, Elixir, Ruby, and iRules, the five registered languages that lacked the non-zero-tokens positive test required by Lesson 1. Each mirrors the existing smoke-test idiom (check_metrics + tokens_sum() > 0) on small realistic source. Fix cpp_tokens_nested_attribution: the docstring described a forward-declaration fixture the test never used; correct it to match the actual lambda fixture and replace the vacuous max >= 7 / max <= sum inequalities with exact assertions (sum == 24, max == 24, min == 0). A C++ lambda is a closure but opens no FuncSpace, so all leaves attribute to the single outer scope; the exact values fail if a regression split the lambda into its own scope or credited the unit, which the prior inequalities would have passed. Drop the unfulfilled "unit-level sum must equal total of all scopes" comment. Fixes #785 Fixes #787
- csv.rs: numeric metric cells CAN start with `-` (negative MI), but are safe as self-contained numeric literals, not "cannot begin with a trigger". State the real invariant. - dump_ops/dump_metrics/dump.rs: drop dead `[`Result`]: #variant.Result` doc-links; bare `[`Result`]` resolves via intra-doc. - dump_metrics.rs: format_number's whole-float-as-integer DIVERGES from JSON (which keeps `.0`); it is a legibility choice, not a match. - dump.rs: document the `# Panics` precondition that `code` must be the exact source `node` was parsed from, on dump_node and dump_node_with_color. - dump.rs: name `dump_tree_helper` in the depth-limit test comment; the old `dump_children` no longer exists post-#700. - warning_line.rs: drop the false GitHub Actions `::warning::` / "out of the box" claims; CI annotators need a registered problem matcher for Clang/MSVC lines. - parser.rs + nodes.md: document numeric -t/--type as a raw, grammar- version-unstable kind_id match. Fixes #791 #792 #793 #795 #796 #797 #799
read_file_with_eol stripped a leading UTF-16 BE/LE BOM and kept the interleaved-NUL body, which then passed the byte-level UTF-8 probe (each NUL is a valid single-byte scalar) and reached the parser as garbage. A UTF-16 BOM now skips the file (Ok(None)), consistent with is_generated's documented stance that UTF-16 source is unsupported. The UTF-8 BOM is still stripped and read, and the BOM checks now use bounds-safe slice prefixes so a sub-3-byte probe cannot panic. The probe read also collapsed every read_exact failure to Ok(None), swallowing real I/O faults (permission, hardware) that the documented contract promises to surface as Err. The read now distinguishes a clean short read (UnexpectedEof -> Ok(None)) from any other error kind, which propagates as Err. Fixes #803 Fixes #804
The UnknownMetric error hint re-derived the suppressible vocabulary from Metric::NAMES with a hardcoded "tokens" filter, contradicting Metric::suppressible()'s documented claim of being the single source of truth. Route the hint through suppressible() (sorted to keep the list alphabetised) and guard the derivation with a test assertion. Also fix two doc inaccuracies on published API: - Search::act_on_node's trait-declaration param was named `pred` though the callback is an action with no return; rename to `action` to match act_on_child and the node.rs impl (cosmetic, non-breaking). - PreprocFile::new_macros's doc said it "adds new macros" but it constructs a fresh PreprocFile; reword as a constructor. Fixes #801 Fixes #802 Fixes #805
Five VCS-internal integer-arithmetic sites used bare +/-/sum that
panic in a debug build (overflow checks on) and wrap in release,
violating the no-panic invariant the subsystem documents
(src/vcs/stats.rs: "saturating_add everywhere ... the crate forbids
panics in non-test code"). None changes a metric value for normal
inputs; each makes one degenerate-input path total.
- blame line math: convert a blame hunk's 0-based start + length to a
1-based inclusive LineRun via a named saturating constructor
(LineRun::from_blame_hunk), so a line number at the u32 ceiling
clamps instead of overflowing the `+ 1` / `+ len - 1`.
- window cutoffs: `now - *_window_secs` underflows i64 for a garbage
extreme `--as-of` (e.g. i64::MIN). saturating_sub floors the
boundary at i64::MIN ("no lower bound" = include all history), the
safe total behavior. Fixed uniformly at all four sites:
git/history.rs, replay.rs (long + recent), git/jit.rs (long +
recent).
- secs_to_days: saturating_add the rounding term so secs_to_days(i64::MAX)
saturates to u32::MAX instead of wrapping negative and flooring to 0.
- cross-author edit sum: fold with u32::saturating_add (the per-author
counts were already saturating_add-clamped; std Sum<u32> panicked on
overflow).
Each fix carries an overflow regression test that panics against the
pre-fix bare arithmetic in debug. The pre-existing
secs_to_days_saturates_huge_and_floors_negative test was
false-confidence (it only exercised an in-range magnitude); it now
asserts secs_to_days(i64::MAX) == u32::MAX, the case that actually
triggered the overflow.
Fixes #809
Fixes #814
Fixes #820
Fixes #821
Two precision regressions in commit-message classification, both violating the module's "favour precision over recall" contract. #806: rollback detection ran an unanchored `\brollback\b` against the full multi-line message while revert was subject-anchored, so a body-prose "rollback" mention flipped the whole commit to revert. Give rollback the same discipline as revert: match `^rollback\b` against the subject line only. A leading "Rollback the migration" still classifies; a buried mention no longer does. #808: the security regex matched bare `injection` and `overflow`, which fire on routine non-security commits ("dependency injection", "text overflow", arithmetic "integer overflow"). Replace them with qualified attack-vector forms: `(?:sql|command|code|html|ldap|os|xml)\s+injection` and `(?:buffer|heap)\s+overflow`. "integer overflow" and "stack overflow" are deliberately excluded as ambiguous (arithmetic bug / website name); precision wins over recall. Adds negative/positive regression tests for both, verified via revert against the old regexes. Fixes #806 Fixes #808
Two author-identity bugs in VCS participant resolution that skew bus-factor / ownership / JIT-experience metrics. #812: ParticipantResolver::participants ran the Co-authored-by regex over the whole commit message, so a `Co-authored-by:` line quoted in the body (a pasted log, a reverted commit, an indented code block) was falsely attributed as a real co-author. Git only honours trailers in the message's final block, so scan only the detected trailer block: the last paragraph (after the final blank-line separator), treated as trailers only when at least one line is a recognised `Token: value` trailer and at least 25% of its non-blank lines are (mirroring git-interpret-trailers' threshold). A genuine trailing `Co-authored-by:` alongside a `Signed-off-by:` still counts. #817: AuthorId::new keyed an author with empty name AND empty email to the empty string, collapsing every such author into one phantom identity that accrued all their edits, ownership, and first-authorship and deflated the distinct-author count. Add AuthorId::has_identity and drop keyless authors in push_if_human, mirroring the existing bot guard. A commit whose only author is keyless yields an empty participant set, which process_commit already skips as unknown-author. Eq/Hash on AuthorId are unchanged; has_identity is additive. Fixes #812 Fixes #817
The persistent VCS history cache wrote a `truncated_shallow_clone` flag but `is_compatible` never read it back. An entry written from a shallow clone was reused verbatim after the repo was deepened (`git fetch --unshallow` leaves HEAD unmoved, so the head-SHA key is unchanged and the pure-hit/incremental paths fire), silently replaying truncated event counts while reporting `shallow = false`. The reverse (cached full, now shallow) reported a false truncation flag over complete counts. Thread the repository's current shallow state through `is_compatible` and `load_compatible` and require `truncated_shallow_clone` to equal it for reuse. A mismatch in either direction now falls through to the re-walk path, which produces the correct counts and flag. The normal same-shallow-state pure-hit and incremental-splice paths are unaffected. Fixes #810
A pure rename (similarity 100%, no body) emits no `+++ b/<new>` line, so
diff_git_new_path took the new path solely from the ambiguous
`diff --git a/<old> b/<new>` header. The `rsplit(' ')` fallback walked
from the right and yielded the first `b/`-prefixed token, truncating a
spaced new name (`b/new name.rs` -> `new`) and corrupting the diffusion
key.
Capture the unambiguous `rename to <path>` line in the stanza parser and
set the new-side path from it (decoded with the same C-style unquoting),
in preference to splitting the spaceful header. Content-change renames
and plain modifies are unaffected; non-rename stanzas see no
`rename to` line and keep the existing logic.
Fixes #813
blob_line_stats diffed every touched text blob twice per scored commit: line_counts() for the added/deleted totals and lines() only to count hunks. In gix 0.83 each runs a full blob diff, so every modified file was diffed twice. Prepare the diff once and read insertions, removals, and the hunk count off a single gix_diff::blob::Diff::compute pass, mirroring line_counts()'s own internals. Because this uses the same un-slider-postprocessed algorithm line_counts() uses, the added/deleted totals stay bit-identical to the old two-pass code and to the file-level history walk (history.rs still calls line_counts()), so the JIT size features and the history walk cannot diverge. The full integration suite (which pins exact added/deleted/hunks values) and all insta snapshots are unchanged. Fixes #815
The #810 module-doc reference to the pub(crate) HistoryCache::is_compatible tripped -D rustdoc::private-intra-doc-links; use a plain code span.
The unsalted SHA-256 of a canonical email emitted by `--emit-author-details` (and stored in the persistent cache) was described as "irreversible" / "injective for practical purposes" / publishable "without disclosing the underlying email". An email is low-entropy and enumerable and commit histories are public, so the digest is recoverable against a candidate email set (the Gravatar weakness). Replace the false guarantees with an honest threat model: the digest is a stable pseudonym that avoids emitting plaintext email and deters casual disclosure, but is not cryptographically irreversible and is not anonymization. Doc/comment + test-rename only; no change to the hashing scheme. Hardening (keyed HMAC / KDF) is deferred as a follow-up because it must be reconciled with the #334 cache-replay stable-digest invariant. - identity.rs / cache.rs / git/history.rs: honest threat-model notes - book vcs.md: new "Author-detail privacy" section; corrected cache note - book python/vcs.md: corrected cache note, links to the new section - rename test hashed_is_stable_and_irreversible_looking -> hashed_is_stable_and_avoids_plaintext_email; assert it avoids the plaintext email rather than "looks irreversible" Fixes #811
Fix six VCS documentation/test-accuracy issues, all comment/doc/test changes with no production logic change: - #807: assert Display for all 16 vcs::Error variants (was 11); add InvalidFileTypeScope, InvalidBusFactorThreshold, InvalidTrend, Blame, InvalidDiff cases so the test lives up to its "pin each variant" claim. - #816: JitDiffReport doc claimed the partial diff score is "always lower" than the full commit score, but the experience term is negatively signed, so the full score can fall below the partial; the two are not ordered. Removed the false ordering claim. - #818: workdir_root doctest asserted None == None (relative paths never resolve). Rewrote it to illustrate two same-checkout paths sharing one Some(root), asserting only when both resolve. - #819: trend.rs comment claimed each point is an O(commits) lookup rather than its own walk, but since #648 every per-point build re-anchors and re-walks the first-parent timeline. Corrected to describe the shared timeline's actual (oid-resolution) role and the per-point re-walk cost. - #822: wire.rs test comment claimed VcsTrend(Point) use serde(flatten); they carry a nested vcs key. Removed the false flatten/CBOR-footgun claim, kept the write-only-untested rationale. - #823: score.rs called size_factor a "tiny tie-breaker" / "well under one churn-point", but it enters base at coefficient 1.0 (~0.85 at 10k SLOC, >1.0 past ~50k). Corrected both comments to cite real magnitudes. Fixes #807 #816 #818 #819 #822 #823
Record the Unreleased entries for the #739-#823 batch: cross-language metric consistency fixes, output edge cases, VCS correctness/arithmetic/ security fixes, the read_file_with_eol hardening, new APIs (MetricSet::resolved, public defang_formula, AuthorId::has_identity), the JIT single-pass diff, and the author-hash privacy honesty fix.
audit-tests found two new tests that passed without fully exercising their fix: - ccomment_remove_comments_preserves_crlf (#767) only asserted CRLF consistency, so an impl that *dropped* removed-comment lines would still pass. Add a line-count-preservation assertion (comment removal blanks lines, never deletes them). - coauthor_in_prose_last_paragraph_is_not_counted (#812) used a mid-line Co-authored-by mention that the ^-anchored regex never matched, so it passed regardless of trailer-block scoping. Rewrite to a column-0 co-author in a non-final paragraph (renamed coauthor_outside_final_trailer_block_is_not_counted) that goes red if scoping is dropped.
code-review --fix: npa.rs (#780) and npm.rs (#783) independently computed "node has an explicit private/protected Modifier" with inconsistent casts (Csharp::Private vs Private as u16). Extract one pub(crate) csharp_node_has_private_or_protected_modifier in npa.rs and reuse it from both gates — single source of truth for the C# narrowing rule. Also scope a clippy::naive_bytecount allow on the small CRLF test fixture counter added in the prior audit-tests commit (the bytecount crate is unwarranted there).
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #957 +/- ##
==========================================
+ Coverage 85.50% 85.69% +0.19%
==========================================
Files 106 101 -5
Lines 64642 65364 +722
Branches 64248 65364 +1116
==========================================
+ Hits 55270 56015 +745
+ Misses 8943 8918 -25
- Partials 429 431 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This was referenced Jun 15, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Batch fix of the open
#739–#823backlog: 73 issues fixed across thelibrary, CLI, output serializers, VCS subsystem, and docs. Two issues were
investigated and found to be non-bugs (left open with evidence), and one
hardening follow-up was filed.
Each fix is its own atomic commit carrying a
Fixes #Ntrailer (so theissues close on merge), with a regression test verified by reverting the
production change.
Highlights by area
numeric-truthy operands + Kotlin bare predicates (abc: Python/JS-family terminal-bool sets omit numeric literals (numeric-truthy operands undercounted) #772, fix(abc): Kotlin bare if/while/do-while predicate counts 0 conditions #773); PHP
cognitive function-boundary nesting (php: cognitive complexity skips function-boundary nesting reset / depth surcharge (#696 gap) #775); Elixir cyclomatic head-clause
(fix(cyclomatic): Elixir single-clause anonymous fn over-counts (head stab_clause counted as a branch) #776); JS-family LLOC
statement_block(metrics(loc): JS-family LLOC counts statement_block as a logical line, inflating JS/TS LLOC vs other languages #777); multi-line-string PLOCacross 17 languages (metrics(loc): multi-line string interior rows counted as PLOC in Python but as blank in Perl (cross-language inconsistency) #778); Go
panic/Luaerrorexits (metrics(nexits): Go panic and Lua error/os.exit uncounted as exits (cross-language inconsistency) #779); C#/PHPnpa/npmvisibility & enum-case counting (metrics(npa): C# interface fields counted public, ignoring explicit private/protected #780, metrics(npa): PHP counts enum cases as attributes; Java/Kotlin/Rust/C# do not (cross-language inconsistency) #781, metrics(npm): C# narrowed accessor visibility miscounted as public method #783);exclude_testsSLOC propagation (metrics: exclude_tests still leaves unit-level loc.sloc inflated for impl/trait/closure-nested tests (#722 follow-up) #741);with_metric_setdependencyresolution (big-code-analysis: MetricsOptions::with_metric_set accepts unresolved metric dependencies #743);
nargstext/JSON parity (metrics(nargs): Display function_args/closure_args use per-space counter, diverging from JSON sum #782).read_file_with_eol(big-code-analysis: read_file_with_eol misclassifies some valid UTF-8 files as non-UTF-8 #746/big-code-analysis: read_file_with_eol rejects valid UTF-8 containing U+FFFD in the probe window #758, big-code-analysis: read_file_with_eol strips UTF-16 BOM then accepts UTF-16 body as garbage #803, big-code-analysis: read_file_with_eol swallows probe read_exact I/O errors as Ok(None) #804); C++14/C23 digit-separatorlexer (big-code-analysis: c_macro lexer treats C++14 digit-separator ' as a char-literal opener, leaking later macros #765); CRLF-preserving comment removal (big-code-analysis: comment removal emits LF newlines, corrupting CRLF files #767);
Nodelifetimewiden (node: child_by_field_name over-narrows return lifetime to Node<'_> vs Node<'a> #786); ops synthetic-Unit root (big-code-analysis: ops_inner lacks metrics_inner's synthetic-Unit-root wrapper, erroring (EmptyRoot) where metrics() succeeds #789); cfg comma-list (cfg: not(...)-led comma list swallows later test operand (false negative) #763).
/SARIF/numfmt edge cases (output(checkstyle): start_col=Some(0) emits invalid column="0" (siblings clamp to 1) #784, output(sarif): colon in relative path first segment emits scheme-ambiguous uri-reference #798, output(numfmt): MessageMetric renders tiny negatives as "-0" #800); CSV formula-injection
defang for the VCS report (CWE-1236) (security(vcs): CSV writer path column not defanged against formula injection (CWE-1236), bypassing #703 #794).
co-author scoping (vcs: Co-authored-by matched outside the trailer block yields phantom co-authors #812); empty-identity author drop (vcs: AuthorId::new collapses all name-and-email-less authors into one empty-keyed identity #817); shallow-clone
cache invalidation (vcs: cache reuses shallow-walked entry after unshallow; truncated_shallow_clone written but never read #810); rename-path parsing (vcs(diff): rename-only file with spaced path truncated at first space #813); single-pass JIT
blob diff (vcs(jit): blob_line_stats diffs each touched blob twice (line_counts + lines) #815); saturating arithmetic restoring the no-panic invariant
(vcs: Touched::churn uses non-saturating add, breaking the subsystem's no-panic invariant #742, vcs: blame line-number conversion uses non-saturating u32 add, breaking the no-panic invariant #809, vcs: collect_events long_boundary subtraction can overflow i64 on extreme --as-of #814, vcs: secs_to_days non-checked add overflows on near-i64::MAX window, breaking no-panic + totality claim #820, vcs(stats): total_edits uses non-saturating sum, breaking the no-panic invariant #821); honest author-hash privacy docs (vcs: --emit-author-details hash of email is not practically irreversible #811).
per-language
tokenssmoke tests (c_macros: predefined table lists nonexistent UINT*_MIN macros #760, preproc: c_specials contains non-type entries char64_t/charptr_t the generator excludes #762, tests: consumer error-handling test comment overclaims guard coverage #764, lib: ConcurrentRunner::new doc omits the max(2,n)-1 worker-count transform #766, docs(lib): Supported Languages rustdoc omits Objective-C (LANG::Objc) #769, macros: implement_metric_trait! comment omits Mi from the bracketed-arm trait list #771,docs(metrics): book mi.sei formula labels the comment term comment_ratio but code uses a [0,100] percentage #774, tests(tokens): no smoke test for C, Objc, Elixir, Ruby, iRules #785, tests(tokens): cpp_tokens_nested_attribution has stale docstring + vacuous assertions #787, docs(wmc): Kotlin companion_object comment contradicts #431 (it IS a func_space / own Class scope) #788, docs(output): csv comment wrongly claims metric cells can't start with a formula trigger (negative MI starts with '-') #791–docs(dump_metrics): format_number comment claims a JSON trailing-.0 elision that serde_json does not do #793, output(dump): public dump_node panics on mismatched code/node, undocumented precondition #795–docs(output): warning_line module doc claims GitHub Actions ::warning:: recognition out of the box; wrong syntax + needs problem matcher #797, docs(parser): numeric -t/--type filter silently matches by grammar-unstable kind_id #799, traits: Search::act_on_node callback param misnamed 'pred' #801, docs(preproc): PreprocFile::new_macros doc describes mutation, not construction #802, docs(suppression): UnknownMetric hint re-derives suppressible set, contradicting Metric::suppressible() single-source-of-truth doc #805,
tests(vcs): display_covers_every_variant covers 11 of 16 Error variants #807, docs(vcs/jit): partial diff score is not 'always lower' than the commit score (experience term is negative) #816, docs(vcs): workdir_root doctest asserts None == None, not the convergence it claims #818, vcs: build_trend per-point build re-walks the first-parent timeline (stale O(commits)-lookup comment) #819, docs(wire): VcsTrendPoint/VcsTrend test comment claims #[serde(flatten)] Vcs, but the field is nested #822, docs(vcs/score): size_factor is not a 'tiny tie-breaker'; comment understates its weight #823).
Not fixed (investigated → non-bug)
functions/closuresare a partition (shared
computeearly-return), and Go/Rust/C#/C++ allexclude their closure kinds from
is_functoo. Recommend wontfix.compute_go_args+ a pre-existing regression test.Follow-up
--emit-author-details(deferred hardening from vcs: --emit-author-details hash of email is not practically irreversible #811; must reconcile with the perf: persistent VCS history cache keyed by HEAD SHA #334 cache-replay
invariant).
Quality assurance
Survived five independent passes —
simplify-rust,rust-optimize,audit-tests(each of ~150 new tests verified by revert; 2 weak assertionsstrengthened),
review, andcode-review max(one C# predicate dedupapplied).
make pre-commitis green: fmt, clippy ×2-D warnings, workspacetests
--all-features, doc, udeps, all lint families, manpages, self-scanboth tiers, and the Python stages.
Snapshots
Four metric changes shifted integration snapshots in the
big-code-analysis-outputsubmodule; those commits are pushed(
edacd90f..3fba6241onmain) and the parent records the matching SHA.