fix: batch of 73 bug/doc fixes (#739–#823) by dekobon · Pull Request #957 · dekobon/big-code-analysis

dekobon · 2026-06-15T18:12:21Z

Summary

Batch fix of the open #739–#823 backlog: 73 issues fixed across the
library, CLI, output serializers, VCS subsystem, and docs. Two issues were
investigated and found to be non-bugs (left open with evidence), and one
hardening follow-up was filed.

Each fix is its own atomic commit carrying a Fixes #N trailer (so the
issues close on merge), with a regression test verified by reverting the
production change.

Highlights by area

Cross-language metric consistency (snapshots re-baselined): ABC
numeric-truthy operands + Kotlin bare predicates (abc: Python/JS-family terminal-bool sets omit numeric literals (numeric-truthy operands undercounted) #772, fix(abc): Kotlin bare if/while/do-while predicate counts 0 conditions #773); PHP
cognitive function-boundary nesting (php: cognitive complexity skips function-boundary nesting reset / depth surcharge (#696 gap) #775); Elixir cyclomatic head-clause
(fix(cyclomatic): Elixir single-clause anonymous fn over-counts (head stab_clause counted as a branch) #776); JS-family LLOC statement_block (metrics(loc): JS-family LLOC counts statement_block as a logical line, inflating JS/TS LLOC vs other languages #777); multi-line-string PLOC
across 17 languages (metrics(loc): multi-line string interior rows counted as PLOC in Python but as blank in Perl (cross-language inconsistency) #778); Go panic/Lua error exits (metrics(nexits): Go panic and Lua error/os.exit uncounted as exits (cross-language inconsistency) #779); C#/PHP
npa/npm visibility & enum-case counting (metrics(npa): C# interface fields counted public, ignoring explicit private/protected #780, metrics(npa): PHP counts enum cases as attributes; Java/Kotlin/Rust/C# do not (cross-language inconsistency) #781, metrics(npm): C# narrowed accessor visibility miscounted as public method #783);
exclude_tests SLOC propagation (metrics: exclude_tests still leaves unit-level loc.sloc inflated for impl/trait/closure-nested tests (#722 follow-up) #741); with_metric_set dependency
resolution (big-code-analysis: MetricsOptions::with_metric_set accepts unresolved metric dependencies #743); nargs text/JSON parity (metrics(nargs): Display function_args/closure_args use per-space counter, diverging from JSON sum #782).
Core: byte-level UTF-8 probe + UTF-16-BOM/IO handling in
read_file_with_eol (big-code-analysis: read_file_with_eol misclassifies some valid UTF-8 files as non-UTF-8 #746/big-code-analysis: read_file_with_eol rejects valid UTF-8 containing U+FFFD in the probe window #758, big-code-analysis: read_file_with_eol strips UTF-16 BOM then accepts UTF-16 body as garbage #803, big-code-analysis: read_file_with_eol swallows probe read_exact I/O errors as Ok(None) #804); C++14/C23 digit-separator
lexer (big-code-analysis: c_macro lexer treats C++14 digit-separator ' as a char-literal opener, leaking later macros #765); CRLF-preserving comment removal (big-code-analysis: comment removal emits LF newlines, corrupting CRLF files #767); Node lifetime
widen (node: child_by_field_name over-narrows return lifetime to Node<'_> vs Node<'a> #786); ops synthetic-Unit root (big-code-analysis: ops_inner lacks metrics_inner's synthetic-Unit-root wrapper, erroring (EmptyRoot) where metrics() succeeds #789); cfg comma-list (cfg: not(...)-led comma list swallows later test operand (false negative) #763).
Output / security: GFM-escaped VCS bus-factor cells (cli: bus-factor directory cells not GFM-escaped in Markdown VCS report #739); Checkstyle
/SARIF/numfmt edge cases (output(checkstyle): start_col=Some(0) emits invalid column="0" (siblings clamp to 1) #784, output(sarif): colon in relative path first segment emits scheme-ambiguous uri-reference #798, output(numfmt): MessageMetric renders tiny negatives as "-0" #800); CSV formula-injection
defang for the VCS report (CWE-1236) (security(vcs): CSV writer path column not defanged against formula injection (CWE-1236), bypassing #703 #794).
VCS: commit-classification precision (vcs: rollback revert-detection unanchored, matches body prose (asymmetric with ^revert) #806, vcs: SECURITY_FIX regex 'injection'/'overflow' false-positive on dependency injection / UI overflow #808); trailer-block
co-author scoping (vcs: Co-authored-by matched outside the trailer block yields phantom co-authors #812); empty-identity author drop (vcs: AuthorId::new collapses all name-and-email-less authors into one empty-keyed identity #817); shallow-clone
cache invalidation (vcs: cache reuses shallow-walked entry after unshallow; truncated_shallow_clone written but never read #810); rename-path parsing (vcs(diff): rename-only file with spaced path truncated at first space #813); single-pass JIT
blob diff (vcs(jit): blob_line_stats diffs each touched blob twice (line_counts + lines) #815); saturating arithmetic restoring the no-panic invariant
(vcs: Touched::churn uses non-saturating add, breaking the subsystem's no-panic invariant #742, vcs: blame line-number conversion uses non-saturating u32 add, breaking the no-panic invariant #809, vcs: collect_events long_boundary subtraction can overflow i64 on extreme --as-of #814, vcs: secs_to_days non-checked add overflows on near-i64::MAX window, breaking no-panic + totality claim #820, vcs(stats): total_edits uses non-saturating sum, breaking the no-panic invariant #821); honest author-hash privacy docs (vcs: --emit-author-details hash of email is not practically irreversible #811).
Docs / test-quality: numerous comment/rustdoc/book corrections and new
per-language tokens smoke tests (c_macros: predefined table lists nonexistent UINT*_MIN macros #760, preproc: c_specials contains non-type entries char64_t/charptr_t the generator excludes #762, tests: consumer error-handling test comment overclaims guard coverage #764, lib: ConcurrentRunner::new doc omits the max(2,n)-1 worker-count transform #766, docs(lib): Supported Languages rustdoc omits Objective-C (LANG::Objc) #769, macros: implement_metric_trait! comment omits Mi from the bracketed-arm trait list #771,
docs(metrics): book mi.sei formula labels the comment term comment_ratio but code uses a [0,100] percentage #774, tests(tokens): no smoke test for C, Objc, Elixir, Ruby, iRules #785, tests(tokens): cpp_tokens_nested_attribution has stale docstring + vacuous assertions #787, docs(wmc): Kotlin companion_object comment contradicts #431 (it IS a func_space / own Class scope) #788, docs(output): csv comment wrongly claims metric cells can't start with a formula trigger (negative MI starts with '-') #791–docs(dump_metrics): format_number comment claims a JSON trailing-.0 elision that serde_json does not do #793, output(dump): public dump_node panics on mismatched code/node, undocumented precondition #795–docs(output): warning_line module doc claims GitHub Actions ::warning:: recognition out of the box; wrong syntax + needs problem matcher #797, docs(parser): numeric -t/--type filter silently matches by grammar-unstable kind_id #799, traits: Search::act_on_node callback param misnamed 'pred' #801, docs(preproc): PreprocFile::new_macros doc describes mutation, not construction #802, docs(suppression): UnknownMetric hint re-derives suppressible set, contradicting Metric::suppressible() single-source-of-truth doc #805,
tests(vcs): display_covers_every_variant covers 11 of 16 Error variants #807, docs(vcs/jit): partial diff score is not 'always lower' than the commit score (experience term is negative) #816, docs(vcs): workdir_root doctest asserts None == None, not the convergence it claims #818, vcs: build_trend per-point build re-walks the first-parent timeline (stale O(commits)-lookup comment) #819, docs(wire): VcsTrendPoint/VcsTrend test comment claims #[serde(flatten)] Vcs, but the field is nested #822, docs(vcs/score): size_factor is not a 'tiny tie-breaker'; comment understates its weight #823).

Not fixed (investigated → non-bug)

big-code-analysis: PHP NOM excludes anonymous_function and arrow_function from functions_sum #744 (PHP NOM): the "outlier" premise is false — functions/closures
are a partition (shared compute early-return), and Go/Rust/C#/C++ all
exclude their closure kinds from is_func too. Recommend wontfix.
big-code-analysis: Go method receivers are counted as ordinary nargs #756 (Go nargs receiver): already excluded via the dedicated
compute_go_args + a pre-existing regression test.

Follow-up

vcs: opt-in keyed/salted hashing for --emit-author-details (harden against email enumeration) #956: opt-in keyed/salted hashing for --emit-author-details
(deferred hardening from vcs: --emit-author-details hash of email is not practically irreversible #811; must reconcile with the perf: persistent VCS history cache keyed by HEAD SHA #334 cache-replay
invariant).

Quality assurance

Survived five independent passes — simplify-rust, rust-optimize,
audit-tests (each of ~150 new tests verified by revert; 2 weak assertions
strengthened), review, and code-review max (one C# predicate dedup
applied). make pre-commit is green: fmt, clippy ×2 -D warnings, workspace
tests --all-features, doc, udeps, all lint families, manpages, self-scan
both tiers, and the Python stages.

Snapshots

Four metric changes shifted integration snapshots in the
big-code-analysis-output submodule; those commits are pushed
(edacd90f..3fba6241 on main) and the parent records the matching SHA.

bus_factor_rows handed raw directory paths to write_md_table, which only pads cells. A `|` in a Unix directory path injected a spurious column separator and corrupted the Markdown table; the doc comment on write_markdown_bus_factor falsely claimed the renderer escaped cells. Return typed Cells from bus_factor_rows so each format applies its own escaping the way the ranked-files path does: Markdown maps through render_cell_md (GFM-escapes the directory Cell::Path), HTML maps through cell_text (write_table_with_tooltips escape_html-escapes). Numerics are pre-formatted Cell::Num and pass through raw. Fixes #739

`run_command_preproc` finalized the shared worker accumulator with two `.expect()` calls -- the panic-on-poison / panic-on-un-joined-Arc pattern #445 removed from the sibling count collector. A worker panic poisoned the mutex and crashed `bca preproc` with a backtrace instead of a clean exit. Extract a `into_preproc_data` helper that mirrors `CountCollector::into_count`: `Arc::try_unwrap` then poison-tolerant `into_inner` on the sole-owned path, and `mem::take` from behind a poison-recovered guard on the still-shared path. The recovered data still holds every joined worker's file entry, so finalization degrades rather than panics. Add regression tests for both the poisoned-mutex and shared-Arc paths (verified by revert). Fixes #740

`Touched::churn()` used a plain `+` to sum `added` and `deleted`, the only churn computation in the VCS subsystem that was not saturating. This contradicts the subsystem's documented no-panic-on-overflow invariant and would panic in a debug build on overflow. Switch to `saturating_add`, matching every sibling accumulator (stats.rs, diff_parse.rs, size_features), and add a regression test that would panic under the old `+`. Fixes #742

read_file_with_eol previously classified its 64-byte prefix with String::from_utf8_lossy + an unconditional pop() + a U+FFFD scan. That heuristic dropped the probe's last character unconditionally, hiding a genuinely invalid trailing byte (wrongly accepting non-UTF-8 input) and rejecting files that legitimately contained the U+FFFD replacement scalar within the probe window. Replace it with a std::str::from_utf8 check: accept on Ok; tolerate an error only when it is a trailing incomplete multibyte sequence (error_len().is_none()) AND the file continues past the probe, so the split character is completed by later bytes; reject otherwise. When the probe is the whole file, an incomplete tail is genuine corruption. The public Ok(None)/Ok(Some(..)) contract and the 64-byte window (now a named UTF8_PROBE_BYTES constant) are unchanged. Fixes #746 Fixes #758

CountCollector::into_count's Err(shared) arm silently returned a race-dependent partial tally when the collector was still shared, violating its own "after every worker has joined" contract and turning a coordination bug into undercounted metrics with no signal. Add a debug_assert!(false, ...) in the still-shared arm so misuse trips loudly in debug and test builds, while release continues to degrade to the best-effort snapshot rather than panicking, honoring the no-panic-in-production contract. Sharpen the doc comment to state that a still-shared collector indicates a caller-side coordination bug and that the returned Count is a non-final snapshot, not the complete aggregate the #[must_use] return implies. into_count is pub and re-exported from lib.rs, so its signature must stay stable until 2.0; the debug_assert path makes the misuse explicit without a SemVer break, mirroring the sibling into_preproc_data recovery pattern. Add a debug_assertions-gated #[should_panic] regression test for the still-shared misuse path (verified by revert: it does not panic without the assert). Fixes #757

Ops::new initialised every space name from get_func_space_name, which returns Some("<anonymous>") for root Unit nodes that have no name field. ops_inner overwrites only the final top-level node, so any non-top-level Unit space kept the synthetic "<anonymous>" placeholder even though the API docs describe None as the "name could not be resolved" state. Mirror the SpaceKind::Unit special-case already in FuncSpace::new: skip name derivation for Unit kinds and leave name = None, so the ops and metrics paths agree. The top-level override in ops_inner still sets the caller-supplied name, leaving top-level behaviour unchanged. Fixes #755

Under --exclude-tests, the prune hook records each pruned test subtree's row span on its innermost enclosing func-space via Sloc::exclude_span. Sloc::sloc() subtracts only that space's own excluded_lines, and Sloc::merge folded only sloc_min/sloc_max, not excluded_lines. For a #[test] fn inside a retained impl/trait/closure (all Rust func-spaces), only that space's SLOC dropped; the unit's span-based SLOC stayed inflated, so MI's 16.2*ln(SLOC) term still inherited the test lines. Ploc was already correct because its line-set merges upward. Fold excluded_lines in Sloc::merge so the pruned count propagates to every enclosing space up to the unit, mirroring Ploc's upward line-set union. Each ancestor's span includes the pruned rows exactly once and pruned subtrees never descend, so the count is subtracted once per altitude with no double-counting. Strengthen sloc_drops_for_test_fn_nested_in_impl to assert the unit root (not just the impl). Add cases for a #[test] fn in a production impl, a test item nested in a closure, and a non-test impl that must be unaffected. Verified via test-via-revert: the unit assertions fail without the fold. Fixes #741

MetricsOptions::with_metric_set stored the caller-supplied MetricSet verbatim, so a public caller could pass empty().with(Metric::Mi) and have the walker compute a maintainability index from zero-valued Loc / Cyclomatic / Halstead inputs with no error. Add MetricSet::resolved() (the set-in/set-out counterpart of from_slice_with_deps) and apply it inside with_metric_set so the set is closed under Metric::dependencies before storing. This matches the existing with_only behaviour, which already resolves the closure and emits the dependencies; compute and emit share one selected set by design, so no output widens beyond what with_only(&[Mi]) already does. The operation is idempotent: an already-closed set is unchanged. Keeps the builder infallible (additive correctness fix, minor bump). Fixes #743

A top-level comma list whose first operand is `not(...)` and whose last operand ends in `)` (e.g. `not(foo), all(test)`) was misread as a single `not(...)` operand: the `not` short-circuit stripped `not` to leave `(foo), all(test)`, which both starts with `(` and ends with `)`, so the whole list was dropped and the trailing `test` operand was lost. Items so annotated were not recognised as test-only and survived exclude_tests. Move the bare-comma-list split ahead of the single-operand not/all/any prefix checks, so each operand is classified in isolation. The existing cfg_split_top_level_args already respects paren depth, so a comma nested inside a predicate's own parens (`not(foo, bar)`, `all(test, unix)`) remains a single operand and is unaffected. Fixes #763

The byte-level C/C++ lexer prepass in `replace` treated every `'` as a character-literal opener. C++14 ([lex.icon]/[lex.fcon]) and C23 allow a single-quote digit separator inside numeric literals (`1'000`, `0xDEAD'BEEF`, `3.14'159`). With an odd separator count the opening `'` had no closing `'`, so the scanner stayed in a spurious Char state to EOF and every later macro identifier leaked through unmasked. Track numeric-literal context (`in_number`) and recognize a `'` flanked by hex digits inside a number as a separator rather than a Char opener. The numeric-context guard distinguishes separators from length-prefixed char literals (`L'x'`, `u8'x'`), whose prefix is an identifier run, not a numeric one, so those still open Char correctly. Bundle the masking-pass cursors into a `MaskState` struct to keep the scanner's working state in one place. Fixes #765

remove_from_code substituted bare LF for every line a removed multi-line comment spanned, so stripping comments from a CRLF source produced mixed CRLF/LF output and `bca strip-comments --in-place` silently rewrote line endings. Detect the source's dominant line ending from its first newline and substitute the matching sequence (\r\n for CRLF, \n otherwise) per removed-comment line, via a CRLF fast-path buffer mirroring the existing LF buffer. Also restore the \r when a single-line comment node absorbed the CR of its terminating CRLF, so the kept line keeps a full \r\n. Regression test strips a multi-line comment from a CRLF fixture and asserts no bare LF survives; the LF test now asserts no CR is introduced. Fixes #767

`Node::child_by_field_name` returned `Option<Node<'_>>`, tying the child to the borrow of `&self` rather than the underlying tree lifetime `'a`. tree_sitter::Node is Copy and its children are valid for the whole tree, so the narrowing was needless and inconsistent with the sibling accessors (`child`, `parent`, `children`, `next/previous_sibling`), which already return `Node<'a>`. Widen the return to `Option<Node<'a>>` so callers may hold a child past the parent borrow. This is additive: every existing call site uses the result transiently, and a wider return lifetime only adds capability, so nothing breaks across lib/cli/web/py. Add a regression test whose helper returns a child past an inner `&parent` reborrow; it fails to compile (E0515) under the old signature and passes under the widened one. Fixes #786

ops_inner lacked metrics_inner's synthetic-Unit-root wrapper, so `Ast::ops()` returned Err(EmptyRoot) for ERROR-root parses (e.g. Lua partial input) where `Ast::metrics()` succeeds. Mirror the metrics seam's push_synthetic_unit_root so the two seams agree: ops() now yields a top-level Unit Ops (name = caller-supplied Source::name) in the same cases metrics() does. Additive correctness, not a breaking shape change to the public Ops type. Also correct the Ops::operands / Ops::operators doc comments: they return the DISTINCT (deduplicated) Halstead operand/operator vocabularies (n2 / n1, HashMap keys), not "all" occurrences. Fixes #789 Fixes #790

The dedicated `(tree_sitter_cpp)` arm of `get_language!` produced `tree_sitter_cpp::LANGUAGE.into()`, which is byte-identical to what the generic `($name:ident)` arm produces for `$name = tree_sitter_cpp` (`$name::LANGUAGE.into()`). Before #718 the arm differed — it routed to `tree_sitter_mozcpp::LANGUAGE` — but commit 95fccbb (the upstream cpp default / opt-in mozcpp work) repointed it at `tree_sitter_cpp`, leaving it dead. `tree_sitter_cpp` exposes `pub const LANGUAGE`, so the generic arm covers Cpp unchanged. Pure no-op: the language→grammar mapping is identical before and after. The typescript/tsx/php arms are retained because they use distinct `LANGUAGE_*` constants. Fixes #770

Add the missing Objective-C bullet to the crate-level rustdoc `## Supported Languages` list and correct two stale slugs the same list carried: TSX (`tsx`, not `typescript`) and the internal `ccomment` / `preproc` C-family helpers (lowercase canonical slugs). Add a drift-guard `#[test]` that derives the canonical slug set from `LANG::into_enum_iter()`/`name()` (the single source of truth, always compiled in regardless of per-language Cargo features) and asserts each distinct slug appears as a backtick token within the section, so a future add-lang cannot silently desync the docs.rs landing page. Fixes #769

Fix five independent documentation/comment accuracy issues. All are comment/doc-only — no behavior change. - concurrent_files_tests.rs: the consumer error-handling test comment overclaimed it exercises both sides of the call-site guard; its `invocations == 2` assertion cannot distinguish swallow vs emit. State that it verifies only loop-continuation and point at the per_file_error_* unit tests for the branch distinction. - concurrent_files.rs: document ConcurrentRunner::new's max(2,n)-1 worker-count transform (total budget = producer + consumers). - macros/mod.rs: add Mi to the bracketed-arm trait list (Mi uses the [Trait] default-body arm alongside Tokens/Nom/NArgs). - metrics/wmc.rs: Kotlin companion_object IS a func_space with its own Class scope (#431); correct the comment that claimed otherwise. - book metrics.md: the mi.sei comment term is a percentage in [0,100], not a [0,1] ratio; relabel comment_ratio -> comment_percentage. Fixes #764 Fixes #766 Fixes #771 Fixes #788 Fixes #774

The #765 digit-separator fix grows step_normal's cyclomatic to the hard limit (a flat hand-rolled lexer dispatch) and the #767 CRLF helper adds a small LineEnding type that lifts comment_rm's file-level nargs sum. Both pass the hard gate but encroach the 95% headroom band. Mark them with in-source suppression + rationale rather than contorting the code: the lexer is clearest as one state machine and file-level nargs is an aggregation artifact, not a wide-signature smell.

The predefined-macro table listed 14 `UINT*_MIN` entries that are not real C standard macros: ISO C defines `UINTN_MAX` but no `UINTN_MIN` (the minimum of any unsigned type is 0, so no macro exists). As a result `is_predefined_macros` returned true for non-macros and the macro-masking prepass could clobber a user identifier such as `UINT16_MIN`. The legitimate signed `INT*_MIN` macros are retained. The preproc specials table likewise listed `char64_t` and `charptr_t`, which the `enums/data/special.py` generator documents as not real C/C++ types and never produces (it widens `char{8,16,32}_t` only and omits `charptr_t`). Because that generator only adds entries (diff = specials - old) it can never prune stale rows, so these made `is_specials` return true and a `#define char64_t ...` was wrongly dropped from macro extraction. Removed the entries from the `.txt` sources and regenerated the `.rs` slices via `cargo run -p enums -- --language c_macros`; the slices stay sorted (binary_search lookup), and regeneration is idempotent. Added regression tests pinning the pruned names (the codegen smoke tests only cover generic `names[0]`/`names[1]` placeholders). Fixes #760 Fixes #762

Bash `heredoc_body` (Bash::HeredocBody2) and Perl `heredoc_body_statement` (Perl::HeredocBodyStatement) match Checker::is_string but were omitted from their Alterator flatten arms, so a heredoc body with interpolation kept its structured interpolation/fragment children in the AST dump while is_string treated the whole node as one string -- the dump asymmetry #699 eliminated elsewhere (Ruby HeredocBody, PHP Heredoc/Nowdoc). Add both variants to the flatten arms, mirroring Ruby/PHP. This is a dump-only change: find/count already treated these nodes as strings, so no metric value moves and no snapshot shifts. Adds regression tests verifying the dumped node is a flat string with its interpolation text preserved (test-via-revert confirmed both fail against the pre-fix arms). Fixes #761

Issue #768 reported that the LPAREN2/LBRACK2/LBRACK3 second-alias opener arms in get_op_type (Cpp/C/Objc/Mozcpp/Tcl/iRules/Php/Elixir/ Ruby) escape the #695 pair-glyph fold, inflating Halstead n1 and rendering a bare "(" instead of the folded "()". Investigation shows the premise does not hold. tree-sitter's runtime collapses each second alias to its base via the grammar's public_symbol_map before Node::kind_id() (ts_node_symbol) returns, so the alias kind_id is never observable to the metric layer. Verified empirically across the exact constructs that produce the alias internally (#if defined(FOO), f(1), a[0], in Point[1,2]): the alias kind_ids 20/47/46/95/96/155 appear zero times; only base openers reach compute_halstead, n1 is not inflated, and openers already render folded. No metric value changes; no snapshots shift. The alias match arms are therefore defensive, not active. Rather than apply the no-op fold #768 proposed (which would imply a fold is needed when it is not), this documents each alias arm as defensive and adds second_alias_opener_collapses_to_base_kind_id, which pins the runtime-collapse invariant. If a future grammar bump drops the public_symbol_map collapse, the test goes red and signals that the alias arms must then fold to their pair glyph in get_operator_id_as_str. Fixes #768

The ABC condition-counting terminal-bool sets for Python and the JS family (JavaScript, Mozjs, TypeScript, Tsx) omitted the numeric-literal kind, so a numeric-truthy operand in an if/while/&&/|| slot scored zero conditions -- the exact gap fixed for Lua. Python treats every non-zero number as truthy and the JS family treats every non-zero/non-NaN number as truthy, so `if 5:` (Python) and `while (5)` / `x && 5` (JS/TS) each count their numeric literal as a Fitzpatrick unary condition. Add Python::Integer / Python::Float and the JS-family Number literal kind to the affected macros. TypeScript and Tsx emit `number` under two ids: the numeric literal (Number) and the `predefined_type` keyword `number` in a type annotation (Number2). Only the literal is a value, so Number2 is deliberately omitted. Statically-typed languages (Go/Rust/Java/Kotlin/C#) correctly omit numerics -- a bare int in a bool slot is a type error -- and are left untouched. Adds regression tests mirroring lua_number_truthy_condition_counts and records the JS integration-fixture snapshot shifts in the submodule. Fixes #772

Kotlin's ABC compute had no Phase-2B condition-slot arm, so a non-comparison predicate (`if (flag)`, `while (running())`, `do {} while (ok)`) counted zero ABC conditions — diverging from every sibling language and dropping below Kotlin's own cyclomatic decision count. Route the `if`/`while`/`do-while` `condition` field through a new `kotlin_count_condition` helper that counts a bare terminal once and unwraps paren/negation via the existing `kotlin_inspect_container`. A comparison (`a == b`) or `&&`/`||` predicate is a nested binary_expression already counted by the token and chain-walker arms, so it adds nothing here — no double-count. Regression tests cover the three bare-predicate forms plus the parenthesised case, with guard tests pinning that comparison and short-circuit predicates are unchanged. No snapshots shifted. Fixes #773

PHP `Cognitive::compute` was missing the function-definition boundary arm that every sibling language gained in #696. It never reset structural `nesting` to 0 nor bumped the function-depth surcharge at a `FunctionDefinition` / `MethodDeclaration`, so a PHP function or method nested inside control flow inherited the enclosing nesting and missed the SonarSource B-nesting amplification. Add the boundary arm mirroring Java exactly: `nesting = 0` plus `increment_function_depth(&mut depth, node, &[FunctionDefinition, MethodDeclaration])`. Closures (`AnonymousFunction` / `ArrowFunction`) keep their existing lambda arm and intentionally do not reset nesting, matching how the siblings treat lambdas vs named functions. A function `inner()` defined inside `if ($a) { if ($c) { … } }` whose body is `if ($b) { if ($d) { … } }` drops from cognitive 7 to 5; the same body as a top-level function is unchanged at 3. No snapshot fixtures contain nested function definitions, so none shifted. Fixes #775

The Elixir cyclomatic impl added +1 for every `stab_clause`, but the first `stab_clause` of an `anonymous_function` is the closure's head/definition, not a pattern-dispatch decision. Because the closure already opens its own function space seeded with base cyclomatic 1, a trivial `fn x -> x end` reported cyclomatic 2 for zero decisions, diverging from cognitive (which already scores it 0) and from trivial closures in Python/Rust/JS. Discriminate on the parent: a `stab_clause` keeps counting unless it is the first `stab_clause` child of an `anonymous_function` parent. `case`/`cond`/`with`/`try` arms have a `do_block` parent and are unaffected; multi-clause `fn`s skip only their head clause, so N clauses add N-1 branches. Modified cyclomatic is untouched (the anon fn is not a Call). Fixes #776

The JS-family LOC impls (Mozjs, Javascript, Typescript, Tsx) listed `StatementBlock` among the LLOC contributors, so every `{ … }` brace block added +1 logical line. No other language does this: Rust's `Block`, C/C++/Objc/Mozcpp's `CompoundStatement`, and Java/Groovy/C#'s `Block` all contribute 0 lloc. A statement_block is a syntactic grouping, not a logical statement, so the count inflated JS/TS lloc and broke cross-language parity. Remove `StatementBlock` from all four JS-family LLOC match arms; it now falls through to the PLOC catch-all like every sibling block node. For `if (x) { return 1 } else { return -1 }` the JS lloc drops from 6 to 3, matching the C and Rust translations. Update the existing anchored lloc tests to the corrected values and add `js_family_if_lloc_matches_c_and_rust`, which pins parity across Mozjs/JS/TS/TSX and the C/Rust baselines. Refresh the lib insta snapshots and the integration-output submodule fixtures (every changed lloc value decreases; no other metric shifts). Fixes #777

Interior rows of a multi-line string literal were classified inconsistently across languages. Python credits every row of a non-docstring multi-line string to PLOC (#415: "real code, not blank lines"), but most other languages no-op'd their string nodes, so the interior rows reached neither PLOC nor CLOC and `blank = sloc - ploc - cloc` mislabelled them as blank. Add a shared `add_multiline_string_ploc` helper mirroring Python's #415 mechanism exactly (insert the opening row only when the enclosing statement begins earlier, then insert `start + 1..=end`) and route every multi-line-capable string node through it: Perl (quoted strings + heredoc body), Ruby (strings + heredoc body), PHP (encapsed / single string), Go (raw string), Rust (string / raw string), C / C++ / Mozcpp / Objc (string / raw string), Java + Groovy (text blocks), C# (verbatim / raw string), Kotlin (multiline string), Lua (long-bracket string), and JS / TS / TSX / Mozjs (template literals). Python is the canonical reference and is unchanged. A 3-line string assignment now reports ploc == 3 / blank == 0 in every language with a multi-line string literal, matching Python. Tcl, Bash, and PHP heredoc already behaved correctly. Fixes #778

Go's panic() and Lua's error()/os.exit() are built-in abrupt-exit calls that unwind the stack like throw/raise in the exception languages, but were silently uncounted while every other exit-language counts its throw/raise equivalent. Match them by callee text, mirroring the Bash/Elixir builtin-by-name pattern already in this file: - Go: a call_expression whose `function` field is a bare identifier spelling `panic`. A package-qualified `foo.panic()` has a selector_expression callee and is correctly not counted. - Lua: a function_call whose `name` field text is `error` or `os.exit`. A user call such as `foo()` or `myError(...)` is not counted. `return` is still counted; panic/error/os.exit are additional exits. Fixes #779

C# 8+ permits explicit private/protected modifiers on interface members, but CsharpCode::Npa::compute counted every interface field as a public attribute (interface_npa = interface_na), inheriting Java's idiom without Java's invariant that interface fields are always public. Interface members default to public, so gate interface_npa per declaration on the inverse of the class rule: a field counts as a public attribute unless it carries an explicit private/protected modifier. The new csharp_interface_member_is_public helper mirrors ts_member_is_public!'s default-public gate. The modifier applies to every declarator of a field, so the public/private split is per-declaration. Also replaces the assign-the-whole-sum smell (stats.interface_npa = stats.interface_na) with a proper increment. Fixes #780

PHP was the only language whose Npa impl counted enum cases as class attributes (`enum Color { case Red; case Green; case Blue; }` reported class_na = class_npa = 3). Java, Kotlin, Rust, and C# all treat enum cases/variants as sum-type tags, not data fields, and exclude them. Remove the `EnumDeclarationList` arm so PHP enum bodies contribute no npa attributes, matching the established cross-language convention. PHP enums cannot declare instance properties, and class-level `const`s are not counted as attributes outside an enum either (the class-body arm counts only `PropertyDeclaration`), so excluding the body entirely keeps PHP internally consistent as well. Adds regression tests: enum cases report 0, enum-with-const reports 0, and a cross-language parity check that PHP and Java equivalent enums both report (0, 0). Regenerates the traits_enums.php integration snapshot in the big-code-analysis-output submodule. Fixes #781

The Display impl for nargs::Stats emitted the function_args/closure_args headline from the per-space accumulators (function_args/closure_args), while every structured serializer (JSON/YAML/TOML/CBOR via wire.rs) emits them from the cross-space sum (function_args_sum/closure_args_sum). The accumulators are not combined by Stats::merge, so at a parent space that rolls up child function-spaces the text output disagreed with JSON for the same field name (e.g. python_nested_functions: JSON 3, Display 2). Switch the two Display lines to the *_sum accessors so text matches JSON. total/average/*_average already sourced from the summed accessors and are unaffected; this is an output-format fix, not a metric-value change, so no JSON/YAML snapshots shift. Adds a parity regression test asserting the Display headline equals the summed accessors for a nested-function space. Fixes #782

Add per-language smoke tests for C, Objective-C, Elixir, Ruby, and iRules, the five registered languages that lacked the non-zero-tokens positive test required by Lesson 1. Each mirrors the existing smoke-test idiom (check_metrics + tokens_sum() > 0) on small realistic source. Fix cpp_tokens_nested_attribution: the docstring described a forward-declaration fixture the test never used; correct it to match the actual lambda fixture and replace the vacuous max >= 7 / max <= sum inequalities with exact assertions (sum == 24, max == 24, min == 0). A C++ lambda is a closure but opens no FuncSpace, so all leaves attribute to the single outer scope; the exact values fail if a regression split the lambda into its own scope or credited the unit, which the prior inequalities would have passed. Drop the unfulfilled "unit-level sum must equal total of all scopes" comment. Fixes #785 Fixes #787

- csv.rs: numeric metric cells CAN start with `-` (negative MI), but are safe as self-contained numeric literals, not "cannot begin with a trigger". State the real invariant. - dump_ops/dump_metrics/dump.rs: drop dead `[`Result`]: #variant.Result` doc-links; bare `[`Result`]` resolves via intra-doc. - dump_metrics.rs: format_number's whole-float-as-integer DIVERGES from JSON (which keeps `.0`); it is a legibility choice, not a match. - dump.rs: document the `# Panics` precondition that `code` must be the exact source `node` was parsed from, on dump_node and dump_node_with_color. - dump.rs: name `dump_tree_helper` in the depth-limit test comment; the old `dump_children` no longer exists post-#700. - warning_line.rs: drop the false GitHub Actions `::warning::` / "out of the box" claims; CI annotators need a registered problem matcher for Clang/MSVC lines. - parser.rs + nodes.md: document numeric -t/--type as a raw, grammar- version-unstable kind_id match. Fixes #791 #792 #793 #795 #796 #797 #799

read_file_with_eol stripped a leading UTF-16 BE/LE BOM and kept the interleaved-NUL body, which then passed the byte-level UTF-8 probe (each NUL is a valid single-byte scalar) and reached the parser as garbage. A UTF-16 BOM now skips the file (Ok(None)), consistent with is_generated's documented stance that UTF-16 source is unsupported. The UTF-8 BOM is still stripped and read, and the BOM checks now use bounds-safe slice prefixes so a sub-3-byte probe cannot panic. The probe read also collapsed every read_exact failure to Ok(None), swallowing real I/O faults (permission, hardware) that the documented contract promises to surface as Err. The read now distinguishes a clean short read (UnexpectedEof -> Ok(None)) from any other error kind, which propagates as Err. Fixes #803 Fixes #804

The UnknownMetric error hint re-derived the suppressible vocabulary from Metric::NAMES with a hardcoded "tokens" filter, contradicting Metric::suppressible()'s documented claim of being the single source of truth. Route the hint through suppressible() (sorted to keep the list alphabetised) and guard the derivation with a test assertion. Also fix two doc inaccuracies on published API: - Search::act_on_node's trait-declaration param was named `pred` though the callback is an action with no return; rename to `action` to match act_on_child and the node.rs impl (cosmetic, non-breaking). - PreprocFile::new_macros's doc said it "adds new macros" but it constructs a fresh PreprocFile; reword as a constructor. Fixes #801 Fixes #802 Fixes #805

Five VCS-internal integer-arithmetic sites used bare +/-/sum that panic in a debug build (overflow checks on) and wrap in release, violating the no-panic invariant the subsystem documents (src/vcs/stats.rs: "saturating_add everywhere ... the crate forbids panics in non-test code"). None changes a metric value for normal inputs; each makes one degenerate-input path total. - blame line math: convert a blame hunk's 0-based start + length to a 1-based inclusive LineRun via a named saturating constructor (LineRun::from_blame_hunk), so a line number at the u32 ceiling clamps instead of overflowing the `+ 1` / `+ len - 1`. - window cutoffs: `now - *_window_secs` underflows i64 for a garbage extreme `--as-of` (e.g. i64::MIN). saturating_sub floors the boundary at i64::MIN ("no lower bound" = include all history), the safe total behavior. Fixed uniformly at all four sites: git/history.rs, replay.rs (long + recent), git/jit.rs (long + recent). - secs_to_days: saturating_add the rounding term so secs_to_days(i64::MAX) saturates to u32::MAX instead of wrapping negative and flooring to 0. - cross-author edit sum: fold with u32::saturating_add (the per-author counts were already saturating_add-clamped; std Sum<u32> panicked on overflow). Each fix carries an overflow regression test that panics against the pre-fix bare arithmetic in debug. The pre-existing secs_to_days_saturates_huge_and_floors_negative test was false-confidence (it only exercised an in-range magnitude); it now asserts secs_to_days(i64::MAX) == u32::MAX, the case that actually triggered the overflow. Fixes #809 Fixes #814 Fixes #820 Fixes #821

Two precision regressions in commit-message classification, both violating the module's "favour precision over recall" contract. #806: rollback detection ran an unanchored `\brollback\b` against the full multi-line message while revert was subject-anchored, so a body-prose "rollback" mention flipped the whole commit to revert. Give rollback the same discipline as revert: match `^rollback\b` against the subject line only. A leading "Rollback the migration" still classifies; a buried mention no longer does. #808: the security regex matched bare `injection` and `overflow`, which fire on routine non-security commits ("dependency injection", "text overflow", arithmetic "integer overflow"). Replace them with qualified attack-vector forms: `(?:sql|command|code|html|ldap|os|xml)\s+injection` and `(?:buffer|heap)\s+overflow`. "integer overflow" and "stack overflow" are deliberately excluded as ambiguous (arithmetic bug / website name); precision wins over recall. Adds negative/positive regression tests for both, verified via revert against the old regexes. Fixes #806 Fixes #808

Two author-identity bugs in VCS participant resolution that skew bus-factor / ownership / JIT-experience metrics. #812: ParticipantResolver::participants ran the Co-authored-by regex over the whole commit message, so a `Co-authored-by:` line quoted in the body (a pasted log, a reverted commit, an indented code block) was falsely attributed as a real co-author. Git only honours trailers in the message's final block, so scan only the detected trailer block: the last paragraph (after the final blank-line separator), treated as trailers only when at least one line is a recognised `Token: value` trailer and at least 25% of its non-blank lines are (mirroring git-interpret-trailers' threshold). A genuine trailing `Co-authored-by:` alongside a `Signed-off-by:` still counts. #817: AuthorId::new keyed an author with empty name AND empty email to the empty string, collapsing every such author into one phantom identity that accrued all their edits, ownership, and first-authorship and deflated the distinct-author count. Add AuthorId::has_identity and drop keyless authors in push_if_human, mirroring the existing bot guard. A commit whose only author is keyless yields an empty participant set, which process_commit already skips as unknown-author. Eq/Hash on AuthorId are unchanged; has_identity is additive. Fixes #812 Fixes #817

The persistent VCS history cache wrote a `truncated_shallow_clone` flag but `is_compatible` never read it back. An entry written from a shallow clone was reused verbatim after the repo was deepened (`git fetch --unshallow` leaves HEAD unmoved, so the head-SHA key is unchanged and the pure-hit/incremental paths fire), silently replaying truncated event counts while reporting `shallow = false`. The reverse (cached full, now shallow) reported a false truncation flag over complete counts. Thread the repository's current shallow state through `is_compatible` and `load_compatible` and require `truncated_shallow_clone` to equal it for reuse. A mismatch in either direction now falls through to the re-walk path, which produces the correct counts and flag. The normal same-shallow-state pure-hit and incremental-splice paths are unaffected. Fixes #810

A pure rename (similarity 100%, no body) emits no `+++ b/<new>` line, so diff_git_new_path took the new path solely from the ambiguous `diff --git a/<old> b/<new>` header. The `rsplit(' ')` fallback walked from the right and yielded the first `b/`-prefixed token, truncating a spaced new name (`b/new name.rs` -> `new`) and corrupting the diffusion key. Capture the unambiguous `rename to <path>` line in the stanza parser and set the new-side path from it (decoded with the same C-style unquoting), in preference to splitting the spaceful header. Content-change renames and plain modifies are unaffected; non-rename stanzas see no `rename to` line and keep the existing logic. Fixes #813

blob_line_stats diffed every touched text blob twice per scored commit: line_counts() for the added/deleted totals and lines() only to count hunks. In gix 0.83 each runs a full blob diff, so every modified file was diffed twice. Prepare the diff once and read insertions, removals, and the hunk count off a single gix_diff::blob::Diff::compute pass, mirroring line_counts()'s own internals. Because this uses the same un-slider-postprocessed algorithm line_counts() uses, the added/deleted totals stay bit-identical to the old two-pass code and to the file-level history walk (history.rs still calls line_counts()), so the JIT size features and the history walk cannot diverge. The full integration suite (which pins exact added/deleted/hunks values) and all insta snapshots are unchanged. Fixes #815

The #810 module-doc reference to the pub(crate) HistoryCache::is_compatible tripped -D rustdoc::private-intra-doc-links; use a plain code span.

The unsalted SHA-256 of a canonical email emitted by `--emit-author-details` (and stored in the persistent cache) was described as "irreversible" / "injective for practical purposes" / publishable "without disclosing the underlying email". An email is low-entropy and enumerable and commit histories are public, so the digest is recoverable against a candidate email set (the Gravatar weakness). Replace the false guarantees with an honest threat model: the digest is a stable pseudonym that avoids emitting plaintext email and deters casual disclosure, but is not cryptographically irreversible and is not anonymization. Doc/comment + test-rename only; no change to the hashing scheme. Hardening (keyed HMAC / KDF) is deferred as a follow-up because it must be reconciled with the #334 cache-replay stable-digest invariant. - identity.rs / cache.rs / git/history.rs: honest threat-model notes - book vcs.md: new "Author-detail privacy" section; corrected cache note - book python/vcs.md: corrected cache note, links to the new section - rename test hashed_is_stable_and_irreversible_looking -> hashed_is_stable_and_avoids_plaintext_email; assert it avoids the plaintext email rather than "looks irreversible" Fixes #811

Fix six VCS documentation/test-accuracy issues, all comment/doc/test changes with no production logic change: - #807: assert Display for all 16 vcs::Error variants (was 11); add InvalidFileTypeScope, InvalidBusFactorThreshold, InvalidTrend, Blame, InvalidDiff cases so the test lives up to its "pin each variant" claim. - #816: JitDiffReport doc claimed the partial diff score is "always lower" than the full commit score, but the experience term is negatively signed, so the full score can fall below the partial; the two are not ordered. Removed the false ordering claim. - #818: workdir_root doctest asserted None == None (relative paths never resolve). Rewrote it to illustrate two same-checkout paths sharing one Some(root), asserting only when both resolve. - #819: trend.rs comment claimed each point is an O(commits) lookup rather than its own walk, but since #648 every per-point build re-anchors and re-walks the first-parent timeline. Corrected to describe the shared timeline's actual (oid-resolution) role and the per-point re-walk cost. - #822: wire.rs test comment claimed VcsTrend(Point) use serde(flatten); they carry a nested vcs key. Removed the false flatten/CBOR-footgun claim, kept the write-only-untested rationale. - #823: score.rs called size_factor a "tiny tie-breaker" / "well under one churn-point", but it enters base at coefficient 1.0 (~0.85 at 10k SLOC, >1.0 past ~50k). Corrected both comments to cite real magnitudes. Fixes #807 #816 #818 #819 #822 #823

Record the Unreleased entries for the #739-#823 batch: cross-language metric consistency fixes, output edge cases, VCS correctness/arithmetic/ security fixes, the read_file_with_eol hardening, new APIs (MetricSet::resolved, public defang_formula, AuthorId::has_identity), the JIT single-pass diff, and the author-hash privacy honesty fix.

audit-tests found two new tests that passed without fully exercising their fix: - ccomment_remove_comments_preserves_crlf (#767) only asserted CRLF consistency, so an impl that *dropped* removed-comment lines would still pass. Add a line-count-preservation assertion (comment removal blanks lines, never deletes them). - coauthor_in_prose_last_paragraph_is_not_counted (#812) used a mid-line Co-authored-by mention that the ^-anchored regex never matched, so it passed regardless of trailer-block scoping. Rewrite to a column-0 co-author in a non-final paragraph (renamed coauthor_outside_final_trailer_block_is_not_counted) that goes red if scoping is dropped.

code-review --fix: npa.rs (#780) and npm.rs (#783) independently computed "node has an explicit private/protected Modifier" with inconsistent casts (Csharp::Private vs Private as u16). Extract one pub(crate) csharp_node_has_private_or_protected_modifier in npa.rs and reuse it from both gates — single source of truth for the C# narrowing rule. Also scope a clippy::naive_bytecount allow on the small CRLF test fixture counter added in the prior audit-tests commit (the bytecount crate is unwarranted there).

codecov · 2026-06-15T18:22:09Z

Codecov Report

❌ Patch coverage is 97.94872% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.69%. Comparing base (48cd18a) to head (6581ae4).

Files with missing lines	Patch %	Lines
src/metrics/loc.rs	95.08%	9 Missing ⚠️
src/comment_rm.rs	87.50%	5 Missing and 1 partial ⚠️
src/vcs/git/diff_parse.rs	81.25%	1 Missing and 2 partials ⚠️
src/alterator.rs	90.00%	2 Missing ⚠️
src/vcs/git/jit.rs	93.33%	0 Missing and 2 partials ⚠️
src/c_macro.rs	99.01%	0 Missing and 1 partial ⚠️
src/metrics/abc.rs	98.83%	0 Missing and 1 partial ⚠️
src/metrics/cyclomatic.rs	95.45%	1 Missing ⚠️
src/metrics/npa.rs	99.13%	1 Missing ⚠️
src/output/checkstyle.rs	0.00%	0 Missing and 1 partial ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #957      +/-   ##
==========================================
+ Coverage   85.50%   85.69%   +0.19%     
==========================================
  Files         106      101       -5     
  Lines       64642    65364     +722     
  Branches    64248    65364    +1116     
==========================================
+ Hits        55270    56015     +745     
+ Misses       8943     8918      -25     
- Partials      429      431       +2

Flag	Coverage Δ
python	`?`
rust	`85.69% <97.94%> (+0.28%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/c_langs_macros/c_macros.rs	`100.00% <ø> (ø)`
src/c_langs_macros/c_specials.rs	`100.00% <ø> (ø)`
src/c_langs_macros/mod.rs	`87.50% <100.00%> (+15.08%)`	⬆️
src/cfg_predicate.rs	`98.74% <100.00%> (+0.11%)`	⬆️
src/concurrent_files.rs	`90.37% <ø> (ø)`
src/count.rs	`90.90% <100.00%> (+1.30%)`	⬆️
src/getter.rs	`91.01% <ø> (-1.13%)`	⬇️
src/langs.rs	`91.44% <100.00%> (+1.00%)`	⬆️
src/macros/mod.rs	`91.97% <ø> (ø)`
src/metric_set.rs	`99.02% <100.00%> (+0.08%)`	⬆️
... and 46 more

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dekobon added 30 commits June 14, 2026 13:40

dekobon added 16 commits June 15, 2026 06:51

docs(vcs): unlink private is_compatible in cache module doc

1e1312e

The #810 module-doc reference to the pub(crate) HistoryCache::is_compatible tripped -D rustdoc::private-intra-doc-links; use a plain code span.

dekobon merged commit 7e44967 into main Jun 15, 2026
44 checks passed

dekobon deleted the fix/batch-2026-06-14 branch June 15, 2026 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: batch of 73 bug/doc fixes (#739–#823)#957

fix: batch of 73 bug/doc fixes (#739–#823)#957
dekobon merged 51 commits into
mainfrom
fix/batch-2026-06-14

dekobon commented Jun 15, 2026

Uh oh!

codecov Bot commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dekobon commented Jun 15, 2026

Summary

Highlights by area

Not fixed (investigated → non-bug)

Follow-up

Quality assurance

Snapshots

Uh oh!

codecov Bot commented Jun 15, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant