Skip to content

docs(vcs/score): size_factor is not a 'tiny tie-breaker'; comment understates its weight #823

@dekobon

Description

@dekobon

Summary

The size_factor term in the weighted risk score is documented as a "deliberately tiny tie-breaker" that for "even a 10k-line file" contributes "well under one churn-point", but the actual magnitude is far larger than those comments imply.

Location

  • src/vcs/score.rs:36 (module doc: "deliberately tiny tie-breaker")
  • src/vcs/score.rs:121-123 (inline comment + size_factor computation)

Evidence

// Size is a tiny tie-breaker: squared log over 100 keeps even a
// 10k-line file contributing well under one churn-point.
let size_factor = ln1p(input.sloc as f64).powi(2) / 100.0;

size_factor enters base with coefficient 1.0 (+ size_factor), whereas the recency-churn "point" enters as 0.30 * ln1p(churn). Computed magnitudes:

sloc size_factor
100 0.213
300 0.326
1 000 0.477
10 000 0.848
50 000 1.171

For comparison, at the test baseline (churn_recent = 50) the dominant recency-churn contribution is 0.30 * ln1p(50) = 1.18, the long-churn term is 0.05 * ln1p(200) = 0.27, the entropy factor is 0.15, and the baseline fix term is 0.069.

So:

  • At sloc=300 the size term (0.326) already exceeds the entire entropy factor (0.15), the long-churn term (0.27), and is ~5x the fix term.
  • At sloc=10k the size term (0.848) is ~72% of the dominant recency-churn term — not "well under one churn-point."
  • At sloc=50k the size term is 1.17, i.e. it exceeds one whole unit, directly contradicting the comment.

Because the size term is inside base, it is further amplified by the multiplicative (1.0 + dev_bonus + new_file_bonus) factor (up to 1.50x), so a large new file with >=9 developers sees a size contribution of ~1.27.

Expected Behavior

The doc comments should describe the term's true weight, or the formula should bound/down-weight size_factor so it matches the stated "tiny tie-breaker" intent.

Actual Behavior

The comments understate the size term by roughly an order of magnitude relative to its real contribution, which could mislead a maintainer tuning the formula (and a future reader who trusts the "tie-breaker" framing may not realize a large file materially shifts its own rank).

Impact

Maintainers reasoning about the risk-score weighting from the in-source documentation. The score remains ordinal, so this is a documentation-accuracy / formula-design concern rather than an output-corruption bug.


Resolution

Corrected score.rs comments: size_factor enters base at coefficient 1.0 with real magnitudes (~0.85 at 10k SLOC, >1.0 past ~50k), not a tiny tie-breaker. Formula unchanged. Commit a7e35ad.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions