Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ clap = { version = "4.5", features = ["derive"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
tree-sitter = "0.25"
tree-sitter-python = "0.25"
tree-sitter-rust = "0.24"

[dev-dependencies]
Expand Down
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- untracked files included in default stats
- language filtering with `--lang`
- Rust-only, non-test-only stats by default, with `--test`, `--no-test`, and `--no-test-filter`
- test-aware filtering for Rust and Python
- single-commit and revision-range support

This repository also ships `rust-test-audit`, a companion CLI for auditing Rust source trees
Expand Down Expand Up @@ -73,21 +74,24 @@ git diff-stat --commit HEAD
git diff-stat --last
git diff-stat --last --no-test-filter
git diff-stat HEAD~1..HEAD --lang py --no-test-filter
git diff-stat --lang py --test
git diff-stat --test
```

## Usage

```bash
git diff-stat [<rev> | <rev1> <rev2> | <rev-range>] [--lang rs,js] [--test | --no-test | --no-test-filter]
git diff-stat [<rev> | <rev1> <rev2> | <rev-range>] [--lang rs,py,js] [--test | --no-test | --no-test-filter]
```

Defaults:

- `--lang` defaults to `rs`
- `--lang` defaults to `rs,py`
- test filtering defaults to `--no-test`
- output always begins with a header line describing the comparison scope, languages, and test scope

That means plain `git diff-stat` already reports Rust and Python non-test changes together.

## Rust Test Audit

```bash
Expand Down Expand Up @@ -126,8 +130,9 @@ test regions cross configurable density thresholds.

- `--lang` currently uses file extensions.
- `--test` and `--no-test` treat Rust files under `tests/` and Rust files imported by `#[cfg(test)]` module declarations as whole-file test code. Other Rust files still use code-region splitting for `#[cfg(test)]` modules and test-annotated functions such as `#[test]` and `#[tokio::test]`.
- `--no-test-filter` disables Rust test splitting entirely and reports full-file stats for the selected languages.
- because `--lang` defaults to `rs`, use `--no-test-filter --lang <langs>` when you want non-Rust output.
- `--test` and `--no-test` treat Python files under `tests/`, `test_*.py`, `*_test.py`, and `conftest.py` as whole-file test code. Other Python files split test regions using `def test_*` and `class Test*`.
- `--no-test-filter` disables Rust and Python test splitting entirely and reports full-file stats for the selected languages.
- because `--lang` defaults to `rs,py`, use `--lang rs` or `--lang py` when you want a narrower language set.
- `--last` is sugar for the patch introduced by `HEAD`, equivalent to `HEAD^!`.
- rendered output starts with a Chinese description line such as `未提交的 rs 文件中,非测试代码统计如下:`.
- rendered output starts with a Chinese description line such as `未提交的 rs,py 文件中,非测试代码统计如下:`.
- Output is intentionally close to `git diff --stat`, but not byte-for-byte identical.
178 changes: 178 additions & 0 deletions docs/plans/2026-03-21-python-lang-support-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Python Language Support Design

**Context**

`git-diff-stat` currently treats `--lang` as a thin file-extension filter in [`src/lang.rs`](../../src/lang.rs), while Rust test-aware behavior is implemented separately in [`src/filter.rs`](../../src/filter.rs) and [`src/rust_tests.rs`](../../src/rust_tests.rs). This works for a single language, but it couples the main execution path to Rust-specific logic and makes each new language support request disproportionately expensive.

The immediate goal is to support `--lang py` with the same test-filter semantics that Rust already participates in:

- default / `--no-test`: report only non-test code
- `--test`: report only test code
- `--no-test-filter`: report full-file stats without test splitting

The target Python test style is the one used by `winq_bt`: `pytest`-style test discovery centered around `tests/`, `test_*.py`, `*_test.py`, `conftest.py`, top-level `def test_*`, and `class Test*`.

**Goal**

Add first-class Python support without turning `main.rs` into a per-language switchboard. The structure should make future languages easier to add, while keeping the current Rust behavior unchanged.

**Approaches**

1. Extend the current code path with Python-specific branches in `main.rs` and `filter.rs`.
- Lowest short-term cost.
- Rejected because it keeps the architecture centered on Rust special-cases and makes future additions harder.

2. Introduce lightweight language backends and move test-aware behavior behind a shared interface.
- Slightly more up-front work.
- Keeps CLI/rendering stable while moving language-specific rules into isolated modules.
- Recommended.

3. Build a fully generic plugin system now.
- Over-designed for the current repository size and only one additional language.
- Rejected as unnecessary complexity.

**Decision**

Use lightweight language backends.

The refactor should not aim for a public plugin API. It only needs enough structure to answer these questions in one place per language:

- does this path belong to the language?
- which files are whole-file tests?
- for mixed files, which changed lines are test lines vs non-test lines?

`main.rs` should keep orchestrating Git I/O, revision selection, rendering, and CLI interpretation. It should stop knowing Rust-specific details.

**Proposed Structure**

- `src/lang/mod.rs`
- language parsing and normalization
- registry of supported languages
- path-to-language detection
- `src/lang/rust.rs`
- wraps current Rust-specific behavior
- owns Rust whole-file test-path detection and line-region splitting
- `src/lang/python.rs`
- Python path matching and test-region detection
- `src/test_filter.rs`
- shared orchestration for building test-only or non-test-only stats across requested languages
- `src/rust_tests.rs`
- can remain as a Rust parser helper used by `src/lang/rust.rs`

This is a moderate refactor, not a rewrite. Existing types such as `FileChange`, `FilePatch`, `DisplayStat`, and `TestFilterMode` remain useful as-is.

**Backend Shape**

The shared test-filter orchestration should operate in terms of a small internal backend contract. The exact Rust type names can vary, but the responsibilities should look like this:

- language identity and aliases, such as `rs` and `py`
- file matching by extension
- optional whole-file-test classification
- optional per-file region splitting for tracked and untracked files

One practical model is:

- `LanguageKind` enum for supported languages
- helper functions in each language module instead of trait objects
- a dispatcher in `test_filter.rs` that groups changed files by language and invokes the relevant backend helpers

This avoids unnecessary dynamic dispatch while still removing language conditionals from `main.rs`.

**Python Test Semantics**

Python support should match common `pytest` conventions first.

Whole-file test rules:

- any `.py` file under a `tests/` path component
- `test_*.py`
- `*_test.py`
- `conftest.py`

Mixed-file region rules:

- top-level `def test_*`
- methods named `test_*`
- `class Test*`

That gives useful behavior for projects like `winq_bt` without trying to model every Python test framework on day one.

Not in scope for the first version:

- full `unittest.TestCase` inference beyond names already covered by `test_*`
- custom pytest discovery configuration
- doctests
- dynamic test generation

**Parsing Strategy**

Use `tree-sitter-python` alongside the existing `tree-sitter` setup.

This aligns with the current Rust implementation style:

- accurate line ranges for test functions and classes
- support for tracked diffs and untracked files
- no need to invent a fragile indentation-based parser

The Python parser only needs to detect class and function definition ranges. It does not need semantic import resolution similar to Rust's `#[cfg(test)] mod` handling.

**Data Flow**

The runtime flow should become:

1. Parse CLI and revision selection.
2. Parse requested languages into supported language kinds.
3. Filter `FileChange` values by requested languages.
4. If `--no-test-filter`, render full-file stats directly.
5. Otherwise, call a shared test-aware stats builder.
6. The builder:
- parses the diff patch once
- loads per-revision or worktree sources as needed
- asks each language backend for whole-file test paths
- asks each language backend to split changed lines into test/non-test counts
7. Render using the existing header machinery.

The important shift is that the builder should work over "requested supported languages", not over "Rust files only".

**Compatibility Rules**

- Default `--lang` remains `rs` for now.
- `--lang py` participates in the same `--test`, `--no-test`, and `--no-test-filter` flags.
- `--lang rs,py` should combine both backends in one run.
- Non-test-filtered output remains full-file diff stats for any selected language.
- Unknown or unsupported language names should continue to be ignored or rejected consistently with current behavior; if validation is added, it should happen centrally in `src/lang/mod.rs`.

**Testing Strategy**

Add coverage at three layers:

1. Unit tests for language recognition and normalization.
2. Unit tests for Python test-region detection and whole-file test-path classification.
3. CLI smoke tests proving end-to-end behavior for:
- `--lang py` default non-test filtering
- `--lang py --test`
- `--lang py --no-test-filter`
- mixed `--lang rs,py`

Python smoke tests should include:

- a production file under `src/`
- a `tests/test_*.py` file
- a mixed file containing both production code and `def test_*`
- optionally `conftest.py` to prove whole-file test classification

**Risks**

- The current loader helpers in `main.rs` are named and shaped around Rust sources. Moving them into shared test-filter orchestration will require careful renaming so behavior does not regress.
- Path normalization rules must stay consistent across languages, especially for whole-file test matching.
- Python region detection should stay intentionally narrow; a too-clever first version is more likely to misclassify production code.

**Outcome**

After this refactor, adding a new language should mainly mean:

1. register a new language kind
2. implement one language module
3. add backend tests and one or two CLI regressions

That is the right level of structure for the repository at its current size.
Loading