Skip to content

Detect git worktrees and submodules in identify_vcs#799

Merged
takluyver merged 1 commit into
pypa:mainfrom
potiuk:fix-worktree-vcs-detection
Apr 25, 2026
Merged

Detect git worktrees and submodules in identify_vcs#799
takluyver merged 1 commit into
pypa:mainfrom
potiuk:fix-worktree-vcs-detection

Conversation

@potiuk

@potiuk potiuk commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Closes #798.

Problem

In a linked git worktree (git worktree add ...), or inside a git submodule, .git is a regular file containing a gitdir: <path> pointer rather than a directory. flit/vcs/__init__.py:identify_vcs only checked .is_dir(), so in these contexts it returns None, flit/sdist.py:SdistBuilder.select_files silently falls back to the non-VCS file set (super().select_files() from flit_core), and the resulting sdist omits tracked files such as docs/, tests/, and anything declared through [tool.flit.sdist] include = [...]. No warning is emitted.

Change

Accept .git whether it is a directory or a file:

git_entry = p / '.git'
if git_entry.is_dir() or git_entry.is_file():
    return git

Downstream git ls-files naturally validates the pointer target — no extra parsing needed inside identify_vcs.

Test

Added tests/test_vcs.py::test_identify_git_worktree: create .git as a regular file containing a gitdir: pointer, identify_vcs now returns the git module. Existing test_identify_git_parent continues to cover the plain-checkout path.

Three other tests in tests/test_sdist.py and tests/test_wheel.py fail on current main in my environment — they also fail on main without this change, so they are unrelated and not regressed by this PR.

Context

Discovered during Apache Airflow provider release verification — 20 sdists diverged byte-for-byte from the released tarballs because the release was built from a plain checkout while the verifier ran from a worktree (apache/airflow#65771). The same failure shape surfaces when a worktree is bind-mounted into a container and the main repo's .git is not reachable.

In a linked git worktree (`git worktree add ...`) or a git submodule, `.git` is a regular file containing a `gitdir: <path>` pointer rather than a directory. `identify_vcs` only checked `.is_dir()`, so it returned `None` inside a worktree and `flit`'s sdist builder silently fell back to a non-VCS file set via `SdistBuilder.select_files` (`flit/sdist.py`), producing an incomplete sdist that omits tracked files such as `docs/`, `tests/`, and other data files.

Accept `.git` whether it is a directory or a file; downstream `git ls-files` naturally validates the pointer.
potiuk added a commit to potiuk/airflow that referenced this pull request Apr 24, 2026
Add references to pypa/flit#798 and pypa/flit#799 in the docstring of check_flit_worktree_compatibility, and the Airflow tracking issue apache#65772 at both the helper and its call site. CLAUDE.md requires the tracking-issue URL to appear at the workaround site in the code so the follow-up work is discoverable from any grep or code review.
potiuk added a commit to apache/airflow that referenced this pull request Apr 24, 2026
…65771)

* Breeze: fail fast when building provider sdists from a git worktree

flit's --use-vcs silently produces incomplete sdists when run from a
`git worktree add ...` directory, because flit.vcs.identify_vcs() checks
`(p / ".git").is_dir()` and in a worktree `.git` is a file (gitdir:
pointer). flit then falls back to a minimal sdist that omits docs/,
tests/, provider.yaml and other tracked files, and the resulting
packages fail reproducibility checks against released sdists on
dist.apache.org — with no warning.

`breeze release-management prepare-provider-distributions` now exits 1
with a clear explanation and a workaround (use a plain checkout, or
pass `--distribution-format wheel` — wheels are unaffected) when it
detects it is running from a worktree and would build sdists.

Wheels and providers using hatchling with explicit sdist includes are
not affected, so only `sdist` and `both` formats trigger the check.

* Breeze: detect Docker-mounted worktrees in provider-sdist check

When Breeze runs prepare-provider-distributions from a git worktree, the worktree's .git file carries an absolute gitdir: pointer to the main repo's .git/worktrees/<name> directory. Only the worktree folder is bind-mounted into Breeze's Docker container, so that pointer target is unreachable from inside the build. flit's VCS detection then either fails or silently produces an incomplete sdist.

Expand check_flit_worktree_compatibility to parse .git and branch on the failure mode: read error, unexpected format, missing gitdir target (Docker mount case), or healthy host-side worktree. Tests cover all paths.

* Link upstream flit issue/PR + tracking issue at the workaround site

Add references to pypa/flit#798 and pypa/flit#799 in the docstring of check_flit_worktree_compatibility, and the Airflow tracking issue #65772 at both the helper and its call site. CLAUDE.md requires the tracking-issue URL to appear at the workaround site in the code so the follow-up work is discoverable from any grep or code review.
@takluyver takluyver added this to the 4.0 milestone Apr 25, 2026
@takluyver takluyver merged commit 2e03393 into pypa:main Apr 25, 2026
16 checks passed
potiuk added a commit to apache/airflow that referenced this pull request Apr 26, 2026
…t worktree (#65771)

* Breeze: fail fast when building provider sdists from a git worktree

flit's --use-vcs silently produces incomplete sdists when run from a
`git worktree add ...` directory, because flit.vcs.identify_vcs() checks
`(p / ".git").is_dir()` and in a worktree `.git` is a file (gitdir:
pointer). flit then falls back to a minimal sdist that omits docs/,
tests/, provider.yaml and other tracked files, and the resulting
packages fail reproducibility checks against released sdists on
dist.apache.org — with no warning.

`breeze release-management prepare-provider-distributions` now exits 1
with a clear explanation and a workaround (use a plain checkout, or
pass `--distribution-format wheel` — wheels are unaffected) when it
detects it is running from a worktree and would build sdists.

Wheels and providers using hatchling with explicit sdist includes are
not affected, so only `sdist` and `both` formats trigger the check.

* Breeze: detect Docker-mounted worktrees in provider-sdist check

When Breeze runs prepare-provider-distributions from a git worktree, the worktree's .git file carries an absolute gitdir: pointer to the main repo's .git/worktrees/<name> directory. Only the worktree folder is bind-mounted into Breeze's Docker container, so that pointer target is unreachable from inside the build. flit's VCS detection then either fails or silently produces an incomplete sdist.

Expand check_flit_worktree_compatibility to parse .git and branch on the failure mode: read error, unexpected format, missing gitdir target (Docker mount case), or healthy host-side worktree. Tests cover all paths.

* Link upstream flit issue/PR + tracking issue at the workaround site

Add references to pypa/flit#798 and pypa/flit#799 in the docstring of check_flit_worktree_compatibility, and the Airflow tracking issue #65772 at both the helper and its call site. CLAUDE.md requires the tracking-issue URL to appear at the workaround site in the code so the follow-up work is discoverable from any grep or code review.
(cherry picked from commit ddf3c7a)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
potiuk added a commit to apache/airflow that referenced this pull request Apr 26, 2026
…t worktree (#65771) (#65828)

* Breeze: fail fast when building provider sdists from a git worktree

flit's --use-vcs silently produces incomplete sdists when run from a
`git worktree add ...` directory, because flit.vcs.identify_vcs() checks
`(p / ".git").is_dir()` and in a worktree `.git` is a file (gitdir:
pointer). flit then falls back to a minimal sdist that omits docs/,
tests/, provider.yaml and other tracked files, and the resulting
packages fail reproducibility checks against released sdists on
dist.apache.org — with no warning.

`breeze release-management prepare-provider-distributions` now exits 1
with a clear explanation and a workaround (use a plain checkout, or
pass `--distribution-format wheel` — wheels are unaffected) when it
detects it is running from a worktree and would build sdists.

Wheels and providers using hatchling with explicit sdist includes are
not affected, so only `sdist` and `both` formats trigger the check.

* Breeze: detect Docker-mounted worktrees in provider-sdist check

When Breeze runs prepare-provider-distributions from a git worktree, the worktree's .git file carries an absolute gitdir: pointer to the main repo's .git/worktrees/<name> directory. Only the worktree folder is bind-mounted into Breeze's Docker container, so that pointer target is unreachable from inside the build. flit's VCS detection then either fails or silently produces an incomplete sdist.

Expand check_flit_worktree_compatibility to parse .git and branch on the failure mode: read error, unexpected format, missing gitdir target (Docker mount case), or healthy host-side worktree. Tests cover all paths.

* Link upstream flit issue/PR + tracking issue at the workaround site

Add references to pypa/flit#798 and pypa/flit#799 in the docstring of check_flit_worktree_compatibility, and the Airflow tracking issue #65772 at both the helper and its call site. CLAUDE.md requires the tracking-issue URL to appear at the workaround site in the code so the follow-up work is discoverable from any grep or code review.
(cherry picked from commit ddf3c7a)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001 pushed a commit to apache/airflow that referenced this pull request Apr 27, 2026
…t worktree (#65771) (#65828)

* Breeze: fail fast when building provider sdists from a git worktree

flit's --use-vcs silently produces incomplete sdists when run from a
`git worktree add ...` directory, because flit.vcs.identify_vcs() checks
`(p / ".git").is_dir()` and in a worktree `.git` is a file (gitdir:
pointer). flit then falls back to a minimal sdist that omits docs/,
tests/, provider.yaml and other tracked files, and the resulting
packages fail reproducibility checks against released sdists on
dist.apache.org — with no warning.

`breeze release-management prepare-provider-distributions` now exits 1
with a clear explanation and a workaround (use a plain checkout, or
pass `--distribution-format wheel` — wheels are unaffected) when it
detects it is running from a worktree and would build sdists.

Wheels and providers using hatchling with explicit sdist includes are
not affected, so only `sdist` and `both` formats trigger the check.

* Breeze: detect Docker-mounted worktrees in provider-sdist check

When Breeze runs prepare-provider-distributions from a git worktree, the worktree's .git file carries an absolute gitdir: pointer to the main repo's .git/worktrees/<name> directory. Only the worktree folder is bind-mounted into Breeze's Docker container, so that pointer target is unreachable from inside the build. flit's VCS detection then either fails or silently produces an incomplete sdist.

Expand check_flit_worktree_compatibility to parse .git and branch on the failure mode: read error, unexpected format, missing gitdir target (Docker mount case), or healthy host-side worktree. Tests cover all paths.

* Link upstream flit issue/PR + tracking issue at the workaround site

Add references to pypa/flit#798 and pypa/flit#799 in the docstring of check_flit_worktree_compatibility, and the Airflow tracking issue #65772 at both the helper and its call site. CLAUDE.md requires the tracking-issue URL to appear at the workaround site in the code so the follow-up work is discoverable from any grep or code review.
(cherry picked from commit ddf3c7a)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001 pushed a commit to apache/airflow that referenced this pull request May 20, 2026
…t worktree (#65771) (#65828)

* Breeze: fail fast when building provider sdists from a git worktree

flit's --use-vcs silently produces incomplete sdists when run from a
`git worktree add ...` directory, because flit.vcs.identify_vcs() checks
`(p / ".git").is_dir()` and in a worktree `.git` is a file (gitdir:
pointer). flit then falls back to a minimal sdist that omits docs/,
tests/, provider.yaml and other tracked files, and the resulting
packages fail reproducibility checks against released sdists on
dist.apache.org — with no warning.

`breeze release-management prepare-provider-distributions` now exits 1
with a clear explanation and a workaround (use a plain checkout, or
pass `--distribution-format wheel` — wheels are unaffected) when it
detects it is running from a worktree and would build sdists.

Wheels and providers using hatchling with explicit sdist includes are
not affected, so only `sdist` and `both` formats trigger the check.

* Breeze: detect Docker-mounted worktrees in provider-sdist check

When Breeze runs prepare-provider-distributions from a git worktree, the worktree's .git file carries an absolute gitdir: pointer to the main repo's .git/worktrees/<name> directory. Only the worktree folder is bind-mounted into Breeze's Docker container, so that pointer target is unreachable from inside the build. flit's VCS detection then either fails or silently produces an incomplete sdist.

Expand check_flit_worktree_compatibility to parse .git and branch on the failure mode: read error, unexpected format, missing gitdir target (Docker mount case), or healthy host-side worktree. Tests cover all paths.

* Link upstream flit issue/PR + tracking issue at the workaround site

Add references to pypa/flit#798 and pypa/flit#799 in the docstring of check_flit_worktree_compatibility, and the Airflow tracking issue #65772 at both the helper and its call site. CLAUDE.md requires the tracking-issue URL to appear at the workaround site in the code so the follow-up work is discoverable from any grep or code review.
(cherry picked from commit ddf3c7a)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flit sdist silently drops tracked files when built from a git worktree

2 participants