Skip to content

flit sdist silently drops tracked files when built from a git worktree #798

Description

@potiuk

Summary

flit build --format sdist (via --use-vcs, the default) silently produces an incomplete sdist when run from a linked git worktree (git worktree add ...). Tracked files such as docs/, tests/, data files declared in pyproject.toml [tool.flit.external-data], and any other content that isn't in flit.module are dropped. No warning is emitted.

Building the same commit from a plain git clone produces a correct sdist.

Root cause

flit/vcs/__init__.pyidentify_vcs:

if (p / '.git').is_dir():
    return git

In a linked git worktree, <worktree>/.git is a regular file whose contents look like:

gitdir: /path/to/main-repo/.git/worktrees/<name>

is_dir() returns False, identify_vcs returns None, and flit/sdist.py:SdistBuilder.select_files silently falls back to super().select_files() — the non-VCS minimal file set from flit_core. (The same gitdir: file pattern is used by git submodule — submodule roots are affected for the same reason.)

There is no log line that signals the fallback, so breakage is invisible until a downstream check (e.g. reproducibility verification) fires.

Reproducer

tmp=$(mktemp -d)
cd "$tmp"
git init -q demo && cd demo
mkdir mypkg && touch mypkg/__init__.py
cat > pyproject.toml <<'TOML'
[build-system]
requires = ["flit_core >=3.11,<4"]
build-backend = "flit_core.buildapi"

[project]
name = "mypkg"
version = "0.0.1"
description = "demo"

[tool.flit.sdist]
include = ["docs/"]
TOML
mkdir docs && echo "hello" > docs/readme.txt
git add -A && git commit -qm init

pip install --quiet flit

# Baseline: build from the plain checkout — sdist includes docs/readme.txt
flit build --format sdist && tar tzf dist/mypkg-0.0.1.tar.gz | grep docs

# Reproduce: build from a worktree of the same commit — docs/readme.txt is missing
cd "$tmp"
git -C demo worktree add ../wt HEAD -q
cd wt
flit build --format sdist && tar tzf dist/mypkg-0.0.1.tar.gz | grep docs

Impact

Discovered during PMC verification of an Apache Airflow provider release: 20 of 20 flit-built provider sdists diverged byte-for-byte from the release tarballs on dist.apache.org because the release was built from a plain checkout and the verifier (via automated tooling) ran from a worktree. Wheels matched (wheels don't go through this path). Context and the Airflow-side guard:

A similar failure shape surfaces when sources are bind-mounted from a worktree into a container: only the worktree dir is mounted, the absolute path inside the gitdir: pointer is not reachable from the container, and git ls-files fails or returns nothing.

Proposed fix

One-line change in flit/vcs/__init__.py to also accept .git when it is a file (worktree or submodule pointer). Submitting as PR to let you review:

-        if (p / '.git').is_dir():
+        git_entry = p / '.git'
+        if git_entry.is_dir() or git_entry.is_file():
             return git

Downstream git ls-files naturally validates the pointer; no extra parsing needed inside identify_vcs.

A secondary improvement (out of scope for the fix, but worth considering): log a WARNING from SdistBuilder.select_files when identify_vcs returns None but a .git entry exists at or above cfgdir, so the silent-fallback mode becomes loud when a VCS was clearly expected.

PR: will link once opened.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions