Skip to content

Conversation

@rwgk
Copy link
Collaborator

@rwgk rwgk commented Jan 10, 2026

This PR adds safety mechanisms to prevent version-related issues during builds and ensures version detection from git tags is working as intended.

Version Number Validation

Problem: setuptools-scm silently falls back to version 0.1.x when git tags are unavailable (e.g., due to shallow clones), which can lead to incorrect version detection and unexpected dependency resolution. Silent failures in the procedure producing __version__ can lead to highly confusing behavior and potentially invalidate elaborate QA testing.

Solution: We implement a two-layer defense strategy to catch invalid versions at multiple stages:

First Line of Defense: Import-Time Assertions

Fail-fast assertions are added immediately after importing __version__ in all three package __init__.py files:

  • cuda_bindings/cuda/bindings/__init__.py
  • cuda_core/cuda/core/__init__.py
  • cuda_pathfinder/cuda/pathfinder/__init__.py

Each file includes a minimal one-liner assertion:

assert tuple(int(_) for _ in __version__.split(".")[:2]) > (0, 1), "FATAL: invalid __version__"

This ensures that any attempt to import a package with an invalid version (e.g., 0.1.dev...) fails immediately with a clear error, preventing the package from being used at all. The assertion checks that major.minor > (0, 1), which is sufficient since all three packages are already at higher versions.

Second Line of Defense: Unit Tests

As a backup, we also implement late-stage detection via regular unit tests that validate version numbers after installation:

  • Adds validate_version_number() function to cuda_python_test_helpers for centralized validation logic
  • Creates minimal test_version_number.py files in cuda_bindings, cuda_core, and cuda_pathfinder that import the version and call the validation function
  • Tests are split into separate functions (test_bindings_version, test_core_version, test_pathfinder_version) so all invalid versions are reported in a single test run
  • Each test suite validates its own package version plus dependency versions (e.g., cuda_bindings tests check both cuda-bindings and cuda-pathfinder versions)
  • Provides clear error messages explaining the issue without referencing setuptools-scm internals

The unit tests run during the test phase and provide explicit test coverage with clearer error messages in CI logs.

Why Two Layers?

While the import-time assertions provide immediate feedback and prevent invalid packages from being imported, we maintain the unit tests as a second line of defense because:

  1. Redundancy: If the assertions somehow fail to catch an issue (e.g., due to import path quirks or edge cases), the unit tests provide a backup check
  2. Explicit Test Coverage: Unit tests make version validation an explicit, testable requirement rather than an implicit assertion
  3. CI Visibility: Test failures in CI logs are more visible and easier to debug than import-time assertion failures
  4. Defense in Depth: Silent failures in the procedure producing __version__ can lead to highly confusing behavior and potentially invalidate elaborate QA testing. Multiple detection points reduce the risk of invalid versions going undetected.

CI Workflow Hardening

To ensure version validation works correctly in CI environments, we've hardened the test workflows:

  • Intentional Shallow Clone: Test workflows explicitly use fetch-depth: 1 (the default) with a comment emphasizing that shallow cloning is intentional. This ensures we're testing wheel installation without full git history, which is the correct behavior for testing pre-built artifacts.

  • Wheel-Only Installation: The "Ensure cuda-python installable" step uses --only-binary=:all: to ensure we only test wheels, never build from source. This prevents pip from building packages from source when source code is present, which could lead to version issues in shallow clones.

  • Immediate Version Verification: After installing cuda-python, a new "Verify installed package versions" step imports cuda.pathfinder and cuda.bindings to trigger the __version__ assertions immediately. This provides early detection of invalid versions right after installation, before any tests run.

These changes ensure that:

  • We're testing the actual wheel artifacts, not building from source
  • Version issues are caught as early as possible in the CI pipeline
  • The test workflows are resilient even with shallow clones

Why Not Early-Stage Detection?

We initially attempted early-stage detection in build hooks (build_wheel, build_editable) to catch fallback versions during the build process. However, this approach proved too fragile:

  1. Timing Issues: _version.py files are written by setuptools-scm during prepare_metadata_for_build_wheel, but validation needs to run at the right time in the build process. Attempting to validate too early results in "file not found" errors, while validating too late allows builds to complete with invalid versions.

  2. Shallow Clone Handling: When setuptools-scm detects a shallow clone, it bypasses git_describe_command entirely and falls back to 0.0 or 0.1.x versions before our validation can run. This makes build-time detection unreliable in CI environments that use shallow clones.

  3. Complexity: The build hook approach required careful coordination between PEP 517 hooks (prepare_metadata_for_build_wheel, build_wheel) and custom validation logic, making it error-prone and difficult to maintain.

Given these challenges, we decided to use the simpler and more certain approach: import-time assertions for immediate feedback, unit tests for explicit coverage, and CI workflow hardening to ensure the validation works correctly in all environments.

PyPI Fallback Prevention

Problem: During isolated (PEP 517) builds, a just-built cuda-bindings installation could be incorrectly replaced with a PyPI wheel if the installed version didn't match expectations.

Solution: Enhanced cuda_core/build_hooks.py to:

  • Check installed cuda-bindings version using direct import of cuda.bindings._version (note that importlib.metadata cannot be used in isolated environments)
  • Detect editable installs by checking if the _version.py file path is within the repository root
  • Prevent replacement of editable installs
  • Ensure version compatibility: if cuda-bindings is installed (non-editable) and its major version doesn't match the CUDA major version, raise an exception

This prevents accidental installation of incompatible cuda-bindings versions from PyPI during builds.


Piggy-backed:

Import Sorting Fix

Problem: Ruff's import sorting (I001) was inconsistently reordering the _version import in cuda_pathfinder/cuda/pathfinder/__init__.py, depending on whether _version.py exists or not (e.g., after git clean -fdx).

Solution: Added # isort: skip directive to the _version import line to prevent ruff from moving it. This ensures consistent import ordering regardless of build state.

Note on setuptools-scm RuntimeWarning

We've observed RuntimeWarnings from setuptools-scm that incorrectly display package versions instead of setuptools versions (e.g., ERROR: setuptools==0.5.1.dev20+gf8dddb370 is used in combination with setuptools-scm>=8.x). This appears to be a known issue in setuptools-scm when using custom build backends (see setuptools-scm issue #1192). A minimal reproducer has been created at github.com/rwgk/setuptools-scm-issue-1192.

These warnings don't affect functionality but are noisy. They occur on both main and this branch.

rwgk added 5 commits January 9, 2026 13:30
Add pre-build validation that checks git tag availability directly to ensure
builds fail early with clear error messages before setuptools-scm silently
falls back to version '0.1.x'.

Changes:
- cuda_bindings/setup.py: Validate tags at import time (before setuptools-scm)
- cuda_core/build_hooks.py: Validate tags in _build_cuda_core() before building
- cuda_pathfinder/build_hooks.py: New custom build backend that validates tags
  before delegating to setuptools.build_meta
- cuda_pathfinder/pyproject.toml: Configure custom build backend

Benefits:
- Fails immediately when pip install -e . is run, not during build
- More direct: tests what setuptools-scm actually needs (git describe)
- Cleaner: no dependency on generated files
- Better UX: clear error messages with actionable fixes

Error messages include:
- Clear explanation of the problem
- The actual git error output
- Common causes (tags not fetched, wrong directory, etc.)
- Package-specific debugging commands
- Actionable fix: git fetch --tags
Add validation in _get_cuda_bindings_require() to check if cuda-bindings
is already installed and validate its version compatibility.

Strategy:
- If cuda-bindings is not installed: require matching CUDA major version
- If installed from sources (editable): keep it regardless of version
- If installed from wheel: validate major version matches CUDA major
- Raise clear error if version mismatch detected

This prevents accidentally using PyPI versions that don't match the
CUDA toolkit version being used for compilation.

Changes:
- Add _check_cuda_bindings_installed() to detect installation status
- Check for editable installs via direct_url.json, repo location, or .egg-link
- Validate version compatibility in _get_cuda_bindings_require()
- Move imports to module level (PEP 8 compliance)
- Add noqa: S110 for broad exception handling (intentional)
Remove Methods 2 and 3 for detecting editable installs, keeping only
PEP 610 (direct_url.json) which is the standard for Python 3.10+ and
pip 20.1+.

Changes:
- Remove Method 2: import cuda.bindings to check repo location
  (problematic during build requirement phase)
- Remove Method 3: .egg-link file detection (obsolete for Python 3.10+)
- Keep only PEP 610 method (direct_url.json) which is reliable and
  doesn't require importing modules during build

This fixes build errors caused by importing cuda.bindings during the
build requirement phase, which interfered with Cython compilation.
…s detection

Replace importlib.metadata.distribution() with direct import of
cuda.bindings._version module. The former may incorrectly return
the cuda-core distribution when queried for 'cuda-bindings' in
isolated build environments (tested with Python 3.12 and pip 25.3).
This may be due to cuda-core metadata being written during the
build process before cuda-bindings is fully available, causing
importlib.metadata to return the wrong distribution.

Also ensure cuda-bindings is always required in build environment
by returning ['cuda-bindings'] instead of [] when already installed.
This ensures pip makes it available in isolated build environments
even if installed elsewhere.

Fix import sorting inconsistency for _version import in cuda_pathfinder
by adding 'isort: skip' directive.
Make _validate_git_tags_available() take tag_pattern as parameter and
ensure all three implementations (cuda-core, cuda-pathfinder, cuda-bindings)
are identical. Add sync comments to remind maintainers to keep them in sync.

Also fix ruff noqa comments: S603 on subprocess.run() line, S607 on
list argument line.
@rwgk rwgk self-assigned this Jan 10, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 10, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 10, 2026

/ok to test

@github-actions
Copy link

rwgk added 2 commits January 10, 2026 13:07
Replace _validate_git_tags_available() functions with DRY shim/runner pattern:
- Create scripts/git_describe_command_runner.py: shared implementation
- Create git_describe_command_shim.py in each package: thin wrappers that
  check for scripts/ directory and delegate to the runner
- Update pyproject.toml files to use git_describe_command_shim.py
- Remove all three copies of _validate_git_tags_available() from
  build_hooks.py and setup.py

Benefits:
- DRY: single implementation in scripts/
- Portable: Python is always available (no git in PATH requirement)
- Clear error messages: shims check for scripts/ and provide context
- No import-time validation: only runs when setuptools-scm calls it
- Cleaner code: cuda_pathfinder/build_hooks.py is now just 13 lines

All three shim files are identical copies and must be kept in sync.
@rwgk
Copy link
Collaborator Author

rwgk commented Jan 10, 2026

/ok to test

Remove unnecessary shim files and use scripts/git_describe_wrapper.py directly.

Since setuptools_scm runs from repo root (root = ".."), we can call
the shared wrapper script directly without package-specific shims.

Changes:
- Remove all three git_describe_command_shim.py files
- Update pyproject.toml files to use scripts/git_describe_wrapper.py
- Remove cuda_pathfinder/build_hooks.py (was just pass-through)
- Remove pre-commit hook for checking shim files
- Rename git_describe_command_runner.py to git_describe_wrapper.py

This simplifies the codebase while maintaining the same functionality:
- Single shared implementation for git describe
- Clear error messages when tags are missing
- Works correctly from repo root where setuptools-scm runs
@rwgk rwgk force-pushed the version-safety-checks branch from 4dfd289 to 1b2e4c0 Compare January 11, 2026 02:35
@rwgk
Copy link
Collaborator Author

rwgk commented Jan 11, 2026

/ok to test

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 11, 2026

setuptools-scm Shallow Clone Fallback Issue (exists already on main)

Summary

During CI testing, we discovered that cuda-pathfinder is being built with fallback version 0.1.dev1+g1b2e4c088 instead of the expected 1.3.4.dev77+g1b2e4c088 in some Windows test runs. This occurs when setuptools-scm detects a shallow Git clone and silently falls back to the default version without calling our git_describe_wrapper.py.

Root Cause

  1. Shallow Clone Detection: setuptools-scm detects shallow clones early in its version detection process
  2. Early Fallback: When a shallow clone is detected, setuptools-scm bypasses git_describe_command entirely and falls back to 0.0 or 0.1.x versions
  3. Wrapper Never Called: Our git_describe_wrapper.py is never executed because setuptools-scm decides to skip version detection before attempting to call it

Evidence

Test Results

The fallback version 0.1.dev1+g1b2e4c088 was detected in 18 Windows test files (all Windows tests):

  • Test_win-64___py3.10__12.9.1__wheels__rtx2080__WDDM_.txt
  • Test_win-64___py3.10__13.0.2__local__rtxpro6000__TCC_.txt
  • Test_win-64___py3.10__13.1.0__local__rtxpro6000__TCC_.txt
  • Test_win-64___py3.11__12.9.1__local__v100__MCDM_.txt
  • Test_win-64___py3.11__13.0.2__wheels__rtx4090__WDDM_.txt
  • Test_win-64___py3.11__13.1.0__wheels__rtx4090__WDDM_.txt
  • Test_win-64___py3.12__12.9.1__wheels__l4__MCDM_.txt
  • Test_win-64___py3.12__13.0.2__local__a100__TCC_.txt
  • Test_win-64___py3.12__13.1.0__local__a100__TCC_.txt
  • Test_win-64___py3.13__12.9.1__local__l4__TCC_.txt
  • Test_win-64___py3.13__13.0.2__wheels__rtxpro6000__MCDM_.txt
  • Test_win-64___py3.13__13.1.0__wheels__rtxpro6000__MCDM_.txt
  • Test_win-64___py3.14__12.9.1__wheels__v100__TCC_.txt
  • Test_win-64___py3.14__13.0.2__local__l4__MCDM_.txt
  • Test_win-64___py3.14__13.1.0__local__l4__MCDM_.txt
  • Test_win-64___py3.14t__12.9.1__local__l4__TCC_.txt
  • Test_win-64___py3.14t__13.0.2__wheels__a100__MCDM_.txt
  • Test_win-64___py3.14t__13.1.0__wheels__a100__MCDM_.txt

Notes:

  • All affected tests are Windows-based. Linux tests appear to have proper version numbers.

  • We found 0.1.dev also in the same 18 CI logs from the latest run on main

Log Evidence

From Test_win-64___py3.14t__12.9.1__local__l4__TCC_.txt:

2026-01-11T03:06:35.1995098Z   C:\Windows\Temp\pip-build-env-qw2xma_m\overlay\Lib\site-packages\setuptools_scm\git.py:202: UserWarning: "C:\actions-runner\_work\cuda-python\cuda-python" is shallow and may cause errors
2026-01-11T03:06:35.1995894Z     warnings.warn(f'"{wd.path}" is shallow and may cause errors')
2026-01-11T03:06:36.0998424Z   Created wheel for cuda-pathfinder: filename=cuda_pathfinder-0.1.dev1+g1b2e4c088-py3-none-any.whl
2026-01-11T03:06:54.4353416Z cuda-bindings 12.9.6.dev2+g563cd83db requires cuda-pathfinder~=1.1, but you have cuda-pathfinder 0.1.dev1+g1b2e4c088 which is incompatible.

Local Testing

We tested locally with a shallow clone and confirmed:

  1. Wrapper Not Called: Even with git_describe_command configured, the wrapper script is never executed
  2. Silent Fallback: setuptools-scm detects the shallow clone, warns about it, but proceeds with fallback version 0.0 or 0.1.x
  3. No Error: The build succeeds with the wrong version, causing dependency conflicts

setuptools-scm Behavior

From setuptools-scm source code (git.py):

def version_from_describe(...):
    if describe_command is not None:
        # Only called if setuptools-scm decides to attempt version detection
        describe_res = _run(describe_command, wd.path)
    else:
        describe_res = wd.default_describe()
    
    return describe_res.parse_success(parse=parse_describe)

# In _git_parse_inner:
version = version_from_describe(wd, config, describe_command)

if version is None:
    # Falls back to 0.0 or configured fallback_version
    tag = config.version_cls(config.fallback_version or "0.0")
    # ... creates version with fallback

Key Finding: When setuptools-scm detects a shallow clone early (via is_shallow()), it may skip calling git_describe_command entirely and go straight to the fallback.

Impact

  1. CI Testing: Windows CI runs are building cuda-pathfinder with incorrect versions
  2. Dependency Conflicts: The fallback version 0.1.dev1 doesn't satisfy cuda-bindings requirement ~=1.1
  3. SWQA Team Risk: If SWQA uses shallow clones, they'll encounter the same issue
  4. Silent Failure: The build succeeds but with wrong version, making it hard to detect

Attempted Solutions

1. git_describe_wrapper.py Enhancement

We enhanced scripts/git_describe_wrapper.py to detect shallow clones proactively:

# Check if repository is shallow
result = subprocess.run(
    ["git", "rev-parse", "--is-shallow-repository"],
    capture_output=True,
    text=True,
    timeout=5,
)
if result.returncode == 0 and result.stdout.strip() == "true":
    print("ERROR: Repository is a shallow clone.", file=sys.stderr)
    sys.exit(1)

Result: ❌ Doesn't work - The wrapper is never called when setuptools-scm detects a shallow clone.

2. setuptools-scm fail_on_shallow Option

setuptools-scm provides a pre_parse = "fail_on_shallow" option that should fail builds on shallow clones.

Result: ❌ Didn't work in our tests - May require different configuration or setuptools-scm version.

References

rwgk added 2 commits January 10, 2026 21:55
Add build-time validation to detect when setuptools-scm falls back to
default versions (0.0.x or 0.1.dev*) due to shallow clones or missing
git tags. This prevents silent failures that cause dependency conflicts.

Changes:
- scripts/validate_version.py: New DRY validation script that checks
  for fallback versions and validates against expected patterns
- cuda_core/build_hooks.py: Add validation in prepare_metadata hooks
- cuda_pathfinder/build_hooks.py: New build hooks with version validation
- cuda_pathfinder/pyproject.toml: Use custom build_hooks backend
- cuda_bindings/setup.py: Add ValidateVersion command class

The validation runs after setuptools-scm generates _version.py files,
ensuring we catch fallback versions before builds complete. This will
cause the 18 Windows CI tests that currently use fallback versions to
fail with clear error messages instead of silently using wrong versions.

Related to shallow clone issue documented in PR NVIDIA#1454.
…etadata

Move validation from prepare_metadata_for_build_* to build_editable/build_wheel
where _version.py definitely exists. This fixes build failures where validation
ran before setuptools-scm wrote the version file.
@rwgk
Copy link
Collaborator Author

rwgk commented Jan 11, 2026

/ok to test

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 11, 2026

/ok to test

rwgk added 2 commits January 11, 2026 12:14
This commit adds version number validation tests to detect when automatic
version detection fails and falls back to invalid versions (e.g., 0.0.x or
0.1.dev*). This addresses two main concerns:

1. Fallback version numbers going undetected: When setuptools-scm cannot
   detect version from git tags (e.g., due to shallow clones), it silently
   falls back to default versions like 0.1.dev*. These invalid versions can
   cause dependency conflicts and confusion in production/SWQA environments.

2. PyPI wheel replacement: The critical issue of just-built cuda-bindings
   being replaced with PyPI wheels is already handled by
   _check_cuda_bindings_installed() in cuda_core/build_hooks.py.

Rather than attempting complex early detection in build hooks (which proved
fragile due to timing issues with when _version.py files are written), we
implement late-stage detection via test files. This approach is:
- Simpler: No complex build hook timing issues
- Reliable: Tests run after installation when versions are definitely available
- Sufficient: Catches issues before they reach production/SWQA

Changes:
- Add validate_version_number() function to cuda_python_test_helpers for
  centralized validation logic
- Create minimal test_version_number.py files in cuda_bindings, cuda_core,
  and cuda_pathfinder that import the version and call the validation function
- Add helpers/__init__.py files in cuda_bindings/tests and
  cuda_pathfinder/tests to enable importing from cuda_python_test_helpers
- Update cuda_core/tests/helpers/__init__.py to use ModuleNotFoundError
  instead of ImportError for consistency

The validation checks that versions have major.minor > 0.1, which is
sufficient since all three packages are already at higher versions. Error
messages explain the issue without referencing setuptools-scm internals.
@rwgk
Copy link
Collaborator Author

rwgk commented Jan 11, 2026

/ok to test

The supports_ipc_mempool function was previously defined in
cuda_python_test_helpers, but it had a hard dependency on
cuda.core._utils.cuda_utils.handle_return. This caused CI failures
when cuda_bindings or cuda_pathfinder tests tried to import
cuda_python_test_helpers, because cuda-core might not be installed
in those test environments.

By moving supports_ipc_mempool to cuda_core/tests/helpers/__init__.py,
we ensure that:
- cuda_python_test_helpers remains free of cuda-core-specific dependencies
- The function is only available where cuda-core is guaranteed to be
  installed (i.e., in cuda_core tests)
- cuda_bindings and cuda_pathfinder can safely import
  cuda_python_test_helpers without requiring cuda-core

Changes:
- Move supports_ipc_mempool from cuda_python_test_helpers to
  cuda_core/tests/helpers/__init__.py
- Update cuda_core/tests/test_memory.py to import from helpers
  instead of cuda_python_test_helpers
- Remove unused imports (functools, Union, handle_return) from
  cuda_python_test_helpers/__init__.py
- Remove supports_ipc_mempool from cuda_python_test_helpers __all__

This fixes CI failures where importing cuda_python_test_helpers
would fail due to missing cuda-core dependencies.
@rwgk
Copy link
Collaborator Author

rwgk commented Jan 11, 2026

/ok to test

The main purpose of these tests is to validate that dependencies have
valid version numbers, not just the package being tested. This is
critical for catching cases where a dependency (e.g., cuda-pathfinder)
might be built with a fallback version (0.1.dev...) due to shallow
git clones or missing tags.

To ensure we see all invalid versions in a single test run, we organize
the tests as separate test functions (test_bindings_version,
test_core_version, test_pathfinder_version) rather than combining them
into a single function. This way, if multiple packages have invalid
versions, pytest will report all failures rather than stopping at the
first one.

Changes:
- cuda_bindings/tests/test_version_number.py: Tests both cuda-bindings
  and cuda-pathfinder versions
- cuda_core/tests/test_version_number.py: Tests cuda-bindings,
  cuda-core, and cuda-pathfinder versions
- cuda_pathfinder/tests/test_version_number.py: Tests cuda-pathfinder
  version (renamed function for consistency)
@rwgk
Copy link
Collaborator Author

rwgk commented Jan 12, 2026

/ok to test

rwgk added 2 commits January 11, 2026 20:23
Add minimal assertions immediately after importing __version__ in all three
package __init__.py files to fail fast if an invalid version (e.g., 0.1.dev...)
is detected. This prevents packages with fallback versions from being imported
or used, catching the issue at the earliest possible point.

The assertion checks that major.minor > (0, 1) using a minimal one-liner:
    assert tuple(int(_) for _ in __version__.split(".")[:2]) > (0, 1), "FATAL: invalid __version__"

Strictly speaking this makes the unit tests redundant, but we want to keep
the unit tests as a second line of defense. The assertions provide immediate
feedback during import, while the unit tests provide explicit test coverage
and clearer error messages in CI logs.

Changes:
- cuda_bindings/cuda/bindings/__init__.py: Add version assertion
- cuda_core/cuda/core/__init__.py: Add version assertion
- cuda_pathfinder/cuda/pathfinder/__init__.py: Add version assertion
Use an intentionally shallow clone (fetch-depth: 1) to test wheel installation
without full git history. This ensures we're testing the wheel artifacts
themselves, not building from source.

Changes:
- Set fetch-depth: 1 explicitly (although it is the default) with comment
  emphasizing that shallow cloning is intentional
- Add --only-binary=:all: to cuda-python installation to ensure we only test
  wheels, never build from source
- Add "Verify installed package versions" step that imports cuda.pathfinder
  and cuda.bindings to trigger __version__ assertions immediately after
  installation, providing early detection of invalid versions
- Update comments to accurately reflect that we're testing wheel artifacts

This approach hardens the test workflows by:
- Making the shallow clone intentional and explicit
- Actually testing that __version__ assertions work (fail-fast on invalid
  versions)
- Catching version issues immediately after installation, before tests run
- Ensuring we only test wheels, not source builds

Applied consistently to both:
- .github/workflows/test-wheel-windows.yml
- .github/workflows/test-wheel-linux.yml
@rwgk rwgk changed the title Implement build-time version validation: git tag checks and PyPI fallback prevention Add __version__ validation: import-time assertions, unit tests, PyPI fallback prevention, CI hardening Jan 12, 2026
@rwgk
Copy link
Collaborator Author

rwgk commented Jan 12, 2026

/ok to test

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 12, 2026

CI Logs Analysis: PR #1454 vs Main Branch

Summary

After carefully analyzing the CI logs from PR #1454 (/wrk/logs_20909188905) and comparing them to the main branch logs (/wrk/main_logs_20871705999), I found that:

  1. 0.1.dev versions still appear in both PR and main branch logs (18 Windows test files in each)
  2. However, they are caught and replaced before tests run
  3. All version tests pass - no invalid versions reach the actual test execution
  4. The root cause is a workflow bug in the Windows test workflow

Detailed Findings

0.1.dev Versions Still Appear

PR Logs: Found cuda-pathfinder-0.1.dev1+g233447b1b in 18 Windows test files
Main Logs: Found cuda-pathfinder-0.1.dev1+gacc78f7c0 in 18 Windows test files

The same number of occurrences suggests this is a pre-existing issue, not something introduced by the PR.

Where They Come From

The bad versions are built during the "Install cuda.pathfinder extra wheels for testing" step. The issue is in the Windows workflow:

Windows workflow (.github/workflows/test-wheel-windows.yml line 259):

pip install --only-binary=:all: -v . --group "test-cu${TEST_CUDA_MAJOR}"

Linux workflow (.github/workflows/test-wheel-linux.yml line 292):

pip install --only-binary=:all: -v ./*.whl --group "test-cu${TEST_CUDA_MAJOR}"

The Problem: When pip processes . (a directory), it needs to prepare metadata to determine dependencies, which triggers setuptools-scm to build from source (in a shallow clone, resulting in 0.1.dev1). The --only-binary=:all: flag doesn't prevent metadata preparation - it only prevents installing from source distributions.

When pip processes ./*.whl, it installs directly from the wheel file without needing to prepare metadata, avoiding the source build entirely.

What Happens Next

The sequence in affected Windows tests:

  1. Good version installed first: cuda-pathfinder-1.3.4.dev85+g233447b1b (from wheel artifact)
  2. Version verification passes: The "Verify installed package versions" step successfully imports cuda.pathfinder and cuda.bindings (lines 15404-15405)
  3. Bad version built: During "Install cuda.pathfinder extra wheels", pip builds 0.1.dev1 from source due to the . directory issue
  4. Bad version installed: Successfully installed cuda-pathfinder-0.1.dev1+g233447b1b (line 15677)
  5. Bad version replaced: Uninstalling cuda-pathfinder-0.1.dev1+g233447b1b then Successfully installed cuda-pathfinder-1.3.4.dev85+g233447b1b (lines 15758-15760)
  6. Tests run with good version: All test_version_number tests pass

Version Tests Status

All version tests pass in both PR and main branch logs:

  • test_bindings_version PASSED
  • test_core_version PASSED
  • test_pathfinder_version PASSED

This confirms that:

  • The import-time assertions work correctly (version verification step passes)
  • The unit tests work correctly (all version tests pass)
  • No invalid versions reach actual test execution

Comparison: PR vs Main

Metric PR Logs Main Logs
Files with 0.1.dev versions 18 (Windows) 18 (Windows)
Version test failures 0 0
Import-time assertion failures 0 0

Key Difference: The PR has the "Verify installed package versions" step that explicitly tests the import-time assertions, providing early detection. The main branch doesn't have this step, but also doesn't fail because the bad versions get replaced before tests run.

Remaining Issue

While the bad versions are caught and replaced, they still cause:

  1. Dependency conflict warnings: cuda-bindings requires cuda-pathfinder~=1.1, but you have cuda-pathfinder 0.1.dev1+g233447b1b which is incompatible
  2. Inefficiency: Unnecessary source builds and reinstallations
  3. Risk: If the replacement step were to fail, tests would run with the bad version

Recommended Fix

Change the Windows workflow to match the Linux workflow:

- pip install --only-binary=:all: -v . --group "test-cu${TEST_CUDA_MAJOR}"
+ pip install --only-binary=:all: -v ./*.whl --group "test-cu${TEST_CUDA_MAJOR}"

This will prevent pip from building from source during metadata preparation, eliminating the temporary 0.1.dev versions entirely.

Conclusion

The PR's version validation mechanisms are working correctly:

  • ✅ Import-time assertions catch invalid versions immediately
  • ✅ Unit tests provide explicit coverage and pass
  • ✅ CI workflow hardening provides early detection

The remaining 0.1.dev versions are a pre-existing workflow bug (Windows using . instead of ./*.whl) that causes temporary bad versions, but they are caught and replaced before tests run. The fix is straightforward: update the Windows workflow to use ./*.whl instead of ..

Change pip install command from '.' to './*.whl' to prevent pip from
building from source during metadata preparation. This matches the Linux
workflow and eliminates the temporary 0.1.dev versions that were being
built in shallow clones.

See: NVIDIA#1454 (comment)
@rwgk
Copy link
Collaborator Author

rwgk commented Jan 12, 2026

/ok to test

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 12, 2026

CI Logs Analysis: PR #1454 (post commit c574a94)

CI Run: /wrk/logs_20910873985

Executive Summary

All checks passed successfully. The fix to use ./*.whl instead of . in the Windows workflow has eliminated the 0.1.dev fallback versions that were previously appearing. All version validation mechanisms are working as intended.

Key Findings

1. No 0.1.dev Package Versions Detected

Search Results: Comprehensive grep across all log files found zero instances of cuda-pathfinder, cuda-bindings, or cuda-core packages with 0.1.dev versions.

  • All installed packages show valid versions:
    • cuda-pathfinder-1.3.4.dev86+gc574a94c2
    • cuda-bindings-13.1.2.dev72+gc574a94c2
    • cuda-core-0.5.1.dev39+gc574a94c2

2. Windows Workflow Fix Confirmed

The fix to change pip install --only-binary=:all: -v . to pip install --only-binary=:all: -v ./*.whl is working correctly:

Example from Test_win-64___py3.14t__13.1.0__wheels__a100__MCDM_.txt:

2026-01-12T07:27:30.4348704Z + pip install --only-binary=:all: -v ./cuda_pathfinder-1.3.4.dev86+gc574a94c2-py3-none-any.whl --group test-cu13
2026-01-12T07:27:31.1679583Z Processing c:\actions-runner\_work\cuda-python\cuda-python\cuda_pathfinder\cuda_pathfinder-1.3.4.dev86+gc574a94c2-py3-none-any.whl

Pip is now correctly processing the wheel file directly, preventing any source builds that would trigger setuptools-scm in shallow clones.

3. Version Validation Tests Passing

All version validation tests are passing across all platforms:

Windows Examples:

  • tests/test_version_number.py::test_pathfinder_version PASSED
  • tests/test_version_number.py::test_bindings_version PASSED
  • tests/test_version_number.py::test_core_version PASSED

Linux Examples:

  • tests/test_version_number.py::test_pathfinder_version PASSED
  • tests/test_version_number.py::test_bindings_version PASSED
  • tests/test_version_number.py::test_core_version PASSED

4. Import-Time Verification Steps Executing

The CI workflow hardening steps are executing successfully:

Windows (Test_win-64___py3.14t__13.1.0__wheels__a100__MCDM_.txt):

2026-01-12T07:27:30.1648632Z + python -c 'import cuda.pathfinder'
2026-01-12T07:27:30.2797131Z + python -c 'import cuda.bindings'

Linux (Test_linux-64___py3.14t__13.1.0__local__l4.txt):

2026-01-12T07:21:49.6369200Z + python -c 'import cuda.pathfinder'
2026-01-12T07:21:49.7363396Z + python -c 'import cuda.bindings'

These steps successfully trigger the import-time assertions in __init__.py files, providing immediate feedback if invalid versions are present.

5. No Dependency Conflicts

Search Results: No dependency conflict warnings related to version mismatches. The only "incompatible" matches found were from unrelated test names (test_from_buffer_incompatible_dtype_and_itemsize), which are expected test cases.

6. No Assertion Errors

Search Results: Zero instances of:

  • FATAL: invalid __version__
  • AssertionError.*version
  • Invalid version number detected

This confirms that:

  1. All packages have valid versions
  2. Import-time assertions are not triggering
  3. Unit tests are not encountering invalid versions

Comparison with Previous Analysis

Previous Issue (Logs /wrk/logs_20909188905)

In the previous CI run, we identified that Windows workflows were building cuda-pathfinder from source with 0.1.dev versions due to pip treating . as a source distribution, even with --only-binary=:all:. The bad versions were then replaced by wheels before tests ran, masking the issue.

Current State (Logs /wrk/logs_20910873985)

Issue Resolved: The fix to use ./*.whl instead of . ensures pip installs directly from wheel files, preventing any source builds in shallow clones.

Build Workflow Analysis

Build Jobs

All build jobs show cuda-pathfinder being built from source (expected behavior in build workflows):

Building wheels for collected packages: cuda-pathfinder

However, these builds occur in contexts where full git history is available (build jobs fetch tags), so setuptools-scm correctly generates versions like 1.3.4.dev86+gc574a94c2.

Test Jobs (Wheel Installation)

Test jobs that install from wheels show:

  • Processing c:\actions-runner\_work\cuda-python\cuda-python\cuda_pathfinder\cuda_pathfinder-1.3.4.dev86+gc574a94c2-py3-none-any.whl
  • No source builds triggered
  • All packages installed with correct versions

Two-Layer Defense Verification

Layer 1: Import-Time Assertions ✅

The assert tuple(int(_) for _ in __version__.split(".")[:2]) > (0, 1) statements in __init__.py files are:

  • Present in all three packages (cuda.bindings, cuda.core, cuda.pathfinder)
  • Not triggering (no assertion errors in logs)
  • Verified by the explicit import steps in CI workflows

Layer 2: Unit Tests ✅

The test_version_number.py tests are:

  • Running successfully across all platforms
  • Passing for all three packages
  • Providing granular reporting (separate tests for each package)

Conclusion

The PR's version validation strategy is working as designed:

  1. Import-time assertions provide fail-fast detection of invalid versions
  2. Unit tests provide explicit coverage and granular reporting
  3. CI workflow hardening ensures wheel-only installation in test workflows
  4. Windows workflow fix prevents source builds that would trigger setuptools-scm fallback in shallow clones

No issues detected. The PR is ready for review.

Recommendations

  1. Ready for merge: All validation mechanisms are functioning correctly
  2. Monitoring: Continue monitoring CI logs for any future 0.1.dev versions (should not occur with current fixes)
  3. Documentation: The PR description accurately reflects the implemented solution

@rwgk rwgk marked this pull request as ready for review January 12, 2026 08:03
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 12, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk rwgk requested a review from mdboom January 12, 2026 08:04
from cuda.bindings import utils
from cuda.bindings._version import __version__

assert tuple(int(_) for _ in __version__.split(".")[:2]) > (0, 1), "FATAL: invalid __version__"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this outside of an opt-in debug mechanism. If someone grabbed the source code in some weird way that breaks the version resolution via setuptools_scm, we don't want it to error for them unnecessarily, where getting an "incorrect" version would be better than getting this assertion.


import importlib

assert tuple(int(_) for _ in __version__.split(".")[:2]) > (0, 1), "FATAL: invalid __version__"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +33 to +34
@functools.cache
def supports_ipc_mempool(device_id: Union[int, object]) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move this helper in this PR? It would ideally be done in a separate PR.


from cuda.pathfinder._version import __version__ # isort: skip

assert tuple(int(_) for _ in __version__.split(".")[:2]) > (0, 1), "FATAL: invalid __version__"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants