Skip to content

ci(publish): retry testpypi-smoke install to tolerate index propagation lag#57

Merged
kj-podonos merged 2 commits into
mainfrom
kj-podonos/fix-testpypi-smoke
Jun 18, 2026
Merged

ci(publish): retry testpypi-smoke install to tolerate index propagation lag#57
kj-podonos merged 2 commits into
mainfrom
kj-podonos/fix-testpypi-smoke

Conversation

@kj-podonos

Copy link
Copy Markdown
Contributor

Summary

  • Fixes the flaky testpypi-smoke job in Publish (TestPyPI).
  • Root cause: TestPyPI's /simple/ index is eventually-consistent (Fastly CDN). The test-pypi job uploads onepin==X.Y.Z.devN and succeeds; testpypi-smoke runs ~8s later and pip installs that exact pinned version, racing the index before it propagates → spurious No matching distribution.
  • Verified on run 27736392953: buildsmoke-installtest-pypi all passed, only testpypi-smoke failed; the version 0.6.1.dev4 was present on TestPyPI ~1h later (HTTP 200). Upload was fine — smoke just ran too early.

Change

  • Wrap the single pip install … && onepin --version in a retry loop: 10 attempts × 30s (~5 min ceiling), early-exit on success.
  • Add --no-cache-dir so pip doesn't replay a cached negative index response across attempts.
  • Route the version through env: VERSION (referenced as "${VERSION}") — GitHub-recommended script-injection mitigation, defense-in-depth atop the existing PEP-440 validation.
  • set -euo pipefail + explicit exit 0/exit 1 preserves correct pass/fail semantics. Happy path unchanged (attempt 1 passes if index warm).

Reviews

  • Codex (/codex review, xhigh): PASS — 0 findings. "Retry loop correctly handles TestPyPI propagation delays; passes actionlint + shell syntax; no regressions."
  • Claude (code-review, Opus): APPROVE — 0 critical/high. 1 MEDIUM (awareness: Fastly edge negative-caching is outside --no-cache-dir scope; the ~5-min budget covers it in practice) + 3 LOW (optional polish: empty-VERSION guard, trailing-iteration sleep, --extra-index-url dep-confusion). None blocking.

Verification

  • python -c "import yaml; yaml.safe_load(...)" — parses, 5 jobs intact.
  • actionlint .github/workflows/publish.yml — clean.
  • bash -n on the embedded retry script — clean.
  • Live exercise on next main push touching build inputs (or workflow_dispatch).

Notes

  • Infra-only change (.github/workflows/); no Python touched, so the diff-cover/onepin._cli gate is not triggered.
  • promote-prod.yml (PyPI lane) has a pre-publish preflight but no post-publish smoke install — no analogous race, no change needed.

🤖 Generated with Claude Code

…on lag

TestPyPI's /simple/ index is eventually-consistent (Fastly CDN): a just-
uploaded version can take tens of seconds to a couple minutes to become
installable. The smoke step ran a single pip install seconds after the
test-pypi upload, racing the index and failing spuriously with "No matching
distribution". Wrap the install in a retry loop (10x30s, ~5min ceiling,
early-exit on success) and add --no-cache-dir so pip does not replay a
cached negative index response. Route the version through env: for
script-injection safety.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kj-podonos kj-podonos self-assigned this Jun 18, 2026
Document the failure mode fixed in the prior commit (TestPyPI /simple/
index propagation lag racing the post-publish smoke install) and its
retry mitigation as row 8 of the pre-mortem table.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@kdh-podonos kdh-podonos left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kj-podonos kj-podonos merged commit ee77566 into main Jun 18, 2026
22 checks passed
@kj-podonos kj-podonos deleted the kj-podonos/fix-testpypi-smoke branch June 18, 2026 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants