Skip to content

test(e2e): auto-retry transient python-e2e flakes (pytest-rerunfailures)#278

Merged
v1r3n merged 1 commit into
mainfrom
fix/python-e2e-rerun-flaky
Jun 20, 2026
Merged

test(e2e): auto-retry transient python-e2e flakes (pytest-rerunfailures)#278
v1r3n merged 1 commit into
mainfrom
fix/python-e2e-rerun-flaky

Conversation

@v1r3n

@v1r3n v1r3n commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Summary

The python-e2e CI job drives a real server + real LLM, so individual tests flake nondeterministically — and a single transient failure fails the whole job. While verifying an unrelated fix (#277), consecutive runs each failed a different, unrelated test:

Run Failing test Cause
1 test_after_tool_callback_executes workflow still RUNNING at client timeout
2 test_http_lifecycle@credentials "Run stalled at tool-calling stage"

These are transient infra/LLM-latency flakes, not regressions.

Fix

Add pytest-rerunfailures and mark every e2e item flaky(reruns=2, reruns_delay=5) via the e2e conftest.py (pytest_collection_modifyitems, scoped to items carrying the e2e marker). A genuinely broken test still fails all 3 attempts — no real regression is masked — while a one-off flake recovers.

Configured in conftest rather than the CI YAML so it also applies to local e2e runs (and needs no workflow-file change).

Verification

  • pytest-rerunfailures imports; the dynamically-added flaky marker carries {reruns: 2, reruns_delay: 5}.
  • pytest e2e/ --collect-only collects 115 tests with the hook in place (the 2 unrelated collection errors are a missing local-only mcp_test_server dep that CI installs separately).
  • Scope check: the hook only marks items with the e2e marker, so unit suites are unaffected.

Note

This is the real fix for "python-e2e flakes on a different test each run", complementing #277 (which fixes a genuine, deterministic guardrail bug). Recommend merging #277 first, then this.

The python-e2e suite drives a real server + real LLM, so individual
tests flake nondeterministically on transient conditions — workflow
still RUNNING at the client timeout, tool-call batches not returning,
LLM phrasing variance. A single transient failure currently fails the
whole job; observed runs failed a *different* unrelated test each time
(test_after_tool_callback_executes, test_http_lifecycle@credentials, ...).

Add pytest-rerunfailures and mark every e2e item flaky(reruns=2,
reruns_delay=5) via the e2e conftest. A genuinely broken test still
fails all 3 attempts; a one-off flake recovers. Configured in conftest
(not the CI yaml) so it also covers local e2e runs and needs no
workflow-file change.

dev extra + uv.lock updated (pytest-rerunfailures 16.3).
@v1r3n v1r3n merged commit 5c5a709 into main Jun 20, 2026
12 checks passed
@v1r3n v1r3n deleted the fix/python-e2e-rerun-flaky branch June 20, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant