test(e2e): auto-retry transient python-e2e flakes (pytest-rerunfailures) by v1r3n · Pull Request #278 · agentspan-ai/agentspan

v1r3n · 2026-06-20T19:20:47Z

Summary

The python-e2e CI job drives a real server + real LLM, so individual tests flake nondeterministically — and a single transient failure fails the whole job. While verifying an unrelated fix (#277), consecutive runs each failed a different, unrelated test:

Run	Failing test	Cause
1	`test_after_tool_callback_executes`	workflow still `RUNNING` at client timeout
2	`test_http_lifecycle@credentials`	"Run stalled at tool-calling stage"

These are transient infra/LLM-latency flakes, not regressions.

Fix

Add pytest-rerunfailures and mark every e2e item flaky(reruns=2, reruns_delay=5) via the e2e conftest.py (pytest_collection_modifyitems, scoped to items carrying the e2e marker). A genuinely broken test still fails all 3 attempts — no real regression is masked — while a one-off flake recovers.

Configured in conftest rather than the CI YAML so it also applies to local e2e runs (and needs no workflow-file change).

Verification

pytest-rerunfailures imports; the dynamically-added flaky marker carries {reruns: 2, reruns_delay: 5}.
pytest e2e/ --collect-only collects 115 tests with the hook in place (the 2 unrelated collection errors are a missing local-only mcp_test_server dep that CI installs separately).
Scope check: the hook only marks items with the e2e marker, so unit suites are unaffected.

Note

This is the real fix for "python-e2e flakes on a different test each run", complementing #277 (which fixes a genuine, deterministic guardrail bug). Recommend merging #277 first, then this.

The python-e2e suite drives a real server + real LLM, so individual tests flake nondeterministically on transient conditions — workflow still RUNNING at the client timeout, tool-call batches not returning, LLM phrasing variance. A single transient failure currently fails the whole job; observed runs failed a *different* unrelated test each time (test_after_tool_callback_executes, test_http_lifecycle@credentials, ...). Add pytest-rerunfailures and mark every e2e item flaky(reruns=2, reruns_delay=5) via the e2e conftest. A genuinely broken test still fails all 3 attempts; a one-off flake recovers. Configured in conftest (not the CI yaml) so it also covers local e2e runs and needs no workflow-file change. dev extra + uv.lock updated (pytest-rerunfailures 16.3).

v1r3n merged commit 5c5a709 into main Jun 20, 2026
12 checks passed

v1r3n deleted the fix/python-e2e-rerun-flaky branch June 20, 2026 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(e2e): auto-retry transient python-e2e flakes (pytest-rerunfailures)#278

test(e2e): auto-retry transient python-e2e flakes (pytest-rerunfailures)#278
v1r3n merged 1 commit into
mainfrom
fix/python-e2e-rerun-flaky

v1r3n commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

v1r3n commented Jun 20, 2026

Summary

Fix

Verification

Note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant