Skip to content

SP-0000 Add flakiness detection and visual timeline#25

Closed
soemo wants to merge 3 commits intomainfrom
SP-0000-detect-flakiness
Closed

SP-0000 Add flakiness detection and visual timeline#25
soemo wants to merge 3 commits intomainfrom
SP-0000-detect-flakiness

Conversation

@soemo
Copy link
Contributor

@soemo soemo commented Mar 2, 2026

Type of Change

  • Enhancement / new feature

Description

Add flakiness detection (fail-after-pass) and visual timeline

The script previously only collected failing test annotations and ranked them by occurrence count. It had no way to distinguish between consistently broken tests (always red) and truly flaky tests (alternating green/red). Both were reported the same way.

This PR extends the script to produce two separate ranked lists and adds a visual timeline for each flaky test.

💡 Let's discuss it

How to test

Example output (terminal)

Most failing tests (top 3, limit=12):
  8x (100%) e2e/checkout.spec.ts
  5x (50%) src/auth/login.spec.ts
  2x (20%) src/dashboard/widget.spec.ts

Most flaky tests (top 3, limit=12):
  30% flaky (3 flips / 10 runs) src/auth/login.spec.ts
    ▅▅▅▅▅▅▅▅▅▅          ← red/green alternating (ANSI colored)
  10% flaky (1 flips / 10 runs) src/dashboard/widget.spec.ts
    ▅▅▅▅▅▅▅▅▅▅          ← mostly green, one red blip
  0% flaky (0 flips / 8 runs) e2e/checkout.spec.ts
    ▅▅▅▅▅▅▅▅            ← solid red — broken, not flaky

Example output (Slack)

30% flaky (3 flips / 10 runs) `src/auth/login.spec.ts`
    🔴🟢🔴🔴🟢🔴🟢🟢🔴🟢

When many runs (e.g. 100+ over a week), the timeline auto-buckets:

25% flaky (12 flips / 100 runs) `src/auth/login.spec.ts`
    🟢🟡🔴🟢🟢🟡🔴🟢🟡🟢🔴🟢🟢🟡🔴🟢🟡🟢🟢🔴🟢🟡🔴🟢🟡🟢🔴🟢🟡🟢 _(100 runs)_
🔄 Top 6 flaky tests — green → red flips (limit=10)
19% flaky (15 flips / 80 runs) src/modules/search/search-results.spec.ts 
    🟡🟡🔴🟢🟡🟡🟡🔴🟢🟡🟡🟡🟢🟡🟡🟡🔴🟡🟢🟡🟡🟡🟡🟡🟡🟢🟡🟡🟡🟢 (80 runs)
        
14% flaky (11 flips / 80 runs) src/modules/auth/login.spec.ts 
    🟢🟡🟢🔴🟢🟢🔴🟢🟢🟢🟢🟡🟢🟢🟡🟢🟡🟢🟡🟢🟡🟢🟢🟢🟢🟡🟢🟢🟡🟡 (80 runs)

10% flaky (8 flips / 80 runs) src/components/navigation/sidebar.spec.ts 
    🟢🟢🟢🟡🟢🟢🟢🟡🟡🟢🟡🟢🟡🟡🟢🟢🟢🟢🟢🟡🟢🟡🟢🟢🟢🟢🟢🟢🟢🟢 (80 runs)

6% flaky (5 flips / 80 runs) src/components/dashboard/widget.spec.tsx 
    🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟡🟢🟢🟢🟡🟢🟢🟢🟢🟢🟡🟢🟢🟢🟢🟢🟢🟢 (80 runs)

4% flaky (3 flips / 80 runs) src/modules/settings/profile.spec.tsx 
    🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢 (80 runs)

0% flaky (0 flips / 80 runs) e2e/flows/checkout.spec.ts.     
    🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴 (80 runs)
Legend: 🟢 = mostly pass · 🔴 = mostly fail · 🟡 = mixed (oldest → newest)
 
The key things visible in the screenshot:
Top list: checkout.spec.ts is #1 with 80 failures — always broken
Bottom list: search-results.spec.ts is #1 flaky with 15 flips — while checkout.spec.ts is dead last (0 flips, solid 🔴) because it never passes, so it's broken, not flaky
The timeline stripes make it immediately visual which tests are flickering vs consistently red

@soemo soemo added the major Pull requests with breakable changes label Mar 2, 2026
@soemo soemo marked this pull request as ready for review March 2, 2026 20:54
@soemo soemo requested a review from a team as a code owner March 2, 2026 20:54
@soemo soemo requested review from a team, 0x46616c6b, Stummi, adriansinger87, mirellat and timkante and removed request for a team March 2, 2026 20:54
@soemo soemo marked this pull request as draft March 3, 2026 05:05
@soemo soemo closed this Mar 10, 2026
@soemo soemo deleted the SP-0000-detect-flakiness branch March 10, 2026 14:47
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

major Pull requests with breakable changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant