Extend existing failure notifications for tracking flaky CI issues by G-D-Petrov · Pull Request #2974 · man-group/ArcticDB

G-D-Petrov · 2026-03-17T09:01:41Z

Reference Issues/PRs

What does this implement or fix?

Extends the existing failure_notification.yaml workflow to automatically track intermittent CI failures as GitHub issues and enrich Slack notifications with known/new status.

What changed

The workflow now has two jobs:

track-failures (runs on all branches) — when any monitored workflow (Build and Test, Build with conda, Build with analysis tools, Coverity Static Analysis, Installation Tests Execution) fails:
- Parses failed test names from run logs (GoogleTest [ FAILED ] and pytest FAILED patterns)
- Identifies failed infrastructure steps via the GitHub API (e.g. Install MongoDB timing out due to network issues)
- For each failure, searches for an existing open issue — if found, adds a comment with the run link; otherwise creates a new issue
- Outputs a summary classifying each failure as Known or New
notify-slack (runs on master only, same condition as before) — sends a Slack message enriched with the failure summary, e.g.:

🔥 Build with conda failure on master
⚠️ Known — StorageLockWithAndWithoutRetry.StressManyWriters (Increase Python Pickle protocol to 4 #42)
🚨 New — 3.11 Linux / integration-NoCache / Install MongoDB (issue)

Labels

Label	Colour	Used for
`flaky-test`	yellow	Test failures (e.g. `Flaky test: StorageLockWithAndWithoutRetry.StressManyWriters`)
`flaky-step`	pink	Infrastructure/setup step failures (e.g. `Flaky step: 3.11 Linux / integration-NoCache / Install MongoDB`)
`ci-failure`	red	Failures where no specific test or step could be identified

Labels are auto-created on first run if they don't already exist.

Any other comments?

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

github-actions · 2026-03-17T09:01:53Z

Label error. Requires exactly 1 of: patch, minor, major. Found:

claude · 2026-03-17T09:03:13Z

.github/workflows/flaky_CI_issue.yml

+    types: [completed]
+
+jobs:
+  track-failures:


The workflow triggers on any branch failure (workflow_run fires regardless of the branch the triggering run ran on). The PR description says "on the master branch" but the if: condition only checks conclusion == 'failure' — it does not filter on branch. This means flaky-test issues will be auto-created for failures on feature branches, release branches, forks, etc., generating noise.

Consider adding a branch filter:

Suggested change

track-failures:

if: github.event.workflow_run.conclusion == 'failure' && github.event.workflow_run.head_branch == 'master'

claude · 2026-03-17T09:03:21Z

.github/workflows/flaky_CI_issue.yml

+          for job_id in $(echo "$failed_jobs_json" | jq -r '.[].id'); do
+            job_name=$(echo "$failed_jobs_json" | jq -r ".[] | select(.id == $job_id) | .name")
+            step_names=$(gh api "repos/$REPO/actions/jobs/$job_id" \
+              --jq '.steps[] | select(.conclusion == "failure") | .name')


The log download silently succeeds even when it fails (|| true). If gh run view --log-failed fails (e.g. rate-limit, token permission issue), /tmp/failed_logs.txt may be empty or contain an error message, causing the grep patterns to produce zero results. The workflow would then fall through to the "unparseable failures" path and create a generic issue for every CI failure — this is the highest-noise fallback and should be avoided.

Consider logging a warning when the log download fails so at least the issue body contains diagnostic context.

claude · 2026-03-17T09:03:31Z

.github/workflows/flaky_CI_issue.yml

+          echo "$FAILING_TESTS" | while IFS= read -r test_name; do
+            [ -z "$test_name" ] && continue
+
+            echo "Processing test: $test_name"


Injection risk via ${{ steps.parse.outputs.tests }}.

FAILING_TESTS is populated from parsed log output (test names) and then passed through ${{ steps.parse.outputs.tests }}. GitHub Actions interpolates ${{ ... }} expressions directly into the YAML before the runner executes the step. If a crafted test name contains shell metacharacters or YAML-breaking sequences, this could alter step behaviour.

The correct mitigation is to reference the output only via the environment variable ($FAILING_TESTS), which is already being done inside the run: script — the environment variable assignment itself is the risk. Prefer writing step outputs to a file (e.g. /tmp/failing_tests.txt) and reading that file in the subsequent step, rather than passing through ${{ steps.parse.outputs.tests }} as an env value.

The same pattern applies to FAILED_STEPS at line 197.

claude · 2026-03-17T09:03:38Z

.github/workflows/flaky_CI_issue.yml

+              body="### Another failure observed"
+              body="$body"$'\n\n'"- **Run:** $RUN_URL"
+              body="$body"$'\n'"- **Commit:** \`${COMMIT_SHA:0:10}\`"
+              body="$body"$'\n'"- **Date:** $run_date"


The GitHub Search API (used by gh issue list --search) can return stale results with a lag of a few minutes. If two CI runs fail in quick succession for the same test, the deduplication search may find no existing issue for both runs, causing two issues to be created with the same title.

A safer deduplication approach would be to search by exact title match using gh issue list --search "is:issue is:open label:flaky-test \"$issue_title\"" and then verify the title server-side with --jq, which is what the code already does — but the underlying search index latency is unavoidable. This is worth noting in the workflow comments so maintainers understand the duplicates-on-simultaneous-failures edge case.

claude · 2026-03-17T09:03:49Z

.github/workflows/flaky_CI_issue.yml

+          RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.event.workflow_run.id }}
+          REPO: ${{ github.repository }}
+          COMMIT_SHA: ${{ github.event.workflow_run.head_sha }}
+        run: |


The "unparseable failures" step creates a new issue per run with a unique title (CI failure (unparseable): run #$RUN_ID). If the log download frequently fails or the grep patterns never match (e.g. due to a log format change), this will create an unbounded number of issues — one for every CI failure.

Unlike the flaky-test and flaky-step steps, there is no deduplication here. Consider either deduplicating by searching for a generic title (e.g. CI failure (unparseable)) or enforcing a rate-limit / daily cap, or simply skipping issue creation and only writing to $GITHUB_STEP_SUMMARY for this fallback case.

claude · 2026-03-17T09:03:55Z

.github/workflows/flaky_CI_issue.yml

+              body="$body"$'\n\n'"---"
+              body="$body"$'\n'"*This issue was automatically created by the flaky test tracker workflow.*"
+              body="$body"$'\n'"*Add further failure occurrences as comments below.*"
+              gh issue create --repo "$REPO" \


The issue body hard-codes "on the master branch" but the workflow is not restricted to master (see the branch-filter comment on line 9). This text will be misleading when issues are filed from non-master runs.

Suggested change

gh issue create --repo "$REPO" \

body="$body"$'\n\n'"This test has been detected as failing in CI."

claude · 2026-03-17T09:04:02Z

.github/workflows/flaky_CI_issue.yml

+          # Get failed jobs with their names and IDs
+          failed_jobs_json=$(gh api "repos/$REPO/actions/runs/$RUN_ID/jobs?filter=latest&per_page=100" \
+            --jq '[.jobs[] | select(.conclusion == "failure") | {id, name}]')
+


The API call fetches up to 100 jobs (per_page=100) but does not handle pagination. If a workflow run has more than 100 jobs (unlikely but possible as the matrix grows), failures in jobs beyond position 100 will be silently missed. Consider adding --paginate if the gh api CLI supports it here, or at least document this limitation.

claude · 2026-03-17T09:04:38Z