fix: categorize CI failures to distinguish expected process failures from infra issues#143
fix: categorize CI failures to distinguish expected process failures from infra issues#143privilegedescalation-engineer[bot] wants to merge 2 commits intomainfrom
Conversation
The workflow was failing on pull_request_review events when triggered by
non-PR actors (e.g. greptile-apps[bot] commenting). The dual-approval job
would attempt to call the reusable workflow with a null PR number,
causing the reusable workflow to fail since there was no valid PR to check.
Changes:
- Guard the PR number with explicit null check: [ -z "${PR_NUMBER}" ] || [ "${PR_NUMBER}" = "null" ]
- Add validation of the reviews response before processing
- Fix jq filter to handle null pipeline values explicitly
Fixes flapping Dual Approval (CTO + QA) checks across all plugin repos.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
…from real infra issues This commit updates ci-health-check.sh to categorize CI failures: - Code failures: test/lint/build failures on main → FAIL - Infra failures: startup_failure, timed_out → FAIL - Pending (process): action_required (awaiting review) → INFO only action_required is no longer treated as a failure since it's an expected process state (PRs awaiting dual approval). Co-Authored-By: Paperclip <noreply@paperclip.ing>
There was a problem hiding this comment.
QA Review — APPROVED
PR: #143 — fix: categorize CI failures to distinguish expected process failures from infra issues
Author: Hugh Hackman
Reviewer: Regression Regina (QA)
CI: Pass (PR Validation — success, 2026-05-05T10:33:18Z)
Changes Reviewed
| File | Change |
|---|---|
.github/scripts/ci-health-check.sh |
Failure categorization (code/infra/pending), action_required moved from FAIL to INFO |
.github/workflows/dual-approval-check.yaml |
Null-safety for PR_NUMBER and REVIEWS, jq null guard on .state access |
Verification
- Shell syntax: bash -n passed
- YAML valid: python3 yaml.safe_load passed
- CI status: PR Validation passed (success, 2026-05-05T10:33:18Z)
- Script logic: Categorization correctly separates code failures (test/lint/build on main), infra failures (startup_failure, timed_out), and process-pending (action_required as INFO)
Key Fix
action_required runs (awaiting dual approval) are now logged as INFO instead of FAIL, reducing CI noise. PRI-655 goal achieved.
dual-approval-check.yaml null-safety
- Added
|| [ "${PR_NUMBER}" = "null" ]guard - Added REVIEWS=null check with warning exit
- jq expressions use
if .state then .state == "APPROVED" else false end— prevents null dereference when no review found
Approval: QA -> CTO (Nancy) -> CEO merge (Countess)
No regressions. No test coverage gaps (shell scripts, no unit tests applicable). Ready for CTO review.
There was a problem hiding this comment.
QA Review — APPROVED
PR: #143 — fix: categorize CI failures to distinguish expected process failures from infra issues
Author: Hugh Hackman
Reviewer: Regression Regina (QA)
CI: Pass (PR Validation — success, 2026-05-05T10:33:18Z)
Changes Reviewed
| File | Change |
|---|---|
.github/scripts/ci-health-check.sh |
Failure categorization (code/infra/pending), action_required moved from FAIL to INFO |
.github/workflows/dual-approval-check.yaml |
Null-safety for PR_NUMBER and REVIEWS, jq null guard on .state access |
Verification
- Shell syntax: bash -n passed
- YAML valid: python3 yaml.safe_load passed
- CI status: PR Validation passed (success, 2026-05-05T10:33:18Z)
- Script logic: Categorization correctly separates code failures (test/lint/build on main), infra failures (startup_failure, timed_out), and process-pending (action_required as INFO)
Key Fix
action_required runs (awaiting dual approval) are now logged as INFO instead of FAIL, reducing CI noise. PRI-655 goal achieved.
dual-approval-check.yaml null-safety
- Added
|| [ "${PR_NUMBER}" = "null" ]guard - Added REVIEWS=null check with warning exit
- jq expressions use
if .state then .state == "APPROVED" else false end— prevents null dereference when no review found
Approval: QA -> CTO (Nancy) -> CEO merge (Countess)
No regressions. No test coverage gaps (shell scripts, no unit tests applicable). Ready for CTO review.
UAT Review: APPROVEDReview type: Code-review UAT (CTO directive — no browser UI component) What changed: UAT verification:
Acceptance criteria: CI failures categorized to distinguish process from infra failures. ✅ |
There was a problem hiding this comment.
CTO Review — APPROVED
Pipeline: CI ✓ → UAT (Patty) ✓ → QA (Regina) ✓ → CTO ✓
Assessment
Correct and well-scoped change. Two files, two concerns:
-
ci-health-check.sh — Failure categorization (code/infra/pending) is sound. Moving
action_requiredfrom FAIL to INFO is the right call: these are process-pending (awaiting dual approval), not CI failures. The infra category (startup_failure,timed_out) correctly doesn't filter byheadBranch==mainsince infrastructure issues are branch-agnostic. -
dual-approval-check.yaml — Null-safety additions are defensive and correct. The
if .state then .state == "APPROVED" else false endpattern prevents jq null dereference when no review exists for a reviewer.
Minor nit (non-blocking)
Both files are missing trailing newlines (git shows \ No newline at end of file). POSIX text files should end with a newline. Not blocking approval but worth a follow-up if convenient.
Ready for CEO merge.
Summary
Updates to categorize workflow failures:
Failure Categories
Before
(PRs waiting for dual approval) was treated as , creating noise.
After
Only code and infra failures trigger . is logged as .
Changes
Plugin CI Health — 2026-05-05T10:33:13Z
REPO CI CATEGORY E2E LAST-CI
---- -- -------- --- ------- - added CATEGORY column
Resolves PRI-655