test(p9): integration + chaos battery + conformance + CI by broomva · Pull Request #1 · broomva/p9

broomva · 2026-05-05T03:10:47Z

Summary

Closes the v1 test pyramid for P9 with three additions:

`tests/test_p9_integration.py` (15 tests) — full-lifecycle scenarios: happy path through PUSHED→WATCHING→GREEN→MERGE_READY→MERGED, self-heal classifier flow with stall detection via the evaluator, multi-PR concurrency at `max=1` and `max=2`, wait-queue end-to-end drain, events tail filtering, doctor degraded states, and subprocess-level mocks asserting `gh pr checks --watch` is actually the spawned command.
`tests/test_p9_chaos.py` (6 tests) — fault-injection battery: JSONL partial-line truncation recovery, mid-file corruption invariant violation, 5×10 concurrent state writers (flock correctness), policy missing-block fail-closed with no side effects, `heal.lock` timeout under contention, 5×10 concurrent wait-queue pushes with no loss.
`p9 conformance` subcommand + `.github/workflows/test.yml` — runs the full battery on push/PR across Python 3.11/3.12/3.13, plus CLI smoke checks.

Test totals

Unit: 46 (PR 2)
Integration: 15 (this PR)
Chaos: 6 (this PR)
Total: 67 passing locally in 3.7s

Spec coverage

This PR satisfies §8 of the design doc (`docs/superpowers/specs/2026-05-04-p9-ci-watcher-design.md`):

§8.1 unit tests — already shipped in PR 2
§8.2 integration tests — happy path, single-heal, two-heal-then-escalate (via evaluator stall), unclassified-immediate-escalate, concurrent watchers ✅
§8.3 chaos battery — partial write, mid-file corruption (invariant violation), concurrent writers (flock), policy fail-closed, lock-timeout, queue durability ✅
§8.4 conformance subcommand ✅
§8.5 explicit non-coverage: real `gh pr checks --watch` polling loop (GitHub's responsibility) and LLM behavior under the skill — out of scope per spec

Notes

One workaround was needed for Python 3.14 multiprocessing's `spawn` start method requiring picklable callables — chaos test helpers were lifted to module level.
`vcrpy` was considered for `gh` interaction recording but rejected: `gh` is a subprocess that makes its own HTTP calls, so direct `subprocess` monkeypatching is more honest. README still mentions vcrpy in the original spec for posterity; future enhancements may revisit.

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Added fault-injection tests to verify behavior under concurrent operations and data corruption scenarios.
- Added comprehensive integration tests for end-to-end CLI workflows and state transitions.
Chores
- Configured automated testing in CI/CD to validate across multiple Python versions.
- Added conformance command to execute the full test suite.

Three additions complete the test pyramid for P9: 1. tests/test_p9_integration.py (15 tests): full-lifecycle scenarios — happy path through PUSHED→WATCHING→GREEN→MERGE_READY→MERGED, self-heal classifier flow with stall detection, multi-PR concurrency at max=1 and max=2, wait-queue end-to-end drain, events tail, doctor degraded states, and subprocess-level mocks asserting `gh pr checks --watch` is actually the spawned command. 2. tests/test_p9_chaos.py (6 tests): fault-injection battery — JSONL partial-line truncation recovery, mid-file corruption invariant violation, 5×10 concurrent state writers (flock correctness), policy missing-block fail-closed (no side effects), heal.lock timeout under contention, 5×10 concurrent wait-queue pushes with no loss. 3. scripts/p9.py: new `p9 conformance` subcommand that runs the full pytest battery (unit + integration + chaos) — used as the CI-lane validator and as a local pre-merge check. Honors BROOMVA_P9_PYTEST env var for interpreter pinning. 4. .github/workflows/test.yml: matrix CI across Python 3.11/3.12/3.13 running `p9 conformance` plus CLI smoke tests on push and PR. Test totals: 46 unit + 15 integration + 6 chaos = 67 passing in 3.7s. Closes the v1 test pyramid for P9. Spec coverage: docs/superpowers/specs/2026-05-04-p9-ci-watcher-design.md §8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-05T03:10:57Z

📝 Walkthrough

Walkthrough

The PR establishes comprehensive testing infrastructure for the p9 system by adding GitHub Actions CI workflow, a CLI conformance subcommand to invoke pytest, and two extensive test suites covering chaos fault-injection scenarios and end-to-end integration flows.

Changes

Testing Infrastructure & CI

Layer / File(s)	Summary
CLI Enhancement `scripts/p9.py`	Adds `cmd_conformance` handler and `conformance` subcommand to `build_parser()`, invoking pytest against `tests/` directory with optional `-v` and `-k` passthrough flags. Returns `EXIT_OK` or `EXIT_DEGRADED` based on test outcomes.
Chaos Test Suite `tests/test_p9_chaos.py`	Implements six fault-injection and concurrency stress tests: JSONL truncation recovery, mid-file corruption detection, concurrent writer durability, policy validation, heal-lock contention, and concurrent queue reliability. Uses multiprocessing and direct state manipulation.
Integration Test Suite `tests/test_p9_integration.py`	Adds end-to-end lifecycle tests organized in seven test classes covering happy-path state transitions, heal classification, concurrency ceilings, wait-queue lifecycle, event tailing, doctor validation, and subprocess mocking (watcher spawning, gh CLI absence).
CI Workflow `.github/workflows/test.yml`	Configures GitHub Actions to run on `push`/`pull_request` to `main` and `workflow_dispatch`. Executes `pytest` matrix across Python 3.11–3.13 via the new conformance subcommand, plus CLI smoke tests for `--help`, `doctor`, `status`, and `wait-queue list`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

Hop-hop, the tests now run with pride, ✨🐰
GitHub Actions keeps us verified!
Chaos comes, but p9 holds tight,
Locks and queues—concurrent and right.
From PUSHED to MERGED, our workflow's might!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.84% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'test(p9): integration + chaos battery + conformance + CI' directly summarizes the main changes—adding integration tests, chaos fault-injection tests, a conformance subcommand, and CI workflow, all as described in the raw summary.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/test-battery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

scripts/p9.py (1)
871-877: 💤 Low value

Use shlex.split for runner parsing.

runner.split() breaks on any path or runner spec containing whitespace or quoted arguments — e.g., sys.executable resolving to /path with space/python (common in macOS framework builds and some venvs) or a user setting BROOMVA_P9_PYTEST="uv run pytest --tb=short". shlex.split is the standard, robust replacement.
♻️ Proposed fix
+import shlex
@@
-    runner = os.environ.get("BROOMVA_P9_PYTEST", f"{sys.executable} -m pytest")
+    runner = os.environ.get("BROOMVA_P9_PYTEST", f"{sys.executable} -m pytest")
     tests_dir = Path(__file__).resolve().parent.parent / "tests"
     if not tests_dir.exists():
         print(f"p9 conformance: tests directory not found at {tests_dir}",
               file=sys.stderr)
         return EXIT_DEGRADED
-    cmd = runner.split() + [str(tests_dir)]
+    cmd = [*shlex.split(runner), str(tests_dir)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/p9.py` around lines 871 - 877, The code uses runner.split() to build
the pytest command which breaks on paths/arguments containing whitespace;
replace runner.split() with shlex.split(runner) (import shlex at top if missing)
when constructing cmd so that the runner string (from the runner variable /
BROOMVA_P9_PYTEST env) is correctly tokenized before appending the tests_dir
path in the cmd assignment.
.github/workflows/test.yml (2)
1-13: 💤 Low value

Pin permissions: to least privilege.

This workflow only checks out code, installs deps, and runs pytest — it doesn't write to the repo, comment on PRs, or publish artifacts. Without an explicit permissions: block, GITHUB_TOKEN falls back to repo defaults, which can be contents: write. Pinning to read-only is a one-line hardening that closes a supply-chain blast radius if any installed test dep is ever compromised.
🔒 Proposed fix
 on:
   push:
     branches: [main]
   pull_request:
     branches: [main]
   workflow_dispatch:
 
+permissions:
+  contents: read
+
 concurrency:
   group: ${{ github.workflow }}-${{ github.ref }}
   cancel-in-progress: true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml around lines 1 - 13, Add a minimal permissions
block to the workflow to pin GITHUB_TOKEN to least privilege: include a
top-level permissions: mapping (e.g., permissions: contents: read) in the "name:
tests" workflow so the actions that only checkout code and run pytest cannot use
write permissions; update the file that defines the workflow (the workflow with
"name: tests" and "concurrency") to add this permissions block at top-level.
14-22: 💤 Low value

Add a timeout-minutes to bound chaos-test hangs.

Chaos tests (heal.lock contention, multiprocessing writers) are the most likely to deadlock or wedge a runner. Default GitHub-hosted timeout is 360 minutes — a tight bound (e.g., 10 min) means a wedge cancels quickly instead of burning a full hour of runner time per matrix entry on three Python versions.
⏱️ Proposed fix
 jobs:
   pytest:
     name: pytest (${{ matrix.python-version }})
     runs-on: ubuntu-latest
+    timeout-minutes: 10
     strategy:
       fail-fast: false
       matrix:
         python-version: ["3.11", "3.12", "3.13"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml around lines 14 - 22, Add a job-level timeout to
bound long-running chaos tests by setting the GitHub Actions job timeout for the
pytest job (named "pytest (${{ matrix.python-version }})")—insert a
timeout-minutes field (e.g., 10) under the pytest job definition so each matrix
entry cannot run past that limit, preventing deadlocked tests from consuming the
default 360-minute runner time.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/test.yml:
- Around line 1-13: Add a minimal permissions block to the workflow to pin
GITHUB_TOKEN to least privilege: include a top-level permissions: mapping (e.g.,
permissions: contents: read) in the "name: tests" workflow so the actions that
only checkout code and run pytest cannot use write permissions; update the file
that defines the workflow (the workflow with "name: tests" and "concurrency") to
add this permissions block at top-level.
- Around line 14-22: Add a job-level timeout to bound long-running chaos tests
by setting the GitHub Actions job timeout for the pytest job (named "pytest (${{
matrix.python-version }})")—insert a timeout-minutes field (e.g., 10) under the
pytest job definition so each matrix entry cannot run past that limit,
preventing deadlocked tests from consuming the default 360-minute runner time.

In `@scripts/p9.py`:
- Around line 871-877: The code uses runner.split() to build the pytest command
which breaks on paths/arguments containing whitespace; replace runner.split()
with shlex.split(runner) (import shlex at top if missing) when constructing cmd
so that the runner string (from the runner variable / BROOMVA_P9_PYTEST env) is
correctly tokenized before appending the tests_dir path in the cmd assignment.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a7f5a208-f168-46c8-910a-e0a9e9826cda

📥 Commits

Reviewing files that changed from the base of the PR and between 34dfc82 and b9d851a.

📒 Files selected for processing (4)

.github/workflows/test.yml
scripts/p9.py
tests/test_p9_chaos.py
tests/test_p9_integration.py

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

broomva merged commit 03a380b into main May 5, 2026
4 checks passed

broomva deleted the feat/test-battery branch May 5, 2026 12:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(p9): integration + chaos battery + conformance + CI#1

test(p9): integration + chaos battery + conformance + CI#1
broomva merged 1 commit into
mainfrom
feat/test-battery

broomva commented May 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

broomva commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test totals

Spec coverage

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

broomva commented May 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading