Skip to content

test(p9): integration + chaos battery + conformance + CI#1

Merged
broomva merged 1 commit into
mainfrom
feat/test-battery
May 5, 2026
Merged

test(p9): integration + chaos battery + conformance + CI#1
broomva merged 1 commit into
mainfrom
feat/test-battery

Conversation

@broomva
Copy link
Copy Markdown
Owner

@broomva broomva commented May 5, 2026

Summary

Closes the v1 test pyramid for P9 with three additions:

  1. `tests/test_p9_integration.py` (15 tests) — full-lifecycle scenarios: happy path through PUSHED→WATCHING→GREEN→MERGE_READY→MERGED, self-heal classifier flow with stall detection via the evaluator, multi-PR concurrency at `max=1` and `max=2`, wait-queue end-to-end drain, events tail filtering, doctor degraded states, and subprocess-level mocks asserting `gh pr checks --watch` is actually the spawned command.
  2. `tests/test_p9_chaos.py` (6 tests) — fault-injection battery: JSONL partial-line truncation recovery, mid-file corruption invariant violation, 5×10 concurrent state writers (flock correctness), policy missing-block fail-closed with no side effects, `heal.lock` timeout under contention, 5×10 concurrent wait-queue pushes with no loss.
  3. `p9 conformance` subcommand + `.github/workflows/test.yml` — runs the full battery on push/PR across Python 3.11/3.12/3.13, plus CLI smoke checks.

Test totals

  • Unit: 46 (PR 2)
  • Integration: 15 (this PR)
  • Chaos: 6 (this PR)
  • Total: 67 passing locally in 3.7s

Spec coverage

This PR satisfies §8 of the design doc (`docs/superpowers/specs/2026-05-04-p9-ci-watcher-design.md`):

  • §8.1 unit tests — already shipped in PR 2
  • §8.2 integration tests — happy path, single-heal, two-heal-then-escalate (via evaluator stall), unclassified-immediate-escalate, concurrent watchers ✅
  • §8.3 chaos battery — partial write, mid-file corruption (invariant violation), concurrent writers (flock), policy fail-closed, lock-timeout, queue durability ✅
  • §8.4 conformance subcommand ✅
  • §8.5 explicit non-coverage: real `gh pr checks --watch` polling loop (GitHub's responsibility) and LLM behavior under the skill — out of scope per spec

Notes

  • One workaround was needed for Python 3.14 multiprocessing's `spawn` start method requiring picklable callables — chaos test helpers were lifted to module level.
  • `vcrpy` was considered for `gh` interaction recording but rejected: `gh` is a subprocess that makes its own HTTP calls, so direct `subprocess` monkeypatching is more honest. README still mentions vcrpy in the original spec for posterity; future enhancements may revisit.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests

    • Added fault-injection tests to verify behavior under concurrent operations and data corruption scenarios.
    • Added comprehensive integration tests for end-to-end CLI workflows and state transitions.
  • Chores

    • Configured automated testing in CI/CD to validate across multiple Python versions.
    • Added conformance command to execute the full test suite.

Three additions complete the test pyramid for P9:

1. tests/test_p9_integration.py (15 tests): full-lifecycle scenarios —
   happy path through PUSHED→WATCHING→GREEN→MERGE_READY→MERGED, self-heal
   classifier flow with stall detection, multi-PR concurrency at max=1
   and max=2, wait-queue end-to-end drain, events tail, doctor degraded
   states, and subprocess-level mocks asserting `gh pr checks --watch` is
   actually the spawned command.

2. tests/test_p9_chaos.py (6 tests): fault-injection battery — JSONL
   partial-line truncation recovery, mid-file corruption invariant
   violation, 5×10 concurrent state writers (flock correctness), policy
   missing-block fail-closed (no side effects), heal.lock timeout under
   contention, 5×10 concurrent wait-queue pushes with no loss.

3. scripts/p9.py: new `p9 conformance` subcommand that runs the full
   pytest battery (unit + integration + chaos) — used as the CI-lane
   validator and as a local pre-merge check. Honors BROOMVA_P9_PYTEST env
   var for interpreter pinning.

4. .github/workflows/test.yml: matrix CI across Python 3.11/3.12/3.13
   running `p9 conformance` plus CLI smoke tests on push and PR.

Test totals: 46 unit + 15 integration + 6 chaos = 67 passing in 3.7s.

Closes the v1 test pyramid for P9. Spec coverage:
docs/superpowers/specs/2026-05-04-p9-ci-watcher-design.md §8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

The PR establishes comprehensive testing infrastructure for the p9 system by adding GitHub Actions CI workflow, a CLI conformance subcommand to invoke pytest, and two extensive test suites covering chaos fault-injection scenarios and end-to-end integration flows.

Changes

Testing Infrastructure & CI

Layer / File(s) Summary
CLI Enhancement
scripts/p9.py
Adds cmd_conformance handler and conformance subcommand to build_parser(), invoking pytest against tests/ directory with optional -v and -k passthrough flags. Returns EXIT_OK or EXIT_DEGRADED based on test outcomes.
Chaos Test Suite
tests/test_p9_chaos.py
Implements six fault-injection and concurrency stress tests: JSONL truncation recovery, mid-file corruption detection, concurrent writer durability, policy validation, heal-lock contention, and concurrent queue reliability. Uses multiprocessing and direct state manipulation.
Integration Test Suite
tests/test_p9_integration.py
Adds end-to-end lifecycle tests organized in seven test classes covering happy-path state transitions, heal classification, concurrency ceilings, wait-queue lifecycle, event tailing, doctor validation, and subprocess mocking (watcher spawning, gh CLI absence).
CI Workflow
.github/workflows/test.yml
Configures GitHub Actions to run on push/pull_request to main and workflow_dispatch. Executes pytest matrix across Python 3.11–3.13 via the new conformance subcommand, plus CLI smoke tests for --help, doctor, status, and wait-queue list.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

Hop-hop, the tests now run with pride, ✨🐰
GitHub Actions keeps us verified!
Chaos comes, but p9 holds tight,
Locks and queues—concurrent and right.
From PUSHED to MERGED, our workflow's might!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(p9): integration + chaos battery + conformance + CI' directly summarizes the main changes—adding integration tests, chaos fault-injection tests, a conformance subcommand, and CI workflow, all as described in the raw summary.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/test-battery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
scripts/p9.py (1)

871-877: 💤 Low value

Use shlex.split for runner parsing.

runner.split() breaks on any path or runner spec containing whitespace or quoted arguments — e.g., sys.executable resolving to /path with space/python (common in macOS framework builds and some venvs) or a user setting BROOMVA_P9_PYTEST="uv run pytest --tb=short". shlex.split is the standard, robust replacement.

♻️ Proposed fix
+import shlex
@@
-    runner = os.environ.get("BROOMVA_P9_PYTEST", f"{sys.executable} -m pytest")
+    runner = os.environ.get("BROOMVA_P9_PYTEST", f"{sys.executable} -m pytest")
     tests_dir = Path(__file__).resolve().parent.parent / "tests"
     if not tests_dir.exists():
         print(f"p9 conformance: tests directory not found at {tests_dir}",
               file=sys.stderr)
         return EXIT_DEGRADED
-    cmd = runner.split() + [str(tests_dir)]
+    cmd = [*shlex.split(runner), str(tests_dir)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/p9.py` around lines 871 - 877, The code uses runner.split() to build
the pytest command which breaks on paths/arguments containing whitespace;
replace runner.split() with shlex.split(runner) (import shlex at top if missing)
when constructing cmd so that the runner string (from the runner variable /
BROOMVA_P9_PYTEST env) is correctly tokenized before appending the tests_dir
path in the cmd assignment.
.github/workflows/test.yml (2)

1-13: 💤 Low value

Pin permissions: to least privilege.

This workflow only checks out code, installs deps, and runs pytest — it doesn't write to the repo, comment on PRs, or publish artifacts. Without an explicit permissions: block, GITHUB_TOKEN falls back to repo defaults, which can be contents: write. Pinning to read-only is a one-line hardening that closes a supply-chain blast radius if any installed test dep is ever compromised.

🔒 Proposed fix
 on:
   push:
     branches: [main]
   pull_request:
     branches: [main]
   workflow_dispatch:
 
+permissions:
+  contents: read
+
 concurrency:
   group: ${{ github.workflow }}-${{ github.ref }}
   cancel-in-progress: true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml around lines 1 - 13, Add a minimal permissions
block to the workflow to pin GITHUB_TOKEN to least privilege: include a
top-level permissions: mapping (e.g., permissions: contents: read) in the "name:
tests" workflow so the actions that only checkout code and run pytest cannot use
write permissions; update the file that defines the workflow (the workflow with
"name: tests" and "concurrency") to add this permissions block at top-level.

14-22: 💤 Low value

Add a timeout-minutes to bound chaos-test hangs.

Chaos tests (heal.lock contention, multiprocessing writers) are the most likely to deadlock or wedge a runner. Default GitHub-hosted timeout is 360 minutes — a tight bound (e.g., 10 min) means a wedge cancels quickly instead of burning a full hour of runner time per matrix entry on three Python versions.

⏱️ Proposed fix
 jobs:
   pytest:
     name: pytest (${{ matrix.python-version }})
     runs-on: ubuntu-latest
+    timeout-minutes: 10
     strategy:
       fail-fast: false
       matrix:
         python-version: ["3.11", "3.12", "3.13"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml around lines 14 - 22, Add a job-level timeout to
bound long-running chaos tests by setting the GitHub Actions job timeout for the
pytest job (named "pytest (${{ matrix.python-version }})")—insert a
timeout-minutes field (e.g., 10) under the pytest job definition so each matrix
entry cannot run past that limit, preventing deadlocked tests from consuming the
default 360-minute runner time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/test.yml:
- Around line 1-13: Add a minimal permissions block to the workflow to pin
GITHUB_TOKEN to least privilege: include a top-level permissions: mapping (e.g.,
permissions: contents: read) in the "name: tests" workflow so the actions that
only checkout code and run pytest cannot use write permissions; update the file
that defines the workflow (the workflow with "name: tests" and "concurrency") to
add this permissions block at top-level.
- Around line 14-22: Add a job-level timeout to bound long-running chaos tests
by setting the GitHub Actions job timeout for the pytest job (named "pytest (${{
matrix.python-version }})")—insert a timeout-minutes field (e.g., 10) under the
pytest job definition so each matrix entry cannot run past that limit,
preventing deadlocked tests from consuming the default 360-minute runner time.

In `@scripts/p9.py`:
- Around line 871-877: The code uses runner.split() to build the pytest command
which breaks on paths/arguments containing whitespace; replace runner.split()
with shlex.split(runner) (import shlex at top if missing) when constructing cmd
so that the runner string (from the runner variable / BROOMVA_P9_PYTEST env) is
correctly tokenized before appending the tests_dir path in the cmd assignment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a7f5a208-f168-46c8-910a-e0a9e9826cda

📥 Commits

Reviewing files that changed from the base of the PR and between 34dfc82 and b9d851a.

📒 Files selected for processing (4)
  • .github/workflows/test.yml
  • scripts/p9.py
  • tests/test_p9_chaos.py
  • tests/test_p9_integration.py

@broomva broomva merged commit 03a380b into main May 5, 2026
4 checks passed
@broomva broomva deleted the feat/test-battery branch May 5, 2026 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant