feat: v4.0 — pipeline simplification + per-task adversarial review by ducdmdev · Pull Request #19 · ducdmdev/agent-team-plugin

ducdmdev · 2026-05-10T11:24:47Z

Summary

Major refactor collapsing the 4-skill pipeline into 2 skills with a per-task adversarial review pipeline and 3-role universal taxonomy.

Skills: 4 → 2 (execute + audit). start and plan deleted.
Roles: 13 specialized templates → 3 universal (Executor / Reviewer / Challenger) + Lead, with archetype-specific extensions
Per-task pipeline: every task spawns ephemeral Executor → Reviewer → Challenger
Auto-chaining: execute auto-invokes superpowers:writing-plans if no plan, then auto-chains to audit on completion
Audit scope reduction: deep code review → integration-only review of files in 2+ tasks' impact_files
Workspace schema additions: reviews/ subdirectory, impact_files/review_status/challenge_status fields, in_review status
Hooks: unchanged (13 hooks, 16 scripts)

Spec: docs/specs/2026-05-09-agent-team-v4-refactor-design.md
Plan: docs/plans/2026-05-09-agent-team-v4-refactor.md

BREAKING

/agent-team:start removed → use /agent-team:execute
/agent-team:plan removed → use /superpowers:writing-plans then /agent-team:execute
13-role taxonomy collapsed to 3 universal roles
Per-task spawn count is ~3× v3.x (Executor + Reviewer + Challenger ephemeral per task)
Custom roles (docs/custom-roles.md) need porting — porting guide included

Test plan

Commits

26 commits on feat/v4-refactor. Highlights:

Spec design + audit findings incorporation
File moves + content updates (prior-context-loading, plan-mode-protocol, plan-proposal-example)
Phase 0 prepended to execute, Phase 4 rewritten to E/R/C pipeline
spawn-templates collapse (13 → 3) + impact-analysis directive moved to Implementation archetype only
teammate-roles + custom-roles + team-archetypes + workspace-templates + audit/SKILL.md + audit/agents/reviewer.md rewrites
README + CLAUDE.md + demo scripts rewritten for v4.0
Version bump to 4.0.0 + CHANGELOG entry
3 new integration test files (schema, lifecycle, skill coherence) — 146 new assertions

…al releases)

Collapse the four-skill pipeline into two skills (execute + audit) by absorbing decomposition, archetype detection, plan-mode marking, and the user approval gate into a new Phase 0 of execute. Auto-chain to superpowers:writing-plans when no plan file is available.

Adds the previously-uncaptured edits in execute's own supporting files: coordination-patterns.md (9 Phase 2 references), communication-protocol.md (Plan Stage Messages section + PLAN_REVIEW subsection), spawn-templates.md (plan-mode marking attribution). Adds shared-doc edits in workspace-templates (Plan Audit Result table removal, phase checklist), team-archetypes (10+ phase references), teammate-roles, and custom-roles. Establishes a phase numbering convention table mapping the dropped phases to Phase 0 sub-steps. Restructures the implementation order into 6 staged groups (30 steps).

…apse to v4 spec Bundles two major design additions into v4.0: 1. Per-task ephemeral Executor/Reviewer/Challenger pipeline replacing the current per-task light review. Each task now goes through 3 sequential stages with shutdown between each. Adds CHALLENGE_REVIEW message format, reviews/ workspace subdirectory, and impact_files / review_status / challenge_status fields on task-graph.json. 2. Role taxonomy collapse from 13 specialized templates to 3 universal roles (Executor / Reviewer / Challenger) with archetype-specific prompt extensions. Each archetype (Implementation/Research/Audit/Planning) maps the same role triple onto archetype-appropriate work. Audit deep code review reduced to integration-only since per-task Challenger covers within-task adversarial review. Implementation order expanded to 9 stages, 37 steps. Migration guide updated for custom-roles porting and v3.x workspace forward-compat. Risk catalog expanded for spawn cost, reviewer/challenger bottlenecks, and grep-based impact-analysis limits.

Audit ran against the actual codebase. Found and resolved 8 gaps and 2 new findings the spec had missed: - README scope was 5x larger than spec claimed (Teammate Roles table at lines 188-202 + Team Types table at 207-216 + 80-line Quickstart demo all need rewrite) - CLAUDE.md "Adding a New Teammate Role" common-task guide assumes the legacy 13-role taxonomy and needs full rewrite - plan-mode-protocol.md needs 4 sections updated (not just Ownership Boundary): Activation, Spawn Directive (ephemeral semantics), Revision Limits (per-task model), Workspace Tracking (Plan Proposals semantics) - prior-context-loading.md move is not pure — 3 inline content updates needed - custom-roles.md is a near-total rewrite (single-role template → 3 extension templates; 36-line example rewritten; new Porting subsection) - workspace-templates.md needs 2 more line updates (Stage enum, line 365) - demo scripts (NEW finding) hardcode Phase 1/Phase 2/Implementer references - validate-task-graph.sh schema flexibility (NEW finding) confirmed safe for the new task-graph.json fields; stale comment on line 46 is cosmetic Audit Findings table added to spec for traceability. Implementation Order expanded from 37 to 40 numbered steps.

Bite-sized 25-task plan executing the spec at docs/specs/2026-05-09-agent-team-v4-refactor-design.md. Organized into 9 stages: pre-flight, file moves, support file edits, role taxonomy collapse, Phase 0 + Phase 4 rewrites, audit scope reduction, workspace schema updates, dead-skill deletion, tests + meta + docs, and validate. Each task has multiple steps with concrete edits (Find/Replace patterns and full content blocks for major rewrites). Every task ends with a verification step and a commit. Self-review at the end of the plan confirms spec coverage, no placeholders, and consistent naming.

Moves prior-context-loading.md, plan-mode-protocol.md, and plan-proposal-example.md into the execute skill. Content updates follow in subsequent commits.

Updates Activation, Ownership Boundary, Spawn Directive, Revision Limits, and Workspace Tracking sections to reflect v4.0's Phase 0 sub-steps and ephemeral per-task spawn pattern.

Renames 9 Phase 2 + 1 Phase 1 references. Adds Per-Task Pipeline Coordination section describing the default Executor → Reviewer → Challenger lifecycle. Updates Adversarial Review Rounds to reflect that adversarial review is now default behavior.

…le diagram + add TOC entry Code review caught two issues: - Bounds bullet listed 6 agents (incl. re-Reviewer) but lifecycle diagram skipped the re-review step. Diagram now shows fix-Executor → re-Reviewer/re-Challenger → final verdict explicitly. - Per-Task Pipeline Coordination was missing from the Contents TOC. Added entry between Plan-Mode Coordination and Advanced Patterns.

…_REVIEW Renames the Plan Stage Messages section to Research Messages (FINDING and ANALYSIS are now reusable across archetypes during execute, not plan-stage-specific). Adds CHALLENGE_REVIEW message format for the new per-task Challenger role. Drops PLAN_REVIEW (no plan-reviewer agent in v4.0).

Code Review Messages H2 was already in the file but never listed in the Contents block. Adding the entry makes CHALLENGE_REVIEW (newly added in the prior commit) discoverable via TOC navigation.

Replaces specialized templates (Implementer, Migrator, Researcher, Tester, Auditor, Planner, Writer, Strategist, etc.) with Executor / Reviewer / Challenger templates parameterized by archetype directives (Implementation, Research, Audit, Planning). Reviewer template includes the impact-analysis directive (grep call sites, glob importers, populate impact_files in task-graph.json). Challenger template includes the rules-compliance + missed-issues directive.

…e only Code review caught universal Reviewer template baking in impact-analysis behavior that only makes sense for Implementation tasks. Moved the directive (and the task-graph.json write permission) into the Implementation Reviewer archetype directive block. Also: removed unused {RESULTS_FORMAT} and {REPORT_FORMAT} placeholders from the header note (never substituted anywhere in the file). Made Challenger 'Impact files' field conditional. Softened universal Executor 'owned area' rule to acknowledge archetype overrides.

Replaces 13-role catalog with Lead + 3 universal roles (Executor / Reviewer / Challenger) parameterized by archetype. Legacy role names preserved as colloquial archetype specializations in a lookup table. Notes hook agnosticism — no hook hardcodes role names, so the collapse has zero hook impact.

Code review caught 2 unlinked references in Selection Guide (team-archetypes guide, docs/custom-roles.md) — both now markdown links. Added pointer to error-recovery-protocol.md for the canonical recovery class definitions.

Replaces single-role custom template with three archetype-specific extension templates (Custom Executor / Reviewer / Challenger). Each extension layers directives on top of the standard prompt rather than defining a standalone role. Adds Database Migration Specialist example demonstrating the new pattern. Adds Porting v3.x custom roles section with three concrete migration examples (Implementer, Reviewer, Researcher patterns).

Replaces 10+ Phase 1 / Phase 2 references with Phase 0 sub-step references. Removes /agent-team:plan from the commands list. Adds Reviewer focus and Challenger focus subsections per archetype to guide what the per-task pipeline checks for each task type.

Phase 0 absorbs the work formerly done by skills/plan/ (resolution, prior-context loading, archetype detection, decomposition, plan-mode marking, user approval gate). Auto-chains to superpowers:writing-plans if no plan is found. Updates frontmatter description and trigger phrases to reflect the new entry-point role. Updates Preconditions to remove the require- approved-plan-in-workspace constraint.

Every task now spawns Executor → Reviewer → Challenger ephemerally. Reviewer performs grep-based impact analysis and populates impact_files in task-graph.json. Challenger deep-dives looking for missed issues. Each role shuts down before the next spawns. Adds reviews/ workspace subdirectory for per-task review artifacts. Updates task-graph.json schema usage with review_status and challenge_status fields. Replaces 'When chained via /agent-team:start' end-of-stage messaging with auto-chain to ../audit/SKILL.md. Also: cleans up 2 stale Phase 2 references (Worktree Isolation, Anti- Patterns) and updates the Overview paragraph for v4.0 framing.

Per-task Challengers now cover within-task adversarial review during execute Phase 4. Audit's Step 4 reduces to cross-task integration: read challenge docs for context, focus end-to-end review on impact_files overlapping multiple tasks, verify integrated build/test. Removes /agent-team:start references and legacy/manual workspace backward-compat framing.

Updates the audit-stage reviewer agent prompt: read per-task challenge docs for context, focus end-to-end review on impact_files overlapping multiple tasks, surface coverage gaps the per-task pipeline missed.

… challenge_status fields Documents the new reviews/ subdirectory created during Phase 4 by Reviewer and Challenger. Adds three new task-graph.json fields: impact_files (Reviewer's grep-based impact area), review_status, challenge_status. Confirms validate-task-graph.sh tolerates the extension (only requires subject/status/depends_on). Updates phase checklist and Stage enum for v4.0 vocabulary. Removes Plan Audit Result template (the 7-check audit is dropped).

Reviewer caught: line 74 still listed (plan, execute, or audit) for the Stage field, contradicting the template enum {execute|audit} 50 lines above. Updated to (`execute` or `audit`). Also tightened Pipeline status values (approved → Phase 0.6, executed → Phase 4).

… comment Tasks 14, 15, 16, 20 bundled: - Delete skills/start/ and skills/plan/ entirely (Phase 0 of execute now absorbs all their work). - Update tests/structure/test-doc-references.sh: drop start/plan existence checks, add execute/examples/ check, replace Elegance Reviewer assertion with 3-role taxonomy check (Executor, Reviewer, Challenger), update message-types list (drop PLAN_REVIEW, add CHALLENGE_REVIEW + CODE_REVIEW), update spawn templates check to verify 3 universal roles, update plan-stage assertions to execute-stage. - Update See Also list in docs/team-archetypes.md to drop /agent-team:start (was missed in Task 8). - Update line 46 comment in scripts/validate-task-graph.sh to reflect Phase 0 (cosmetic). Test suite: 16 files, all passing.

Updates 14+ sections per the v4 refactor scope: - Tagline: 3 stages (plan/execute/audit) → 2 stages (execute/audit) - What It Does intro: per-task adversarial pipeline + auto-chain framing - Quickstart demo (~80 lines) full rewrite for Phase 0 + per-task E/R/C - Pipeline Commands table: drop /agent-team:start and /agent-team:plan rows - How It Works diagram: replace plan→execute→audit with execute(Phase 0) → execute(Phase 4 pipeline) → audit - Stage table: collapse 4 rows (Start/Plan/Execute/Audit) to 2 (Execute/Audit) with detailed phase breakdown - Drop "Plan team" / "Execute team" / "Audit team" composition list - Plan-aware paragraph: rewrite to Phase 0 vocabulary - Teammate Roles table: 13-row catalog → 4-row taxonomy (Lead, Executor, Reviewer, Challenger) with archetype specializations note - Team Types table: rewrite Default Roles column → per-archetype Reviewer/Challenger focus - Workspace tree: add reviews/ subdir + impact_files/review_status/ challenge_status fields - Plugin Structure tree: drop skills/start, skills/plan; expand skills/execute/ and skills/audit/ subfolder annotations

Updates 6+ sections per the v4 refactor scope: - Architecture diagram: drop skills/start, skills/plan; expand skills/execute and skills/audit subfolder annotations to include Phase 0 prep, Phase 4 per-task pipeline, integration-only review - Key Design Decisions: extend Team-per-stage bullet to mention the per-task ephemeral E/R/C pipeline; add 3-role universal taxonomy bullet noting legacy role names are colloquial archetype specializations - File Ownership table: delete 5 plan/start rows; rewrite execute rows to highlight Phase 0/4 + 3 universal templates; add skills/execute/examples/ row - Adding a New Teammate Role guide: full rewrite to point to docs/custom-roles.md archetype-extension model instead of direct teammate-roles.md edits - Adding a New Pipeline Stage guide: drop reference to deleted skills/start/SKILL.md routing table

Replaces Phase 1/Phase 2 demo blocks with Phase 0 (resolve + decompose + approve) and Phase 4 (per-task Executor -> Reviewer -> Challenger ephemeral pipeline). Updates teammate names from auth-impl-1/2 (Implementer) to executor-task-1/2 + ephemeral reviewer-task-N + challenger-task-N identifiers, demonstrating the new ephemeral spawn-shutdown-respawn pattern. Both scripts produce visually equivalent output: - generate-demo-cast.sh: asciicast-style with timing - record-demo.sh: plain stdout with colors

v4.0.0 collapses the 4-skill pipeline (start/plan/execute/audit) to 2 skills (execute/audit) with per-task adversarial review pipeline (Executor → Reviewer → Challenger ephemeral) and 3-role universal taxonomy. CHANGELOG entry covers BREAKING/Migration/Added/Changed/ Removed sections.

Three new integration test files covering the v4.0 schema additions and per-task pipeline lifecycle: - tests/integration/test-v4-schema.sh (17 tests) Verifies validate-task-graph.sh accepts new fields (impact_files, review_status, challenge_status), the in_review status, and the approved_with_findings status values. Verifies compute-critical-path and detect-resume work on v4.0 graphs. Verifies regression: graphs missing required fields are still blocked. - tests/integration/test-v4-pipeline-lifecycle.sh (15 tests) Simulates a task progressing through the per-task pipeline: pending → in_review → review_status approved → challenge_status approved → completed. Tests fix-cycle exhaustion (approved_with_ findings) and cross-task impact_files overlap (the audit-stage integration review scope). - tests/integration/test-v4-skill-coherence.sh (93 tests) Static analysis of v4.0 skill content. Verifies SKILL.md files contain the expected v4.0 instructions, cross-references resolve, and demo scripts use v4.0 vocabulary. The bash-level equivalent of the manual smoke tests (Task 23 in the plan) — these don't invoke the actual Claude skills (which require TeamCreate at runtime), but they verify the skill instructions themselves describe the v4.0 flow. Total test suite: 19 files, 336 test assertions, all passing. Auto-discovery via find works without runner changes.

A thorough audit caught several issues in the new integration tests: **v4-schema.sh** (17 → 24 tests): - Tests 1-3 were tautological: claimed to verify v4 status enum acceptance but validate-task-graph.sh has no enum guard (it accepts any non-null string). Reframed to honestly test what's actually verified: the script doesn't choke on graphs containing the new fields. - Tests 3, 4 had missing 'cwd' in JSON input causing validate-task-graph to early-exit before validation. Fixed. - Test 5 (detect-resume) used empty {} input, hitting the no-CWD silent fallback. Fixed: pass cwd, assert stdout shows the workspace name and remaining tasks. - Tests 7a-7f wrote test docs and then grep'd them for fields the test itself wrote — pure tautologies. Replaced with checks against the PRODUCTION docs/workspace-templates.md (impact_files semantics, who populates each field, validate-task-graph tolerance). - Added Test 7 (regression: cycle detection still works). **v4-pipeline-lifecycle.sh** (15 → 21 tests): - Stages 3, 6 had missing cwd in JSON input. Fixed. - Reframed jq-on-fixture assertions as 'schema integrity checks' (verifying the field names jq reads match production spec) and added spec contract checks at every stage: - Stage 3: spawn-templates.md describes the impact-analysis directive - Stage 4: communication-protocol.md documents CODE_REVIEW + CHALLENGE_REVIEW - Stage 6: coordination-patterns.md documents max-1-fix-cycle bound and log-to-issues.md on exhaustion - Stage 7: audit/SKILL.md describes integration review using impact_files - Removed sleep-based timing assumptions; now uses precise jq queries. **v4-skill-coherence.sh** (93 → 101 tests): - Test 5 'Phase 1|Phase 2' regex was too broad (would match any incidental string). Narrowed to detect actual phase headings or numbered Phase 1a/1b mentions. - Test 9 archetype-directive count used substring matching (would pass for partial matches). Now checks each role × archetype combination with exact-match patterns: 4 archetypes × 3 roles = 12 directive blocks (Executor, Reviewer, Challenger). Net: 357 assertions (up from 336), no tautologies, every assertion verifies meaningful behavior or a documented contract. All tests pass.

ducdmdev added 30 commits March 26, 2026 13:26

fix: skip release creation if already exists (prevents CI 422 on manu…

33472f7

…al releases)

refactor: move plan-stage refs into skills/execute/

d181f06

Moves prior-context-loading.md, plan-mode-protocol.md, and plan-proposal-example.md into the execute skill. Content updates follow in subsequent commits.

refactor: update moved plan-stage refs for Phase 0 vocabulary

3ec00e1

Updates Activation, Ownership Boundary, Spawn Directive, Revision Limits, and Workspace Tracking sections to reflect v4.0's Phase 0 sub-steps and ephemeral per-task spawn pattern.

fix(communication-protocol): add missing Code Review Messages TOC entry

6648140

Code Review Messages H2 was already in the file but never listed in the Contents block. Adding the entry makes CHALLENGE_REVIEW (newly added in the prior commit) discoverable via TOC navigation.

refactor(audit): scope reviewer agent to integration-only

ea7ea01

Updates the audit-stage reviewer agent prompt: read per-task challenge docs for context, focus end-to-end review on impact_files overlapping multiple tasks, surface coverage gaps the per-task pipeline missed.

ducdmdev merged commit f17a9e5 into main May 11, 2026
1 check passed

ducdmdev deleted the feat/v4-refactor branch May 11, 2026 14:29

ducdmdev mentioned this pull request May 11, 2026

chore: sync repo metadata + landing page with v4.0 concept #20

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v4.0 — pipeline simplification + per-task adversarial review#19

feat: v4.0 — pipeline simplification + per-task adversarial review#19
ducdmdev merged 31 commits into
mainfrom
feat/v4-refactor

ducdmdev commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ducdmdev commented May 10, 2026

Summary

BREAKING

Test plan

Commits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant