feat: apex-forge v0.3.0 — multi-agent adaptation + development principles#11
Open
d-wwei wants to merge 39 commits into
Open
feat: apex-forge v0.3.0 — multi-agent adaptation + development principles#11d-wwei wants to merge 39 commits into
d-wwei wants to merge 39 commits into
Conversation
…ign review Implement Phase 2 of the design-baseline gate: 5 expert persona sub-agents (UX Designer, Accessibility Specialist, Brand Guardian, Performance Analyst, End User) dispatched in parallel for independent design review. Key additions: - Persona prompt templates with evaluation questions and anti-rubber-stamping rules - Conflict detection (action conflicts + severity conflicts) with surfacing protocol - Verdict aggregation via confidence voting with explicit precedence (BLOCK > ESCALATE > PASS_WITH_NOTE > PASS) - CONCERN verdict classification, minimum 3-persona guardrail, all-disabled guard clause - Output transformation mapping persona findings to canonical review format - Artifact integration section for per-persona findings in review doc Architecture change: Phase 2 subsumes tasteful-frontend in Review layer. tasteful-frontend retained in Execute stage for design spec generation. Also includes: quick compound for previous iteration (compound memory write enforcement). Requirements: docs/brainstorms/design-baseline-phase2-requirements.md Plan: docs/plans/design-baseline-phase2-plan.md Review: docs/reviews/design-baseline-phase2-review.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… + agent rules Implement Phase 1 of the Plan Agent architecture spec: - terminal.ts: add sendKey() to TerminalAdapter interface + CmuxAdapter + TmuxAdapter - interrupt.ts: new module — per-agent interrupt key sequences with adapter awareness - worker.ts: wire worker_agent_rules[category] into resolveAgent priority chain - protocol-template.ts: add sectionDirectiveCheck for Worker directive/escalation protocol; make agentStartCommand async, read config.adapters first with hardcoded fallback - cross-model.ts: update agentStartCommand call to await - state.ts: runStructuralGate handles orchestrate:* stages with artifact checks - event-log.ts: materializeState handles orchestration.event type - types/state.ts: add OrchestrationEvent interface + optional field on StageState - master.md: complete rewrite — Initiation/M&C/Closure three-phase model, daemon integration, directive/escalation protocol, orchestrate: stage prefix Tests: 123 pass across 8 test files (sendKey, interrupt, directive, async agentStartCommand) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace non-functional Hero page controls: - "New Project" button → "Guide" with popover showing `apex dashboard` usage - Search input: live filter by project name/description - Status filter: dropdown for all/running/active/archived - Sort: dropdown for recent/name/success rate All filters compose via applyProjectFilters() pipeline. applyLocale() now supports data-i18n-placeholder for input elements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rch CLI Implement the orchestration daemon process (spec Section 3): - orchestrator/daemon.ts: DaemonState, tick loop — monitors workers, auto-integrates/merges passing tasks, spawns unblocked tasks, detects crashes/escalations, checks M&C closure condition - orchestrator/integrate.ts: autoIntegrate (tmp worktree merge+test), autoMerge (git merge --ff on main, race-condition handling) - orchestrator/notify.ts: notifyPlanAgent (terminal idle check + send, file-based notification queue fallback), readPendingNotifications - commands/orch.ts: apex orch start/stop/status — atomic lock (wx flag), PID tracking, stale lock detection, --force takeover - cli.ts: register 'orch' subcommand Tests: 8 new tests (integrate contract, notify queue CRUD + ordering + idempotency) Full suite: 314/315 pass (1 pre-existing timeout) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Define AgentAdapter with capabilities, protocol injection method, interrupt keys. Built-in adapters for claude/codex/gemini/opencode. resolveAdapterWithConfig with config > builtin > error priority. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When `apex init` runs in a non-git parent directory while the real project lives in a git-enabled subdirectory, both get registered as separate projects. The canonical-root dedup missed this because non-git dirs fall back to themselves as canonical root. Add path containment check: if two entries have an ancestor/descendant relationship, keep the git repo and discard the non-git ghost. When both are git repos, they are treated as genuinely separate projects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ProtocolInjectionMethod: discriminated union with type+flag fields (was string) - skipProxyEnv: codex/gemini/opencode set to true (was false) - opencode buildStartCommand: add run -p "$(cat 'path')" protocol injection - Strengthen opencode test to verify protocol injection in command Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lookup agentStartCommand now resolves via resolveAdapterWithConfig (config > builtin > error). codex gets stdin pipe, gemini gets single-quoted path, opencode supported. Unknown agents throw instead of falling back to claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…erence All section functions accept lang parameter. English translations for: sectionTask, sectionExecution, sectionCoreRules, sectionCommunication, sectionBoundaries, sectionCrossModel, sectionDirectiveCheck. Language auto-selected from adapter.capabilities.preferredLanguage. JSON field names stay English in both languages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three tiers: full bash (heredoc + apex CLI), file-write (Write instructions), minimal (Create file only). Selected by adapter capabilities. Bilingual support in all three tiers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
checkAgent: which -> --version -> functional trust. checkAllAgents: checks all 4 builtin agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace which-based check with three-level checkAgent in spawn. New 'apex worker check' lists all agents with status, version, protocol type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All code changes — including from investigation, debugging, and ad-hoc tasks — now require walking Execute → Review → Ship → Compound before committing. The pre-commit hook reads the session's pipeline state and blocks `git commit` if the stage is not "ship". Manual commits (without APEX_SESSION_ID) are unaffected. Also fixes: - Broken symlink detection in `apex init` — lstatSync returned true for dangling links, so the hook was never reinstalled - Session ID sanitization in the hook to prevent path traversal - SKILL.md Code Change Escalation Rule + anti-pattern entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reads agent type from meta.json, resolves interrupt keys via interruptKeys(), sends via terminal.sendKey(). Checks isAgentIdle after sending. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… args, capability check fixes C1: interruptKeys from interrupt.ts as single source of truth I1: quote protocolPath and model in all buildStartCommand I2: pass adapter.binary to checkAgent in spawn I3: functional tied to --version success I4: sectionDirectiveCheck respects capability degradation I5: gemini autoApprovalFlag added Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agents were skipping stage file reading (stages/{stage}.md), missing
required steps and exit gates. Root cause: the "MUST Read" rule only
existed in the "Explicit Stage Commands" section, not in Phase Discipline.
CLI provided no reminder when entering a stage.
Two-layer fix:
- apex stage set now prints MANDATORY reminder with stage file path
- Phase Discipline now has "Stage File Reading Rule (HARD GATE)" section
Requirements: user conversation log showing 3x Ship push skip
Plan: inline (brainstorm-plan fast-track)
Review: docs/reviews/stage-skip-protection-review.md
…recovery Phase 4: New CLI commands - `apex orchestrate event` — record orchestration events to state.jsonl - `apex worker directive` — write directive.json to worker (amend|pause|abort|info) - Help text updated for all new + existing orch commands Phase 5: Singleton mutex + recovery - `updateLock` / `parseHandleFlag` for Plan Agent handle registration - `--handle` flag in `apex orch start` for recovery takeover - Double-shutdown guard in daemon cmdStart - `discoverWorkers` enhanced: logs recovery, sets accurate initial health - master.md: cross-session recovery section with CLI-first directive usage - master-recovery.md: dedicated Plan Agent recovery protocol Tests: 22 new (398/398 pass, 0 fail) - orchestrate-event (4), worker-directive (5), orch-lock+acquireLock (9), daemon-recovery (4) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step 7 had three issues: (1) --limit 1 only checked the latest workflow run, missing other triggered workflows; (2) "No bypass" text contradicted a "继续创建 PR" option three lines below; (3) no explicit requirement to wait for all jobs to complete before proceeding. Now: check all workflow runs (--limit 10), wait until every run completes (no proceeding while in_progress/queued), removed all bypass options, increased timeout to 15 minutes, require re-running Step 7 after fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l-based-audit alias product-review requires a running product, so it doesn't belong in the Review stage (which examines code diffs). Relocated to a new post-Ship evaluation point where users can optionally run UX review or goal-based audit before Compound. Changes: - New aliases: product-user-review, product-goal-based-audit - bindings.yaml: removed product-review from review section, added post-ship section - ship.md: inserted Post-Ship Evaluation choice point before Compound Transition - SKILL.md: updated command table and external skills list - Added review-dedup-optimization spec for future reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. event-log.ts: materializePerSession() initialized last_updated to new Date() (query time), causing sessions with only orchestration or skill events to never go stale. Fixed to use first event timestamp. Before: 31 ghost "active" sessions. After: 0. 2. init.ts: auto-register project in ~/.apex-forge/registry.json on `apex init`, so Dashboard Hub discovers projects without requiring a separate `apex dashboard` invocation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oint tracking Ship stage `runStructuralGate` previously only checked 1 item (review artifact exists). ship.md defined 8 structural checks but the CLI didn't enforce them, allowing agents to skip mandatory steps like great-writer invocation, push prompts, and iteration summaries. - Expand ship gate from 1 to 6 programmatic checks (S1,S2,S3,S6,S7,S8) - Add `apex ship checkpoint` CLI command for recording conversation-flow steps - Add `ship.checkpoint` event type in event-sourcing pipeline - Register great-writer as required ship-stage binding in bindings.yaml - Add checkpoint recording instructions to ship.md at each step boundary - 8 new tests covering gate blocking and passing scenarios S4 (preflight) and S5 (CI) remain in SubAgent-based gate (external tool dependency). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move multi-agent orchestration code to the independent apex-manager repo (github.com/d-wwei/apex-manager). This decouples orchestration (managing parallel Worker agents) from protocol (how each Worker works). Removed from apex-forge: - src/worker/ (terminal, monitor, agent-adapter, cost, proxy, etc.) - src/orchestrator/ (daemon, integrate, notify) - src/commands/ (worker, orch) - skill/roles/master.md - All associated tests Preserved: - src/worker/ directory (empty, for future AF-specific worker utils) Added: - apex-manager as companion in install.sh - Extraction spec and review docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ss 5 stages Audit found 40 structural checks defined in stage files but only 15 implemented in CLI runStructuralGate(). This closes all 25 gaps: - Brainstorm: +5 checks (acceptance criteria, constraints, scope, status, decisions) - Plan: +4 checks (file manifest, test paths, task decomposition, status) - Execute: +2 checks (test files exist, execution log) - Review: +6 checks (4 persona sections, status, no unresolved P0) - Compound: +5 checks (root cause, prevention, roadmap, memory, re-entry prompt) Also: add hasSection()/parseFrontmatter() helpers, compound checkpoint command, fix CRLF frontmatter parsing, fix regex false-positives, fix artifact ordering (use last not first for current iteration). 19 new tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ment All previous constraints operated on Think axis (text instructions, voluntary CLI gates). Agent could bypass the entire pipeline by running git commit directly. This adds structural behavioral enforcement: - PreToolUse hook (apex-forge-gate.sh): blocks git commit/push/gh-pr outside Ship stage, blocks code edits during Brainstorm, validates ship checkpoints before push - Fix pre-commit hook: remove broken APEX_SESSION_ID dependency, use mtime-based state discovery - Fix apex init: find hooks from skill directory (works in any repo, not just apex-forge) - install.sh: auto-register PreToolUse hook in settings.json - 18 new tests covering all gate scenarios This is the L2-Deny layer from the unified constraint framework (Think+Do): agent literally cannot execute git commit without being in the correct pipeline stage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…all layers New `apex doctor` command audits 23 checks across 5 categories: - L2-Deny: PreToolUse hook registration, gate script, pre-commit hook, python3 - L4-Gate: per-stage check coverage (brainstorm 7/7, plan 7/7, execute 3/3, review 8/9, ship 6/8, compound 6/6) - Checkpoints: ship + compound checkpoint commands functional - Bindings: required companion skills installed - Project: .apex/ state, .gitignore Weighted scoring (CRITICAL=3x, HIGH=2x, MEDIUM=1x, LOW=0.5x) with A-F grading. CRITICAL fail caps grade at C. Fix instructions for every FAIL. Uses internal runStructuralGate API — no state mutation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert design-review and codex-consult from companion to builtin skills (aliases/) since they have no independent repos and their logic lives inside AF. Add great-writer and product-goal-based-audit to install.sh DEPS so all bindings.yaml declarations are now covered by auto-install. - bindings.yaml: skill → builtin for design-review, codex-consult - New: aliases/codex-consult.md (independent second-opinion dispatch) - install.sh: DEPS 10 → 12 (+great-writer, +product-goal-based-audit) - skill-trace hook: COMPANIONS list synced with new topology - README + SKILL.md: updated counts and skill listings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… stage ordering, fix idle-toggle bypass Three layers to prevent agents from bypassing pipeline gates: 1. PreToolUse hook Rule 0: deny any Bash command containing --skip-gate 2. Stage ordering in setStage(): each stage requires predecessor completed via explicit gate (not auto-closed by transition) 3. CLI: --skip-gate flag removed, gate always enforced Also: completed_via field on StageHistory distinguishes "gate" (explicit apex stage complete) from "transition" (auto-closed by stage change). Prevents idle-toggle bypass where brainstorm→idle→plan faked completion. Proper pipeline artifacts: requirements doc (7/7 gate), plan doc (7/7 gate), multi-persona review (8/8 gate including P0 resolution verification). 9 new tests, all existing tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plant 3 sentences at the top of SKILL.md (before any rules) to establish that artifact quality matters more than response speed, that hollow gate-passing has zero value, and that every artifact must be self-contained for independent judgment. This addresses the Goodhart effect where agents optimize for "shortest path through gate" rather than production-usable output. Strategy 1 of 4 in the protocol restructuring initiative. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…f duplicating logic The full implementation (3 modes, Codex CLI detection, filesystem boundary enforcement, telemetry) lives in workflow/roles/codex-consult.md. The alias should route there, not reimplement a subset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…layer
Restructure SKILL.md and 6 stage files from 2270 to 988 lines (-56%)
by moving detailed checklists, templates, and procedures to 14 new
files in skill/details/. Main files keep flow skeletons, exit gates,
and essential tables. Each moved section replaced with a pointer:
"→ See details/{file}.md".
This addresses the attention dilution problem: agent compliance rate
drops exponentially with instruction count. The 3-layer architecture
(skeleton → key requirements → full details) ensures agents read only
what they need for the current step.
Strategy 2 of 4 in the protocol restructuring initiative.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ation Adds `apex audit` command that checks pipeline execution quality beyond process compliance. Three layers: L1 process integrity (stages, gates, checkpoints), L2 artifact content quality (AC depth, review substance), L3 cross-verification against unforgeable sources (git diff, tests, timestamps). Scoring: Process 30%, Content 20%, Verified 50%. Any L3 FAIL caps at C. CLI flags: --session, --json, --no-test, --all. Requirements: docs/brainstorms/audit-command-requirements.md Plan: docs/plans/audit-command-plan.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…heckpoint Three automations to shift agent attention from mechanical to content work: 1. `apex stage set` now prints 3-4 key requirements inline, so the agent knows the core rules without reading the full stage file first. 2. `apex stage set brainstorm|plan|review` auto-creates artifact templates with correct frontmatter skeletons if no artifact exists yet. 3. PostToolUse hook detects AskUserQuestion during ship stage and auto-records push-prompt checkpoint, removing a manual step. Strategy 3 of 4 in the protocol restructuring initiative. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two improvements to make protocol compliance visible: 1. `apex stage complete ship` now prints a pipeline compliance summary showing per-stage artifact status and an overall grade (A-D). 2. `apex doctor` gains 3 content quality checks: - CQ1: Brainstorm acceptance criteria count >= 3 - CQ2: Plan file manifest paths exist on disk - CQ3: Review persona sections have substantive content (>50 chars) These shift quality enforcement from form (section exists) to substance (section has meaningful content), addressing the Goodhart effect where agents fill sections with minimal content to pass gates. Strategy 4 of 4 in the protocol restructuring initiative. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two enforcement upgrades that shift gates from form-checking to substance: 1. Content quality gates (real-time blockers): - CQ1: brainstorm must have >= 3 numbered acceptance criteria - CQ3: review persona sections must have > 50 chars each 2. Adversarial verification gates (Tier 2+ only): - ADV1: brainstorm requires .apex/verifications/brainstorm-adversarial.md - ADV2: review requires .apex/verifications/review-adversarial.md Gate prints sub-agent dispatch prompt when blocked. Agent must spawn independent sub-agent to challenge the artifact. Lightweight scope exempt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`apex audit --quick` produces a concise one-page summary (28 lines) for human decision support: task name, scope, acceptance criteria, git changes, sub-agent findings, gate status per stage, and a decision prompt (approve/modify/reject). Designed so a human can scan the pipeline output in 30 seconds and focus on the highest-signal section: what the adversarial sub-agent found that the main agent missed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ge explanations Covers all 6 commits from the protocol restructuring initiative: value anchor, progressive disclosure, do-axis automation, compliance reporting, gate hardening with adversarial verification, and apex audit --quick for human review. Each section includes a "大白话" explanation of what it does and why. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…detection Root cause: cognitive-kernel L1 output protocol fires unconditionally on "proposing" scenarios, overriding apex-forge's "initialize pipeline first" requirement. Agent produces full analysis without ever running apex init or entering brainstorm stage. Two-path fix: 1. cognitive-kernel L1 Step 0: new "Pipeline 协议优先" exemption — when a pipeline skill is loaded but not initialized, defer proposing template until pipeline init completes. 2. PostToolUse hook: detect when Skill tool loads apex-forge/better-work and .apex/ is missing or stage is idle — output stderr warning to remind agent to run Dashboard Gate + apex init. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d compat, security Systematically codify 13 development principles across 4 enforcement layers: - CONTRIBUTING.md: 7 new sections (dependency policy, backward compat, changelog, ADR, security, linting, test requirement) - CI: Biome zero-warning gate + CLI backward compatibility schema test - Brainstorm checklist: Capability Audit, Evidence of Need, Anti-Double-Counting - ADR directory with template + example (hybrid changelog format decision) Biome auto-fix applied to 85 source files (import sorting, node: protocol, template literals). 3 rules disabled pending codebase-wide cleanup. Requirements: docs/brainstorms/dev-principles-optimization-requirements.md Plan: docs/plans/dev-principles-optimization-plan.md Review: docs/reviews/dev-principles-optimization-review.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- biome.json: 3 disabled rules (noExplicitAny, noNonNullAssertion, noAssignInExpressions) upgraded from "off" to "warn" — violations now visible locally without breaking CI - state.ts: S8 ADR gate programmatic enforcement — checks docs/decisions/ for non-template ADR files when scope is Standard/Deep with ≥2 approaches - CONTRIBUTING.md: Security Principles expanded with sandbox, browser automation, and MCP server security boundaries Requirements: docs/brainstorms/dev-principles-hardening-requirements.md Review: docs/reviews/dev-principles-hardening-review.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
apex worker check/interruptapex audit --quickKey Changes (39 commits)
docs/decisions/: ADR system with templatesrc/state/state.ts: S8 ADR gate enforcementsrc/adapters/: Multi-agent adapter registryskill/: Protocol restructuring (Progressive Disclosure)src/commands/audit.ts: Pipeline quality audit +--quickmodeTest plan
bun test --filter dev-principles— 20 passbunx biome ci src/— exit 0 (116 warnings, 0 errors)bunx tsc --noEmit— 0 type errors🤖 Generated with Claude Code