diff --git a/.gitignore b/.gitignore index 4b81f2b38..e7d3afb61 100644 --- a/.gitignore +++ b/.gitignore @@ -243,6 +243,7 @@ benchmarks/** skill-creator spec-annotate changes-header +review-output.json apm.yml .playwright-*/ .apm @@ -250,11 +251,13 @@ apm.yml .codegraph .vercel .codex +.review .mypy_cache .mcp.json demo skills-lock.json sx.json +context-hud .env** playwright/ .claude.backup.* diff --git a/README.md b/README.md index 651756b6c..1dd0d4ab5 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Pilot Shell -### How real engineers run Claude Code. +### How real engineers run Claude Code and Codex From requirement to production-grade code — planned, tested, verified.
**Spec-driven plans. Enforced quality gates. Persistent knowledge.** @@ -39,23 +39,21 @@ curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install. ## Why Pilot Shell -**Claude Code writes code fast** — but without structure, it skips tests, loses context, and produces inconsistent results. Other frameworks add complexity (dozens of agents, thousands of lines of config) without meaningfully better output. +**Claude Code and Codex CLI write code fast** — but without structure, they skip tests, lose context, and produce inconsistent results. Other frameworks add complexity (dozens of agents, thousands of lines of config) without meaningfully better output. **Pilot Shell is different.** Every component solves a real problem with an engineered solution: +- **`/prd`** — brainstorm ideas into clear requirements with optional deep research - **`/spec`** — plans, implements, and verifies features end-to-end with TDD - **`/fix`** — bugfix workflow with TDD; bails out when complexity exceeds the standard fix lane -- **`/prd`** — brainstorm ideas into clear requirements through with optional deep research - **Spec collaboration** — share specs with teammates, annotations flow back grouped by author - **Quality hooks** — enforce linting, formatting, type checking, and tests as quality gates - **Context engineering** — preserves decisions and knowledge across sessions - **Code intelligence** — semantic search (Semble) + code knowledge graph (CodeGraph) - **Token optimization** — 60–90% cost reduction via RTK compression and Semble code search +- **Pilot Bot** — persistent automation agent with scheduled tasks and background jobs - **Extensions** — reusable rules, skills, and MCP servers with team sharing and customization - **Console** — local web dashboard with real-time notifications and session management -- **Pilot Bot** — persistent automation agent with scheduled tasks and background jobs - -Run `pilot` for Spec-Driven Development with `/spec`, or `pilot bot` for 24/7 automations. --- @@ -63,25 +61,22 @@ Run `pilot` for Spec-Driven Development with `/spec`, or `pilot bot` for 24/7 au ### Prerequisites -**Claude Code:** Install [Claude Code](https://code.claude.com/docs/en/quickstart) using the **native installer** before setting up Pilot Shell. If you have the `npm` or `brew` version installed, uninstall it first. If no Claude Code installation is detected, the Pilot installer will attempt to set it up for you. +**At least one AI agent:** Pilot Shell supports **Claude Code** (primary — full feature coverage) and **Codex CLI** (all workflows, fewer platform features). Install at least one before running the Pilot installer: -**Claude Subscription:** Solo developers should choose [Max 5x](https://claude.com/pricing) for moderate usage or [Max 20x](https://claude.com/pricing) for heavy usage. Teams should use [Team Premium](https://claude.com/pricing) (6.25x usage per member, SSO, admin tools, billing management). Companies with stricter compliance or procurement requirements should use [Enterprise](https://claude.com/pricing) (API based pricing applies per usage). +- **Claude Code:** Install via the [native installer](https://code.claude.com/docs/en/quickstart). If you have the `npm` or `brew` version, uninstall it first. Requires a Claude subscription — [Max 5x or 20x](https://claude.com/pricing) for solo, [Team Premium](https://claude.com/pricing) for teams, [Enterprise](https://claude.com/pricing) for organizations. +- **Codex CLI:** Install via the [native installer](https://developers.openai.com/codex/cli). If you have the `npm` or `brew` version, uninstall it first. Requires an OpenAI subscription — [Plus or Pro](https://developers.openai.com/codex/pricing) for solo, [Business or Enterprise](https://developers.openai.com/codex/pricing) for teams. **Terminal (Recommended):** [cmux](https://cmux.com) works great with Pilot Shell — its vertical tab layout lets you run multiple sessions side by side. Any modern terminal works: [Ghostty](https://ghostty.org/), [iTerm2](https://iterm2.com/), or the built-in macOS/Linux terminal. -**Claude Chrome (Recommended):** Install the [Claude Code Chrome extension](https://code.claude.com/docs/en/chrome) for browser automation and E2E testing. Pilot automatically detects it and uses it as the preferred tool. When the extension isn't available, Pilot falls back to [Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp) (direct CDP access, Lighthouse, performance tracing), then [playwright-cli](https://github.com/microsoft/playwright-cli) (persistent sessions, tracing) or [agent-browser](https://agent-browser.dev/) (lightweight, fast startup). - -**Codex Plugin (Included):** The [Codex plugin](https://github.com/openai/codex-plugin-cc) is installed automatically with Pilot. It provides adversarial code review powered by OpenAI Codex — an independent second opinion during `/spec` planning and verification. Run `/codex:setup` once to authenticate, then enable reviewers in Console Settings → Reviewers. A [ChatGPT Plus](https://chatgpt.com/#pricing) subscription ($20/mo) covers the Codex API usage. - ### Installation -**Works with any existing project.** Pilot Shell is installed on top of Claude Code and uses its built-in concepts like commands, rules, hooks, skills, subagents, MCP, LSP and optimized settings to improve your experience: +**Works with any existing project.** Pilot Shell integrates with **Claude Code** and **Codex CLI**, using their built-in concepts (rules, hooks, skills, subagents, MCP) to improve your experience: ```bash curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash ``` -Installs globally on macOS, Linux, and Windows (WSL2). All tools and rules go to `~/.pilot/` and `~/.claude/`. After installation, `cd` into any project and run `pilot` or `ccp` to start. +Installs globally on macOS, Linux, and Windows (WSL2). After installation, just run `claude` or `codex` directly — Pilot Shell is loaded automatically. Run `pilot update` to check for updates.
Downgrade @@ -107,23 +102,24 @@ curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/uninstal
Reset & Refresh -Over time, Claude Code's accumulated session logs and Pilot Shell's caches can slow things down. A periodic reset gives you a clean baseline: +Over time, accumulated session logs and Pilot Shell's caches can slow things down. A periodic reset gives you a clean baseline: ```bash -# 1. Inside Claude Code, log out +# 1. If using Claude Code, log out first /logout # 2. Back up your current config (just in case) mv ~/.claude.json ~/.claude.json.bak mv ~/.claude ~/.claude.bak +mv ~/.codex ~/.codex.bak mv ~/.pilot ~/.pilot.bak # 3. Reinstall Pilot Shell from the official installer curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash -# 4. Start Pilot, sign in to Claude, and re-activate your license -pilot +# 4. Re-activate your license, then start your agent pilot activate +claude # or: codex ``` Once Pilot Shell is running smoothly again, you can delete the `.bak` copies. Forgot your license key? Recover it in the [Pilot members area](https://polar.sh/max-ritter/portal). @@ -141,32 +137,81 @@ For tighter isolation when working with untrusted code, combine the dev containe
What the installer does -7-step installer with progress tracking, rollback on failure, and idempotent re-runs: +9-step installer with progress tracking, rollback on failure, and idempotent re-runs. Steps 3 and 4 are agent-conditional — they skip cleanly when the matching agent CLI is not detected. The installer **does not install Claude Code or Codex CLI itself**; install at least one yourself per the prerequisites above. -1. **Prerequisites** — Checks/installs Homebrew, Node.js, Python 3.12+, uv, git, jq -2. **Claude files** — Installs into `~/.claude/` (native layout) — rules, commands, hooks, MCP servers, agents -3. **Config files** — Creates `.nvmrc` and project config -4. **Dependencies** — Installs Semble, RTK, CodeGraph, [Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp), [playwright-cli](https://github.com/microsoft/playwright-cli), [agent-browser](https://agent-browser.dev/), language servers -5. **Shell integration** — Auto-configures bash, fish, and zsh with `pilot` alias -6. **VS Code extensions** — Installs recommended extensions for your stack -7. **Finalize** — Success message with next steps +1. **Prerequisites** — Checks/installs Homebrew, Node.js, Python 3.12+, uv, git, jq. Verifies at least one supported agent (Claude Code or Codex CLI) is on the system; aborts with a clear error otherwise. +2. **Pilot files** — Agent-neutral Pilot Shell-managed assets. Hooks → `~/.pilot/hooks/`, Console scripts/UI → `~/.pilot/`, MCP server template → `~/.pilot/.mcp.json`, raw rule sources → `~/.pilot/rules/` (read by Codex's adapter), plus the canonical skill source at `~/.claude/skills/` (Claude reads natively; Codex adapts in step 4). Always runs. +3. **Claude files** — Claude-specific assets under `~/.claude/`: rules, sub-agents, `settings.json` (three-way merged), plus the Claude post-install merges (hooks into settings, `~/.claude.json` MCP block, model config migration). **Skipped when Claude Code CLI is not detected.** +4. **Codex files** — Codex-specific assets: adapted skills → `~/.agents/skills/`, review agents → `~/.codex/agents/`, `~/.codex/AGENTS.md`, `~/.codex/config.toml`, `~/.codex/hooks.json`. Per-category counts mirror the Claude section. **Skipped when Codex CLI is not detected.** +5. **Config files** — Creates `.nvmrc` and project config. +6. **Dependencies** — Installs Semble, RTK, CodeGraph, [Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp), [playwright-cli](https://github.com/microsoft/playwright-cli), [agent-browser](https://agent-browser.dev/), language servers, and the `codex@openai-codex` Claude marketplace plugin (skipped on Codex-only systems alongside Chrome DevTools MCP and LSP plugins). +7. **Shell integration** — Auto-configures bash, fish, and zsh with the `pilot` admin alias. +8. **VS Code extensions** — Installs recommended extensions for your stack. +9. **Finalize** — Success message with next steps.
+### First Steps + +Run these commands once in each new project after installing Pilot Shell: + +```bash +# Claude Code # Codex CLI +claude codex +> /setup-rules > $setup-rules +``` + +`/setup-rules` reads your codebase, discovers your conventions, and generates project-specific rules and MCP server docs — this is how Pilot learns your project. Run it once to start, then again after major architectural changes. + +Once your rules are in place, use `/create-skill` to capture any repeatable workflow as a reusable skill, and `/benchmark` to measure whether your rules and skills are actually improving outputs. See [Additional Workflows](#additional-workflows) for full details on all three. + --- -

How It Works

-Just chat — no plan, no approval gate. [Quick mode](https://pilot-shell.com/docs/workflows/quick-mode) is the default: quality hooks and TDD enforcement still apply, best for small tasks and exploration. For anything that needs a plan, use `/spec` — not Claude Code's built-in plan mode. +

Core Workflows

+ +Three commands cover the full development cycle — from vague idea to shipped feature. Quality hooks and TDD enforcement run automatically on every task. + +### /prd — Brainstorm Ideas Into Product Requirements Documents + +[`/prd`](https://pilot-shell.com/docs/workflows/prd) is the brainstorming surface for ideas that aren't specs yet — vague problem statements and fuzzy shapes. It pitches directions, pressure-tests them with you, and converges on a PRD you can hand to `/spec`. PRDs are saved to `docs/prd/` and visible in the Console's **Requirements** tab. + +```bash +# Claude Code # Codex CLI +claude codex +> /prd "Add real-time notifications for team updates" > $prd "Add real-time notifications for team updates" +``` + +
+How /prd works + +**When to use `/prd` over `/spec`:** `/prd` is for **what** and **why**; `/spec` is for **how**. Reach for `/prd` first when you only have a problem statement, want to riff across multiple directions, or need scope boundaries defined before someone starts building. + +**Flow:** two modes, picked automatically from how fuzzy the idea is: + +1. **Ideate** — free-form prose, the agent pitches 3-5 directions, you react (only runs when the idea is vague) +2. **Clarify → Converge → Write** — structured multiple-choice questions once the shape is known, then the PRD is written + +**Research tiers** (picked at the start): + +| Tier | Behavior | +|------|----------| +| **Quick** | Skip research | +| **Standard** | Web search for competitors, prior art, best practices | +| **Deep** | Parallel research agents for comprehensive findings | + +The final PRD covers problem statement, core user flows, scope boundaries, and technical context — then offers to hand off directly to `/spec` for implementation. + +
### /spec — Spec-Driven Development -**[`/spec`](https://pilot-shell.com/docs/workflows/spec) replaces Claude Code's built-in plan mode** (Shift+Tab) for new features, refactoring, and architectural work. It provides a complete planning workflow with TDD, verification, and code review. **[Collaborative spec review](https://pilot-shell.com/docs/features/spec-collaboration) shifts review left** — share a single link, teammates annotate inline, feedback flows back into the Console grouped by author. +[`/spec`](https://pilot-shell.com/docs/workflows/spec) is for new features, refactoring, and architectural work. It provides a complete planning workflow with TDD, verification, and code review (on Claude Code, it replaces the built-in plan mode at Shift+Tab). **[Collaborative spec review](https://pilot-shell.com/docs/features/spec-collaboration) shifts review left** — share a single link, teammates annotate inline, feedback flows back into the Console grouped by author. ```bash -pilot -> /spec "Add user authentication with OAuth and JWT tokens" -> /spec "Migrate the REST API to GraphQL" +# Claude Code # Codex CLI +claude codex +> /spec "Add user authentication with OAuth and JWT" > $spec "Add user authentication with OAuth and JWT" ``` ``` @@ -175,32 +220,16 @@ Discuss → Plan → Approve → Implement (TDD) → Verify → Done └── Loop──┘ ``` -Pilot Shell Console — Specifications -
How /spec works `/spec` auto-detects whether the request is a feature or a bugfix and routes to the right workflow. The three phases below apply to both — the verify step differs slightly (features get E2E scenarios; bugfixes get a Behavior Contract audit, see the `/fix` section below). -**Plan:** Explores codebase with semantic search → asks clarifying questions → writes detailed spec with scope, tasks, and definition of done → for UI features, writes **E2E test scenarios** (step-by-step, browser-executable) that become the verification contract → **spec-review sub-agent** validates completeness → waits for your approval. Optional **Codex adversarial review** provides an independent second opinion when enabled. +**Plan:** Explores codebase with semantic search → asks clarifying questions → writes detailed spec with scope, tasks, and definition of done → for UI features, writes **E2E test scenarios** (step-by-step, browser-executable) that become the verification contract → **spec-review sub-agent** validates completeness in Claude Code or Codex → waits for your approval. Optional **Codex Companion Reviewers** provide a Claude Code plugin second opinion when enabled. **Implement:** Creates an isolated git worktree → implements each task with strict TDD (RED → GREEN → REFACTOR) → quality hooks auto-lint, format, and type-check every edit → full test suite after each task. -**Verify:** Full test suite + actual program execution → **unified review sub-agent** (compliance + quality + goal) → for UI features, executes each E2E scenario step-by-step via browser automation (pass/fail tracked, results written to plan) → auto-fixes findings → squash merges to main on success. - -
- -
-Collaborative spec review — catch flaws in the spec, not the PR - -Send a single persistent share link to your teammates. They open it in any browser at `pilot-shell.com/s/` — no Pilot Shell install required — and annotate the spec inline. Submitted feedback flows back into your Console automatically (60-second poll), surfaces in the bell as `" left N annotations on "`, and lands in the annotation panel grouped by teammate. - -- **One link, many reviewers.** Click "Share with Teammates" once. The button is permanently replaced by a copyable link. Forward it to anyone. -- **Zero handoff.** Teammates click Submit on pilot-shell.com — no copy-URL-back step, no manual import. -- **Author-grouped feedback in the Console.** Keep what's useful, delete what isn't. Annotations land in `.annotations/.json` so the agent reads them at the next review checkpoint. -- **Shift left.** Wrong approach, missed edge case, unclear scope, weak architecture — spot it before code is written, where a one-sentence annotation is the entire fix. Not just bugs: anything you'd flag in PR review costs less to flag here. - -Links expire after 7 days. No accounts, no encryption — the unguessable URL is the access token. +**Verify:** Full test suite + actual program execution → **changes-review sub-agent** in Claude Code or Codex (compliance + quality + goal) → for UI features, executes each E2E scenario step-by-step via browser automation (pass/fail tracked, results written to plan) → auto-fixes findings → squash merges to main on success.
@@ -209,10 +238,9 @@ Links expire after 7 days. No accounts, no encryption — the unguessable URL is **[`/fix`](https://pilot-shell.com/docs/workflows/fix) is the bugfix command.** Investigate the bug, write the failing test, fix at the root cause, single-pass audit, done. No plan file, no approval mid-flow, no separate verify phase. ```bash -pilot -> /fix "annotation persistence drops fields between save and reload" -> /fix "off-by-one in pagination at boundary" -> /fix "wrong default for max_retries" +# Claude Code # Codex CLI +claude codex +> /fix "annotation persistence drops fields between save and reload" > $fix "annotation persistence drops fields" ``` ```text @@ -226,70 +254,34 @@ If investigation reveals the bug is multi-component or architectural, `/fix` sto For local bugs. Single file, obvious-once-traced root cause. No plan file, no approval mid-flow, no separate verify phase. TDD still enforced — bugfixes without a failing test don't ship. -- **Investigate:** Reproduce the bug → trace to root cause at `file:line` with `codegraph_context` + targeted reads → state confidence (High/Medium required to proceed). For UI / async / race bugs, add temporary `SPEC-DEBUG:`-marked logs at component boundaries before tracing. +- **Investigate:** Reproduce the bug → trace to root cause at `file:line` with `codegraph_context` (structure) + `semble search` (intent, cross-language) + targeted reads → state confidence (High/Medium required to proceed). For UI / async / race bugs, add temporary `SPEC-DEBUG:`-marked logs at component boundaries before tracing. - **RED:** Write the failing test via an existing public entry point → run, must fail with the documented symptom. - **Fix:** Minimal change at the root cause. Symptom patches are forbidden. Reproducing test must pass, then the targeted test module. Diff sanity check (root-cause file in diff, no unplanned files, < 20 lines, symptom-patching grep) catches issues with the fix itself. -- **Verify End-to-End:** The primary correctness signal. Run the actual program with the original input (Claude Code Chrome → Chrome DevTools MCP → playwright-cli → agent-browser for UI; CLI / API / REPL / job trigger for non-UI) and capture concrete evidence. A passing unit test alone is never accepted as proof. +- **Verify End-to-End:** The primary correctness signal. Run the actual program with the original input (browser automation for UI — Claude Code uses its Chrome extension; Codex uses Chrome DevTools MCP; both fall back to playwright-cli / agent-browser. CLI / API / REPL / job trigger for non-UI) and capture concrete evidence. A passing unit test alone is never accepted as proof. - **Quality Gate:** Lint + types + build + full anti-regression suite, once. -- **Bail-out:** If investigation reveals the bug is multi-component, architectural, needs defense-in-depth at multiple layers, or two fix attempts have failed, `/fix` stops cleanly and tells you to re-invoke with `/spec`. It does not silently switch lanes. -
- -
-How /spec handles bugs - -When you type `/spec ""`, the full bugfix workflow runs — for bugs that warrant a written plan, approval, code review, and the full verify ceremony. - -- **Behavior Contract:** every plan pins down `Given / When / Currently / Expected / Anti-regression` — the invariant the fix must produce and the behavior it must not break -- **Three uniform tasks** (always, regardless of bug size): Write Reproducing Test (RED) → Implement Fix at Root Cause → Quality Gate -- **Verify audit:** always-on `cp`+`trap` revert-test (proves the reproducing test would genuinely fail without the fix — rules out retroactive rubber-stamp tests) + root-cause-at-source audit (flags symptom patches and caller-side workarounds) + original-symptom re-check — no sub-agents, tests carry the proof -- **Iteration cap at 3:** after three failed verify cycles, the workflow stops and asks if the bug is architectural rather than letting you loop forever +**When to use `/spec` for bugs instead:** bugs that span 3+ files, need a written plan and approval, warrant a Behavior Contract (`Given / When / Currently / Expected`), or have failed two fix attempts. `/spec` adds a revert-test proof in verify, a cap at 3 iterations, and a code review gate — use it when the complexity makes that structure worthwhile.
-### /prd — Brainstorm Ideas Into Product Requirements Documents - -[`/prd`](https://pilot-shell.com/docs/workflows/prd) is the brainstorming surface for ideas that aren't specs yet — vague problem statements and fuzzy shapes. It pitches directions, pressure-tests them with you, and converges on a PRD you can hand to `/spec`. PRDs are saved to `docs/prd/` and visible in the Console's **Requirements** tab. - -```bash -pilot -> /prd "Add real-time notifications for team updates" -> /prd "We need better onboarding — users drop off after signup" -``` - -
-What /prd Does - -**When to use `/prd` over `/spec`:** `/prd` is for **what** and **why**; `/spec` is for **how**. Reach for `/prd` first when you only have a problem statement, want to riff across multiple directions, or need scope boundaries defined before someone starts building. - -**Flow:** two modes, picked automatically from how fuzzy the idea is: - -1. **Ideate** — free-form prose, Claude pitches 3-5 directions, you react (only runs when the idea is vague) -2. **Clarify → Converge → Write** — structured multiple-choice questions once the shape is known, then the PRD is written - -**Research tiers** (picked at the start): - -| Tier | Behavior | -|------|----------| -| **Quick** | Skip research | -| **Standard** | Web search for competitors, prior art, best practices | -| **Deep** | Parallel research agents for comprehensive findings | +--- -The final PRD covers problem statement, core user flows, scope boundaries, and technical context — then offers to hand off directly to `/spec` for implementation. +## Additional Workflows -
+Run after installing Pilot Shell to configure your environment, then on demand as your project evolves. ### /setup-rules — Generate Modular Rules [`/setup-rules`](https://pilot-shell.com/docs/workflows/setup-rules) explores your codebase, discovers conventions, generates modular rules and documents MCP servers. Run once initially, then anytime your project changes significantly. ```bash -pilot -> /setup-rules +# Claude Code # Codex CLI +claude codex +> /setup-rules > $setup-rules ```
-What /setup-rules Does +How /setup-rules works 12 phases that read your codebase and produce comprehensive AI context: @@ -315,19 +307,20 @@ pilot [`/create-skill`](https://pilot-shell.com/docs/workflows/create-skill) builds a reusable skill from any topic — explores the codebase and creates it interactively with you. If no topic is given, evaluates the current session for extractable knowledge. ```bash -pilot -> /create-skill "Automate the review and triaging of our PR Bot comments" +# Claude Code # Codex CLI +claude codex +> /create-skill "Automate PR Bot comment review" > $create-skill "Automate PR Bot comment review" ```
-What /create-skill Does +How /create-skill works 6 phases that turn domain knowledge into a reusable skill: 1. **Reference** — load use case categories, complexity spectrum, file structure template, description formula, security restrictions 2. **Understand** — explore the codebase for relevant patterns, ask clarifying questions, or evaluate the current session for extractable knowledge 3. **Check existing** — search project and global skills to avoid duplicates -4. **Create** — write to `.claude/skills/` (project) or `~/.claude/skills/` (global), apply portability and determinism checklists +4. **Create** — write to the active agent's skills directory (`.claude/skills/` or `~/.claude/skills/` for Claude Code; `~/.agents/skills/` for Codex), apply portability and determinism checklists 5. **Quality gates** — structure checklist (SKILL.md naming, frontmatter fields), content checklist (error handling, examples, exclusions), triggering test (should/shouldn't trigger), iteration signals 6. **Test & iterate** — run test prompts with sub-agents, evaluate results, optimize description triggering @@ -339,7 +332,7 @@ pilot | **Workflow Automation** | Multi-step processes with validation gates and iterative refinement | | **MCP Enhancement** | Workflow guidance on top of MCP tool access, multi-MCP coordination | -**Skill structure:** Each skill is a folder with a `SKILL.md` file (case-sensitive), optional `scripts/`, `references/`, and `assets/` directories. The YAML frontmatter description determines when Claude loads the skill — it must include what the skill does, when to use it, and specific trigger phrases. Progressive disclosure keeps context lean: frontmatter loads always (~100 tokens), SKILL.md loads on activation, linked files load on demand. +**Skill structure:** Each skill is a folder with a `SKILL.md` file (case-sensitive), optional `scripts/`, `references/`, and `assets/` directories. The YAML frontmatter description determines when the agent loads the skill — it must include what the skill does, when to use it, and specific trigger phrases. Progressive disclosure keeps context lean: frontmatter loads always (~100 tokens), SKILL.md loads on activation, linked files load on demand.
@@ -348,13 +341,14 @@ pilot [`/benchmark`](https://pilot-shell.com/docs/workflows/benchmark) runs your prompts with and without the target, grades outputs against falsifiable assertions, and shows a structured report you can absorb in 30 seconds — labeled verdict, quadrant breakdown, and only the divergent assertions in the drill-down. Finishes with a concrete improvement plan so you know exactly what to change next. ```bash -pilot -> /benchmark pilot/skills/create-skill -> /benchmark pilot/rules/testing.md +# Claude Code # Codex CLI +claude codex +> /benchmark pilot/skills/create-skill > $benchmark pilot/skills/create-skill +> /benchmark pilot/rules/testing.md > $benchmark pilot/rules/testing.md ```
-What /benchmark Does +How /benchmark works Six phases turn a rule or skill into a before/after comparison with an actionable plan: @@ -372,302 +366,75 @@ Six phases turn a rule or skill into a before/after comparison with an actionabl 6. **Improvement plan** — ≤ 5 ranked proposals in a uniform format (`[TARGET]` or `[EVALS]` tag, location, current quote, replacement, "Lever" line). You pick: apply target edits, iterate on evals, both, or save the plan and stop. Re-runs land in a fresh `runs//` so iteration deltas stay legible. -**Isolation:** each run gets its own sandbox directory; a globally-installed copy of the target in `~/.claude/` is auto-hidden for the duration and restored afterward (with on-disk recovery manifest covering SIGKILL / power loss / segfault). Conditional-loading frontmatter (`path:` / `paths:`) is stripped from the copy installed into the `with` sandbox so the target loads unconditionally for every prompt — without that, rules scoped to e.g. `paths: ["**/*.py"]` would stay dormant in both configs and the delta would collapse to 0.00. The source file is never modified. +**Isolation:** each run gets its own sandbox directory; a globally-installed copy of the target in `~/.claude/` (or `~/.codex/` / `~/.agents/`) is auto-hidden for the duration and restored afterward (with on-disk recovery manifest covering SIGKILL / power loss / segfault). Conditional-loading frontmatter (`path:` / `paths:`) is stripped from the copy installed into the `with` sandbox so the target loads unconditionally for every prompt — without that, rules scoped to e.g. `paths: ["**/*.py"]` would stay dormant in both configs and the delta would collapse to 0.00. The source file is never modified. **Key flags:** `--runs N` (default 1), `--configs with,without`, `--workers N`, `--model`, `--no-isolate-global`, `--restore-hidden`.
-### Status Line - -Pilot shell ships with its own advanced status line with real-time session metrics and spec progress: - -Pilot Shell Status Line +## Pilot Shell Console -
-All fields explained - -**Line 1 — Session Metrics** (separated by `|`): - -| Widget | Description | -| ----------------- | ------------------------------------------------------------------------------- | -| **Model** | Active model in short form (`Opus 4.7 [1M]`, `Sonnet 4.6`) | -| **Context** | Effective context usage with progress bar and buffer indicator. Green < 80%, Yellow 80–95%, Red 95%+ | -| **Lines changed** | `+added -removed` in session (hidden when `rate_limits` is available) | -| **Git** | Branch with staged (`+N`) / unstaged (`~N`) counts (hidden when `rate_limits` is available) | -| **5h / 7d usage** | Rate-limit percentage with pacing arrow and reset countdown (`5h: 42% ⇡ 2h`). ⇡ red = over pace, ⇣ green = under pace. Read cross-platform from Claude Code's `rate_limits` stdin field (Pro/Max subscriptions on Claude Code 2.1.80+). Replaces lines+git when present. | -| **Cost** | Session cost in USD. Green < $1, Yellow $1–5, Red $5+. Hidden when `rate_limits` is available — on Pro/Max the subscription covers API usage, so the dollar figure is noise. | -| **Savings** | Token savings percentage from RTK proxy (`Savings: N%`). Always shown when RTK has data. | - -**Line 2 — Mode:** - -- **Quick Mode:** `Quick Mode` -- **Spec Mode:** Plan name, type (`feature`/`bugfix`), phase (`plan`/`implement`/`verify`), progress bar, task count, and iteration count - -**Line 3 — Version & Session Info:** - -`Pilot () · CC () · sessions · memories ` - -Pilot tier: Solo, Team, or Trial with time remaining. Claude subscription (Pro/Max/Team/Enterprise) detected via `claude auth status` and cached for 24 hours. +Local web dashboard at `localhost:41777` with real-time notifications and 10 views. -
- -### Pilot Shell Console +Console — Dashboard -A local web dashboard with different views and real-time notifications when Claude needs your input: +### Dashboard -Pilot Shell Console — Dashboard - -
-All views - -Each view with project-specific data has an inline **Project Filter** dropdown — switch projects without leaving the page. Dashboard stats tiles are clickable — navigate directly to the relevant view. - -| View | What it shows | -| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | -| **Dashboard** | Global command center — 8 clickable stat cards, 4 recent cards (Specifications, Requirements, Sessions, Memories) with "Show all" links. Active specs as pills in the top bar, notification bell in top right. | -| **Sessions** | Browse past sessions with search. Copy the session ID and use `/resume ` in Claude Code to jump back into any session. | -| **Memories** | Browsable observations — decisions, discoveries, bugfixes — with type filters and search. Each memory shows its session — click to navigate. | -| **Requirements** | PRD documents with view/annotate modes. Selected shown as a tab, others in a Previous dropdown. | -| **Specifications** | All spec plans with task progress, phase tracking, and iteration history. **Annotate mode** lets you mark up plans visually before approving — select text or click **+** on any block to write a note. **Share with Teammate** generates a compressed share link; **Receive Feedback** imports their annotations with accept/reject controls | -| **Extensions** | All extensions — local, plugin, and remote — with team sharing via git, diff view, push/pull, and color-coded categories. | -| **Changes** | Git diff viewer with staged/unstaged files, branch info, and worktree context. **Review mode** adds inline annotations on diff lines — the agent reads them directly before marking a spec as verified | -| **Usage** | Daily token costs, model routing breakdown, and usage trends | -| **Help** | Documentation, guides, and quick-start resources | -| **Settings** | One scrollable page (Spec Workflow / Console). Spec workflow toggles (branch isolation, ask questions, plan approval, **Model Switching**), reviewer toggles (spec review, changes review, optional Codex). Active model is set via `/model` — no model dropdowns here. | +Global command center with 8 clickable stat cards and 4 recent cards (Specifications, Requirements, Sessions, Memories). Active specs shown as pills in the top bar; notification bell in the top right. -
+### Sessions -
-Visual plan annotation +Browse past sessions with search for both Claude Code and Codex. For Claude Code, copy the session ID and use `/resume ` to directly jump back into any session. -When a spec plan is pending approval, the Specifications tab defaults to **Annotate mode**. Two ways to annotate: +### Memories -- **Select text** — highlight any passage and write a note via the floating toolbar -- **Click +** — hover over any block to add a note without selecting text +Browsable observations — decisions, discoveries, bugfixes — with type filters and search. Each memory shows its session — click to navigate directly to it. -Annotations save automatically. The agent reads them at the next checkpoint, revises the plan accordingly, and asks for approval again. This turns plan review into a conversation — you mark what needs changing, the agent addresses it, you review again. +### Requirements -
+Product requirement documents (PRDs) generated by `/prd`, with view and annotate modes. Access all previous documents and share them with your team for direct feedback and annotations. -
-Spec sharing +### Specifications -Share specs and requirements with teammates for async review — no cloud service required: - -1. **Share** — Click **Share with Teammate** to generate a compressed link. The entire spec and annotations are encoded into the URL fragment, so no data is transmitted to any server. -2. **Review** — Your colleague opens the link in their Console (or on pilot-shell.com), sees your annotations, and adds their own feedback. -3. **Import** — Click **Receive Feedback** to import their annotations. Each annotation has per-item **accept/reject** controls — accepted annotations merge into your plan, rejected ones are removed from both the sidebar and inline markers. - -Everything works locally. The link is self-contained — the recipient doesn't need access to your machine, repository, or any shared service. - -
- -
-Code review - -After a spec completes all automated checks, the agent prompts you to review the changes in the **Changes** tab: - -1. **Enable Review mode** — toggle it in the Changes tab header -2. **Annotate diffs** — click **+** on any diff line to add an inline note. Annotations save automatically. -3. **Agent addresses feedback** — the agent reads every annotation and resolves them before marking the spec as verified - -This gives you a final quality gate with direct, line-level feedback — the same workflow as a PR review, but before the code ever leaves your machine. - -
+All spec plans generated by `/spec` with task progress, phase tracking, and iteration history. Annotate mode lets you mark up plans visually before approving, share with teammates via a single link. ### Extensions -Rules, commands, skills, and agents — all plain markdown files in `.claude/` (project) or `~/.claude/` (global). The Console Extensions page lets you browse, edit, compare, and share everything from one place: - -Pilot Shell Console — Extensions - -
-Extension categories - -| Extension | Location | When it loads | -| ------------ | ------------------- | ------------------------------------------- | -| **Skills** | `.claude/skills/` | Automatically when relevant | -| **Rules** | `.claude/rules/` | Every session, or by file type | -| **Commands** | `.claude/commands/` | On demand via `/command-name` | -| **Agents** | `.claude/agents/` | Spawned as sub-agents for specialized tasks | - -Use `/setup-rules` to auto-generate rules from your codebase. Use `/create-skill` to capture workflows as reusable skills. - -
- -
-Scopes: Global, Project, Plugin - -**Project** extensions live in `.claude/` — commit them so teammates get them on `git clone`. **Global** extensions live in `~/.claude/` — personal and available across all projects. Move extensions between scopes with one click. - -**Plugin** extensions come from installed Claude Code plugins (`claude plugin install `). They appear as read-only items — visible but not editable. - -
- -
-Team sharing & APM (Team tier) - -Connect a git repository to share extensions across your team: - -- **Push** local extensions to the team remote -- **Pull** remote extensions to your machine (global or project scope) -- **Compare** local vs remote with a built-in side-by-side diff view -- **Conflict detection** — when local and remote differ, choose which version to keep - -**APM format** — check one box and your remote becomes an [APM package](https://microsoft.github.io/apm/introduction/key-concepts/), directly installable via `apm install owner/repo` by anyone using Copilot, Cursor, OpenCode, or Claude. Extensions are automatically converted to APM conventions on push: - -| Pilot Shell | APM Remote | -| --- | --- | -| `rules/my-rule.md` | `instructions/my-rule.instructions.md` | -| `commands/my-cmd.md` | `prompts/my-cmd.prompt.md` | -| `skills/my-skill/SKILL.md` | `skills/my-skill/SKILL.md` | -| `agents/my-agent.md` | `agents/my-agent.agent.md` | - -APM-compatible frontmatter is injected automatically. An `apm.yml` manifest is generated. Toggling APM on/off migrates existing extensions in a single commit. +Browse, edit, compare, and share your rules, commands, skills, and agents. Connect a git remote to push/pull extensions across your team, with optional APM-compatible export format. -
- -### Customization - -Customize everything Pilot auto-installs — tweak the built-in `/spec` workflow, modify existing rules, register additional hooks, add agents, and adjust the auto-applied `settings.json`, `claude.json`, and `.mcp.json`. Source can be a **git repo** (team-wide) or a **local directory** (personal, no git needed). Team and Enterprise plans. - -```bash -pilot customize install # Install and apply -pilot customize update # Pull latest (git) or re-read (local) -pilot customize status # Active source + drift warnings -pilot customize remove # Restore Pilot defaults -``` - -
-What you can customize - -| Target | How | -|--------|-----| -| **Skills** (built-in workflows like `/spec`, `/prd`) | Overlay ops in `customization.json`: `insert_after`, `insert_before`, `replace`, `disable` — or ship an entirely new skill folder | -| **Rules** | New rules are additive; same filename as a core rule overrides it | -| **Hooks** | Scripts copy to `~/.claude/hooks/`; ship `hooks.json` to register them alongside Pilot's built-ins | -| **Agents** | Drop `.md` files to add review/helper agents into `~/.claude/agents/` | -| **MCP servers** | Top-level `.mcp.json` deep-merges into `~/.claude.json` `mcpServers` (adds team servers alongside Pilot's; pack values win on conflict) | -| **Claude settings** | Top-level `settings.json` and `claude.json` deep-merge into the user's files — model prefs, permissions, env vars, etc. User state (oauth, projects) is preserved | - -Replaced skill fragments stay pinned to upstream by hash. `pilot customize status` surfaces drift when Pilot upgrades a replaced step; `pilot customize diff /` shows what changed so you can port improvements. See the [Customization guide](https://pilot-shell.com/docs/features/customization) for the full schema. - -
- -
-Repository / folder structure - -The layout is the same whether you publish it as a git repo or keep it as a local folder. Directory names map 1:1 to `~/.claude/`: - -``` -my-customization/ -├── customization.json # Required: metadata + optional skill overlays -├── settings.json # Optional: deep-merges into ~/.claude/settings.json -├── claude.json # Optional: deep-merges into ~/.claude.json -├── .mcp.json # Optional: deep-merges into ~/.claude.json `mcpServers` -├── skills/ # → ~/.claude/skills/ -│ ├── spec-plan/steps/ -│ │ └── security-review.md # New step injected into spec-plan -│ └── team-deploy/ # Brand-new skill -│ ├── manifest.json -│ ├── orchestrator.md -│ └── steps/01-stage.md -├── rules/ # → ~/.claude/rules/ -│ ├── team-standards.md # Additive -│ └── testing.md # Overrides core -├── hooks/ # → ~/.claude/hooks/ -│ ├── team-lint-check.sh -│ └── hooks.json # Registers team-lint-check.sh alongside Pilot's hooks -└── agents/ # → ~/.claude/agents/ - └── team-reviewer.md -``` - -Only ship the files and directories you need. A repo with just `rules/` is a valid customization; so is one with just `.mcp.json`. - -
- -### Pilot Bot - -Run Claude Code as a persistent 24/7 automation agent with scheduled tasks, background jobs and heartbeat monitoring: - -```bash -pilot bot # Launch automation session (auto-initializes on first run) -``` - -Pilot Bot defines scheduled jobs, automates recurring tasks, and monitor system health around the clock. If the [Telegram Channels plugin](https://github.com/anthropics/claude-plugins-official/tree/main/external_plugins/telegram) is installed, the bot auto-detects it and enables bidirectional messaging. Similar to OpenClaw, but without the added complexity and costs. - -
-Bot skills - -| Skill | Purpose | -|-------|---------| -| `bot-boot` | Boot sequence — health check, job registration, heartbeat setup | -| `bot-heartbeat` | Periodic health checks, notifies only when issues are detected | -| `bot-jobs` | Manage scheduled jobs — add, remove, pause, resume, edit, list | -| `bot-channel-task` | Channel task flow — acknowledge, execute, report (when Telegram is available) | -| `bot-defaults` | Standard bot behaviors (dedup, reporting, error handling) | +### Changes -
+Git diff viewer with staged/unstaged files, branch info, and worktree context. Review mode adds inline annotations on diff lines — the agent reads them directly before marking a spec as verified. -### Claude CLI Flag Passthrough +### Usage -All Claude Code CLI flags work directly with `pilot` — current and future. Pilot forwards any flag it doesn't recognize to the Claude CLI automatically. +Daily token costs, model routing breakdown, and usage trends across sessions for both Claude Code and Codex sessions. Correlates costs to commits and show savings via CLI proxy integration. -```bash -pilot --channels plugin:telegram@claude-plugins-official -pilot --model opus --verbose -pilot --resume -``` +### Settings -### Headless Mode +Configure spec workflow toggles, reviewer settings, and Console preferences. Toggle labels show which review agents run on Claude Code + Codex, and which Codex Companion Reviewers require the Claude Code Codex plugin. -Run Pilot non-interactively with `-p` for CI/CD pipelines, scripts, and automated workflows. All Claude Code CLI flags work — `--output-format`, `--allowedTools`, `--channels`, `--continue`, `--bare`, etc. +### Documentation -```bash -pilot -p "Run tests and fix failures" --allowedTools "Bash,Read,Edit" -pilot -p "Summarize this project" --output-format json -pilot --channels plugin:telegram@official -p "Check messages" -``` +Documentation, guides, and quick-start resources to explain the concepts in detail. --- -## Demo +## Documentation -A full-stack project — created from **scratch with a single prompt**, then extended with **3 features built in parallel** using `/spec` and Git worktrees. Every line of code tested and verified by Pilot, zero manual code edits. **[Check out the Demo Project here →](https://github.com/maxritter/pilot-shell-demo)** +For full details on every component, see the **[Documentation](https://pilot-shell.com/docs/)**. --- -## Under the Hood - -For full details on every component, see the **[Documentation](https://pilot-shell.com/docs/)**. +## Changelog -| Component | What it does | -| --- | --- | -| [**Pilot Console**](https://pilot-shell.com/docs/features/console) | Local web dashboard at `localhost:41777` — 10 views (Dashboard, Sessions, Memories, Requirements, Specifications, Extensions, Changes, Usage, Help, Settings). Port is configurable in Console Settings (`CLAUDE_PILOT_WORKER_PORT`). SQLite-backed, nothing leaves your machine | -| [**Pilot Bot**](https://pilot-shell.com/docs/features/bot) | Persistent 24/7 automation agent with scheduled jobs, background tasks, heartbeat monitoring, and optional Telegram integration for bidirectional messaging | -| [**Status Line**](https://pilot-shell.com/docs/features/statusline) | Real-time session dashboard below every response — model, context usage, git status, cost, spec progress, and savings metrics across 3 lines | -| [**Model Switching**](https://pilot-shell.com/docs/features/model-routing) | `/model` is the only switch. Recommended flow: Opus for planning (the `spec-mode-guard` hook hard-blocks `/spec` on non-Opus), then `/model sonnet[1m]` for implementation. Two resume paths: Option A — switch model and type any prompt (fast, carries planning context); Option B — `/clear` then `/spec ` (fresh session, lower cost). The opt-out **Model Switching** toggle handles the pause. | -| [**Rules & Standards**](https://pilot-shell.com/docs/features/rules) | 10 built-in rules for workflow, testing, verification, debugging, code review, documentation sync, tooling, and MCP routing + 5 coding standards activated by file type (Python, TypeScript, Go, Frontend, Backend) | -| [**Context Optimization**](https://pilot-shell.com/docs/features/context-optimization) | Lean context strategies — RTK output compression, Semble chunk-only code search, conditional rule loading, progressive skill disclosure, lazy MCP tool loading. Compaction resilience for 200K windows | -| [**Remote Control**](https://pilot-shell.com/docs/features/remote-control) | Control Pilot sessions from your phone, tablet, or any browser — send prompts, monitor progress, and receive notifications remotely | -| [**Hooks Pipeline**](https://pilot-shell.com/docs/features/hooks) | 15 hook registrations across 7 events — quality checks on every file edit (ruff, ESLint, go vet), TDD enforcement, token optimization via RTK (60–90% savings), session continuity, memory capture, and session lifecycle management | -| [**Extensions**](https://pilot-shell.com/docs/features/extensions) | Unified view of skills, rules, commands, and agents across global, project, plugin, and remote scopes. Team sharing via git with push, pull, diff, and APM-compatible export | -| [**Customization**](https://pilot-shell.com/docs/features/customization) | Customize what Pilot auto-installs — tweak built-in skills (spec, prd, etc.), modify rules, register additional hooks, add agents, and adjust auto-applied MCP / Claude settings. Source is a git repo (team-wide) or local directory (personal). Skill overlays (`insert_after` / `insert_before` / `replace` / `disable`) modify core workflows without full-file forks; fragments stay pinned to upstream by hash with drift detection. Team and Enterprise plans | -| [**Pilot CLI**](https://pilot-shell.com/docs/features/cli) | Session management, headless mode (`-p`) for CI/CD and scripts, worktree isolation, licensing, context monitoring. Run `pilot` or `ccp` to start | -| [**MCP Servers**](https://pilot-shell.com/docs/features/mcp-servers) | 7 preconfigured MCP servers for library docs, persistent memory, web search, GitHub code search, web page fetching, code knowledge graphs, and hybrid code search (Semble), plus the Chrome DevTools MCP plugin for browser automation | -| [**Language Servers**](https://pilot-shell.com/docs/features/language-servers) | Real-time diagnostics for Python (basedpyright), TypeScript (vtsls), Go (gopls). Auto-installed, auto-configured | -| [**Open Source Tools**](https://pilot-shell.com/docs/features/open-source-tools) | 20+ open-source tools installed alongside Pilot — Semble (hybrid code search), CodeGraph (code intelligence), RTK (token optimization), language servers, and system prerequisites | +See the full changelog at [pilot.openchangelog.com](https://pilot.openchangelog.com/). --- -## What Users Say - -> "Spec-driven development in Pilot Shell is incredible. I'm so impressed that I have to resist the urge to fix every issue all at once." - -> "Instead of just letting Claude Code run on its own, you've managed to make it work in a much more organized, consistent, and reliable way within a workflow, which I think is fantastic. What you've built is truly impressive." +## Contributing -> "I have fallen in love with Pilot and just can't stand the idea of having to go back to native Claude." +Found a bug or missing a feature? [Open an issue](https://github.com/maxritter/pilot-shell/issues) on GitHub. --- @@ -677,131 +444,8 @@ See [LICENSE](LICENSE). --- -## FAQ - -
-Does Pilot Shell send my code or data to external services? - -**No code, files, prompts, project data, or personal information ever leaves your machine through Pilot Shell.** All development tools — code search (Semble), code intelligence (CodeGraph), persistent memory (Pilot Shell Console), session state, and quality hooks — run entirely locally. No OS info, no version strings, no analytics, no telemetry, no heartbeats. Pilot Shell works fully offline between periodic license checks. If you enable the optional [Codex plugin](https://github.com/openai/codex-plugin-cc), adversarial reviews are sent to OpenAI's API — this is opt-in and disabled by default. - -
- -
-Do I need a separate Anthropic subscription? - -Yes. Pilot Shell enhances Claude Code — it doesn't replace it. You need an active Claude subscription — [Max 5x or 20x](https://claude.com/pricing) for solo developers, [Team Premium](https://claude.com/pricing) for teams, or [Enterprise](https://claude.com/pricing) for organizations with compliance or procurement requirements. Pilot Shell adds quality automation on top of whatever Claude Code access you already have. - -
- -
-Does Pilot Shell support AI models beyond Claude? - -Pilot Shell is built for Claude Code and uses Anthropic's Claude models (Opus, Sonnet) for all planning, implementation, and verification. The [Codex plugin](https://github.com/openai/codex-plugin-cc) is included and adds OpenAI-powered adversarial review during `/spec` — an independent second opinion on your plans and code changes. Run `/codex:setup` to authenticate, then enable the reviewers in Console Settings → Reviewers. - -
- -
-Does Pilot Shell work with existing projects? - -Yes — that's the primary use case. Pilot Shell doesn't scaffold or restructure your code. You install it, run `/setup-rules`, and it explores your codebase to discover your tech stack, conventions, and patterns. From there, every session has full context about your project. The more complex and established your codebase, the more value Pilot Shell adds — quality hooks catch regressions, persistent memory preserves decisions across sessions, and `/spec` plans features against your real architecture. - -
- -
-Does Pilot Shell work with any programming language? - -Pilot Shell's quality hooks (auto-formatting, linting, type checking) currently support Python, TypeScript/JavaScript, and Go out of the box. TDD enforcement, spec-driven development, persistent memory, context optimization, and all rules and standards work with any language that Claude Code supports. You can add custom hooks for additional languages. - -
- -
-Can I use Pilot Shell on multiple projects? - -Yes. Pilot Shell installs once globally and works across all your projects — you don't need to reinstall per project. All tools, rules, commands, and hooks live in `~/.pilot/` and `~/.claude/`, available everywhere. Just `cd` into any project and run `pilot`. Each project can optionally have its own `.claude/` rules, custom skills, and MCP servers for project-specific behavior. Run `/setup-rules` in each project to generate project-specific documentation and standards. - -
- -
-Why does Pilot Shell use bypass permissions mode? - -Pilot Shell sets Claude Code to `bypassPermissions` mode by default so the `/spec` workflow can run autonomously — planning, implementing, and verifying without pausing for permission prompts at every tool call. This is what enables the end-to-end spec-driven development experience. - -**In Quick Mode (regular chat), you have full control.** Press `Shift+Tab` at any time to cycle through [Claude Code's permission modes](https://pilot-shell.com/docs/features/permission-modes): - -| Mode | Behavior | -| ---------------- | ----------------------------------------------------- | -| **Plan** | Claude proposes changes, you approve before execution | -| **Accept Edits** | File edits auto-approved, other actions still prompt | -| **Normal** | Fine-grained permission prompts for each tool call | - -You can also set a persistent default in `~/.claude/settings.json` by changing the `defaultMode` field to `acceptEdits`, `default`, `plan`, or `dontAsk`. Pilot Shell preserves your choice across updates — the installer merges permissions additively and never overwrites user customizations. - -
- -
-Can I add my own rules, commands, skills, and agents? - -Yes. Create your own in your project's `.claude/` folder — rules, commands, skills, and agents are all plain markdown files. Your project-level assets load alongside Pilot Shell's built-in defaults and take precedence when they overlap. `/setup-rules` auto-discovers your codebase patterns and generates project-specific rules. `/create-skill` builds reusable skills from any topic interactively. View and manage all extensions on the Console Extensions page. - -For monorepos, organize rules in nested subdirectories by product and team (e.g. `.claude/rules/my-product/team-x/`). Team-level rules must use `paths` frontmatter so they only load when working on relevant files. `/setup-rules` validates this structure, enforces path-scoping, and generates a `README.md` to document the organization. - -
- -
-Can I customize Pilot's built-in workflows and defaults? - -Yes — the **Customization** feature (available on the Team plan) lets you modify what Pilot Shell auto-installs, not just add alongside it. Tweak the built-in `/spec` workflow (insert a security-review step, replace the planning template, disable a step you don't need), adjust existing rules, register additional hooks, add review agents, change which MCP servers get configured, and override the auto-applied `settings.json` and `claude.json`. Source is either a **git repo** for your team or a **local directory** for personal use — no git needed for a one-off tweak. Skill overlays stay pinned to Pilot's upstream by hash, so when Pilot ships an improvement to a step you replaced, `pilot customize status` flags the drift and `pilot customize diff` shows you what changed. See the [Customization guide](https://pilot-shell.com/docs/features/customization) for the full schema. - -
- -
-Can I use Pilot Shell inside a Dev Container? - -Yes. Copy the `.devcontainer` folder from this repository into your project, adapt it to your needs (base image, extensions, dependencies), and install Pilot Shell inside the container. Everything works the same — hooks, rules, MCP servers, persistent memory, and the Console dashboard all run inside the container. This is a great option for teams that want a consistent, reproducible development environment. - -For tighter isolation when working with untrusted code, layer Claude Code's [`/sandbox`](https://code.claude.com/docs/en/sandboxing) on top — the Dockerfile pre-installs `bubblewrap`, `socat`, `iptables`, and `ipset` so it works out of the box. See Anthropic's [development containers](https://code.claude.com/docs/en/devcontainer) and [sandboxing](https://code.claude.com/docs/en/sandboxing) docs for the hardening patterns. - -
- -
-Pilot feels slower after a few weeks — what should I do? - -Claude Code's session logs and Pilot's caches grow over time and can degrade performance. A periodic reset every few weeks restores a clean baseline: - -1. Run `/logout` inside Claude Code. -2. Back up `~/.claude.json`, `~/.claude/`, and `~/.pilot/` (`mv` them to `.bak` copies). -3. Reinstall Pilot Shell with the official installer (`curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash`). -4. Run `pilot`, sign in to Claude again, and re-activate with `pilot activate `. - -Once Pilot is running smoothly, delete the `.bak` copies. Forgot your license key? Recover it in the [Pilot members area](https://polar.sh/max-ritter/portal). Full step-by-step is in the [Reset & Refresh](#install) section above. - -
- -
-What's the difference between pilot and pilot bot? - -**`pilot`** is for interactive development — you chat with Claude, use `/spec` for planned work or quick mode for ad-hoc tasks, and drive every session yourself. It's your daily coding tool. - -**`pilot bot`** is for unattended automation — it launches Claude Code as a persistent background agent that runs 24/7. You define scheduled jobs (health checks, deployments, monitoring) and the bot executes them on a cron schedule with heartbeat monitoring. If the [Telegram plugin](https://github.com/anthropics/claude-plugins-official/tree/main/external_plugins/telegram) is installed, you can send tasks to the bot from your phone and receive results back. Think of `pilot` as your IDE and `pilot bot` as your ops assistant. - -
- ---- - -## Changelog - -See the full changelog at [pilot.openchangelog.com](https://pilot.openchangelog.com/). - ---- - -## Contributing - -Found a bug or missing a feature? [Open an issue](https://github.com/maxritter/pilot-shell/issues) on GitHub. - ---- -
-**How real engineers run Claude Code.** +**How real engineers run Claude Code and Codex**
diff --git a/console/src/cli/adapters/codex.ts b/console/src/cli/adapters/codex.ts new file mode 100644 index 000000000..b15a9f5be Binary files /dev/null and b/console/src/cli/adapters/codex.ts differ diff --git a/console/src/cli/adapters/index.ts b/console/src/cli/adapters/index.ts index 003ef85da..c3af781af 100644 Binary files a/console/src/cli/adapters/index.ts and b/console/src/cli/adapters/index.ts differ diff --git a/console/src/cli/handlers/context.ts b/console/src/cli/handlers/context.ts index 41c19b0c4..d52377c9a 100644 Binary files a/console/src/cli/handlers/context.ts and b/console/src/cli/handlers/context.ts differ diff --git a/console/src/cli/handlers/session-init.ts b/console/src/cli/handlers/session-init.ts index 05d1bbff1..26f8ba422 100644 Binary files a/console/src/cli/handlers/session-init.ts and b/console/src/cli/handlers/session-init.ts differ diff --git a/console/src/cli/hook-command.ts b/console/src/cli/hook-command.ts index a92fad6f1..3dfc5e716 100644 Binary files a/console/src/cli/hook-command.ts and b/console/src/cli/hook-command.ts differ diff --git a/console/src/services/sqlite/SessionStore.ts b/console/src/services/sqlite/SessionStore.ts index 341fab0a2..1ab625bbd 100644 Binary files a/console/src/services/sqlite/SessionStore.ts and b/console/src/services/sqlite/SessionStore.ts differ diff --git a/console/src/services/sqlite/migrations/runner.ts b/console/src/services/sqlite/migrations/runner.ts index f1b6380f9..679259493 100644 Binary files a/console/src/services/sqlite/migrations/runner.ts and b/console/src/services/sqlite/migrations/runner.ts differ diff --git a/console/src/services/sqlite/sessions/create.ts b/console/src/services/sqlite/sessions/create.ts index a5f534303..c1e663432 100644 Binary files a/console/src/services/sqlite/sessions/create.ts and b/console/src/services/sqlite/sessions/create.ts differ diff --git a/console/src/services/worker-types.ts b/console/src/services/worker-types.ts index 0cf886cb0..3a9c909e7 100644 Binary files a/console/src/services/worker-types.ts and b/console/src/services/worker-types.ts differ diff --git a/console/src/services/worker/CodexSessionReader.ts b/console/src/services/worker/CodexSessionReader.ts new file mode 100644 index 000000000..e44f89fd3 Binary files /dev/null and b/console/src/services/worker/CodexSessionReader.ts differ diff --git a/console/src/services/worker/PaginationHelper.ts b/console/src/services/worker/PaginationHelper.ts index 1b0fb8f39..caf087beb 100644 Binary files a/console/src/services/worker/PaginationHelper.ts and b/console/src/services/worker/PaginationHelper.ts differ diff --git a/console/src/services/worker/SessionJsonlService.ts b/console/src/services/worker/SessionJsonlService.ts index cb0728069..0c333b70f 100644 Binary files a/console/src/services/worker/SessionJsonlService.ts and b/console/src/services/worker/SessionJsonlService.ts differ diff --git a/console/src/services/worker/http/routes/DataRoutes.ts b/console/src/services/worker/http/routes/DataRoutes.ts index 388d3f6f2..297644b17 100644 Binary files a/console/src/services/worker/http/routes/DataRoutes.ts and b/console/src/services/worker/http/routes/DataRoutes.ts differ diff --git a/console/src/services/worker/http/routes/ExtensionRoutes.ts b/console/src/services/worker/http/routes/ExtensionRoutes.ts index 5e539f7e7..2507ec937 100644 Binary files a/console/src/services/worker/http/routes/ExtensionRoutes.ts and b/console/src/services/worker/http/routes/ExtensionRoutes.ts differ diff --git a/console/src/services/worker/http/routes/ExtensionTypes.ts b/console/src/services/worker/http/routes/ExtensionTypes.ts index f5f10ae68..773c1f2c4 100644 Binary files a/console/src/services/worker/http/routes/ExtensionTypes.ts and b/console/src/services/worker/http/routes/ExtensionTypes.ts differ diff --git a/console/src/services/worker/http/routes/SessionRoutes.ts b/console/src/services/worker/http/routes/SessionRoutes.ts index d2565eb84..32d39950d 100644 Binary files a/console/src/services/worker/http/routes/SessionRoutes.ts and b/console/src/services/worker/http/routes/SessionRoutes.ts differ diff --git a/console/src/services/worker/http/routes/SettingsRoutes.ts b/console/src/services/worker/http/routes/SettingsRoutes.ts index c68830000..e9180032f 100644 Binary files a/console/src/services/worker/http/routes/SettingsRoutes.ts and b/console/src/services/worker/http/routes/SettingsRoutes.ts differ diff --git a/console/src/services/worker/http/routes/UsageRoutes.ts b/console/src/services/worker/http/routes/UsageRoutes.ts index f092d1259..155ad1864 100644 Binary files a/console/src/services/worker/http/routes/UsageRoutes.ts and b/console/src/services/worker/http/routes/UsageRoutes.ts differ diff --git a/console/src/services/worker/http/routes/utils/planFileReader.ts b/console/src/services/worker/http/routes/utils/planFileReader.ts index 919f0accb..49aa69f93 100644 Binary files a/console/src/services/worker/http/routes/utils/planFileReader.ts and b/console/src/services/worker/http/routes/utils/planFileReader.ts differ diff --git a/console/src/services/worker/usage/aggregator.ts b/console/src/services/worker/usage/aggregator.ts index 1122f797d..dfda217b7 100644 Binary files a/console/src/services/worker/usage/aggregator.ts and b/console/src/services/worker/usage/aggregator.ts differ diff --git a/console/src/services/worker/usage/claude-parser.ts b/console/src/services/worker/usage/claude-parser.ts index 7e7051494..7d4ca41c5 100644 Binary files a/console/src/services/worker/usage/claude-parser.ts and b/console/src/services/worker/usage/claude-parser.ts differ diff --git a/console/src/services/worker/usage/codex-parser.ts b/console/src/services/worker/usage/codex-parser.ts new file mode 100644 index 000000000..ede7724f0 Binary files /dev/null and b/console/src/services/worker/usage/codex-parser.ts differ diff --git a/console/src/services/worker/usage/pricing.ts b/console/src/services/worker/usage/pricing.ts index 8210af884..c14bf6419 100644 Binary files a/console/src/services/worker/usage/pricing.ts and b/console/src/services/worker/usage/pricing.ts differ diff --git a/console/src/services/worker/usage/yield.ts b/console/src/services/worker/usage/yield.ts index ffcebde2f..017917e8e 100644 Binary files a/console/src/services/worker/usage/yield.ts and b/console/src/services/worker/usage/yield.ts differ diff --git a/console/src/shared/SettingsDefaultsManager.ts b/console/src/shared/SettingsDefaultsManager.ts index 0a45da56e..e02b30d00 100644 Binary files a/console/src/shared/SettingsDefaultsManager.ts and b/console/src/shared/SettingsDefaultsManager.ts differ diff --git a/console/src/shared/hook-constants.ts b/console/src/shared/hook-constants.ts index cc96e1db2..e884430f2 100644 Binary files a/console/src/shared/hook-constants.ts and b/console/src/shared/hook-constants.ts differ diff --git a/console/src/shared/transcript-parser.ts b/console/src/shared/transcript-parser.ts index 3b3cc782e..f9bea7594 100644 Binary files a/console/src/shared/transcript-parser.ts and b/console/src/shared/transcript-parser.ts differ diff --git a/console/src/ui/viewer/App.tsx b/console/src/ui/viewer/App.tsx index d0a2d0873..a6d857612 100644 Binary files a/console/src/ui/viewer/App.tsx and b/console/src/ui/viewer/App.tsx differ diff --git a/console/src/ui/viewer/components/CommandPalette.tsx b/console/src/ui/viewer/components/CommandPalette.tsx index d1cfb1ff1..abf9f2ba3 100644 Binary files a/console/src/ui/viewer/components/CommandPalette.tsx and b/console/src/ui/viewer/components/CommandPalette.tsx differ diff --git a/console/src/ui/viewer/components/ui/AgentBadge.tsx b/console/src/ui/viewer/components/ui/AgentBadge.tsx new file mode 100644 index 000000000..9f7aef82c Binary files /dev/null and b/console/src/ui/viewer/components/ui/AgentBadge.tsx differ diff --git a/console/src/ui/viewer/components/ui/index.ts b/console/src/ui/viewer/components/ui/index.ts index f9da47f61..50c02ad8e 100644 Binary files a/console/src/ui/viewer/components/ui/index.ts and b/console/src/ui/viewer/components/ui/index.ts differ diff --git a/console/src/ui/viewer/constants/shortcuts.ts b/console/src/ui/viewer/constants/shortcuts.ts index 089ed2362..541a2387f 100644 Binary files a/console/src/ui/viewer/constants/shortcuts.ts and b/console/src/ui/viewer/constants/shortcuts.ts differ diff --git a/console/src/ui/viewer/hooks/useExtensions.ts b/console/src/ui/viewer/hooks/useExtensions.ts index 10e5d4c1a..7a9cf2da5 100644 Binary files a/console/src/ui/viewer/hooks/useExtensions.ts and b/console/src/ui/viewer/hooks/useExtensions.ts differ diff --git a/console/src/ui/viewer/hooks/useHotkeys.ts b/console/src/ui/viewer/hooks/useHotkeys.ts index 025173b60..51b9b0b34 100644 Binary files a/console/src/ui/viewer/hooks/useHotkeys.ts and b/console/src/ui/viewer/hooks/useHotkeys.ts differ diff --git a/console/src/ui/viewer/hooks/useStats.ts b/console/src/ui/viewer/hooks/useStats.ts index 310195ae4..94cf40be6 100644 Binary files a/console/src/ui/viewer/hooks/useStats.ts and b/console/src/ui/viewer/hooks/useStats.ts differ diff --git a/console/src/ui/viewer/hooks/useUsage.ts b/console/src/ui/viewer/hooks/useUsage.ts index acf6aebf8..4ab0e0236 100644 Binary files a/console/src/ui/viewer/hooks/useUsage.ts and b/console/src/ui/viewer/hooks/useUsage.ts differ diff --git a/console/src/ui/viewer/layouts/Sidebar/SidebarNav.tsx b/console/src/ui/viewer/layouts/Sidebar/SidebarNav.tsx index 4b4f0eafb..5ffd1ebce 100644 Binary files a/console/src/ui/viewer/layouts/Sidebar/SidebarNav.tsx and b/console/src/ui/viewer/layouts/Sidebar/SidebarNav.tsx differ diff --git a/console/src/ui/viewer/views/Dashboard/RecentActivity.tsx b/console/src/ui/viewer/views/Dashboard/RecentActivity.tsx index b2756b709..4dbff7375 100644 Binary files a/console/src/ui/viewer/views/Dashboard/RecentActivity.tsx and b/console/src/ui/viewer/views/Dashboard/RecentActivity.tsx differ diff --git a/console/src/ui/viewer/views/Dashboard/RecentRequirements.tsx b/console/src/ui/viewer/views/Dashboard/RecentRequirements.tsx index ce5032287..ee1de1b31 100644 Binary files a/console/src/ui/viewer/views/Dashboard/RecentRequirements.tsx and b/console/src/ui/viewer/views/Dashboard/RecentRequirements.tsx differ diff --git a/console/src/ui/viewer/views/Dashboard/RecentSessions.tsx b/console/src/ui/viewer/views/Dashboard/RecentSessions.tsx index ff1ce2e6f..150025b9b 100644 Binary files a/console/src/ui/viewer/views/Dashboard/RecentSessions.tsx and b/console/src/ui/viewer/views/Dashboard/RecentSessions.tsx differ diff --git a/console/src/ui/viewer/views/Dashboard/RecentSpecs.tsx b/console/src/ui/viewer/views/Dashboard/RecentSpecs.tsx index 9761d1695..60d92b2d5 100644 Binary files a/console/src/ui/viewer/views/Dashboard/RecentSpecs.tsx and b/console/src/ui/viewer/views/Dashboard/RecentSpecs.tsx differ diff --git a/console/src/ui/viewer/views/Documentation/index.tsx b/console/src/ui/viewer/views/Documentation/index.tsx new file mode 100644 index 000000000..16a581895 Binary files /dev/null and b/console/src/ui/viewer/views/Documentation/index.tsx differ diff --git a/console/src/ui/viewer/views/Extensions/ExtensionDetail.tsx b/console/src/ui/viewer/views/Extensions/ExtensionDetail.tsx index 79418475f..ea408ce17 100644 Binary files a/console/src/ui/viewer/views/Extensions/ExtensionDetail.tsx and b/console/src/ui/viewer/views/Extensions/ExtensionDetail.tsx differ diff --git a/console/src/ui/viewer/views/Extensions/ExtensionsGrid.tsx b/console/src/ui/viewer/views/Extensions/ExtensionsGrid.tsx index 305366871..7f72747b7 100644 Binary files a/console/src/ui/viewer/views/Extensions/ExtensionsGrid.tsx and b/console/src/ui/viewer/views/Extensions/ExtensionsGrid.tsx differ diff --git a/console/src/ui/viewer/views/Extensions/ExtensionsView.tsx b/console/src/ui/viewer/views/Extensions/ExtensionsView.tsx index b358c2ffd..6e0e55e2e 100644 Binary files a/console/src/ui/viewer/views/Extensions/ExtensionsView.tsx and b/console/src/ui/viewer/views/Extensions/ExtensionsView.tsx differ diff --git a/console/src/ui/viewer/views/Help/index.tsx b/console/src/ui/viewer/views/Help/index.tsx deleted file mode 100644 index 4771f3d9f..000000000 Binary files a/console/src/ui/viewer/views/Help/index.tsx and /dev/null differ diff --git a/console/src/ui/viewer/views/Memories/MemoryCard.tsx b/console/src/ui/viewer/views/Memories/MemoryCard.tsx index ee1b80484..ff11b1bb4 100644 Binary files a/console/src/ui/viewer/views/Memories/MemoryCard.tsx and b/console/src/ui/viewer/views/Memories/MemoryCard.tsx differ diff --git a/console/src/ui/viewer/views/Memories/index.tsx b/console/src/ui/viewer/views/Memories/index.tsx index c750a8314..4cb46d9df 100644 Binary files a/console/src/ui/viewer/views/Memories/index.tsx and b/console/src/ui/viewer/views/Memories/index.tsx differ diff --git a/console/src/ui/viewer/views/Requirements/index.tsx b/console/src/ui/viewer/views/Requirements/index.tsx index d9b7a3b23..b6b21bf31 100644 Binary files a/console/src/ui/viewer/views/Requirements/index.tsx and b/console/src/ui/viewer/views/Requirements/index.tsx differ diff --git a/console/src/ui/viewer/views/Sessions/SessionCard.tsx b/console/src/ui/viewer/views/Sessions/SessionCard.tsx index d3ca203fd..5ab34d213 100644 Binary files a/console/src/ui/viewer/views/Sessions/SessionCard.tsx and b/console/src/ui/viewer/views/Sessions/SessionCard.tsx differ diff --git a/console/src/ui/viewer/views/Sessions/SessionDetail.tsx b/console/src/ui/viewer/views/Sessions/SessionDetail.tsx index 49b961929..0ad57a75b 100644 Binary files a/console/src/ui/viewer/views/Sessions/SessionDetail.tsx and b/console/src/ui/viewer/views/Sessions/SessionDetail.tsx differ diff --git a/console/src/ui/viewer/views/Sessions/SessionTimeline.tsx b/console/src/ui/viewer/views/Sessions/SessionTimeline.tsx index e1ed1f605..33bca93f6 100644 Binary files a/console/src/ui/viewer/views/Sessions/SessionTimeline.tsx and b/console/src/ui/viewer/views/Sessions/SessionTimeline.tsx differ diff --git a/console/src/ui/viewer/views/Sessions/index.tsx b/console/src/ui/viewer/views/Sessions/index.tsx index e521b3ce7..ac18964c8 100644 Binary files a/console/src/ui/viewer/views/Sessions/index.tsx and b/console/src/ui/viewer/views/Sessions/index.tsx differ diff --git a/console/src/ui/viewer/views/Settings/sections/SpecWorkflowSection.tsx b/console/src/ui/viewer/views/Settings/sections/SpecWorkflowSection.tsx index 7360d7b38..444c26271 100644 Binary files a/console/src/ui/viewer/views/Settings/sections/SpecWorkflowSection.tsx and b/console/src/ui/viewer/views/Settings/sections/SpecWorkflowSection.tsx differ diff --git a/console/src/ui/viewer/views/Settings/sections/_shared.tsx b/console/src/ui/viewer/views/Settings/sections/_shared.tsx index 7ea836715..97ead88e8 100644 Binary files a/console/src/ui/viewer/views/Settings/sections/_shared.tsx and b/console/src/ui/viewer/views/Settings/sections/_shared.tsx differ diff --git a/console/src/ui/viewer/views/Spec/SpecHeaderCard.tsx b/console/src/ui/viewer/views/Spec/SpecHeaderCard.tsx index d8fbd5b57..57c4563c7 100644 Binary files a/console/src/ui/viewer/views/Spec/SpecHeaderCard.tsx and b/console/src/ui/viewer/views/Spec/SpecHeaderCard.tsx differ diff --git a/console/src/ui/viewer/views/Spec/index.tsx b/console/src/ui/viewer/views/Spec/index.tsx index c2e8ffa2d..3a5b90f74 100644 Binary files a/console/src/ui/viewer/views/Spec/index.tsx and b/console/src/ui/viewer/views/Spec/index.tsx differ diff --git a/console/src/ui/viewer/views/Usage/ModelCostBreakdown.tsx b/console/src/ui/viewer/views/Usage/ModelCostBreakdown.tsx index ce1a1164f..aa419a42d 100644 Binary files a/console/src/ui/viewer/views/Usage/ModelCostBreakdown.tsx and b/console/src/ui/viewer/views/Usage/ModelCostBreakdown.tsx differ diff --git a/console/src/ui/viewer/views/Usage/index.tsx b/console/src/ui/viewer/views/Usage/index.tsx index ebd4dd28d..cbcba45ee 100644 Binary files a/console/src/ui/viewer/views/Usage/index.tsx and b/console/src/ui/viewer/views/Usage/index.tsx differ diff --git a/console/src/ui/viewer/views/index.ts b/console/src/ui/viewer/views/index.ts index ddee18b33..48860d4a8 100644 Binary files a/console/src/ui/viewer/views/index.ts and b/console/src/ui/viewer/views/index.ts differ diff --git a/console/src/utils/logger.ts b/console/src/utils/logger.ts index 5e3bad0a0..9f67ec708 100644 --- a/console/src/utils/logger.ts +++ b/console/src/utils/logger.ts @@ -49,7 +49,8 @@ export type Component = | "CLEANUP" | "DATA" | "VECTOR_DB_GUARD" - | "FEEDBACK_POLL"; + | "FEEDBACK_POLL" + | "CODEX_PARSER"; interface LogContext { sessionId?: number; diff --git a/console/tests/cli/codex-adapter.test.ts b/console/tests/cli/codex-adapter.test.ts new file mode 100644 index 000000000..5e7adaaa3 Binary files /dev/null and b/console/tests/cli/codex-adapter.test.ts differ diff --git a/console/tests/cli/codex-agent-filter.test.ts b/console/tests/cli/codex-agent-filter.test.ts new file mode 100644 index 000000000..2cfc09779 Binary files /dev/null and b/console/tests/cli/codex-agent-filter.test.ts differ diff --git a/console/tests/cli/context-handler-plan.test.ts b/console/tests/cli/context-handler-plan.test.ts index f39e5833b..470c0f395 100644 Binary files a/console/tests/cli/context-handler-plan.test.ts and b/console/tests/cli/context-handler-plan.test.ts differ diff --git a/console/tests/cli/hook-command-failopen.test.ts b/console/tests/cli/hook-command-failopen.test.ts index 0f1a2a1be..c4f712617 100644 Binary files a/console/tests/cli/hook-command-failopen.test.ts and b/console/tests/cli/hook-command-failopen.test.ts differ diff --git a/console/tests/shared/settings-defaults-manager-output.test.ts b/console/tests/shared/settings-defaults-manager-output.test.ts new file mode 100644 index 000000000..273ace0db Binary files /dev/null and b/console/tests/shared/settings-defaults-manager-output.test.ts differ diff --git a/console/tests/shared/transcript-parser.test.ts b/console/tests/shared/transcript-parser.test.ts new file mode 100644 index 000000000..f72a1ee79 Binary files /dev/null and b/console/tests/shared/transcript-parser.test.ts differ diff --git a/console/tests/ui/ChangesNavigation.test.ts b/console/tests/ui/ChangesNavigation.test.ts index 218ff97ac..705e02949 100644 Binary files a/console/tests/ui/ChangesNavigation.test.ts and b/console/tests/ui/ChangesNavigation.test.ts differ diff --git a/console/tests/ui/ExtensionsGrid.test.ts b/console/tests/ui/ExtensionsGrid.test.ts index aa9a894b2..71cbf948b 100644 Binary files a/console/tests/ui/ExtensionsGrid.test.ts and b/console/tests/ui/ExtensionsGrid.test.ts differ diff --git a/console/tests/ui/codex-session-detail.test.ts b/console/tests/ui/codex-session-detail.test.ts new file mode 100644 index 000000000..0cfbdcc69 Binary files /dev/null and b/console/tests/ui/codex-session-detail.test.ts differ diff --git a/console/tests/ui/dashboard-agent-badges.test.tsx b/console/tests/ui/dashboard-agent-badges.test.tsx new file mode 100644 index 000000000..fb907351d Binary files /dev/null and b/console/tests/ui/dashboard-agent-badges.test.tsx differ diff --git a/console/tests/ui/spec-delete-selection.test.ts b/console/tests/ui/spec-delete-selection.test.ts new file mode 100644 index 000000000..11ccdaef7 Binary files /dev/null and b/console/tests/ui/spec-delete-selection.test.ts differ diff --git a/console/tests/ui/spec-section-rendering.test.ts b/console/tests/ui/spec-section-rendering.test.ts index 4c25e95a8..bb30788bb 100644 Binary files a/console/tests/ui/spec-section-rendering.test.ts and b/console/tests/ui/spec-section-rendering.test.ts differ diff --git a/console/tests/ui/spec-workflow-settings-copy.test.ts b/console/tests/ui/spec-workflow-settings-copy.test.ts new file mode 100644 index 000000000..8ebb2fa9a Binary files /dev/null and b/console/tests/ui/spec-workflow-settings-copy.test.ts differ diff --git a/console/tests/ui/teams-navigation.test.ts b/console/tests/ui/teams-navigation.test.ts index 807d42c6a..2f6df2bb5 100644 Binary files a/console/tests/ui/teams-navigation.test.ts and b/console/tests/ui/teams-navigation.test.ts differ diff --git a/console/tests/unit/services/worker/SessionJsonlService.test.ts b/console/tests/unit/services/worker/SessionJsonlService.test.ts index 78a042f64..4a962e1c3 100644 Binary files a/console/tests/unit/services/worker/SessionJsonlService.test.ts and b/console/tests/unit/services/worker/SessionJsonlService.test.ts differ diff --git a/console/tests/unit/services/worker/codex-session-reader.test.ts b/console/tests/unit/services/worker/codex-session-reader.test.ts new file mode 100644 index 000000000..c9db010d0 Binary files /dev/null and b/console/tests/unit/services/worker/codex-session-reader.test.ts differ diff --git a/console/tests/unit/services/worker/usage/aggregator.test.ts b/console/tests/unit/services/worker/usage/aggregator.test.ts index 9039ea845..3aff56fa3 100644 Binary files a/console/tests/unit/services/worker/usage/aggregator.test.ts and b/console/tests/unit/services/worker/usage/aggregator.test.ts differ diff --git a/console/tests/unit/services/worker/usage/claude-parser.test.ts b/console/tests/unit/services/worker/usage/claude-parser.test.ts index 073934bad..ce9cf0b5b 100644 Binary files a/console/tests/unit/services/worker/usage/claude-parser.test.ts and b/console/tests/unit/services/worker/usage/claude-parser.test.ts differ diff --git a/console/tests/unit/services/worker/usage/codex-parser.test.ts b/console/tests/unit/services/worker/usage/codex-parser.test.ts new file mode 100644 index 000000000..0df978402 Binary files /dev/null and b/console/tests/unit/services/worker/usage/codex-parser.test.ts differ diff --git a/console/tests/worker/extension-routes.test.ts b/console/tests/worker/extension-routes.test.ts index a33ac387a..19dcac3d4 100644 Binary files a/console/tests/worker/extension-routes.test.ts and b/console/tests/worker/extension-routes.test.ts differ diff --git a/console/tests/worker/usage-routes.test.ts b/console/tests/worker/usage-routes.test.ts index f426654fe..f560891fc 100644 Binary files a/console/tests/worker/usage-routes.test.ts and b/console/tests/worker/usage-routes.test.ts differ diff --git a/console/tsconfig.json b/console/tsconfig.json index 3a52a1cae..3f70980af 100644 --- a/console/tsconfig.json +++ b/console/tsconfig.json @@ -12,14 +12,13 @@ "skipLibCheck": true, "noEmit": true, "jsx": "react-jsx", - "baseUrl": ".", "paths": { - "@/*": ["src/*"], - "@ui/*": ["src/ui/viewer/*"], - "@services/*": ["src/services/*"], - "@types/*": ["src/types/*"], - "@utils/*": ["src/utils/*"], - "@shared/*": ["src/shared/*"] + "@/*": ["./src/*"], + "@ui/*": ["./src/ui/viewer/*"], + "@services/*": ["./src/services/*"], + "@types/*": ["./src/types/*"], + "@utils/*": ["./src/utils/*"], + "@shared/*": ["./src/shared/*"] } }, "include": [ diff --git a/docs/docusaurus/docs/features/bot.md b/docs/docusaurus/docs/features/bot.md index f998dafe5..bae5d960b 100644 --- a/docs/docusaurus/docs/features/bot.md +++ b/docs/docusaurus/docs/features/bot.md @@ -1,11 +1,15 @@ --- sidebar_position: 2 title: Pilot Bot -description: Persistent Pilot Bot agent — scheduled tasks, background jobs, heartbeat monitoring, and 24/7 automation running on Sonnet for cost-effective Claude operations. +description: Persistent Claude Code automation agent — scheduled tasks, background jobs, heartbeat monitoring, and 24/7 operation running on Sonnet. --- # Pilot Bot +:::warning Claude Code only +Pilot Bot requires Claude Code. It is not available with Codex CLI. +::: + Persistent automation agent — scheduled tasks, background jobs, heartbeat monitoring, 24/7 operation. Always runs on Sonnet for cost-effective automation. ```bash diff --git a/docs/docusaurus/docs/features/cli.md b/docs/docusaurus/docs/features/cli.md index bac07c220..6949bfc47 100644 --- a/docs/docusaurus/docs/features/cli.md +++ b/docs/docusaurus/docs/features/cli.md @@ -1,48 +1,41 @@ --- sidebar_position: 4 title: Pilot CLI -description: Command reference for the pilot binary at ~/.pilot/bin/pilot — session management, license activation, statusline, worktree control, and updates. +description: Command reference for the pilot admin binary — license activation, updates, worktrees, bot mode, and customization. --- # Pilot CLI -Command reference for the `pilot` binary at `~/.pilot/bin/pilot`. +Admin command reference for the `pilot` binary at `~/.pilot/bin/pilot`. -Run `pilot` or `ccp` with no arguments to start Claude with Pilot enhancements. Most commands support `--json` for structured output. Multiple sessions can run in parallel on the same project. +:::note Pilot does not launch your agent +Pilot Shell loads automatically when you run `claude` or `codex` — there is no wrapper command. The `pilot` CLI is for admin tasks only: license management, updates, worktrees, bot mode, customization, and diagnostics. Most commands support `--json` for structured output. +::: -## Session & context +## License & auth | Command | Description | |---------|-------------| -| `pilot` | Start Claude with Pilot enhancements, license check, and a one-line update banner when a newer release exists | -| `pilot [claude-flags...]` | Start Claude with any Claude CLI flags passed through | -| `pilot -p "prompt" [flags...]` | Headless mode — non-interactive for CI/CD, scripts | -| `pilot run [flags...]` | Explicit alias for starting Claude | -| `pilot agents` | Open Claude's agent view (`claude agents`) to manage multiple background sessions | -| `ccp` | Alias for `pilot` | -| `pilot update [--yes] [--json] [--skip-claude]` | Update both Claude Code and Pilot Shell. `pilot upgrade` is an alias — both verbs run the same flow. | -| `pilot check-context --json` | Get current context usage percentage | -| `pilot register-plan ` | Associate a plan file with the current session | -| `pilot sessions [--json]` | Show count of active Pilot sessions | -| `pilot statusline` | Run the status line formatter (called by Claude Code) | -| `pilot notify <message> [--plan-path PATH] [--json]` | Send a notification to the Console dashboard (type: `info`, `plan_approval`, `attention_needed`, `verification_complete`) | -| `pilot skill-build <skill-dir> [--output <path>] [--dry-run] [--json]` | Build `SKILL.md` and `hashes.json` from a skill's manifest + fragments | -| `pilot --version` | Show Pilot Shell version | - -:::info Update flow -On launch, `pilot` checks both registries — GitHub releases for Pilot Shell and the npm registry for Claude Code (`@anthropic-ai/claude-code`) — and prints a one-line banner when either has a newer version. Startup never blocks on installation. - -Run `pilot update` (or its `pilot upgrade` alias — both verbs are interchangeable) when you're ready. The flow runs `claude update` first with a spinner, then downloads and installs the latest Pilot Shell. Pass `--skip-claude` to update only Pilot Shell, or `--yes` to skip the confirmation prompt. -::: +| `pilot activate <key>` | Activate a license key on this machine | +| `pilot deactivate` | Deactivate license on this machine | +| `pilot status [--json]` | Show current license status and tier | +| `pilot verify [--json]` | Verify license validity (used by hooks) | +| `pilot trial --check [--json]` | Check trial eligibility for this machine | +| `pilot trial --start [--json]` | Start a trial (one-time per machine) | -## Bot mode +## Updates | Command | Description | |---------|-------------| -| `pilot bot` | Launch Pilot Bot — persistent automation session with scheduled tasks, background jobs, and optional Telegram | +| `pilot update [--yes] [--json]` | Update Pilot Shell. `pilot upgrade` is an alias. Pass `--yes` to skip the confirmation prompt. | +| `pilot --version` | Show Pilot Shell version | + +Update Claude Code and Codex CLI through their own installers independently — `pilot update` only updates Pilot Shell itself. ## Worktree isolation +Used by the `/spec` workflow to keep work isolated until verification passes. All commands work with both Claude Code and Codex sessions. + | Command | Description | |---------|-------------| | `pilot worktree create --json <slug>` | Create isolated git worktree | @@ -53,19 +46,14 @@ Run `pilot update` (or its `pilot upgrade` alias — both verbs are interchangea | `pilot worktree status --json` | Show active worktree info for current session | :::info Slug format -The `<slug>` for worktree commands is the plan filename without the date prefix and `.md` extension. Example: `docs/plans/2026-02-22-add-auth.md` → `add-auth`. +The `<slug>` is the plan filename without the date prefix and `.md` extension. Example: `docs/plans/2026-02-22-add-auth.md` → `add-auth`. ::: -## License & auth +## Bot mode *(Claude Code only)* | Command | Description | |---------|-------------| -| `pilot activate <key>` | Activate a license key on this machine | -| `pilot deactivate` | Deactivate license on this machine | -| `pilot status [--json]` | Show current license status and tier | -| `pilot verify [--json]` | Verify license validity (used by hooks) | -| `pilot trial --check [--json]` | Check trial eligibility for this machine | -| `pilot trial --start [--json]` | Start a trial (one-time per machine) | +| `pilot bot` | Launch [Pilot Bot](/docs/features/bot) — persistent automation session with scheduled tasks, background jobs, and optional Telegram | ## Customization (Team / Enterprise) @@ -73,51 +61,21 @@ Compose custom steps into core workflow skills and ship team rules, hooks, and a | Command | Description | |---------|-------------| -| `pilot customize install <source> [--branch <b>] [--subfolder <p>] [--json]` | Install and apply. `<source>` = git URL or local directory path. `--branch` applies to git sources only. | +| `pilot customize install <source> [--branch <b>] [--subfolder <p>] [--json]` | Install and apply. `<source>` = git URL or local directory path. | | `pilot customize update [--json]` | Re-apply — pulls git sources, reads local sources in place | | `pilot customize status [--json]` | Show active source, file counts, and drift warnings | | `pilot customize diff <skill>/<step-id> [--json]` | Unified diff between pinned replacement and current upstream | | `pilot customize remove [--json]` | Delete pack files and regenerate pristine `SKILL.md` | -## Claude CLI flag passthrough - -Pilot forwards any unrecognized flags directly to the Claude CLI — all current and future Claude Code flags work out of the box, no Pilot update required. - -```bash -# Any Claude CLI flag works directly -pilot --channels plugin:telegram@claude-plugins-official -pilot --model opus --verbose -pilot --resume -pilot --continue - -# 'run' is an explicit alias — same behavior -pilot run --channels plugin:telegram@claude-plugins-official -``` - -Pilot only intercepts its own subcommands (`activate`, `status`, `worktree`, etc.) and flags (`--version`, `--skip-update-check`). Everything else passes through to `claude`. - -## Headless mode - -Run Pilot non-interactively with `-p` (or `--print`). Wraps `claude -p` with license validation and the Pilot plugin — use it in CI/CD pipelines, scripts, or automated workflows. - -```bash -# Basic usage -pilot -p "What does the auth module do?" +## Internal commands -# Structured JSON output -pilot -p "Summarize this project" --output-format json +Called by hooks and the Console — you rarely need to run these directly. -# Auto-approve specific tools -pilot -p "Run tests and fix failures" --allowedTools "Bash,Read,Edit" - -# With channels -pilot --channels plugin:telegram@claude-plugins-official -p "Check messages" - -# Continue a previous conversation -pilot -p "Now focus on the database queries" --continue - -# Minimal startup (skip hooks, plugins, MCP auto-discovery) -pilot -p "Summarize this file" --bare --allowedTools "Read" -``` - -All [Claude Code CLI flags](https://code.claude.com/docs/en/cli-reference) work with `-p`, including `--output-format`, `--allowedTools`, `--continue`, `--resume`, `--channels`, `--append-system-prompt`, `--json-schema`, and `--bare`. Pilot-specific flags like `--skip-update-check` are stripped automatically. +| Command | Description | +|---------|-------------| +| `pilot check-context --json` | Get current context usage percentage | +| `pilot register-plan <path> <status>` | Associate a plan file with the current session | +| `pilot sessions [--json]` | Show count of active Pilot sessions | +| `pilot statusline` | Status line formatter *(Claude Code only — called by Claude Code's statusLine hook)* | +| `pilot notify <type> <title> <message> [--plan-path PATH] [--json]` | Send a notification to the Console dashboard (type: `info`, `plan_approval`, `attention_needed`, `verification_complete`) | +| `pilot skill-build <skill-dir> [--output <path>] [--dry-run] [--json]` | Build `SKILL.md` and `hashes.json` from a skill's manifest + fragments | diff --git a/docs/docusaurus/docs/features/console.md b/docs/docusaurus/docs/features/console.md index d643f548c..9a288ddc3 100644 --- a/docs/docusaurus/docs/features/console.md +++ b/docs/docusaurus/docs/features/console.md @@ -15,7 +15,7 @@ $ open http://localhost:41777 ``` :::tip Custom port -The default port `41777` is configurable. Open the **Settings** tab and edit the **Console → Worker Port** field, then click **Save Port** (or edit `CLAUDE_PILOT_WORKER_PORT` in `~/.pilot/memory/settings.json` directly). Restart Pilot for the change to take effect — the launcher, status line, hooks (session_end, pre_compact, dashboard notifications), MCP server, and installer all read the same setting and will follow it. +The default port `41777` is configurable. Open the **Settings** tab and edit the **Console → Worker Port** field, then click **Save Port** (or edit `CLAUDE_PILOT_WORKER_PORT` in `~/.pilot/memory/settings.json` directly). Restart your `claude` or `codex` session for the change to take effect. ::: ## Views @@ -25,15 +25,15 @@ Each view that supports project filtering has an inline **Project Filter** dropd | View | Description | |------|-------------| | **Dashboard** | Global command center — 8 clickable stat cards (Projects, Sessions, Active, Memories, Extensions, Requirements, Specifications, Changes), 4 recent-item cards with "Show all" links, active specs as pills in the top bar, notification bell in the top right. | -| **Sessions** | Browse past sessions with search. Copy a session ID and run `/resume <session-id>` to jump back in — all context, files, and conversation history restored. | +| **Sessions** | Browse past sessions with search. Copy a session ID and run `/resume <session-id>` in Claude Code to jump back in (Claude Code only). | | **Memories** | Observations (decisions, discoveries, bugfixes) with type filters and search. Each memory links back to the session it came from. | | **Requirements** | PRD documents with view/annotate modes. Selected opens as a tab, others live in a Previous dropdown. | | **Specifications** | Spec plans with task progress, phase tracking (PENDING/COMPLETE/VERIFIED), and iteration history. Hosts Plan Annotation and Spec Sharing (below). | | **Extensions** | All extensions — local, plugin, remote — with team sharing via git (push, pull, diff), color-coded categories, and scope filtering. | | **Changes** | Git diff viewer with staged/unstaged files, branch info, worktree context. Hosts Code Review and Spec Task Correlation (below). | | **Usage** | Daily token costs, model routing breakdown (Opus vs Sonnet), and usage trends. | -| **Help** | Embedded pilot-shell.com documentation — full technical reference without leaving the Console. | | **Settings** | Spec workflow toggles (branch isolation, ask questions, plan approval, Model Switching), reviewer toggles. See [Settings](#settings) below. | +| **Documentation** | Embedded pilot-shell.com documentation — full technical reference without leaving the Console. | ## Plan Annotation @@ -94,36 +94,36 @@ The **+** button and text selection both work on the normal review page and on s ## Notifications -The Console sends real-time alerts via Server-Sent Events when Claude needs your input or a significant phase completes — no need to watch the terminal. +The Console sends real-time alerts via Server-Sent Events when your agent needs input or a significant phase completes — no need to watch the terminal. - Plan requires your approval — review and respond - Spec phase completed — implementation done, verification starting -- Clarification needed — Claude is waiting for design decisions +- Clarification needed — the agent is waiting for design decisions - Session ended — completion summary with observation count ## Settings -The Settings tab (`localhost:41777/#/settings`, or your custom port) is a single scrollable page with two stacked sections: **Spec Workflow** and **Console**. Toggle preferences save to `~/.pilot/config.json`. The **Console → Worker Port** field saves to `~/.pilot/memory/settings.json` and lets you move the Console off `41777` if it conflicts with another service. Both changes take effect after restarting Pilot. +The Settings tab (`localhost:41777/#/settings`, or your custom port) is a single scrollable page with two stacked sections: **Spec Workflow** and **Console**. Toggle preferences save to `~/.pilot/config.json`. The **Console → Worker Port** field saves to `~/.pilot/memory/settings.json` and lets you move the Console off `41777` if it conflicts with another service. Both changes take effect after restarting your session. -:::info Model selection lives in Claude Code -Pilot doesn't manage model preferences — pick your model with `/model sonnet[1m]`, `/model opus[1m]`, or an explicit Anthropic ID like `/model claude-opus-4-6`. See [Model Routing](./model-routing.md) for the recommended flow. +:::info Model selection lives in the agent +Pilot doesn't manage model preferences. Set the model with Claude Code's `/model` command or Codex's `codex --model <name>` / `~/.codex/config.toml`. See [Model Routing](./model-routing.md). ::: ### Spec Workflow → Review Agents -Two Claude sub-agents run in separate context windows during `/spec`. Toggle each on or off; the sub-agent model is hard-coded to Sonnet because sub-agents do not support 1M context. +Two review agents run during `/spec` on Claude Code and Codex. Toggle each on or off; Claude Code runs them as Claude sub-agents, and Codex runs them as managed custom agents installed under `~/.codex/agents/`. | Agent | Default | Role | |-------|---------|------| | **Spec Review** | On | Validates plans before implementation. Checks alignment with requirements, flags risky assumptions. | | **Changes Review** | On | Reviews code after implementation. Checks compliance, security, test coverage, goal achievement. | -**Codex adversarial reviewers (optional)** — OpenAI Codex agents that provide an independent second opinion. +**Codex Companion Reviewers (optional, Claude Code only)** — OpenAI Codex plugin reviewers that provide an independent second opinion while you are working inside Claude Code. | Agent | Default | Role | |-------|---------|------| -| **Codex Spec Review** | Off | Adversarial plan review — second opinion before implementation. | -| **Codex Changes Review** | Off | Adversarial code review — second opinion after implementation. | +| **Codex Companion Spec Review** | Off | Plugin plan review — second opinion before implementation. | +| **Codex Companion Changes Review** | Off | Plugin code review — second opinion after implementation. | ### Spec Workflow → Automation @@ -134,12 +134,12 @@ Four toggles control user interaction points during `/spec`. Disable all four fo | **Branch Isolation** | On | Asks how to isolate `/spec` changes (new branch or worktree) | Always works on the current branch | | **Ask Questions** | On | Asks clarifying questions during planning | Planning makes autonomous default choices | | **Plan Approval** | On | Requires your approval before implementation starts | Implementation begins automatically after planning | -| **Model Switching** | On | Pauses after plan approval. Option A: run `/model sonnet[1m]`, type any prompt — resumes with planning context in the new model (Claude Code will confirm; carries extra cost). Option B: run `/clear` then `/spec <plan path>` — fresh session, lower cost. | Plan → implement → verify runs continuously on whichever model is active | +| **Model Switching** *(Claude Code only)* | On | Pauses after plan approval. Option A: run `/model sonnet[1m]`, type any prompt — resumes with planning context in the new model. Option B: run `/clear` then `/spec <plan path>` — fresh session, lower cost. | Plan → implement → verify runs continuously on whichever model is active | With all four off, `/spec add user authentication` plans, implements, and verifies the feature end-to-end without checkpoints on whichever model is currently active. :::warning Token usage in autonomous mode -No checkpoints means Claude executes the entire workflow without asking. Make sure your prompt is specific enough to avoid misinterpretation. You can always interrupt with Escape. +No checkpoints means your agent executes the entire workflow without asking. Make sure your prompt is specific enough to avoid misinterpretation. You can always interrupt with Escape. ::: ### Config file @@ -165,4 +165,4 @@ All settings are stored in `~/.pilot/config.json`: } ``` -You can edit `~/.pilot/config.json` directly — the Settings UI is a convenience wrapper. Changes require a Claude Code restart. +You can edit `~/.pilot/config.json` directly — the Settings UI is a convenience wrapper. Changes take effect after restarting your session. diff --git a/docs/docusaurus/docs/features/context-optimization.md b/docs/docusaurus/docs/features/context-optimization.md index ac789d649..740b0da96 100644 --- a/docs/docusaurus/docs/features/context-optimization.md +++ b/docs/docusaurus/docs/features/context-optimization.md @@ -1,14 +1,14 @@ --- sidebar_position: 3 title: Context Optimization -description: Keep the Claude context window lean and recover cleanly when it fills up — auto-compaction hooks, memory persistence, and routing for token-heavy tasks. +description: Keep the context window lean and recover cleanly when it fills up — strategies and memory persistence for Claude Code and Codex. --- # Context Optimization -Two things matter for a long-running Claude session: keeping the context window lean so tokens go to your code, and handling the moments when it fills up anyway. +Two things matter for a long-running session: keeping the context window lean so tokens go to your code, and handling the moments when it fills up anyway. -With 1M context windows (API subscribers on Team and Enterprise get this on all models; Max plan users can opt into 1M per-row on Opus rows — see [Smart Model Routing](./model-routing.md)), compaction is rare — most sessions complete well within the available context. Pilot Shell's strategies focus on **staying lean**, and making **compaction and parallel work** painless when they happen. +These strategies apply to both **Claude Code** and **Codex CLI**. Compaction (automatic context summarization) applies to Claude Code; on Codex, context is managed per the model's native window. ## Keeping context lean @@ -21,7 +21,7 @@ With 1M context windows (API subscribers on Team and Enterprise get this on all | **Scoped MCP tools** | Variable | MCP tool schemas are lazy-loaded via `ToolSearch` — only fetched when needed, not preloaded | | **Routing hooks** | Variable | PreToolUse hooks block `curl`/`wget`/built-in `WebFetch` and redirect to the dedicated web-fetch MCP, so large pages don't dump into context | -## Status line display +## Status line display *(Claude Code only)* The status line shows context usage as a visual progress bar: @@ -31,7 +31,7 @@ Opus 4.7 [1M] | █████░▓ 60% | ... Claude Code reserves ~16.5% of the context window as a compaction buffer, triggering auto-compaction at ~83.5% raw usage. Pilot Shell rescales this to an **effective 0–100% range** so the bar fills naturally to 100% right before compaction fires. A `▓` indicator shows the reserved zone. The monitor warns at ~80% effective (informational) and ~90%+ effective (caution). -## When compaction fires +## When compaction fires *(Claude Code only)* On 200K windows, compaction happens more often. Pilot Shell preserves state automatically across the three lifecycle events: diff --git a/docs/docusaurus/docs/features/customization.md b/docs/docusaurus/docs/features/customization.md index 57916f87c..58bc8469b 100644 --- a/docs/docusaurus/docs/features/customization.md +++ b/docs/docusaurus/docs/features/customization.md @@ -1,12 +1,16 @@ --- sidebar_position: 10 title: Customization -description: Customize what Pilot Shell auto-installs — skills, rules, hooks, agents, MCP servers, LSP servers, and Claude settings on Team and Enterprise plans. +description: Customize what Pilot Shell auto-installs into ~/.claude/ — skills, rules, hooks, agents, MCP servers, and Claude Code settings. Team and Enterprise plans. --- # Customization -Customize everything Pilot Shell auto-installs on your machine. Tweak the built-in `/spec` workflow, modify existing rules, register additional hooks on top of Pilot's defaults, change which MCP or LSP servers get configured, adjust the auto-applied `settings.json` and `claude.json` — all without forking Pilot or hand-editing `~/.claude/` after every update. Available on **Team** and **Enterprise** plans. +Customize everything Pilot Shell auto-installs. Tweak the built-in `/spec` workflow, modify rules, register additional hooks, add MCP servers, adjust `settings.json` and `claude.json` — all without hand-editing `~/.claude/` after every update. Available on **Team** and **Enterprise** plans. + +:::note Claude Code focus +Customization currently targets Claude Code assets (`~/.claude/` — skills, rules, hooks, agents, settings). Codex assets (`~/.codex/`) are configured independently. Use `rules/*.md` and `agents/*.md` for content that applies to both agents. +::: - **Team-wide (git URL):** publish your customization as a git repo; every developer runs `pilot customize install <git-url>` once, and `pilot customize update` pulls your team's latest. - **Individual (local path):** drop the same files into a folder on your machine (e.g. `~/my-pilot-patch/`) and run `pilot customize install ~/my-pilot-patch`. `pilot customize update` re-applies directly from the same folder, so your edits take effect on the next update. @@ -167,7 +171,7 @@ Your step now appears in `~/.claude/skills/<skill>/SKILL.md` right after the anc ## Overriding top-level config -Three Pilot config files can be overridden from your repo root. All three use deep-merge — Pilot is no longer registered as a Claude Code plugin, so MCP servers live at the native user-scope location (`~/.claude.json` `mcpServers` key) instead of a plugin-local file. LSP servers are now installed via Anthropic's [`Piebald-AI/claude-code-lsps`](https://github.com/Piebald-AI/claude-code-lsps) marketplace and are no longer Pilot-managed; teams that need additional LSPs ship them as separate Claude Code plugins. +Three Pilot config files can be overridden from your repo root. All three use deep-merge. MCP servers live at the native user-scope location (`~/.claude.json` `mcpServers` key). Language servers are installed via Anthropic's [`Piebald-AI/claude-code-lsps`](https://github.com/Piebald-AI/claude-code-lsps) marketplace — to add a custom LSP, ship it as a separate Claude Code plugin. | File in repo | Destination | Strategy | Notes | |--------------|-------------|----------|-------| diff --git a/docs/docusaurus/docs/features/extensions.md b/docs/docusaurus/docs/features/extensions.md index 4f5203ba4..f79b0b964 100644 --- a/docs/docusaurus/docs/features/extensions.md +++ b/docs/docusaurus/docs/features/extensions.md @@ -1,21 +1,21 @@ --- sidebar_position: 1 title: Extensions -description: Manage all Claude Code extensions — skills, rules, commands, and agents — from a unified interface with team sharing and plugin visibility +description: Manage skills, rules, commands, and agents for both Claude Code and Codex — from a unified interface with team sharing and plugin visibility. --- # Extensions -Extensions are the things that customize Claude Code behavior. Pilot Shell provides a unified view of all extensions across multiple scopes: **global** (your personal `~/.claude/` directory), **project** (the `.claude/` directory in each project), **plugin** (installed Claude Code plugins), and **remote** (a connected team git repository). +Extensions customize how your AI agent behaves. Pilot Shell provides a unified Console view for both **Claude Code** and **Codex CLI** extensions, across multiple scopes: **global**, **project**, **plugin** (Claude Code plugins), and **remote** (connected team git repository). Use the **Agent toggle** (All / Claude Code / Codex) to filter by agent. ## Extension Categories -| Category | What it does | Location | -| ------------ | ----------------------------------------------------------- | -------------------------------- | -| **Skills** | Reusable workflows that load automatically when relevant | `.claude/skills/<name>/SKILL.md` | -| **Rules** | Instructions Claude follows every session (or by file type) | `.claude/rules/<name>.md` | -| **Commands** | Slash commands invoked on demand via `/<name>` | `.claude/commands/<name>.md` | -| **Agents** | Sub-agent definitions for specialized tasks | `.claude/agents/<name>.md` | +| Category | What it does | Claude Code Location | Codex Location | +| ------------ | ----------------------------------------------------------- | -------------------------------- | -------------------------------- | +| **Skills** | Reusable workflows that load automatically when relevant | `.claude/skills/<name>/SKILL.md` | `.agents/skills/<name>/SKILL.md` | +| **Rules** | Instructions the agent follows every session | `.claude/rules/<name>.md` | `~/.codex/rules/<name>.rules` | +| **Commands** | Slash commands invoked on demand via `/<name>` | `.claude/commands/<name>.md` | *(not available in Codex)* | +| **Agents** | Sub-agent definitions for specialized tasks | `.claude/agents/<name>.md` | `.codex/agents/<name>.toml` | ## Scope: Global vs Project @@ -34,9 +34,52 @@ Installed Claude Code plugins are automatically discovered and their extensions Install plugins via the Claude Code CLI: `claude plugin install <name>` +## Codex Extensions + +The Extensions page supports both Claude Code and Codex. Codex stores extensions in different locations and formats than Claude Code. + +### Agent Toggle + +Use the **All / Claude Code / Codex** toggle at the top of the Extensions page to filter by agent. When "All" is selected, each extension card shows a small agent badge (CC or Codex) to distinguish its source. + +### Format Differences + +| Format | Used by | File extension | Notes | +| --- | --- | --- | --- | +| **Markdown** | Claude Code rules/commands/agents, both agents' skills | `.md` | Full markdown editor with preview | +| **Starlark** | Codex rules | `.rules` | Sandbox permission rules in Python-like syntax | +| **TOML** | Codex agents | `.toml` | Custom agent definitions with model/instruction config | + +Starlark and TOML extensions are displayed as raw source in the detail modal. All formats support editing through the text editor. + +### Codex Skills + +Codex skills use the same `SKILL.md` format as Claude Code skills but live in a different directory (`~/.agents/skills/` instead of `~/.claude/skills/`). They are fully editable and support all operations (rename, delete, move between scopes). + +### Codex System Skills + +Built-in Codex skills (skill-creator, plugin-creator, etc.) are discovered from `~/.codex/skills/.system/` and displayed as read-only items with a "Codex System" badge, similar to Claude Code plugin extensions. + +### Codex Rules + +Codex rules use Starlark (`.rules`) format for sandbox permission control. They are global-only — there is no project-level equivalent. See [Codex Rules](https://developers.openai.com/codex/rules) for the format specification. + +### Codex Agents (Subagents) + +Custom Codex agents are TOML files defining specialized subagents with model selection, instructions, and optional MCP server configuration. They exist at both global (`~/.codex/agents/`) and project (`.codex/agents/`) scope. Pilot installs `spec-review` and `changes-review` as managed global Codex agents when Codex CLI is detected. See [Codex Subagents](https://developers.openai.com/codex/subagents) for the format specification. + +### Limitations + +- **Remote push/pull** currently targets Claude Code extensions only +- **Commands** are a Claude Code-only concept; they do not exist in Codex + ## Team Sharing (Team and Enterprise) -Share extensions with your team through a connected git repository. This feature is available on the Team and Enterprise plans. +:::note Claude Code extensions only +Team sharing (push/pull/diff) currently targets Claude Code extensions in `~/.claude/`. Codex extensions are not yet synced through team remotes — share them via the `customization` feature or commit `.codex/` rules and agents directly with your project. +::: + +Share extensions with your team through a connected git repository. Available on the Team and Enterprise plans. ### How It Works @@ -99,11 +142,13 @@ The [Pilot Console](/docs/features/console) provides a full management interface ### Viewing Extensions - All extensions from all scopes appear in a unified two-column grid +- **Agent toggle** (All / Claude Code / Codex) filters extensions by agent — when "All" is active, cards show agent badges - Each category has a distinct color: Skills (violet), Rules (amber), Commands (green), Agents (blue) - Filter by **scope** (All / Global / Project / Plugin / Remote) and **category** (Skills, Rules, Commands, Agents) - Search by name in the top-right search bar - Extensions that exist in both scopes show an "also in global/project" indicator so you can spot duplicates at a glance -- Plugin extensions display the plugin name as a badge +- Plugin extensions display the plugin name as a badge; Codex system skills show "Codex System" +- Non-markdown formats show format badges (`.rules` for Starlark, `.toml` for TOML) - Click any extension to see its full content ### Editing Extensions @@ -127,12 +172,12 @@ Clicking "→ Global" on a project extension physically moves the file from `.cl ## Creating Extensions -Create extensions manually or via Claude Code commands: +Create extensions manually or via workflows: -- **Rules:** `/setup-rules` — explores your codebase and generates project-specific rules -- **Skills:** `/create-skill` — builds a reusable skill interactively from any topic -- **Commands:** Create `.claude/commands/<name>.md` manually -- **Agents:** Create `.claude/agents/<name>.md` manually +- **Rules:** `/setup-rules` (Claude Code) or `$setup-rules` (Codex) — explores your codebase and generates project-specific rules +- **Skills:** `/create-skill` or `$create-skill` — builds a reusable skill interactively from any topic +- **Commands:** Claude Code only — create `.claude/commands/<name>.md` manually +- **Agents:** Create `.claude/agents/<name>.md` (Claude Code) or `~/.codex/agents/<name>.toml` (Codex) manually ## File Locations Reference @@ -168,6 +213,29 @@ Create extensions manually or via Claude Code commands: └── agents/ ← plugin agents ``` +### Codex Global Extensions + +```text +~/.agents/ +└── skills/ ← Codex user skills (SKILL.md format) + +~/.codex/ +├── rules/ ← Codex rules (.rules Starlark format) +├── agents/ ← Codex custom agents (.toml format) +└── skills/ + └── .system/ ← Codex system skills (read-only) +``` + +### Codex Project Extensions + +```text +<project>/ +├── .agents/ +│ └── skills/ ← Codex project skills +└── .codex/ + └── agents/ ← Codex project agents +``` + ## See Also - **[Customization](/docs/features/customization)** — for team-wide workflow modification. Extensions are project-scoped; customization is a git-hosted overlay that applies across every project via `pilot customize install`. It composes custom steps into core workflow skills and adds team rules, hooks, and agents. Available on Team and Enterprise plans. diff --git a/docs/docusaurus/docs/features/hooks.md b/docs/docusaurus/docs/features/hooks.md index cb71d51d4..df5d05926 100644 --- a/docs/docusaurus/docs/features/hooks.md +++ b/docs/docusaurus/docs/features/hooks.md @@ -1,81 +1,79 @@ --- sidebar_position: 2 title: Hooks Pipeline -description: 15 quality hooks across 7 Claude Code lifecycle events — auto-format, lint, type-check, and TDD enforcement that fire automatically at every stage of work. +description: Quality hooks across lifecycle events — auto-format, lint, type-check, and TDD enforcement that fire automatically at every stage of work. --- # Hooks Pipeline -15 hook registrations across 7 lifecycle events — quality enforcement on autopilot. +Lifecycle hooks enforce quality automatically on every file edit, prompt, and session event. Hooks run on both **Claude Code** and **Codex CLI** — registered in `~/.claude/settings.json` for Claude Code and `~/.codex/hooks.json` for Codex. -Blocking hooks reject actions or force fixes. Non-blocking hooks warn without interrupting. Async hooks run in the background. Two additional command-scoped Stop hooks run during `/spec` phases. +Blocking hooks reject actions or force fixes before they land. Non-blocking hooks warn without interrupting. Async hooks run in the background. ## SessionStart -*On startup, clear, or after compaction* +*On startup, after `/clear`, or after compaction* -| Hook | Type | Description | -|------|------|-------------| -| Worker context bootstrap | Blocking | Restores session context through the worker service on startup, `/clear`, and after compaction | -| `post_compact_restore.py` | Blocking | Re-injects the active plan and task state after compaction | -| `session_clear.py` | Blocking | Resets Pilot session state on `/clear` | -| Session tracker | Async | Starts background session activity tracking | +| Hook | Description | +|------|-------------| +| `license_check.py` | Verifies Pilot Shell license — blocks session if invalid | +| `codegraph_init.py` | Initializes CodeGraph for the current project (async) | +| `post_compact_restore.py` | Re-injects active plan and task state after compaction *(Claude Code only)* | +| `session_clear.py` | Resets Pilot session state on `/clear` *(Claude Code only)* | +| Worker context bootstrap | Restores session context through the Console worker *(Claude Code only)* | ## UserPromptSubmit -*When the user sends a message* +*When you send a message* -| Hook | Type | Description | -|------|------|-------------| -| `spec_mode_guard.py` | Blocking | Blocks `/spec` in plan mode, warns when not in bypassPermissions mode | -| Session initializer | Async | Registers the session with the Console worker daemon | +| Hook | Description | +|------|-------------| +| `spec_mode_guard.py` | Blocks `/spec` outside bypassPermissions mode | +| `spec_handoff_resume.py` | Detects model-switch handoff and resumes `/spec` *(Claude Code only)* | +| Session initializer | Registers session with the Console worker (async) | ## PreToolUse -*Before Bash, search, web, or agent tools run* +*Before Bash, search, or web tools run* -| Hook | Type | Description | -|------|------|-------------| -| `tool_redirect.py` | Blocking | Redirects to MCP alternatives, blocks unsupported web fetch paths, and enforces `/spec`-compatible tool usage | -| `tool_token_saver.py` | Blocking | Rewrites Bash commands via RTK for token savings (60-90% reduction) | +| Hook | Description | +|------|-------------| +| `tool_redirect.py` | Redirects to MCP alternatives; blocks unsupported web paths *(Claude Code only)* | +| `tool_token_saver.py` | Rewrites Bash commands via RTK for 60–90% token savings | ## PostToolUse -*After file edits, reads, searches, and task tools* +*After file edits, reads, and searches* -| Hook | Type | Description | -|------|------|-------------| -| `file_checker.py` | Blocking | Quality checks: Python (ruff), TypeScript (ESLint), Go (go vet + golangci-lint). Also warns when implementation files are edited without a failing test (TDD) | -| `context_monitor.py` | Non-blocking | Tracks context usage 0-100% with warnings as compaction approaches | -| Memory observer | Async | Captures decisions, discoveries, and bugfixes to persistent memory | +| Hook | Description | +|------|-------------| +| `file_checker.py` | Linting (ruff/ESLint/go vet) and TDD enforcement — warns when editing without a failing test | +| `context_monitor.py` | Tracks context usage 0–100%, warns as compaction approaches | +| Memory observer | Saves decisions, discoveries, and bugfixes to persistent memory (async) | -## PreCompact +## PreCompact *(Claude Code only)* -*Before auto-compaction fires* - -| Hook | Type | Description | -|------|------|-------------| -| `pre_compact.py` | Blocking | Snapshots active plan and task list to memory | +| Hook | Description | +|------|-------------| +| `pre_compact.py` | Snapshots active plan and task list before compaction | ## Stop -*When Claude tries to finish* - -| Hook | Type | Description | -|------|------|-------------| -| `spec_stop_guard.py` | Blocking | Blocks stopping if an active spec hasn't completed verification | -| Session summarizer | Async | Saves session observations to memory | +*When the agent finishes* -Additionally, `spec_plan_validator.py` and `spec_verify_validator.py` run as command-scoped Stop hooks during `/spec` phases — they verify plan creation and VERIFIED status respectively. +| Hook | Description | +|------|-------------| +| `spec_stop_guard.py` | Blocks stopping if an active spec hasn't completed verification | +| Session summarizer | Saves session observations to memory (async) | -## SessionEnd +`spec_plan_validator.py` and `spec_verify_validator.py` run as command-scoped Stop hooks during `/spec` phases. -*When the session closes* +## SessionEnd *(Claude Code only)* -| Hook | Type | Description | -|------|------|-------------| -| `session_end.py` | Blocking | Stops worker daemon if no other sessions active, sends dashboard notification | +| Hook | Description | +|------|-------------| +| `session_end.py` | Stops the Console worker when no sessions remain; sends dashboard notification | -:::info Closed loop -When compaction fires, **PreCompact** snapshots your active plan and task list to persistent memory. **SessionStart** restores the working state afterward through the worker service and `post_compact_restore.py` so progress survives compaction. +:::info Compaction resilience +When compaction fires (Claude Code): **PreCompact** captures plan state → compaction runs → **SessionStart** restores it via `post_compact_restore.py`. Work continues without data loss. ::: diff --git a/docs/docusaurus/docs/features/language-servers.md b/docs/docusaurus/docs/features/language-servers.md index 7cad1933b..af54b8d93 100644 --- a/docs/docusaurus/docs/features/language-servers.md +++ b/docs/docusaurus/docs/features/language-servers.md @@ -6,9 +6,13 @@ description: Real-time diagnostics, go-to-definition, and find-references via au # Language Servers -Real-time diagnostics and go-to-definition, auto-installed and configured. +:::warning Claude Code only +Language Server integration requires Claude Code's LSP support. Codex CLI does not have an equivalent LSP integration. On Codex, the `file_checker.py` hook still provides linting and type-checking via the underlying CLI tools (ruff, basedpyright, ESLint, go vet) — but without real-time editor-style diagnostics, hover docs, or go-to-definition. +::: + +Real-time diagnostics and go-to-definition for Claude Code, auto-installed and configured. -Language servers (LSP) give Claude real-time diagnostics, type information, and go-to-definition on every file edit. All three are auto-installed and configured via stdio transport — no manual setup. They work alongside the `file_checker.py` hook: hooks catch formatting and linting errors, LSP provides type-level intelligence. +Language servers give Claude Code real-time diagnostics, type information, and go-to-definition on every file edit. All three are auto-installed and configured via stdio transport — no manual setup. They work alongside the `file_checker.py` hook: hooks catch formatting and linting errors, LSP provides type-level intelligence. ## Python — basedpyright diff --git a/docs/docusaurus/docs/features/mcp-servers.md b/docs/docusaurus/docs/features/mcp-servers.md index 4d09ceb76..1ff57b977 100644 --- a/docs/docusaurus/docs/features/mcp-servers.md +++ b/docs/docusaurus/docs/features/mcp-servers.md @@ -8,7 +8,12 @@ description: Pre-configured MCP servers — context7 for library docs, mem-searc External context always available to every session. -Seven MCP servers are pre-configured in `.mcp.json` and lazy-loaded via `ToolSearch` to keep context lean. Pilot also installs the `chrome-devtools-mcp` Claude plugin alongside them. Add your own MCP entries in `.mcp.json`, then run `/setup-rules` to generate documentation. +Seven MCP servers are pre-configured and lazy-loaded to keep context lean. + +- **Claude Code:** configured in `~/.claude.json` (merged from `.mcp.json` during install). Add custom servers in `.mcp.json`. +- **Codex:** configured in `~/.codex/config.toml` under `[mcp_servers.*]`. + +Run `/setup-rules` (or `$setup-rules`) to generate documentation for your custom MCP servers. Pilot also installs the `chrome-devtools-mcp` plugin for browser automation. ## chrome-devtools-mcp plugin @@ -38,7 +43,7 @@ performance_start_trace(autoStop=true, reload=true) | `list_network_requests` | Inspect all network traffic with headers and bodies | | `list_console_messages` | Read console output filtered by type (error, warn, log) | -**4-tier browser priority:** Claude Code Chrome → Chrome DevTools MCP → playwright-cli → agent-browser. See the `browser-automation.md` rule for detection and fallback logic. +**4-tier browser priority (Claude Code):** Claude Code Chrome extension → Chrome DevTools MCP → playwright-cli → agent-browser. On Codex, the Chrome extension is not available — Chrome DevTools MCP is the preferred tool. ## context7 @@ -121,22 +126,26 @@ codegraph_context(task="refactor authentication flow") | `codegraph_context` | Task-driven context retrieval — entry points, related symbols, and code | | `codegraph_node` | Get details and source code for a specific symbol | -**When to use Semble vs CodeGraph:** +**When to use CodeGraph vs Semble:** + +CodeGraph and Semble are **co-primary** — each excels at different query types. | Question | Best tool | |----------|-----------| -| "How does authentication work?" | **Semble** — natural-language hybrid search (BM25 + Model2Vec) | | "Who calls this function?" | **CodeGraph** — `codegraph_callers` with exact caller list | | "What's the blast radius of my changes?" | **CodeGraph** — `codegraph_impact` shows transitive affected symbols | | "Find functions matching a name" | **CodeGraph** — `codegraph_search` with kind filter | | "Get context for a task" | **CodeGraph** — `codegraph_context` returns entry points and related code | -| "Find code similar to a specific location" | **Semble** — `find_related` discovers parallel implementations and call sites | | "Get details and source for one symbol" | **CodeGraph** — `codegraph_node` (Semble does not extract by symbol name) | +| "How does authentication work?" | **Semble** — natural-language hybrid search (BM25 + Model2Vec) | +| "Where does settings.json get modified?" | **Semble** — finds mutation sites across languages, not just type definitions | +| "How does the notification system work?" | **Semble** — surfaces the full feature stack (UI hooks, routes, business logic) | +| "Find code similar to a specific location" | **Semble** — `find_related` discovers parallel implementations | :::info Tool selection -Rules specify the preferred order — Semble first for intent-based codebase questions, CodeGraph for structural queries (call tracing, impact analysis), context7 for library API lookups, grep-mcp for production code examples, web-search for current information. The `tool_redirect.py` hook blocks the built-in WebSearch/WebFetch and the Explore agent, redirecting to these alternatives. +`codegraph_context` first for structural orientation, then Semble for intent-based discovery. context7 for library API lookups, grep-mcp for production code examples, web-search for current information. -curl/wget and WebFetch are blocked entirely — use the dedicated web-fetch and web-search MCP servers instead. +On Claude Code, the `tool_redirect.py` hook blocks the built-in WebSearch/WebFetch and redirects to these MCP alternatives automatically. ::: ## Semble diff --git a/docs/docusaurus/docs/features/model-routing.md b/docs/docusaurus/docs/features/model-routing.md index 54bca0512..4a31fdb32 100644 --- a/docs/docusaurus/docs/features/model-routing.md +++ b/docs/docusaurus/docs/features/model-routing.md @@ -6,6 +6,10 @@ description: Plan with Opus, implement with Sonnet — without juggling config f # Model Routing +:::warning Claude Code only +Model switching uses Claude Code's `/model` command and is not available in Codex CLI. On Codex, set the model via `codex --model <name>` or in `~/.codex/config.toml`. +::: + There's one switch: Claude Code's `/model`. Whatever you set there is what every Pilot workflow uses. No per-skill table, no Pilot-side override. The interesting question is *when* to switch. Opus reasons better; Sonnet is faster and cheaper. The cost-saving move is to plan on Opus, then drop to Sonnet for the mechanical work that follows. diff --git a/docs/docusaurus/docs/features/open-source-tools.md b/docs/docusaurus/docs/features/open-source-tools.md index f3659c836..a874f6568 100644 --- a/docs/docusaurus/docs/features/open-source-tools.md +++ b/docs/docusaurus/docs/features/open-source-tools.md @@ -6,7 +6,7 @@ description: Open-source tools installed alongside Pilot Shell — Semble, ripgr # Open Source Tools -Pilot Shell installs the following open-source tools during setup. Each tool is installed only if not already present on your system. All tools retain their original licenses and are not modified or redistributed by Pilot Shell. **Claude Code** (proprietary, by Anthropic) is also installed automatically if missing — it is the foundation that Pilot Shell extends. +Pilot Shell installs the following open-source tools during setup. Each tool is installed only if not already present on your system. All tools retain their original licenses and are not modified or redistributed by Pilot Shell. **Claude Code** (proprietary, by Anthropic) is also installed automatically if missing. **Codex CLI** (OpenAI) is auto-detected and configured if present. ## System Prerequisites diff --git a/docs/docusaurus/docs/features/permission-modes.md b/docs/docusaurus/docs/features/permission-modes.md index 2c17a1143..d5cc6de82 100644 --- a/docs/docusaurus/docs/features/permission-modes.md +++ b/docs/docusaurus/docs/features/permission-modes.md @@ -1,40 +1,44 @@ --- sidebar_position: 10 title: Permission Modes -description: How Pilot Shell configures Claude Code permission modes — from default autonomy to manual approval — controlling when Claude asks before reading or writing. +description: How Pilot Shell configures Claude Code permission modes — controlling when Claude asks before reading or writing. --- # Permission Modes -Permission modes control whether Claude asks before acting. Different tasks call for different levels of autonomy: full oversight for sensitive work, minimal interruptions for a long refactor, or read-only access while exploring a codebase. +:::warning Claude Code only +Permission modes are a Claude Code concept. For Codex CLI, the equivalent is `approval_policy` in `~/.codex/config.toml` — Pilot sets it to `"never"` by default. +::: -## Default: Bypass Permissions +Permission modes control whether Claude asks before acting. -Pilot Shell sets Claude Code to `bypassPermissions` mode by default. This enables the `/spec` workflow to run autonomously — planning, implementing, and verifying without pausing for permission prompts. All permission checks are disabled and every tool call executes immediately. +## Default: Bypass Permissions -This is safe because Pilot Shell adds its own quality layer on top: [hooks](/docs/features/hooks) enforce linting, type checking, TDD, and other guardrails that catch issues before they land. +Pilot Shell sets Claude Code to `bypassPermissions` by default so `/spec` can run autonomously without permission prompts. Quality hooks (linting, TDD, type checking) act as the safety layer. -:::caution Only for trusted environments -`bypassPermissions` disables all safety checks. Use it in environments where Claude cannot cause damage to your host system — local development machines, containers, VMs, or devcontainers. If you work on production infrastructure directly, consider switching to a more restrictive mode. +:::caution Trusted environments only +Use `bypassPermissions` in local dev, containers, or VMs — not on production infrastructure. ::: -## Quick Mode Permissions +## Modes -**In Quick Mode (regular chat), you control the permission level.** Press `Shift+Tab` to cycle through modes. The current mode appears in the status bar. +| Mode | Claude can do without asking | Best for | +|------|------------------------------|----------| +| **Normal** | Read files | Sensitive or unfamiliar work | +| **Accept Edits** | Read and edit files | Daily coding | +| **Plan** | Read files (proposes changes, you approve) | Reviewing before `/spec` | +| **Auto** | Everything — classifier reviews each action | Long autonomous tasks | +| **Bypass Permissions** | Everything, no checks | `/spec` workflow, containers | -| Mode | What Claude can do without asking | Best for | -|------|-----------------------------------|----------| -| **Normal** (`default`) | Read files | Getting started, sensitive work | -| **Accept Edits** (`acceptEdits`) | Read and edit files | Iterating on code you're reviewing | -| **Plan** (`plan`) | Read files (proposes changes, you approve) | Exploring a codebase, planning a refactor | -| **Bypass Permissions** (`bypassPermissions`) | All actions, no checks | Isolated containers and VMs, `/spec` workflow | -| **Auto** (`auto`) | All actions, with background safety checks | Long-running tasks, reducing prompt fatigue | +Press `Shift+Tab` in Quick Mode to cycle through modes. -`bypassPermissions` appears in the cycle only if your session started with it (Pilot Shell's default). `auto` appears only when `--enable-auto-mode` is passed at startup — Pilot Shell does this automatically. +:::tip Use /spec instead of plan mode +Claude Code's built-in plan mode has no persistent format. `/spec` saves plans as markdown in `docs/plans/`, drives TDD, and runs full verification. Use `/spec`. +::: -## Setting a Persistent Default +## Set a Persistent Default -Change `defaultMode` in `~/.claude/settings.json`: +Edit `defaultMode` in `~/.claude/settings.json`: ```json { @@ -44,79 +48,14 @@ Change `defaultMode` in `~/.claude/settings.json`: } ``` -The Pilot Shell installer uses three-way merge — your customizations to `defaultMode` and other permission settings are preserved across updates. - -## Plan Mode - -Plan mode tells Claude to research and propose changes without making them. Claude reads files, runs shell commands to explore, and writes a plan — but does not edit your source code. Permission prompts work the same as Normal mode: you still approve Bash commands, network requests, and other actions that would normally prompt. - -:::tip Use /spec instead of plan mode -Claude Code's built-in plan mode (`Shift+Tab` → "plan") is unstructured — plans aren't saved as files, have no consistent format, and disappear when the session ends. Use `/spec` as a drop-in replacement: plans are saved as structured markdown in `docs/plans/`, persist across sessions, and drive a complete workflow with TDD and verification. See the [spec workflow guide](/docs/workflows/spec). -::: +Pilot preserves your `defaultMode` across updates. ## Auto Mode -Pilot Shell passes `--enable-auto-mode` to Claude Code at launch, making Auto Mode available in the `Shift+Tab` permission cycle. Auto Mode lets Claude execute actions without showing permission prompts — a separate classifier model reviews each action before it runs and blocks anything that escalates beyond the task scope. - -:::warning Availability -Auto Mode is available on **Max, Team, or Enterprise plans**, or with **API access**. It is **not available on the Pro plan**. On Team and Enterprise plans, an admin must enable it in [Claude Code admin settings](https://claude.ai/admin-settings/claude-code) before users can turn it on. It also requires **Claude Sonnet 4.6 or Opus 4.7** — older models and third-party providers (Bedrock, Vertex, Foundry) are not supported. -::: - -### How the Classifier Works - -The classifier runs on Claude Sonnet 4.6 (even if your main session uses a different model). Each action goes through a fixed decision order — the first matching step wins: - -1. Actions matching your allow/deny rules resolve immediately -2. Read-only actions and file edits in your working directory are auto-approved -3. Everything else goes to the classifier -4. If the classifier blocks an action, Claude receives the reason and tries an alternative approach - -The classifier receives user messages and tool calls as input, with Claude's own text and tool results stripped out. It also reads your `CLAUDE.md` content, so project instructions factor into allow/block decisions. Because tool results never reach the classifier, hostile content in files or web pages cannot manipulate it. - -**Cost:** Classifier calls count toward your token usage. The extra cost comes mainly from shell commands and network operations — read-only actions and file edits don't trigger a classifier call. - -### Blocked by Default - -- Downloading and executing code (`curl | bash`, scripts from cloned repos) -- Sending sensitive data to external endpoints -- Production deploys and migrations -- Mass deletion on cloud storage -- Granting IAM or repository permissions -- Modifying shared infrastructure -- Irreversibly destroying files that existed before the session started -- Destructive source control operations (force push, pushing directly to `main`) - -### Allowed by Default - -- Local file operations in your working directory -- Installing dependencies from your lock files or manifests -- Reading `.env` and sending credentials to their matching API -- Read-only HTTP requests -- Pushing to the branch you started on or one Claude created - -### Configuring Trusted Infrastructure - -The classifier trusts your working directory and your git repo's configured remotes. Everything else is treated as external. If Auto Mode blocks something routine for your team — like pushing to your org's repo or writing to a company bucket — an administrator can add trusted infrastructure via `autoMode.environment` in managed settings. See [Configure the auto mode classifier](https://code.claude.com/docs/en/permissions#configure-the-auto-mode-classifier) for the full guide. - -### Subagents in Auto Mode - -When Claude spawns a subagent, the classifier evaluates the delegated task before the subagent starts. Inside the subagent, Auto Mode runs with the same block and allow rules as the parent session. When the subagent finishes, the classifier reviews its full action history — if it flags a concern, a security warning is prepended to the results. - -### Fallback Behavior - -If the classifier blocks 3 consecutive actions or 20 total in one session, Auto Mode pauses and Claude Code resumes standard permission prompts. Approving a prompted action resets the counters so you can continue in Auto Mode. - -## Comparison +Auto Mode runs a classifier on each action before it executes, blocking anything outside the task scope. Available on **Max, Team, or Enterprise** plans (not Pro). Requires Claude Sonnet 4.6 or Opus 4.7. -| | Normal | Accept Edits | Plan | Auto | Bypass Permissions | -|---|---|---|---|---|---| -| **Permission prompts** | File edits and commands | Commands only | File edits and commands | None unless fallback triggers | None | -| **Safety checks** | You review each action | You review commands | You review each action | Classifier reviews commands | None | -| **Token usage** | Standard | Standard | Standard | Higher (classifier calls) | Standard | -| **Best with Pilot** | Exploring unfamiliar code | Quick Mode daily work | Reviewing before `/spec` | Long autonomous tasks | `/spec` workflow (default) | +Blocked by default: downloading and executing scripts, production deploys, mass deletion, IAM changes, force-push to main. Allowed: local file operations, installing from lock files, read-only HTTP. -## Further Reading +If the classifier blocks 3 consecutive or 20 total actions, Auto Mode pauses and standard prompts resume. -- [Claude Code permission modes](https://code.claude.com/docs/en/permission-modes) — full reference for all modes -- [Claude Code permissions](https://code.claude.com/docs/en/permissions) — allow, ask, and deny rules -- [Auto Mode announcement](https://claude.com/blog/auto-mode) — how the classifier is designed +See [Claude Code permission modes](https://code.claude.com/docs/en/permission-modes) for the full reference. diff --git a/docs/docusaurus/docs/features/remote-control.md b/docs/docusaurus/docs/features/remote-control.md index daea9a17e..6d6127ef5 100644 --- a/docs/docusaurus/docs/features/remote-control.md +++ b/docs/docusaurus/docs/features/remote-control.md @@ -6,6 +6,10 @@ description: Control Pilot Shell sessions from your phone, tablet, or any browse # [Remote Control](https://youtu.be/Ko7_tC1fMMM?si=kWDzYiQvxlkZTrRK) +:::warning Claude Code only +Remote Control and Channels require Claude Code. They are not available with Codex CLI. +::: + Control Pilot Shell sessions from your phone, tablet, or any browser. Start a `/spec` task at your desk, then monitor and steer it from the couch. Your full local environment stays available — filesystem, MCP servers, hooks, rules, and project configuration. Nothing moves to the cloud. @@ -23,13 +27,13 @@ You also need a **Pro, Max, Team, or Enterprise** Claude subscription. API keys ## Setup -### 1. Start Pilot Shell +### 1. Start Claude Code ```bash -pilot +claude ``` -Loads all hooks, rules, MCP servers, and project configuration. +Pilot Shell loads automatically with all hooks, rules, and MCP servers. ### 2. Activate Remote Control @@ -46,7 +50,7 @@ You can also connect from any browser at [claude.ai/code](https://claude.ai/code ## How it works -Sessions started via `pilot` carry over all rules, hooks, MCP servers, and project configuration. The Claude App and web interface are just a window into your local session — your machine does all the work. +Sessions carry over all rules, hooks, MCP servers, and project configuration. The Claude App and web interface are just a window into your local session — your machine does all the work. - **Full Pilot Shell experience** — hooks, rules, skills, MCP servers all stay active - **Outbound-only** — no ports open on your machine, all traffic over TLS @@ -55,11 +59,11 @@ Sessions started via `pilot` carry over all rules, hooks, MCP servers, and proje ## Starting sessions from your phone via SSH -The setup above assumes you start sessions via `pilot` on your computer first. To start new sessions from your phone instead: +The setup above assumes you start sessions on your computer first. To start new sessions from your phone instead: 1. Install [Termius](https://termius.com/) on your phone (not your computer) 2. SSH into your computer -3. Run `pilot` in any project directory +3. Run `claude` in any project directory ## Keeping your computer reachable @@ -76,9 +80,9 @@ For the Claude App or browser to stay connected, and for SSH to work when you're [Channels](https://code.claude.com/docs/en/channels) push messages from external platforms directly into your running Pilot session. Claude reads the message, acts on it with your full local environment, and replies through the same platform. ```bash -pilot --channels plugin:telegram@claude-plugins-official -pilot --channels plugin:discord@claude-plugins-official -pilot --channels plugin:imessage@claude-plugins-official +claude --channels plugin:telegram@claude-plugins-official +claude --channels plugin:discord@claude-plugins-official +claude --channels plugin:imessage@claude-plugins-official ``` Channels require [Bun](https://bun.sh/) and a one-time bot setup (Telegram/Discord) or macOS (iMessage). Each channel maintains a sender allowlist — only paired users can push messages. diff --git a/docs/docusaurus/docs/features/rules.md b/docs/docusaurus/docs/features/rules.md index 9f93c3846..69d1ee058 100644 --- a/docs/docusaurus/docs/features/rules.md +++ b/docs/docusaurus/docs/features/rules.md @@ -1,14 +1,19 @@ --- sidebar_position: 4 title: Rules & Standards -description: Production-tested rules and standards loaded into every Claude Code session — TDD enforcement, language standards, MCP routing, and project conventions. +description: Production-tested rules and standards loaded into every Claude Code and Codex session — TDD enforcement, language standards, MCP routing, and project conventions. --- # Rules & Standards Production-tested best practices loaded into every session. -Rules load automatically at session start — they're enforced standards, not suggestions. Pilot ships 10 built-in rules plus 5 coding standards. Coding standards load conditionally by file type to keep context lean. Your project-level rules in `.claude/rules/` take precedence over Pilot's built-ins. +Rules load automatically at session start — enforced standards, not suggestions. Pilot ships 10 built-in rules plus 5 coding standards activated by file type. + +- **Claude Code:** rules in `~/.claude/rules/` (global) and `.claude/rules/` (project). Project rules take precedence. +- **Codex:** rules delivered via `~/.codex/AGENTS.md`, adapted from the same source. Custom rules work the same way. + +Run `/setup-rules` (or `$setup-rules` on Codex) to generate project-specific rules from your codebase. ## Built-in Rule Categories @@ -57,10 +62,6 @@ The default `testing.md` rule defines the agent's testing posture. Pilot's defau - **TDD with documented escapes.** Red-green-refactor remains the default. The `Trivial:` plan-task field is the documented opt-out for changes ≤ 5 net new lines with no new branch, public symbol, or error path, and it must name an existing covering test or verification command. Bugfixes never qualify — a reproducing RED test is the bugfix lane's anti-regression guarantee. - **Post-implementation enforcement.** The `changes-review` reviewer agent and the `spec-verify` Step 5 audit ("Test Parsimony Audit" + "Trivial: claim audit") verify the doctrine against the actual diff — the planner's claim is not authoritative. -### Why parsimony - -User feedback (paraphrased): *"90% of the time, the agent tries to test too much, doing redundant tests that just add to the maintenance cost. It also tends to create one test class for each tiny part of the logic of a given class instead of 1 class = 1 unit test + 1 functional test (if needed)."* The default codifies that complaint as the rule. - ### Override the testing posture If your project wants strict TDD on every change, blanket 80 % coverage, or a different posture, create `.claude/rules/testing-project.md` in the repository — it shadows Pilot's global `testing.md`. The shadow file can be as small as a few overriding sections (the rest of the global rule still applies). Suggested structure: diff --git a/docs/docusaurus/docs/features/statusline.md b/docs/docusaurus/docs/features/statusline.md index 67e5d56b3..6068fdf2d 100644 --- a/docs/docusaurus/docs/features/statusline.md +++ b/docs/docusaurus/docs/features/statusline.md @@ -1,118 +1,46 @@ --- sidebar_position: 7 title: Status Line -description: Real-time session dashboard rendered below every Claude Code response — token usage, cost, model, branch, plan status, and active hooks at a glance. +description: Real-time session dashboard rendered below every Claude Code response — token usage, cost, model, branch, plan status, and savings at a glance. --- # Status Line -Real-time session dashboard rendered below every Claude Code response. - -Pilot Shell replaces the default Claude Code status line with a rich, three-line display. It receives JSON from Claude Code via stdin and renders model info, context usage, git state, costs, usage limits, spec progress, and version info — all color-coded and updated after every response. - -## Layout - -The status line has three lines: - -**Subscription users** (Pro / Max — Claude Code emits `rate_limits` on stdin): -``` -Line 1: Opus 4.7 [1M] | █████░▓ 60% | 5h: 42% ⇡ 2h | 7d: 18% ⇣ 4d | Savings: 65% -Line 2: Spec: my-feature feature [implement] ████░░░░ 3/6 -Line 3: Pilot 8.4.0 (Solo) · CC 2.1.80 (Max) · sessions 2 · memories 12 -``` - -**API / Enterprise users** (no `rate_limits` in stdin): -``` -Line 1: Opus 4.7 [1M] | █████░▓ 60% | +156 -23 | main +2 ~3 | $1.45 | Savings: 65% -Line 2: Spec: my-feature feature [implement] ████░░░░ 3/6 -Line 3: Pilot 8.4.0 (Solo) · CC 2.1.80 · sessions 2 · memories 12 -``` - -The layout is symmetric: slots 3 and 4 swap between `5h | 7d` and `lines | git` based on what Claude Code provides on stdin. Cost is shown only for API / Enterprise users — on Pro / Max the subscription covers usage, so a dollar figure is noise and is suppressed. Savings always anchors the right side. - -### Line 1 — Session Metrics - -Widgets separated by `|`, from left to right: - -| Widget | Description | Color coding | -|--------|-------------|--------------| -| **Model** | Active model in short form (`Opus 4.7 [1M]`, `Sonnet 4.6`). Legacy / pinned IDs such as `claude-opus-4-6`, `claude-sonnet-4-5-20250929`, or retired `claude-3-*` variants resolve to friendly labels (`Opus 4.6`, `Sonnet 4.5`, `Sonnet 3.7`, …). Unknown IDs display verbatim. | Cyan | -| **Context** | Effective context usage with progress bar and buffer indicator (`▓`). The session percentage alone is sufficient — no raw token count is shown. | Green < 80%, Yellow 80–95%, Red 95%+ | -| **Lines changed** | Session lines added/removed (`+156 -23`). Hidden when `rate_limits` is present. | Green for added, Red for removed | -| **Git** | Branch name with staged (`+N`) and unstaged (`~N`) counts. Shows worktree branch with `wt` suffix when in a spec worktree. Hidden when `rate_limits` is present. | Magenta branch, Green staged, Yellow unstaged | -| **Cost** | Session cost in USD. Hidden when `rate_limits` is present — on Pro / Max the subscription covers API usage, so the dollar figure is noise. | Green < $1, Yellow $1–5, Red $5+ | -| **5h usage** | 5-hour usage percentage with pacing arrow and reset countdown (`5h: 42% ⇡ 2h`). See pacing rules below. Only shown when `rate_limits` is available. | Green < 70%, Yellow 70–90%, Red 90%+ | -| **7d usage** | 7-day usage percentage with pacing arrow and reset countdown (`7d: 18% ⇣ 4d`). Only shown when `rate_limits` is available. | Same as 5h | -| **Savings** | Token savings percentage from RTK proxy (`Savings: N%`). Always shown when RTK has data, regardless of whether usage info is present. | Cyan | - -:::info Usage Limits — Cross-Platform -Claude Code 2.1.80+ emits `rate_limits` directly on stdin for **subscription plans** (Pro / Max). Pilot reads these values with no network calls, no OAuth credentials, and no platform restrictions — it works identically on macOS, Linux, and Windows. - -**API / Enterprise users** do not receive `rate_limits`, so the status line falls back to showing lines-changed, git branch, and RTK savings instead. No configuration needed — the display adapts automatically to whatever data Claude Code provides. +:::warning Claude Code only +The status line is not available with Codex CLI. It uses a Claude Code-specific stdin API that Codex does not support. ::: -#### Pacing Arrow - -When a usage widget is shown, Pilot compares the used percentage to the *expected* percentage based on how much of the window has elapsed: - -- **⇡** (red) — burning quota **faster** than the clock. On track to hit the limit before reset. -- **⇣** (green) — burning quota **slower** than the clock. Surplus budget available. -- *(no arrow)* — within ±3 percentage points of linear pace. On schedule. +Three-line session dashboard rendered below every Claude Code response. -Example: 150 minutes into the 5-hour window (half elapsed), used is 90% → `5h: 90% ⇡ 2h 30m` — clearly over pace and heading for the limit. Used 5% in the same situation → `5h: 5% ⇣ 2h 30m` — well under pace. - -### Line 2 — Mode - -**Quick Mode** (no active spec): ``` -Quick Mode · /spec for feature implementation and complex bugfixes +Opus 4.7 [1M] | █████░▓ 60% | 5h: 42% ⇡ 2h | 7d: 18% ⇣ 4d | Savings: 65% +Spec: my-feature feature [implement] ████░░░░ 3/6 +Pilot 8.4.0 (Solo) · CC 2.1.80 (Max) · sessions 2 · memories 12 ``` -**Spec Mode** (active `/spec` plan): -``` -Spec: my-feature feature [implement] ████░░░░ 3/6 iter:2 -``` +## Line 1 — Session Metrics -| Field | Description | -|-------|-------------| -| **Name** | Plan name (date prefix stripped) | -| **Type** | `feature` (cyan) or `bugfix` (yellow) | -| **Phase** | `plan` (blue), `implement` (yellow), or `verify` (cyan) | -| **Progress bar** | Visual task completion (filled █ / empty ░) | -| **Count** | Completed/total tasks | -| **Iterations** | `iter:N` shown when verification has looped back (N > 0) | +| Widget | What it shows | +|--------|---------------| +| **Model** | Active model (`Opus 4.7 [1M]`, `Sonnet 4.6`) | +| **Context** | Usage bar + percentage. Green < 80%, Yellow 80–95%, Red 95%+ | +| **5h / 7d usage** | Rate-limit percentage with pacing arrow and reset countdown. Shown on Pro/Max subscriptions. Replaces lines+git when present. ⇡ = over pace (red), ⇣ = under pace (green) | +| **Lines / Git** | `+added -removed` and branch with staged/unstaged counts. Shown on API/Enterprise (no rate limits). | +| **Cost** | Session cost in USD. Shown on API/Enterprise only — suppressed on subscription plans. | +| **Savings** | Token savings % from the RTK proxy. Always shown when RTK has data. | -### Line 3 — Version & Session Info +## Line 2 — Mode -``` -Pilot 8.4.0 (Solo) · CC 2.1.79 (Max) · sessions 2 · memories 12 -``` +**Quick Mode:** `Quick Mode · /spec for feature implementation and complex bugfixes` -| Field | Description | -|-------|-------------| -| **Pilot version** | Pilot Shell version with license tier in parentheses: `Solo` (green), `Team` (cyan), `Trial 5d` (yellow), or `pilot-shell.com/pricing` (dim) | -| **CC version** | Claude Code version with subscription type in parentheses: `Max`, `Pro`, `Team`, `Enterprise`. Detected via `claude auth status` (cached 24h) | -| **Sessions** | Number of active Pilot Shell sessions | -| **Memories** | Observations saved to Pilot Console memory during this session (only shown when > 0) | +**Spec Mode:** `Spec: my-feature feature [implement] ████░░░░ 3/6 iter:2` -## Effective Context Display +Shows plan name, type (feature/bugfix), phase (plan/implement/verify), task progress bar, and iteration count. -Claude Code reserves ~16.5% of the context window as a compaction buffer, triggering auto-compaction at ~83.5% raw usage. Pilot rescales this to an **effective 0–100% range** so the progress bar fills to 100% right before compaction fires. The `▓` character at the end of the bar represents the reserved buffer zone. +## Line 3 — Version Info -For a 200K context window, compaction fires at ~167K tokens (83.5%). For a 1M context window, it fires at ~967K tokens (96.7%). The effective percentage normalizes both to 0–100%. +`Pilot <version> (<tier>) · CC <version> (<subscription>) · sessions N · memories N` ## Configuration -The status line is configured automatically during installation in `~/.claude/settings.json`: - -```json -{ - "statusLine": { - "type": "command", - "command": "~/.pilot/bin/pilot statusline", - "padding": 0 - } -} -``` - -No manual setup required — the installer handles this. +Configured automatically during installation in `~/.claude/settings.json` — no manual setup required. diff --git a/docs/docusaurus/docs/getting-started/codex-cli.md b/docs/docusaurus/docs/getting-started/codex-cli.md new file mode 100644 index 000000000..42100b95a --- /dev/null +++ b/docs/docusaurus/docs/getting-started/codex-cli.md @@ -0,0 +1,31 @@ +--- +sidebar_label: Claude Code vs Codex +description: What works with both agents and what requires Claude Code. +--- + +# Claude Code vs Codex + +Pilot Shell supports both agents. **Claude Code is the preferred agent** — it has full feature coverage. Codex works for all daily development workflows. + +## Works on Both + +All core and additional workflows run on both agents. Use `/` on Claude Code and `$` on Codex: + +- `/spec`, `/fix`, `/prd`, `/benchmark`, `/setup-rules`, `/create-skill` +- Console — all 10 views, persistent memory, sessions, and memories shared between agents +- Quality hooks and TDD enforcement +- MCP servers (CodeGraph, Semble, mem-search, web-search, and more) +- Rules, standards, and context optimization +- Spec-review and changes-review agents + +## Claude Code Only + +- **Status line** — real-time session metrics below every response +- **Pilot Bot** — scheduled tasks and background automation +- **Remote control** — connect from the Claude app / browser, plus channels (Telegram, Discord, iMessage) +- **Language Server integration** — LSP-driven diagnostics, hover docs, and go-to-definition (Codex still gets the same linting via the file-checker hook) +- **Model switching** — `/model` command to change models mid-session (Codex sets model via `codex --model` or `config.toml`) +- **Permission modes** — `Shift+Tab` cycle and Auto Mode classifier (Codex uses `approval_policy` in `config.toml`) +- **Codex companion reviews** — OpenAI adversarial review launched from within Claude Code +- **Team-sharing of extensions** — push/pull of `~/.claude/` extensions through a git remote +- **Commands** — slash-command extensions in `.claude/commands/` (Codex has no command primitive) diff --git a/docs/docusaurus/docs/getting-started/installation.md b/docs/docusaurus/docs/getting-started/installation.md index 3d95ae6b2..c88b6815e 100644 --- a/docs/docusaurus/docs/getting-started/installation.md +++ b/docs/docusaurus/docs/getting-started/installation.md @@ -14,36 +14,39 @@ Works with any existing project — no scaffolding required. curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash ``` -Run from any directory — it installs globally to `~/.pilot/` and `~/.claude/`. After installation, `cd` into any project and run `pilot` or `ccp` to start. +Run from any directory — it installs globally to `~/.pilot/` and `~/.claude/` (and `~/.codex/` if Codex CLI is detected). After installation, just run `claude` or `codex` directly — Pilot Shell loads automatically. ## What the Installer Does -7 steps with progress tracking and rollback on failure: +9 steps with progress tracking and rollback on failure. Steps 3 and 4 are agent-conditional — they skip cleanly when the matching agent CLI is not detected on your system. The installer **does not install Claude Code or Codex CLI itself**; you install at least one of them yourself per [Prerequisites](./prerequisites). | Step | Title | Description | |------|-------|-------------| -| 1 | Prerequisites | Checks/installs Homebrew, Node.js, Python 3.12+, uv, git, jq | -| 2 | Claude files | Sets up `~/.claude/` plugin — rules, commands, hooks, MCP servers | -| 3 | Config files | Creates `.nvmrc` and project config | -| 4 | Dependencies | Installs Semble, RTK, CodeGraph, Chrome DevTools MCP, playwright-cli, agent-browser, language servers | -| 5 | Shell integration | Auto-configures bash, fish, and zsh with the `pilot` alias. Add `# pilot-shell:managed-elsewhere` to a config file to opt out (for framework-managed shells) | -| 6 | VS Code extensions | Installs recommended extensions for your language stack | -| 7 | Finalize | Success message with next steps | +| 1 | Prerequisites | Checks/installs Homebrew, Node.js, Python 3.12+, uv, git, jq. Verifies at least one supported agent (Claude Code or Codex CLI) is on the system; aborts with a clear error otherwise. | +| 2 | Pilot files | Installs agent-neutral Pilot Shell-managed assets: hooks to `~/.pilot/hooks/`, Console scripts/UI to `~/.pilot/`, raw rule sources to `~/.pilot/rules/` (read by Codex's adapter), and the canonical skill source to `~/.claude/skills/` (used by both agents — Claude reads it natively, Codex adapts it). Always runs. | +| 3 | Claude files | Installs Claude-specific assets to `~/.claude/`: rules, sub-agents, and `settings.json` (three-way merged); plus Claude post-install merges (hooks into settings, `~/.claude.json` MCP block, model config). **Skipped when Claude Code CLI is not detected.** | +| 4 | Codex files | Installs Codex-specific assets: adapted skills to `~/.agents/skills/`, review agents to `~/.codex/agents/`, `~/.codex/AGENTS.md`, `~/.codex/config.toml`, `~/.codex/hooks.json`. Per-category counts mirror the Claude section. **Skipped when Codex CLI is not detected.** | +| 5 | Config files | Creates `.nvmrc` and project config | +| 6 | Dependencies | Installs Semble, RTK, CodeGraph, Chrome DevTools MCP, playwright-cli, agent-browser, language servers, plus the `codex@openai-codex` Claude marketplace plugin. Claude-side plugins (Codex companion plugin, Chrome DevTools MCP plugin, LSP plugins) are skipped on Codex-only systems. | +| 7 | Shell integration | Auto-configures bash, fish, and zsh with the `pilot` alias. Add `# pilot-shell:managed-elsewhere` to a config file to opt out (for framework-managed shells) | +| 8 | VS Code extensions | Installs recommended extensions for your language stack | +| 9 | Finalize | Success message with next steps | ## Browser Automation -For the best browser automation and E2E testing experience, install the [Claude Code Chrome extension](https://code.claude.com/docs/en/chrome). It provides richer visual context and direct access to your existing browser sessions. +Pilot installs three browser tools automatically: **Chrome DevTools MCP**, **playwright-cli**, and **agent-browser**. -Pilot uses a 4-tier browser tool selection: **Chrome extension** (preferred) → **[Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp)** (enterprise fallback via CDP — Lighthouse, performance tracing, device emulation) → **playwright-cli** (thorough E2E with persistent sessions, tracing, network mocking) → **agent-browser** (lightweight, fast startup). The three CLI/MCP tools are installed automatically. The Chrome extension must be installed manually via the browser extension store. In environments where the Chrome extension can't be installed (enterprise restrictions, dev containers), Pilot falls back to Chrome DevTools MCP first, then to CLI tools. +- **Claude Code:** also install the [Claude Code Chrome extension](https://code.claude.com/docs/en/chrome) for the richest browser context. Tier order: Chrome extension → Chrome DevTools MCP → playwright-cli → agent-browser. +- **Codex CLI:** the Chrome extension is not available. Tier order: Chrome DevTools MCP → playwright-cli → agent-browser. -## Codex Plugin (Included) +## Codex Companion Plugin (Included) -The [Codex plugin](https://github.com/openai/codex-plugin-cc) is installed automatically by the Pilot installer. To activate it: +The [Codex companion plugin](https://github.com/openai/codex-plugin-cc) is installed automatically by the Pilot installer. It provides adversarial code review powered by OpenAI — an independent second opinion during Claude Code's `/spec` planning and verification. 1. Run `/codex:setup` in any Pilot session to authenticate with your OpenAI account -2. Enable the Codex reviewers in Console Settings → Reviewers +2. Enable the Codex Companion Reviewers in Console Settings → Reviewers -When enabled, Codex provides an independent adversarial review during `/spec` planning and verification phases. A [ChatGPT Plus](https://chatgpt.com/#pricing) subscription ($20/mo) covers the Codex API usage needed for code reviews. +This is separate from [Codex CLI support](/docs/getting-started/codex-cli) — the companion plugin runs from within Claude Code, while Codex CLI is a standalone agent. ## Dev Container @@ -71,27 +74,28 @@ See [releases](https://github.com/maxritter/pilot-shell/releases) for all availa curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/uninstall.sh | bash ``` -Removes binary, plugin files, managed commands/rules, settings, and shell aliases. Your project's custom `.claude/` files are preserved. +Removes binary, plugin files, managed commands/rules, settings, and shell aliases. Your project's custom `.claude/` and `.codex/` files are preserved. ## Reset & Refresh -Claude Code's session logs and Pilot's caches grow over time and can degrade performance. A periodic reset every few weeks restores a clean baseline. +Accumulated session logs and Pilot's caches grow over time and can degrade performance. A periodic reset every few weeks restores a clean baseline. ```bash -# 1. Inside Claude Code, log out +# 1. If using Claude Code, log out first /logout # 2. Back up your current config (just in case) mv ~/.claude.json ~/.claude.json.bak mv ~/.claude ~/.claude.bak +mv ~/.codex ~/.codex.bak mv ~/.pilot ~/.pilot.bak # 3. Reinstall Pilot Shell from the official installer curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash -# 4. Start Pilot, sign in to Claude, and re-activate your license -pilot +# 4. Re-activate your license, then start your agent pilot activate <your-license-key> +claude # or: codex ``` Once Pilot Shell is running smoothly again, you can delete the `.bak` copies. Forgot your license key? Recover it in the [Pilot members area](https://polar.sh/max-ritter/portal). diff --git a/docs/docusaurus/docs/getting-started/prerequisites.md b/docs/docusaurus/docs/getting-started/prerequisites.md index b4737d3b2..09c16829b 100644 --- a/docs/docusaurus/docs/getting-started/prerequisites.md +++ b/docs/docusaurus/docs/getting-started/prerequisites.md @@ -1,20 +1,20 @@ --- sidebar_position: 1 title: Prerequisites -description: What you need before installing Pilot Shell — Claude Code installed via the native installer, an active Anthropic subscription, and a POSIX shell environment. +description: What you need before installing Pilot Shell — at least one AI agent (Claude Code or Codex CLI), a subscription or API key, and a POSIX shell environment. --- # Prerequisites What you need before installing Pilot Shell. -## Claude Code +## At Least One AI Agent -Install [Claude Code](https://code.claude.com/docs/en/quickstart) using the **native installer** before setting up Pilot Shell. If you have the `npm` or `brew` version installed, uninstall it first. If no Claude Code installation is detected, the Pilot installer will attempt to set it up for you. +Pilot Shell supports **Claude Code** (Anthropic, primary — full feature coverage) and **Codex CLI** (OpenAI — all workflows, fewer platform features). Install at least one. The Pilot installer auto-detects and configures both. -## Claude Subscription +### Claude Code -Pilot enhances Claude Code — it doesn't replace it. You need an active Claude subscription. Solo developers, teams, and enterprise organizations are all supported. +Install [Claude Code](https://code.claude.com/docs/en/quickstart) using the **native installer**. If you have the `npm` or `brew` version, uninstall it first. Requires a Claude subscription: | Plan | Audience | Notes | |------|----------|-------| @@ -23,13 +23,21 @@ Pilot enhances Claude Code — it doesn't replace it. You need an active Claude | **Team Premium** | Teams | 6.25x usage per member + SSO, admin tools, billing management | | **Enterprise** | Companies | For organizations with compliance, procurement, or security requirements | -## Codex Plugin (Included) +### Codex CLI -The [Codex plugin](https://github.com/openai/codex-plugin-cc) is installed automatically with Pilot. It provides adversarial code review powered by OpenAI Codex — an independent second opinion during `/spec` planning and verification phases. +Install [Codex CLI](https://developers.openai.com/codex/cli) using the **native installer**. If you have the `npm` or `brew` version, uninstall it first. Authenticate with your ChatGPT account or an API key on first run. Requires a ChatGPT subscription: -**Setup:** Run `/codex:setup` once to authenticate with your OpenAI account, then enable the reviewers in Console Settings → Reviewers. Pilot auto-detects the plugin — Codex reviewer toggles appear grayed out until setup is complete. +| Plan | Audience | Notes | +|------|----------|-------| +| **Plus** | Solo — moderate usage | Good for focused coding sessions each week | +| **Pro $100** | Solo — heavy usage | 5x Plus limits — recommended for full-time AI-assisted development | +| **Pro $200** | Solo — very heavy usage | 20x Plus limits — for the most intensive workflows | +| **Business** | Teams | Flexible credit-based pricing, cloud integrations | +| **Enterprise** | Companies | Compliance, admin tools, organization-wide access | + +**API key** is available for automation and CI but is pay-per-token and lacks cloud features (GitHub code review, Slack integration, delayed model access). See [pricing](https://developers.openai.com/codex/pricing) for details. -A [ChatGPT Plus](https://chatgpt.com/#pricing) subscription ($20/mo) covers the Codex API usage needed for code reviews. +See the [agent comparison](/docs/getting-started/codex-cli) for the full feature matrix. ## System Requirements diff --git a/docs/docusaurus/docs/intro.md b/docs/docusaurus/docs/intro.md index fec54a7a8..d66815562 100644 --- a/docs/docusaurus/docs/intro.md +++ b/docs/docusaurus/docs/intro.md @@ -2,12 +2,12 @@ slug: / sidebar_position: 0 title: Introduction -description: Complete technical reference for Pilot Shell — how real engineers run Claude Code, with spec-driven plans, enforced TDD, persistent memory, and quality hooks. +description: Complete technical reference for Pilot Shell — how real engineers run Claude Code and Codex CLI, with spec-driven plans, enforced TDD, persistent memory, and quality hooks. --- # Pilot Shell Documentation -**Pilot Shell** is how real engineers run Claude Code. You get plans you can review before a single line is written, tests that are enforced — not optional, knowledge that persists across sessions, and quality gates that run automatically on every edit. +**Pilot Shell** is how real engineers run Claude Code and Codex CLI. You get plans you can review before a single line is written, tests that are enforced — not optional, knowledge that persists across sessions, and quality gates that run automatically on every edit. No more re-explaining decisions, chasing skipped tests, or reviewing 15-file changes that were never scoped. Pilot adds the structure that turns fast AI output into reliable production code. @@ -24,31 +24,32 @@ No more re-explaining decisions, chasing skipped tests, or reviewing 15-file cha # Install curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash -# Start -cd your-project && pilot +# Start with Claude Code or Codex CLI (Pilot loads automatically) +cd your-project +claude # Claude Code — full feature set +codex # Codex CLI — all core workflows # Generate project rules -> /setup-rules +> /setup-rules # Codex: $setup-rules # Create a reusable skill -> /create-skill +> /create-skill # Codex: $create-skill # Brainstorm a vague idea into a PRD (with optional research) -> /prd "Add real-time notifications for team updates" +> /prd "Add real-time notifications for team updates" # Codex: $prd # Plan and build a feature -> /spec "Add user authentication with OAuth" +> /spec "Add user authentication with OAuth" # Codex: $spec ``` ## Architecture -Pilot enhances Claude Code with: +Pilot enhances Claude Code and Codex CLI with: -- **18 hook registrations** across 7 lifecycle events for automatic quality and security enforcement (includes a built-in credential scanner) -- **6 MCP servers** for library docs, memory, web search, code search, page fetching, and code intelligence -- **3 language servers** (Python, TypeScript, Go) for real-time diagnostics -- **Intelligent model routing** — Opus for planning, Sonnet for implementation -- **Persistent memory** via local SQLite — decisions and context survive across sessions -- **Pilot Console** — local web dashboard for monitoring, configuration, and skill sharing +- **Quality hooks** — auto-format, lint, type-check, and TDD enforcement on every file edit +- **7 MCP servers** — library docs, persistent memory, web search, code search, page fetching, code intelligence +- **3 language servers** *(Claude Code only)* — Python (basedpyright), TypeScript (vtsls), Go (gopls) +- **Persistent memory** — decisions and context survive across sessions in a local SQLite database +- **Pilot Console** — local web dashboard at `localhost:41777` for monitoring, configuration, and skill sharing -Explore the sidebar for [getting started](/docs/getting-started/prerequisites), [workflows](/docs/workflows/setup-rules), and [features](/docs/features/console). +Explore the sidebar for [getting started](/docs/getting-started/prerequisites), [workflows](/docs/workflows/prd), and [features](/docs/features/console). diff --git a/docs/docusaurus/docs/workflows/benchmark.md b/docs/docusaurus/docs/workflows/benchmark.md index c41b1fc07..ea512195f 100644 --- a/docs/docusaurus/docs/workflows/benchmark.md +++ b/docs/docusaurus/docs/workflows/benchmark.md @@ -9,9 +9,10 @@ description: Measure the real impact of rules and skills with quantitative befor Measure whether your rules and skills actually improve outputs — with numbers, not vibes. ```bash -pilot -> /benchmark pilot/skills/create-skill -> /benchmark pilot/rules/testing.md +# Claude Code # Codex CLI +claude codex +> /benchmark pilot/skills/create-skill > $benchmark pilot/skills/create-skill +> /benchmark pilot/rules/testing.md > $benchmark pilot/rules/testing.md ``` ## When to use @@ -22,12 +23,12 @@ pilot ## How it works -`/benchmark` runs your prompts twice — **with** and **without** the target — grades both against falsifiable assertions, and shows the results inline. +`/benchmark` runs your prompts twice — **with** and **without** the target — grades both against falsifiable assertions, and shows the results inline. Works with both Claude Code and Codex (Codex uses `codex exec` headless mode under the hood; pass `--agent codex` to benchmark against Codex). | Type | `with` | `without` | |------|--------|-----------| -| `skill` | Skill installed | Empty `.claude/` | -| `rule` | Rule loaded | Empty `.claude/` | +| `skill` | Skill installed in the active agent's directory | Empty agent directory | +| `rule` | Rule loaded by the active agent | No rule loaded | ## Writing good assertions @@ -142,7 +143,7 @@ Then `/benchmark` asks which path to take: ## Gotcha: global contamination -A globally-installed copy of your target in `~/.claude/` would leak into the `without` config. The runner moves it aside for the run and restores it automatically. +A globally-installed copy of your target in `~/.claude/` (or `~/.codex/` / `~/.agents/`) would leak into the `without` config. The runner moves it aside for the run and restores it automatically. Pass `--no-isolate-global` to measure "target + your day-to-day setup" instead. diff --git a/docs/docusaurus/docs/workflows/create-skill.md b/docs/docusaurus/docs/workflows/create-skill.md index daf00f767..5d9c701b5 100644 --- a/docs/docusaurus/docs/workflows/create-skill.md +++ b/docs/docusaurus/docs/workflows/create-skill.md @@ -1,7 +1,7 @@ --- sidebar_position: 2 title: /create-skill -description: Build reusable Claude Code skills from any topic — /create-skill explores the codebase, drafts the SKILL.md interactively, and ships a slash command. +description: Build reusable skills for Claude Code and Codex from any topic — /create-skill explores the codebase and drafts the SKILL.md interactively. --- # /create-skill @@ -11,10 +11,17 @@ Build a reusable skill from any topic. Provide a topic or workflow description, and `/create-skill` explores the codebase, gathers relevant patterns, and builds a well-structured skill interactively with you. If no topic is given, it evaluates the current session for extractable knowledge. ```bash -$ pilot +# Claude Code +claude > /create-skill > /create-skill "Vue 3 test migration pattern" > /create-skill "How we set up MFE local development" + +# Codex CLI +codex +> $create-skill +> $create-skill "Vue 3 test migration pattern" +> $create-skill "How we set up MFE local development" ``` ## What /create-skill Does @@ -26,7 +33,7 @@ $ pilot | 0 | Load reference — use case categories, complexity spectrum, file structure, template, frontmatter fields, description formula, security restrictions | | 1 | Understand the topic — explore codebase for relevant patterns, or evaluate session for extractable knowledge | | 2 | Check existing skills — avoid duplicates, identify update opportunities | -| 3 | Create skill — write to `.claude/skills/` (project) or `~/.claude/skills/` (global), apply portability and determinism checklists | +| 3 | Create skill — write to the active agent's skills directory (`.claude/skills/` or `~/.claude/skills/` for Claude Code; `.agents/skills/` or `~/.agents/skills/` for Codex), apply portability and determinism checklists | | 4 | Quality gates — structure checklist, content checklist, triggering test, iteration signals | | 5 | Test & iterate — run test prompts with sub-agents, evaluate results, optimize description triggering | @@ -68,5 +75,5 @@ your-skill-name/ - You discovered an undocumented tool or API integration pattern :::info -Skills are plain markdown files stored in `.claude/skills/`. They're loaded on-demand when relevant, created by `/create-skill`, and shareable across your team via the **Extensions page**. +Skills are plain markdown files using the same `SKILL.md` format on both agents. They're loaded on-demand when relevant and shareable across your team via the **Extensions page**. Claude Code uses `.claude/skills/` and `~/.claude/skills/`; Codex uses `.agents/skills/` and `~/.agents/skills/`. ::: diff --git a/docs/docusaurus/docs/workflows/fix.md b/docs/docusaurus/docs/workflows/fix.md index fdbfd3c52..5347a5043 100644 --- a/docs/docusaurus/docs/workflows/fix.md +++ b/docs/docusaurus/docs/workflows/fix.md @@ -11,10 +11,17 @@ Bugfix workflow with TDD. Investigates the bug, writes a failing test, fixes at Use `/fix` for bugs. Use [`/spec`](/docs/workflows/spec) for features and architectural changes — including bugfixes that warrant a full plan with approval and code review. ```bash -$ pilot +# Claude Code +claude > /fix "annotation persistence drops fields between save and reload" > /fix "off-by-one in pagination at boundary" > /fix "wrong default for max_retries" + +# Codex CLI +codex +> $fix "annotation persistence drops fields between save and reload" +> $fix "off-by-one in pagination at boundary" +> $fix "wrong default for max_retries" ``` `/fix` is **always quick**. If investigation reveals the bug is multi-component, architectural, or otherwise larger than a quick fix, `/fix` stops cleanly and tells you to re-invoke with `/spec`. It does not silently switch lanes. @@ -45,7 +52,7 @@ The primary correctness signal. Run the actual program with the original input a | Bug surface | Tool | Evidence | | --- | --- | --- | -| **UI / web** | 4-tier browser stack: **Claude Code Chrome** → **Chrome DevTools MCP** → **playwright-cli** → **agent-browser** | Page state, element values | +| **UI / web** | Browser automation — Claude Code prefers its Chrome extension; Codex uses Chrome DevTools MCP. Both fall back to playwright-cli / agent-browser. | Page state, element values | | **CLI** | The exact command the user ran | Stdout, exit code | | **HTTP API** | `curl` / HTTP client with the user's body | Status code, response field | | **Library / SDK / function** | `python -c '…'`, `node -e '…'`, REPL, scratch script | Returned value | diff --git a/docs/docusaurus/docs/workflows/prd.md b/docs/docusaurus/docs/workflows/prd.md index 04d13f971..88776e07e 100644 --- a/docs/docusaurus/docs/workflows/prd.md +++ b/docs/docusaurus/docs/workflows/prd.md @@ -6,13 +6,20 @@ description: Brainstorm vague ideas into Product Requirements Documents through # /prd -`/prd` is the **brainstorming surface** for ideas that aren't yet specs. Use it when you have a vague idea, a problem statement without a solution, or just want to think out loud and have Claude pressure-test directions before committing to a plan. The conversation produces a Product Requirements Document (PRD) you can hand directly to `/spec`. +`/prd` is the **brainstorming surface** for ideas that aren't yet specs. Use it when you have a vague idea, a problem statement without a solution, or just want to think out loud and have the agent pressure-test directions before committing to a plan. The conversation produces a Product Requirements Document (PRD) you can hand directly to `/spec`. ```bash -$ pilot +# Claude Code +claude > /prd "Add real-time notifications for team updates" > /prd "We need better onboarding — users drop off after signup" > /prd "Build an API for third-party integrations" + +# Codex CLI +codex +> $prd "Add real-time notifications for team updates" +> $prd "We need better onboarding — users drop off after signup" +> $prd "Build an API for third-party integrations" ``` ## When to Use @@ -34,8 +41,8 @@ $ pilot `/prd` has two distinct conversational modes — divergent for generating ideas, convergent for locking them down: -- **Divergent (Ideate):** Free-form prose. Claude pitches 3-5 distinct directions, you react ("yes that one, but…"), Claude pressure-tests viability and pitches the next round. No structured forms — this is where the riffing happens. -- **Convergent (Clarify → Converge → Write):** Structured `AskUserQuestion` forms with predefined options. Used once the shape is known and you're nailing down details. +- **Divergent (Ideate):** Free-form prose. The agent pitches 3-5 distinct directions, you react ("yes that one, but…"), it pressure-tests viability and pitches the next round. No structured forms — this is where the riffing happens. +- **Convergent (Clarify → Converge → Write):** Structured questions with predefined options (interactive forms on Claude Code; numbered plain-text on Codex). Used once the shape is known and you're nailing down details. The skill picks the mode automatically based on how concrete your input is. A vague problem statement triggers Ideate; a concrete request like "Add Google OAuth" skips it. @@ -49,7 +56,7 @@ The entire flow is conversational — one question at a time, no rushing to solu ### 1. Understand the Idea -Restates your idea, explores project context with CodeGraph, identifies the core problem, and **scope-checks** — if the request describes multiple independent subsystems (e.g., "build a platform with chat, billing, and analytics"), helps you decompose into multiple PRDs before continuing. Doesn't jump to solutions. +Restates your idea, explores project context with CodeGraph (structure) and Semble (intent), identifies the core problem, and **scope-checks** — if the request describes multiple independent subsystems (e.g., "build a platform with chat, billing, and analytics"), helps you decompose into multiple PRDs before continuing. Doesn't jump to solutions. ### 2. Research (Optional) @@ -65,7 +72,7 @@ Research findings are embedded in the PRD under a dedicated section. ### 2b. Ideate (Optional — Divergent Brainstorming) -When the idea is vague, this step kicks in **before** structured questions. Claude pitches 3-5 distinct directions in plain prose: +When the idea is vague, this step kicks in **before** structured questions. The agent pitches 3-5 distinct directions in plain prose: > A few directions for "better onboarding": > - **Reduce surface area** — cut the signup form to email-only, defer the rest @@ -75,13 +82,13 @@ When the idea is vague, this step kicks in **before** structured questions. Clau > > Which resonate, or where am I off? -You react in your own words. Claude pressure-tests your reaction (where does it break? what does it cost?), then pitches the next round shaped by your answer. Usually 1-3 rounds — the signal to converge is when you start saying "yes, and…" instead of "no, but…". +You react in your own words. The agent pressure-tests your reaction (where does it break? what does it cost?), then pitches the next round shaped by your answer. Usually 1-3 rounds — the signal to converge is when you start saying "yes, and…" instead of "no, but…". -This step is **skipped automatically** when your input is concrete (e.g., "Add Google OAuth" — Claude won't pitch alternatives you didn't ask for). +This step is **skipped automatically** when your input is concrete (e.g., "Add Google OAuth" — the agent won't pitch alternatives you didn't ask for). ### 3. Ask Clarifying Questions -Switches to structured `AskUserQuestion` forms. One question at a time, with 2-4 predefined options each. Focuses on purpose, users, constraints, success criteria, and scope boundaries. Challenges assumptions and surfaces trade-offs. Typically 3-6 questions. +One question at a time with 2-4 options. Focuses on purpose, users, constraints, success criteria, and scope boundaries. Challenges assumptions and surfaces trade-offs. Typically 3-6 questions. (Interactive forms on Claude Code; numbered plain-text on Codex.) ### 4. Propose Approaches @@ -104,11 +111,11 @@ Saves a PRD to `docs/prd/YYYY-MM-DD-<slug>.md` with structured metadata and thes | **Key Decisions** | Trade-offs made during the conversation with reasoning | | **Research Findings** | Embedded research results (when research tier was Standard or Deep) | -After writing, Claude runs a 4-point self-review (placeholders, consistency, scope, ambiguity), then **asks you to open the file in your editor and read it through** before you confirm. If you request changes, Claude edits the specific sections in place — it doesn't rewrite the whole document, so you don't lose your editor scroll position or have to re-read everything. +After writing, the agent runs a 4-point self-review (placeholders, consistency, scope, ambiguity), then **asks you to open the file in your editor and read it through** before you confirm. If you request changes, it edits the specific sections in place — no full rewrite, so you don't lose your editor scroll position. ### 7. Hand Off to /spec -After you confirm the PRD, asks whether to hand off to `/spec` immediately or save for later. If yes, `/spec` is invoked automatically with a reference to the PRD. +After you confirm the PRD, asks whether to hand off to `/spec` (or `$spec` on Codex) immediately or save for later. If yes, the spec workflow is invoked automatically with a reference to the PRD. ## PRD Output diff --git a/docs/docusaurus/docs/workflows/quick-mode.md b/docs/docusaurus/docs/workflows/quick-mode.md deleted file mode 100644 index 662c857ce..000000000 --- a/docs/docusaurus/docs/workflows/quick-mode.md +++ /dev/null @@ -1,32 +0,0 @@ ---- -sidebar_position: 5 -title: Quick Mode -description: Direct execution mode — no plan file, no approval gate. Default for trivial changes; Pilot bails out and recommends /spec when scope exceeds quick-lane. ---- - -# Quick Mode - -Direct execution — no plan file, no approval gate. - -Quick mode is the default interaction model. Just describe your task and Pilot gets it done — no spec file, no approval step, no directory scaffolding. Zero overhead on simple tasks. All quality guardrails still apply — hooks, TDD, type checking — but nothing slows down the interaction. When you need a plan, use `/spec` — not Claude Code's built-in plan mode (Shift+Tab). - -```bash -$ pilot -> Add a loading spinner to the submit button -> Write tests for the OrderService class -> Explain how the auth middleware works -> Rename the "products" table to "items" across the codebase -``` - -## Quality Guardrails Active in Quick Mode - -- Quality hooks — auto-format, lint, type-check on every file edit -- TDD enforcement — write failing tests before implementation -- Context preservation across auto-compaction cycles -- Persistent memory for recalling past decisions and context -- MCP servers (context7, mem-search, web-search, grep-mcp, web-fetch, CodeGraph) -- Language servers — real-time diagnostics and go-to-definition - -:::tip When to use /spec instead -Use `/spec` for bug fixes (root cause investigation with test-before-fix), complex features that need a plan before implementation, or refactors with many interdependent changes. -::: diff --git a/docs/docusaurus/docs/workflows/setup-rules.md b/docs/docusaurus/docs/workflows/setup-rules.md index d1eae76e6..0a3af170e 100644 --- a/docs/docusaurus/docs/workflows/setup-rules.md +++ b/docs/docusaurus/docs/workflows/setup-rules.md @@ -8,11 +8,14 @@ description: Generate project rules and document MCP servers — /setup-rules ex Generate project rules and document your development environment. -Run `/setup-rules` to explore your project structure, discover your conventions and undocumented patterns, update project documentation, and document your MCP servers. This is how Pilot adapts to your project — not the other way around. Run it once initially, then any time your codebase changes significantly. +Run `/setup-rules` (or `$setup-rules` on Codex) to explore your project structure, discover your conventions and undocumented patterns, update project documentation, and document your MCP servers. This is how Pilot adapts to your project — not the other way around. Run it once initially, then any time your codebase changes significantly. + +The skill writes Claude Code rules into `.claude/rules/` and `~/.claude/rules/`, and syncs them into `~/.codex/AGENTS.md` (and the project-level `AGENTS.md` if you keep one) so Codex sees the same context. ```bash -$ pilot -> /setup-rules +# Claude Code # Codex CLI +claude codex +> /setup-rules > $setup-rules ``` ## What /setup-rules Does diff --git a/docs/docusaurus/docs/workflows/spec.md b/docs/docusaurus/docs/workflows/spec.md index 56192da41..82e07a8a8 100644 --- a/docs/docusaurus/docs/workflows/spec.md +++ b/docs/docusaurus/docs/workflows/spec.md @@ -8,14 +8,20 @@ description: Plan, implement, and verify complex features with full automation Plan, implement, and verify complex features with full automation using Spec-Driven Development. -**Replaces Claude Code's built-in plan mode (Shift+Tab).** Best for new features, refactoring, architectural changes — work where a plan and a design discussion add value before code. The structured workflow prevents scope creep and ensures every task is tested and verified before being marked complete. +Best for new features, refactoring, and architectural changes — work where a plan and discussion add value before writing code. On Claude Code, `/spec` replaces the built-in plan mode (Shift+Tab). The structured workflow prevents scope creep and ensures every task is tested and verified before being marked complete. For bugfixes, use [`/fix`](/docs/workflows/fix). For vague ideas, use [`/prd`](/docs/workflows/prd) first to produce a PRD, then hand off to `/spec`. ```bash -$ pilot +# Claude Code +claude > /spec "Add user authentication with OAuth and JWT tokens" > /spec "Migrate the REST API to GraphQL" + +# Codex CLI +codex +> $spec "Add user authentication with OAuth and JWT tokens" +> $spec "Migrate the REST API to GraphQL" ``` ## Workflow @@ -48,7 +54,7 @@ For a bugfix workflow without a plan file, use [`/fix`](/docs/workflows/fix). Wh - Explores codebase with semantic search, asks clarifying questions - Writes detailed spec with scope, tasks, and definition of done - For UI/user-facing features: writes structured **E2E test scenarios** (TS-001, TS-002…) with step-by-step actions and expected results — these become the verification contract for the Verify phase -- Spec-review sub-agent validates completeness (optional, enabled by default) +- Spec-review agent validates completeness in Claude Code or Codex (optional, enabled by default) - Waits for your approval — edit the plan directly, or **annotate it visually** in the Console's Specifications tab (select any text, write a note — annotations save automatically). The agent reads your annotations at the approval checkpoint, revises the plan, and re-asks for approval ### Implement Phase @@ -61,9 +67,9 @@ For a bugfix workflow without a plan file, use [`/fix`](/docs/workflows/fix). Wh ### Verify Phase - Full test suite + type checking + lint + build verification -- Features: unified review sub-agent (optional, enabled by default) +- Features: changes-review agent in Claude Code or Codex (optional, enabled by default) - Bugfixes: regression test + full suite — no sub-agents needed -- For UI features: executes the plan's **E2E test scenarios** step-by-step via browser automation (Claude Code Chrome → Chrome DevTools MCP → playwright-cli → agent-browser) — tracks pass/fail per scenario, auto-fixes failures (up to 2 attempts), escalates persistent failures to known issues; results written back to the plan file +- For UI features: executes the plan's **E2E test scenarios** step-by-step via browser automation — tracks pass/fail per scenario, auto-fixes failures (up to 2 attempts), escalates persistent failures to known issues; results written back to the plan file. Claude Code prefers its Chrome extension; Codex uses the Chrome DevTools MCP. Both fall back to playwright-cli / agent-browser. - Auto-fixes findings, loops back until all checks pass - After automated checks pass, prompts you to **review code changes** in the Console's Changes tab — each file shows a **T{N}** badge linking it to the spec task that changed it, and you can click **Spec** to group files by task for focused review. Enable Review mode to add inline annotations on any diff line (they save automatically), and the agent addresses them before marking the spec as verified @@ -88,7 +94,7 @@ When all three are disabled, `/spec` runs end-to-end without any user interactio | **Spec Review** | On | Validates the plan before implementation — checks alignment and flags risky assumptions | | **Changes Review** | On | Reviews code after implementation — compliance, security, test coverage, and goal achievement | -Both reviewers run in a separate context window and don't consume the main session's context budget. Optional **Codex adversarial reviewers** (off by default) provide an independent second opinion using OpenAI Codex. +Both reviewers run outside the main session context: Claude Code uses sub-agents, and Codex uses custom agents installed under `~/.codex/agents/`. Optional **Codex Companion Reviewers** (off by default) provide a Claude Code plugin second opinion using OpenAI Codex. **Codex runs at most once per `/spec` invocation.** Plan iterations (annotation feedback, verify re-runs, fixing prior findings) reuse the result of the first Codex review instead of re-launching — a sentinel file in the session directory enforces this. The bugfix planning phase no longer runs Codex at all; adversarial review is most valuable on real code, not on a plan. diff --git a/docs/docusaurus/docusaurus.config.ts b/docs/docusaurus/docusaurus.config.ts index a81ee5440..69a3b9b23 100644 --- a/docs/docusaurus/docusaurus.config.ts +++ b/docs/docusaurus/docusaurus.config.ts @@ -4,7 +4,7 @@ import type * as Preset from "@docusaurus/preset-classic"; const config: Config = { title: "Pilot Shell", - tagline: "How real engineers run Claude Code — spec-driven plans, enforced tests, persistent knowledge", + tagline: "How real engineers run Claude Code and Codex — spec-driven plans, enforced tests, persistent knowledge", favicon: "img/favicon.png", url: "https://pilot-shell.com", @@ -48,15 +48,14 @@ const config: Config = { docs: { routeBasePath: "docs", sidebarPath: "./sidebars.ts", - editUrl: - "https://github.com/maxritter/pilot-shell/tree/main/docs/docusaurus/", + editUrl: "https://github.com/maxritter/pilot-shell/tree/main/docs/docusaurus/", }, blog: { routeBasePath: "blog", path: "./blog", blogTitle: "Pilot Shell Blog", blogDescription: - "Insights on Claude Code, AI engineering, and the Pilot Shell platform — guides, tools, and model comparisons.", + "Insights on Claude Code, Codex CLI, AI engineering, and the Pilot Shell platform — guides, tools, and model comparisons.", blogSidebarTitle: "Recent posts", blogSidebarCount: "ALL", postsPerPage: 12, @@ -65,7 +64,7 @@ const config: Config = { type: ["rss", "atom"], title: "Pilot Shell Blog", description: - "Latest posts from the Pilot Shell blog — Claude Code engineering, AI tools, and model deep-dives.", + "Latest posts from the Pilot Shell blog — Claude Code and Codex CLI engineering, AI tools, and model deep-dives.", copyright: `Copyright © ${new Date().getFullYear()} Pilot Shell.`, }, }, @@ -84,7 +83,11 @@ const config: Config = { themeConfig: { image: "https://pilot-shell.com/logo.png", metadata: [ - { name: "keywords", content: "how real engineers run Claude Code, Claude Code engineering platform, Claude Code, Claude Code platform, Claude Code framework, spec-driven development, Pilot Shell, TDD enforcement, AI coding agent, MCP servers, Claude Code best practices" }, + { + name: "keywords", + content: + "how real engineers run Claude Code and Codex, Claude Code, Codex CLI, OpenAI Codex, Claude Code engineering platform, Codex engineering platform, Claude Code framework, Codex framework, spec-driven development, Pilot Shell, TDD enforcement, AI coding agent, GPT-5, GPT-5.5, MCP servers, Claude Code best practices, Codex best practices", + }, { name: "twitter:card", content: "summary_large_image" }, { name: "twitter:site", content: "@maxritter" }, { property: "og:type", content: "website" }, @@ -149,9 +152,7 @@ const config: Config = { }, { title: "More", - items: [ - { label: "Home", href: "https://pilot-shell.com" }, - ], + items: [{ label: "Home", href: "https://pilot-shell.com" }], }, ], copyright: `Copyright ${new Date().getFullYear()} Pilot Shell. Built with Docusaurus.`, diff --git a/docs/docusaurus/sidebars.ts b/docs/docusaurus/sidebars.ts index 32b80bb31..d243eb1b0 100644 --- a/docs/docusaurus/sidebars.ts +++ b/docs/docusaurus/sidebars.ts @@ -10,39 +10,74 @@ const sidebars: SidebarsConfig = { items: [ "getting-started/prerequisites", "getting-started/installation", + "getting-started/codex-cli", ], }, { type: "category", - label: "Workflows", + label: "Core Workflows", collapsed: false, items: [ - "workflows/setup-rules", - "workflows/create-skill", "workflows/prd", "workflows/spec", "workflows/fix", + ], + }, + { + type: "category", + label: "Additional Workflows", + collapsed: false, + items: [ + "workflows/setup-rules", + "workflows/create-skill", "workflows/benchmark", - "workflows/quick-mode", ], }, { type: "category", - label: "Features", + label: "Console", collapsed: false, items: [ - "features/spec-collaboration", "features/console", - "features/bot", + "features/spec-collaboration", + "features/extensions", + "features/customization", "features/statusline", - "features/model-routing", + ], + }, + { + type: "category", + label: "Quality", + collapsed: false, + items: [ + "features/hooks", "features/rules", "features/context-optimization", + ], + }, + { + type: "category", + label: "Automation", + collapsed: false, + items: [ + "features/bot", "features/remote-control", - "features/hooks", + ], + }, + { + type: "category", + label: "Configuration", + collapsed: false, + items: [ + "features/model-routing", "features/permission-modes", - "features/extensions", - "features/customization", + ], + }, + { + type: "category", + label: "Tools", + collapsed: false, + items: [ "features/cli", "features/mcp-servers", "features/language-servers", diff --git a/docs/img/console/dashboard.webp b/docs/img/console/dashboard.webp new file mode 100644 index 000000000..924cdc322 Binary files /dev/null and b/docs/img/console/dashboard.webp differ diff --git a/docs/img/dashboard.png b/docs/img/dashboard.png deleted file mode 100644 index d2ad9d68d..000000000 Binary files a/docs/img/dashboard.png and /dev/null differ diff --git a/docs/img/extensions.png b/docs/img/extensions.png deleted file mode 100644 index 8770c7ce1..000000000 Binary files a/docs/img/extensions.png and /dev/null differ diff --git a/docs/img/specifications.png b/docs/img/specifications.png deleted file mode 100644 index 25bcbe4e5..000000000 Binary files a/docs/img/specifications.png and /dev/null differ diff --git a/docs/img/statusline.png b/docs/img/statusline.png deleted file mode 100644 index 8683cc185..000000000 Binary files a/docs/img/statusline.png and /dev/null differ diff --git a/docs/site/index.html b/docs/site/index.html index cc07ce5ed..6d4a0b6ad 100644 --- a/docs/site/index.html +++ b/docs/site/index.html @@ -5,15 +5,15 @@ <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <!-- Primary Meta Tags --> - <title>Pilot Shell — How real engineers run Claude Code - + Pilot Shell — How real engineers run Claude Code and Codex + @@ -57,7 +57,10 @@ rel="preload" as="style" href="https://fonts.googleapis.com/css2?family=Geist:wght@400;500;600;700&family=Geist+Mono:wght@400;700&display=swap" - onload="this.onload=null;this.rel='stylesheet'" + onload=" + this.onload = null; + this.rel = 'stylesheet'; + " />