A skill that teaches Claude, Codex, Copilot, and Gemini to author workflows
Deterministic multi-agent orchestration scripts under plain JavaScript control flow.
Design principle: topology before coding. Pick the workflow shape first (fan-out, pipeline, or loop), capture it in a workflow spec, then write the workflow file.
A workflow is a JavaScript file. The loops, the conditionals, the fan-out are
ordinary code that you control. Only the leaf agent() calls spend model tokens,
and each one runs in its own clean context window. The result is multi-agent work
that behaves the same way every run and can be resumed if it stops partway.
This skill carries the file format, the judgement calls, and a tested authoring procedure, so you can just ask Claude to "create a workflow for X" and get a correct, runnable file back.
- Project-local runtime/plugin implementation for the portable cross-platform spec.
- Strict runtime contract validation for
project-only,file-backed, andno shared statebehavior. - Eval benchmarking that records quality, determinism, and replay/resume gains against the baseline path.
- CI coverage for tooling validation, portable spec validation, and runtime plugin smoke tests.
| Path | What it is |
|---|---|
SKILL.md |
Skill entry point: the procedure Claude follows to design and write a workflow |
references/api-reference.md |
Complete manual: every global, every option, every cap and constant |
references/patterns.md |
Copy-paste orchestration patterns (fan-out, pipeline, loop-until-budget, judge panel, and more) |
assets/templates/ |
Starter files for the three core shapes: fan-out, pipeline, loop |
assets/examples/ |
Seven complete runnable example workflows |
scripts/validate-workflow.mjs |
Linter: checks a workflow file against the parser's hard rules before you run it |
scripts/validate-workflow-spec.mjs |
Spec validator: checks required workflow-spec sections and sign-off completeness |
scripts/estimate-cost.mjs |
Static estimator: projects agent count, fan-out/loop shape, and rough run cost |
scripts/scaffold-evals.mjs |
Generates eval scaffolding from evals/evals.json |
examples/cross-platform/ |
Portable canonical spec and per-platform adapter notes (Claude / Codex / Copilot / Gemini / Kilo / Gumloop / OpenCode) |
evals/evals.json |
Starter evaluation test cases |
git clone https://github.com/hiranp/polyflow.git
mkdir -p ~/.claude/skills
cp -R polyflow ~/.claude/skills/polyflowThe next time Claude Code starts the skill is available.
The tool is off by default and requires an environment variable:
# per session
export CLAUDE_CODE_WORKFLOWS=1
claudeOr set it permanently in .claude/settings.local.json:
"Create a workflow that reviews my branch across bugs, security, and tests, then verifies each finding."
Claude (or any supported AI) uses this skill to design, write, and validate the file, then runs it.
Watch live progress with /workflows.
Use assets/templates/workflow-spec.template.md to decide topology/barrier/schema/verification before writing JavaScript.
node scripts/validate-workflow-spec.mjs <path-to-WORKFLOW-SPEC.md>Fix every reported issue before writing the workflow file.
node ~/.claude/skills/polyflow/scripts/validate-workflow.mjs <path-to-file.js>Exit 0 means the file is clean. Fix any reported errors before invoking the workflow.
node ~/.claude/skills/polyflow/scripts/estimate-cost.mjs <path-to-file.js>This gives a static estimate of model mix, fan-out/loop amplification, and rough per-run cost range.
You --> AI (with polyflow skill)
|
+-- designs the topology (fan-out / pipeline / loop)
+-- writes a .js workflow file
+-- calls Workflow({ scriptPath })
|
+-- agent("task A") <- fresh context, own token budget
+-- agent("task B") <- fresh context, own token budget
+-- agent("task C") <- fresh context, own token budget
Each agent() call runs in isolation -- no shared state, no context bleed. The
orchestration logic (loops, conditions, fan-out) is plain JavaScript you can read and audit.
Polyflow workflows are portable. The examples/cross-platform/ directory contains:
- A canonical portable spec (
portable-skill-spec.json) - A concrete project-local runtime/plugin implementation (
runtime-file-backed-plugin.mjs) - Per-platform runner guides for Claude Code, Codex, Copilot, Gemini, Kilo Code, Gumloop, and OpenCode
- An adapter conformance checklist
The current architect decision is to merge the runtime/plugin track now and carry broader memory/compression work as follow-up issues. See docs/RUNTIME-PLUGIN-DECISION.md.
The most useful early lesson is not "add hidden memory everywhere". It is to make context management intentional: keep one project thread of work alive, externalize state to stable artifacts, and let one layer own context reduction.
For polyflow, that translates into these portable rules:
- Keep one top-level session per project when possible, but run leaf work units from explicit artifacts rather than from ambient chat state.
- Pin a small set of orientation files up front: the workflow spec, the portable spec, and any architecture notes that define constraints.
- Use a read-only recall step before expensive work and a summarize/persist step after the run. Keep both project-local and file-backed.
- If a harness or plugin already manages compaction, do not stack a second compaction system on top of it unless responsibilities are clearly separated.
- Use
examples/cross-platform/runner-opencode-runbook.mdas the harness-specific starting point. - Keep the outer OpenCode session attached to one project so artifact paths and architectural context stay stable.
- If you use an external context manager, disable overlapping built-in compaction so cached prefixes and deferred reductions do not fight each other.
- Favor project-root files such as
.polyflow/runs/and.planning/over session-only notes when you need replay, resume, or auditability.
- Use
examples/cross-platform/runner-copilot-runbook.mdas the base runbook. - Treat each review or verify unit as a fresh task with explicit JSON inputs and outputs.
- Keep prompts short and schema-first; use one final synthesis pass instead of repeatedly restating the whole plan.
- Re-read pinned files at stage boundaries instead of relying on long conversational carryover.
- Use
examples/cross-platform/runner-gemini-runbook.mdas the base runbook. - Prefer one clean session per unit of work and aggregate through files between stages.
- Validate schema after every stage, retry once on mismatch, and checkpoint outputs before starting the next pass.
- Keep stop conditions explicit (
maxUnits,maxTokens,maxDuration) so long-running loops fail predictably.
Seven production-quality examples live in assets/examples/:
| Workflow | Pattern |
|---|---|
review-branch.js |
pipeline + nested parallel |
implement-and-review.js |
do/while loop with schema-driven exit |
triage-sentry.js |
list to pipeline with MCP tool call |
dead-code-sweep.js |
loop-until-dry with dry-streak counter |
api-contract-drift-detector.js |
fan-out with deliberate barrier |
customer-feedback-theme-extractor.js |
parallel to barrier to cluster |
software-dev-pipeline.js |
spec-intake to recall to execute/verify with scoped memory persist |
- Claude Code 2.1.149+ (or compatible Codex / Copilot / Gemini runtime)
- Node.js 18+ (for validation, estimation, and eval scaffold scripts)
Pull requests are welcome. See CONTRIBUTING.md for guidelines.
Inspired by Claude's workflow capabilities and Ray Amjad. Built with community feedback, testing, and ideas.

{ "env": { "CLAUDE_CODE_WORKFLOWS": "1" } }