Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 2-EVALUATE_PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The PRD is multiple files. All files are very important. You will find the PRD f

This benchmark version also has a frozen canonical requirement catalog at `evaluator/requirements_catalog_v1.md`. That catalog is the scoring denominator for this PRD version. It freezes requirement IDs, functional areas, labels, source citations, and severity tiers while staying outside `docs/prd/` so Step 1 does not see evaluator-only material.

If `evaluator/requirements_catalog_v1.md` is missing, stop immediately and tell the user to run `python3 tools/fetch_evaluator.py` from the repo root before retrying Step 2. Do not try to reconstruct the catalog yourself.
If `evaluator/requirements_catalog_v1.md` is missing, first attempt to run `python3 tools/fetch_evaluator.py` from the repo root yourself. If you cannot run shell commands or the fetch fails, stop and tell the user exactly to run `python3 tools/fetch_evaluator.py` from the repo root before retrying Step 2. Do not try to reconstruct the catalog yourself.

## Instructions

Expand Down
2 changes: 1 addition & 1 deletion INSTRUCTIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The user wants to audit a plan they already generated.

- Open and follow `2-EVALUATE_PLAN.md` exactly.
- This requires both the PRD (`docs/prd/`) and an existing `results/PLAN.md`.
- If `evaluator/requirements_catalog_v1.md` is missing, tell the user to run `python3 tools/fetch_evaluator.py` from the repo root, then retry.
- If `evaluator/requirements_catalog_v1.md` is missing, first attempt to run `python3 tools/fetch_evaluator.py` from the repo root yourself. If you cannot run it or it fails, tell the user exactly what command to run, then retry.
- Outputs: `results/PLAN_EVAL.md` and `results/PLAN_EVAL_REPORT.html`

### 3. Re-render the Evaluation Report (Optional Fallback)
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Open a **new conversation** (fresh context). Tell the agent:

The agent will read both the PRD and the plan from Step 1, then audit the plan for coverage and alignment. It scores every requirement as full, partial, or missing, writes `PLAN_EVAL.md`, and then generates `PLAN_EVAL_REPORT.html` from that finished evaluation.
The denominator is frozen in `evaluator/requirements_catalog_v1.md`, so the evaluator scores against the same requirement list every run instead of re-deriving it from scratch.
If the `evaluator/` folder is missing, run `python3 tools/fetch_evaluator.py` first.
If the `evaluator/` folder is missing, the Step 2 agent should first attempt to run `python3 tools/fetch_evaluator.py`. If the agent cannot do that, run it manually and retry Step 2.

**Requires:** `results/PLAN.md` from Step 1
**Primary output:** `results/PLAN_EVAL.md`
Expand All @@ -54,7 +54,7 @@ Each step consumes significant context. Starting fresh ensures the agent has max
2-EVALUATE_PLAN.md # Step 2 prompt — evaluation
3-PLAN_EVAL_REPORT.md # Optional fallback prompt — HTML report rerender only
docs/prd/ # The product spec (PRD + supporting docs)
evaluator/requirements_catalog_v1.md # Frozen Step 2 denominator hidden from Step 1
evaluator/requirements_catalog_v1.md # Frozen Step 2 denominator, fetched on demand
tools/fetch_evaluator.py # Downloads the public evaluator bundle into evaluator/
results/ # All outputs land here
CLAUDE.md # Auto-loaded instructions for Claude Code
Expand Down