From 8e1f05f3cae945feb0d77eb567e35e8db7f1ed3a Mon Sep 17 00:00:00 2001 From: mmaher Date: Sun, 8 Mar 2026 18:29:20 -0700 Subject: [PATCH] Defer evaluator fetch until evaluation --- 2-EVALUATE_PLAN.md | 2 +- INSTRUCTIONS.md | 2 +- README.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/2-EVALUATE_PLAN.md b/2-EVALUATE_PLAN.md index 11e5762..60a3b35 100644 --- a/2-EVALUATE_PLAN.md +++ b/2-EVALUATE_PLAN.md @@ -8,7 +8,7 @@ The PRD is multiple files. All files are very important. You will find the PRD f This benchmark version also has a frozen canonical requirement catalog at `evaluator/requirements_catalog_v1.md`. That catalog is the scoring denominator for this PRD version. It freezes requirement IDs, functional areas, labels, source citations, and severity tiers while staying outside `docs/prd/` so Step 1 does not see evaluator-only material. -If `evaluator/requirements_catalog_v1.md` is missing, stop immediately and tell the user to run `python3 tools/fetch_evaluator.py` from the repo root before retrying Step 2. Do not try to reconstruct the catalog yourself. +If `evaluator/requirements_catalog_v1.md` is missing, first attempt to run `python3 tools/fetch_evaluator.py` from the repo root yourself. If you cannot run shell commands or the fetch fails, stop and tell the user exactly to run `python3 tools/fetch_evaluator.py` from the repo root before retrying Step 2. Do not try to reconstruct the catalog yourself. ## Instructions diff --git a/INSTRUCTIONS.md b/INSTRUCTIONS.md index 4561e47..03578c2 100644 --- a/INSTRUCTIONS.md +++ b/INSTRUCTIONS.md @@ -21,7 +21,7 @@ The user wants to audit a plan they already generated. - Open and follow `2-EVALUATE_PLAN.md` exactly. - This requires both the PRD (`docs/prd/`) and an existing `results/PLAN.md`. -- If `evaluator/requirements_catalog_v1.md` is missing, tell the user to run `python3 tools/fetch_evaluator.py` from the repo root, then retry. +- If `evaluator/requirements_catalog_v1.md` is missing, first attempt to run `python3 tools/fetch_evaluator.py` from the repo root yourself. If you cannot run it or it fails, tell the user exactly what command to run, then retry. - Outputs: `results/PLAN_EVAL.md` and `results/PLAN_EVAL_REPORT.html` ### 3. Re-render the Evaluation Report (Optional Fallback) diff --git a/README.md b/README.md index 21caeaa..303af9f 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ Open a **new conversation** (fresh context). Tell the agent: The agent will read both the PRD and the plan from Step 1, then audit the plan for coverage and alignment. It scores every requirement as full, partial, or missing, writes `PLAN_EVAL.md`, and then generates `PLAN_EVAL_REPORT.html` from that finished evaluation. The denominator is frozen in `evaluator/requirements_catalog_v1.md`, so the evaluator scores against the same requirement list every run instead of re-deriving it from scratch. -If the `evaluator/` folder is missing, run `python3 tools/fetch_evaluator.py` first. +If the `evaluator/` folder is missing, the Step 2 agent should first attempt to run `python3 tools/fetch_evaluator.py`. If the agent cannot do that, run it manually and retry Step 2. **Requires:** `results/PLAN.md` from Step 1 **Primary output:** `results/PLAN_EVAL.md` @@ -54,7 +54,7 @@ Each step consumes significant context. Starting fresh ensures the agent has max 2-EVALUATE_PLAN.md # Step 2 prompt — evaluation 3-PLAN_EVAL_REPORT.md # Optional fallback prompt — HTML report rerender only docs/prd/ # The product spec (PRD + supporting docs) -evaluator/requirements_catalog_v1.md # Frozen Step 2 denominator hidden from Step 1 +evaluator/requirements_catalog_v1.md # Frozen Step 2 denominator, fetched on demand tools/fetch_evaluator.py # Downloads the public evaluator bundle into evaluator/ results/ # All outputs land here CLAUDE.md # Auto-loaded instructions for Claude Code