bladnman · bladnman · Mar 9, 2026
diff --git a/2-EVALUATE_PLAN.md b/2-EVALUATE_PLAN.md
@@ -8,7 +8,7 @@ The PRD is multiple files. All files are very important. You will find the PRD f
 
 This benchmark version also has a frozen canonical requirement catalog at `evaluator/requirements_catalog_v1.md`. That catalog is the scoring denominator for this PRD version. It freezes requirement IDs, functional areas, labels, source citations, and severity tiers while staying outside `docs/prd/` so Step 1 does not see evaluator-only material.
 
-If `evaluator/requirements_catalog_v1.md` is missing, stop immediately and tell the user to run `python3 tools/fetch_evaluator.py` from the repo root before retrying Step 2. Do not try to reconstruct the catalog yourself.
+If `evaluator/requirements_catalog_v1.md` is missing, first attempt to run `python3 tools/fetch_evaluator.py` from the repo root yourself. If you cannot run shell commands or the fetch fails, stop and tell the user exactly to run `python3 tools/fetch_evaluator.py` from the repo root before retrying Step 2. Do not try to reconstruct the catalog yourself.
 
 ## Instructions
 

diff --git a/INSTRUCTIONS.md b/INSTRUCTIONS.md
@@ -21,7 +21,7 @@ The user wants to audit a plan they already generated.
 
 - Open and follow `2-EVALUATE_PLAN.md` exactly.
 - This requires both the PRD (`docs/prd/`) and an existing `results/PLAN.md`.
-- If `evaluator/requirements_catalog_v1.md` is missing, tell the user to run `python3 tools/fetch_evaluator.py` from the repo root, then retry.
+- If `evaluator/requirements_catalog_v1.md` is missing, first attempt to run `python3 tools/fetch_evaluator.py` from the repo root yourself. If you cannot run it or it fails, tell the user exactly what command to run, then retry.
 - Outputs: `results/PLAN_EVAL.md` and `results/PLAN_EVAL_REPORT.html`
 
 ### 3. Re-render the Evaluation Report (Optional Fallback)

diff --git a/README.md b/README.md
@@ -28,7 +28,7 @@ Open a **new conversation** (fresh context). Tell the agent:
 
 The agent will read both the PRD and the plan from Step 1, then audit the plan for coverage and alignment. It scores every requirement as full, partial, or missing, writes `PLAN_EVAL.md`, and then generates `PLAN_EVAL_REPORT.html` from that finished evaluation.
 The denominator is frozen in `evaluator/requirements_catalog_v1.md`, so the evaluator scores against the same requirement list every run instead of re-deriving it from scratch.
-If the `evaluator/` folder is missing, run `python3 tools/fetch_evaluator.py` first.
+If the `evaluator/` folder is missing, the Step 2 agent should first attempt to run `python3 tools/fetch_evaluator.py`. If the agent cannot do that, run it manually and retry Step 2.
 
 **Requires:** `results/PLAN.md` from Step 1
 **Primary output:** `results/PLAN_EVAL.md`
@@ -54,7 +54,7 @@ Each step consumes significant context. Starting fresh ensures the agent has max
 2-EVALUATE_PLAN.md           # Step 2 prompt — evaluation
 3-PLAN_EVAL_REPORT.md        # Optional fallback prompt — HTML report rerender only
 docs/prd/                    # The product spec (PRD + supporting docs)
-evaluator/requirements_catalog_v1.md  # Frozen Step 2 denominator hidden from Step 1
+evaluator/requirements_catalog_v1.md  # Frozen Step 2 denominator, fetched on demand
 tools/fetch_evaluator.py     # Downloads the public evaluator bundle into evaluator/
 results/                     # All outputs land here
 CLAUDE.md                    # Auto-loaded instructions for Claude Code