NVIDIA-NeMo · andreatgretel · May 11, 2026 · May 13, 2026 · May 13, 2026 · May 13, 2026
@@ -25,7 +25,9 @@ A finding may be converted to a fix only if all hold:
   | `packages/data-designer-config` | `make test-config` |
   | `packages/data-designer-engine` | `make test-engine` |
   | `packages/data-designer` | `make test-interface` |
-- **Single concern**: one finding per PR.
+- **Single concern**: one finding per PR, except suite-declared batchable
+  mechanical fixes. A batch must share one suite/category and satisfy the
+  localized-fix bar as a single combined diff.
 - **Allowlisted paths**: matches the suite's path allowlist.
 
 If the top-ranked candidate fails the bar, try the next. If none of the top
@@ -79,6 +81,9 @@ Each daily recipe maintains two arrays in
 Also: `draft_until_proven` (boolean, per-suite, default `true` for
 code-quality and unset elsewhere) controls draft-PR mode.
 
+Batch PRs still record one `attempted_fixes` entry per finding. Multiple
+entries may point to the same `pr_number` and `branch`.
+
 ### `fix_backlog` rules (audit phase populates this)
 
 - Append every detected finding in an eligible category. If `id` is already
@@ -90,6 +95,9 @@ code-quality and unset elsewhere) controls draft-PR mode.
 - Cap at 200 entries (drop oldest by `first_seen`).
 - Populated **before** the `known_issues` filter so fixable findings persist
   even when their report row is suppressed for being unchanged.
+- Batchable categories must include enough information in `data` to group
+  siblings safely. For package-scoped Python fixes, derive `test_target` from
+  the package containing the source file.
 
 ### `attempted_fixes` rules
 
@@ -101,9 +109,9 @@ code-quality and unset elsewhere) controls draft-PR mode.
   `open` attempts that have a `pr_number`: query the PR and flip the
   attempt to `merged` or `closed` if it is no longer open. Then recover
   from crashes that left state un-updated: list open PRs (`gh pr list`)
-  whose bodies contain the
-  `<!-- agentic-ci finding=<id> suite=<suite> -->` marker, parse out
-  each `<id>`, and back-fill any missing `attempted_fixes` entries with
+  whose bodies contain one or more
+  `<!-- agentic-ci finding=<id> suite=<suite> -->` markers, parse out
+  every `<id>`, and back-fill any missing `attempted_fixes` entries with
   `outcome: "open"` and the parsed `pr_number` and `branch`.
 - Prune: drop `merged` entries older than 90 days. Do **not** prune
   `closed` or `abandoned` entries by age — pruning a single-strike entry
@@ -175,7 +183,7 @@ Earlier criteria override later ones:
 
 4. **Recency** — newer findings rank above long-standing ones.
 
-Record the chosen finding's id, scores, and rationale at the top of
+Record the chosen finding id(s), scores, and rationale at the top of
 `/tmp/audit-{{suite}}.md`.
 
 ## Standard fix procedure
@@ -191,29 +199,38 @@ declare only the parts that vary (eligible categories, branch type,
    `merged`; surface two-strike entries in the report's
    `Repeatedly-failed fix attempts` section and drop them from selection.
 3. Rank the remainder per the Ranking section.
-4. For each candidate, top 5 max:
-   1. Re-verify the finding still applies (re-grep / re-read). If not,
-      remove from `fix_backlog` and continue.
-   2. Apply the fix. If the diff exceeds the localized-fix bar or touches
-      a non-allowlisted path, abandon and continue.
-   3. If the category sets `test_required: true`, run the per-package
+4. For each primary candidate, top 5 max:
+   1. If the suite declares the category batchable, collect sibling
+      `fix_backlog` entries for the same suite/category that share the same
+      test target and branch type. Do not discover new findings; use only
+      existing backlog entries. Batch at most 3 entries to stay within the
+      localized-fix file cap.
+   2. Re-verify every finding still applies (re-grep / re-read). If a
+      sibling no longer applies, remove it from `fix_backlog`; if the
+      primary no longer applies, remove it from `fix_backlog` and continue
+      to the next primary candidate.
+   3. Apply the fix or batch. If the combined diff exceeds the
+      localized-fix bar or touches a non-allowlisted path, abandon and
+      continue.
+   4. If the category sets `test_required: true`, run the per-package
       test target (see the mapping table in "Localized fix bar" above)
-      for the package containing the change. On failure: abandon and
+      for the package containing the change(s). On failure: abandon and
       continue.
-   4. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit:
+   5. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit:
       `<type>(agentic-ci): <one-line>`. Push.
-   5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the
-      hidden metadata block:
+   6. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including one
+      hidden metadata block per fixed finding:
       `<!-- agentic-ci finding=<id> suite=<suite> -->`
-   6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft`
+   7. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft`
       iff `draft_until_proven` is true for the suite.
-   7. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`.
-   8. Record `attempted_fixes` entry with `outcome: "open"` and exit.
+   8. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`.
+   9. Record one `attempted_fixes` entry per fixed finding with
+      `outcome: "open"` and exit.
 5. If all 5 candidates were abandoned, append a one-line note to the
    report and exit cleanly. The state already reflects the abandonments.
 
 On any failure mid-flow: record `outcome: "abandoned"` for the chosen
-finding (with `pr_number: null`), leave any pushed branch in place
+finding(s) (with `pr_number: null`), leave any pushed branch in place
 (`pr-stale.yml` will reap it; branch deletion is forbidden), and continue
 to the next candidate.
 
@@ -223,6 +240,8 @@ to the next candidate.
   interactive-only and shells the body inline; CI needs determinism.
 - **Title**: conventional, `<type>(agentic-ci): <one-line>`.
 - **Labels**: `agentic-ci`, `agentic-ci/<suite>`.
+- **Batch markers**: batch PRs include one hidden finding marker per fixed
+  finding so crash recovery can reconstruct every `attempted_fixes` entry.
 - **Draft PRs**: `code-quality` opens draft until a maintainer flips
   `draft_until_proven` to `false` in runner-state, after at least two
   non-draft PRs from that suite have landed clean. This flip is

@@ -16,9 +16,13 @@ This invocation runs the **FIX** phase only.
   codebase to discover new findings is forbidden.
 - Pick the highest-ranked eligible candidate from `fix_backlog`, apply
   the fix, run the package's tests if applicable, commit, push, and open
-  the PR using `gh pr create --body-file`.
+  the PR using `gh pr create --body-file`. If the recipe and
+  `_fix-policy.md` declare the category batchable, you may add sibling
+  entries from the existing `fix_backlog` after re-verifying each one.
+  Do not scan for findings that are not already in `fix_backlog`.
 - Record the attempt in `attempted_fixes` (whether successful, abandoned,
-  or failed through the top-5 fallback) before exiting.
+  or failed through the top-5 fallback) before exiting. Batch PRs record
+  one attempt per fixed finding, all pointing to the same PR and branch.
 - If no candidate qualifies after trying up to 5 of them, exit cleanly,
   append a short note to `/tmp/audit-{{suite}}.md` describing what was
   tried, and update `attempted_fixes` accordingly. Do NOT open a PR.

@@ -67,6 +67,9 @@ Rules:
   passwords) in your output, even if you encounter them in code.
 - **Stay in scope.** Only perform the task described in the recipe. Do not
   explore unrelated areas of the codebase.
+- **No subagents.** Do not use Task, Explore, or other delegated/local agents.
+  The CI key may not have access to their default models; do the work in the
+  main agent session.
 - **Cost awareness.** Minimize unnecessary file reads and tool calls. If you
   have the information you need, stop.
 

@@ -4,7 +4,7 @@ description: Audit code quality gaps not covered by ruff - complexity trends, ex
 trigger: schedule
 tool: claude-code
 timeout_minutes: 20
-max_turns: 30
+max_turns: 50
 permissions:
   contents: write
 ---

@@ -33,11 +33,31 @@ even when their report row is suppressed for being unchanged.
 
 ## Instructions
 
+### Turn budget
+
+This suite must finish before the `max_turns` limit. Do not attempt a
+repo-wide audit in one run.
+
+1. Read runner memory.
+2. Write `/tmp/audit-{{suite}}.md` immediately with the required headings and
+   empty tables. If the run is interrupted later, the workflow must still have
+   a usable partial report.
+3. Use targeted searches to find candidates, then read only the files needed
+   to verify a specific finding.
+4. Stop after either:
+   - 20 tool calls
+   - 2 new findings in a section
+   - all sections have been sampled
+5. Finalize the report, update runner memory, and stop. If no new findings
+   were verified, replace the report with `NO_FINDINGS`.
+
 ### 1. Docstring vs signature drift
 
 This repo uses Google-style docstrings (`Args:`, `Returns:`, `Raises:`).
-Scan public functions and methods in `packages/` for mismatches between the
-docstring and the actual function signature:
+Sample public functions and methods in `packages/` for mismatches between the
+docstring and the actual function signature. Do not scan every source file.
+Use `rg "Args:|Returns:|Raises:" packages/*/src/ --glob '*.py'` to find
+candidates, then inspect at most 5 high-value files:
 
 - Parameters in the `Args:` section that no longer exist in the signature
 - Parameters in the signature that are missing from `Args:`
@@ -60,14 +80,17 @@ Check links in these locations:
 - `docs/` - MkDocs content links, code references, cross-page links
 - `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md` - relative links
 
-For each link, verify the target file or anchor exists. Report broken links
-with the source file, line number, and broken target.
+Use targeted link extraction and inspect at most 10 candidate links. Prefer
+high-value docs and links changed recently. For each sampled link, verify the
+target file or anchor exists. Report broken links with the source file, line
+number, and broken target.
 
 ### 3. Architecture doc references
 
 The 10 files in `architecture/` reference specific classes, functions, files,
 and registries by name. These are high-value docs that agents and developers
-rely on for orientation. For each code reference:
+rely on for orientation. Sample at most 3 architecture files per run,
+prioritizing files changed recently. For each code reference:
 - Verify the referenced class, function, or module still exists at the stated
   location
 - If renamed or moved, flag with the old and new location
@@ -105,7 +128,7 @@ Review for accuracy against the current code:
 - Check that autodoc module paths point to modules that still exist.
 
 **Prioritize by risk of drift**: pages with the most code symbols referenced
-are most likely to be stale. Don't read every page - sample 5-10 high-value
+are most likely to be stale. Don't read every page - sample 3-5 high-value
 pages and flag patterns.
 
 ## Output format

@@ -4,7 +4,7 @@ description: Audit structural integrity - import boundaries, lazy import complia
 trigger: schedule
 tool: claude-code
 timeout_minutes: 20
-max_turns: 30
+max_turns: 50
 permissions:
   contents: write
 ---
@@ -223,6 +223,13 @@ Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits:
 | missing-future | `chore` | yes | Insert `from __future__ import annotations` after the SPDX header block, before other imports. Fully deterministic. Tests required because `__future__` annotations can affect introspection-heavy code paths. |
 | lazy-import | `refactor` | yes | Move a top-level heavy import (pandas/numpy/polars/torch/duckdb/sqlfluff/faker) to the `data_designer.lazy_heavy_imports` accessor pattern. Eligible only when (a) file is under `packages/*/src/`, (b) the module is already wired in the lazy system, (c) the heavy module is used only inside function bodies. |
 
+`missing-future` is batchable: when the primary candidate is
+`missing-future`, include other `missing-future` backlog entries with the
+same `test_target` if each file still lacks the import and the combined
+diff remains within the localized-fix bar. Batch at most 3 files. Run the
+shared test target once. Use one hidden finding marker and one
+`attempted_fixes` entry per file.
+
 **Not eligible** — stays report-only:
 
 - Import boundary violations (architectural judgement).

@@ -32,6 +32,24 @@ update `baselines` with current values and `known_issues` with new findings.
 
 ## Instructions
 
+### Turn budget
+
+This suite must finish before the `max_turns` limit. Do not attempt a
+repo-wide test audit in one run.
+
+1. Read runner memory.
+2. Write `/tmp/audit-{{suite}}.md` immediately with the required headings and
+   empty tables. If the run is interrupted later, the workflow must still have
+   a usable partial report.
+3. Use targeted searches to find candidates, then read only the files needed
+   to verify a specific finding.
+4. Stop after either:
+   - 20 tool calls
+   - 2 new findings in a section
+   - all sections have been sampled
+5. Finalize the report, update runner memory, and stop. If no new findings
+   were verified, replace the report with `NO_FINDINGS`.
+
 ### 1. Test-to-source coverage mapping
 
 Map source files to their corresponding test files:

@@ -150,7 +150,7 @@ jobs:
 
           if [ -n "$ANTHROPIC_BASE_URL" ] && [ -n "$ANTHROPIC_API_KEY" ]; then
             HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
-              --max-time 10 \
+              --max-time 30 \
               -X POST "${ANTHROPIC_BASE_URL}/v1/messages" \
               -H "Content-Type: application/json" \
               -H "x-api-key: ${ANTHROPIC_API_KEY}" \
@@ -187,6 +187,13 @@ jobs:
           RUNNER_CTX=$(cat .agents/recipes/_runner.md)
           FIX_POLICY=$(cat .agents/recipes/_fix-policy.md)
           RECIPE_BODY=$(sed '1,/^---$/{ /^---$/,/^---$/d }' "${RECIPE_DIR}/recipe.md")
+          MAX_TURNS=$(awk -F': *' '
+            /^---$/ { section++; next }
+            section == 1 && $1 == "max_turns" { print $2; exit }
+            section == 2 { exit }
+          ' "${RECIPE_DIR}/recipe.md" | grep -oE '[0-9]+' | head -n1)
+          MAX_TURNS=${MAX_TURNS:-50}
+          echo "Using max turns: ${MAX_TURNS}"
 
           PROMPT=$(printf '%s\n\n%s\n\n%s\n\n%s\n' "${PHASE_DIRECTIVE}" "${RUNNER_CTX}" "${FIX_POLICY}" "${RECIPE_BODY}" \
             | sed "s|{{suite}}|${SUITE}|g" \
@@ -196,7 +203,7 @@ jobs:
           stdbuf -oL -eL claude \
             --model "$AGENTIC_CI_MODEL" \
             -p "$PROMPT" \
-            --max-turns 50 \
+            --max-turns "$MAX_TURNS" \
             --output-format stream-json \
             --verbose \
             2>&1 | tee /tmp/claude-audit-log.txt
@@ -253,6 +260,8 @@ jobs:
             | sed "s|{{date}}|$(date -u +%Y-%m-%d)|g" \
             | sed "s|{{memory_path}}|.agentic-ci-state|g")
 
+          # Keep fix-phase turns fixed at 50; the audit budget is the
+          # suite-tuned scan limit, while fixes are bounded by scope gates.
           stdbuf -oL -eL claude \
             --model "$AGENTIC_CI_MODEL" \
             -p "$PROMPT" \

@@ -56,7 +56,7 @@ jobs:
 
           if [ -n "$ANTHROPIC_BASE_URL" ] && [ -n "$ANTHROPIC_API_KEY" ]; then
             HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
-              --max-time 10 \
+              --max-time 30 \
               -X POST "${ANTHROPIC_BASE_URL}/v1/messages" \
               -H "Content-Type: application/json" \
               -H "x-api-key: ${ANTHROPIC_API_KEY}" \

@@ -193,7 +193,7 @@ jobs:
           # Quick API check (custom endpoint only)
           if [ -n "$ANTHROPIC_BASE_URL" ] && [ -n "$ANTHROPIC_API_KEY" ]; then
             HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
-              --max-time 10 \
+              --max-time 30 \
               -X POST "${ANTHROPIC_BASE_URL}/v1/messages" \
               -H "Content-Type: application/json" \
               -H "x-api-key: ${ANTHROPIC_API_KEY}" \