From e6373e82f2ff45e7f6a9d4df454bdf8a9229c073 Mon Sep 17 00:00:00 2001 From: Andre Manoel Date: Mon, 11 May 2026 13:04:36 -0300 Subject: [PATCH 1/5] fix(ci): limit docs audit turn usage --- .agents/recipes/docs-and-references/recipe.md | 35 +++++++++++++++---- .github/workflows/agentic-ci-daily.yml | 9 ++++- 2 files changed, 37 insertions(+), 7 deletions(-) diff --git a/.agents/recipes/docs-and-references/recipe.md b/.agents/recipes/docs-and-references/recipe.md index f8a7073dc..4d4b5edaa 100644 --- a/.agents/recipes/docs-and-references/recipe.md +++ b/.agents/recipes/docs-and-references/recipe.md @@ -28,11 +28,31 @@ that already appear in `known_issues`. ## Instructions +### Turn budget + +This suite must finish before the `max_turns` limit. Do not attempt a +repo-wide audit in one run. + +1. Read runner memory. +2. Write `/tmp/audit-{{suite}}.md` immediately with the required headings and + empty tables. If the run is interrupted later, the workflow must still have + a usable partial report. +3. Use targeted searches to find candidates, then read only the files needed + to verify a specific finding. +4. Stop after either: + - 20 tool calls + - 2 new findings in a section + - all sections have been sampled +5. Finalize the report, update runner memory, and stop. If no new findings + were verified, replace the report with `NO_FINDINGS`. + ### 1. Docstring vs signature drift This repo uses Google-style docstrings (`Args:`, `Returns:`, `Raises:`). -Scan public functions and methods in `packages/` for mismatches between the -docstring and the actual function signature: +Sample public functions and methods in `packages/` for mismatches between the +docstring and the actual function signature. Do not scan every source file. +Use `rg "Args:|Returns:|Raises:" packages/*/src/ --glob '*.py'` to find +candidates, then inspect at most 5 high-value files: - Parameters in the `Args:` section that no longer exist in the signature - Parameters in the signature that are missing from `Args:` @@ -55,14 +75,17 @@ Check links in these locations: - `docs/` - MkDocs content links, code references, cross-page links - `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md` - relative links -For each link, verify the target file or anchor exists. Report broken links -with the source file, line number, and broken target. +Use targeted link extraction and inspect at most 10 candidate links. Prefer +high-value docs and links changed recently. For each sampled link, verify the +target file or anchor exists. Report broken links with the source file, line +number, and broken target. ### 3. Architecture doc references The 10 files in `architecture/` reference specific classes, functions, files, and registries by name. These are high-value docs that agents and developers -rely on for orientation. For each code reference: +rely on for orientation. Sample at most 3 architecture files per run, +prioritizing files changed recently. For each code reference: - Verify the referenced class, function, or module still exists at the stated location - If renamed or moved, flag with the old and new location @@ -100,7 +123,7 @@ Review for accuracy against the current code: - Check that autodoc module paths point to modules that still exist. **Prioritize by risk of drift**: pages with the most code symbols referenced -are most likely to be stale. Don't read every page - sample 5-10 high-value +are most likely to be stale. Don't read every page - sample 3-5 high-value pages and flag patterns. ## Output format diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index d04a5d839..9b09ce597 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -176,6 +176,13 @@ jobs: # Build prompt: _runner.md + recipe body (strip YAML frontmatter) RUNNER_CTX=$(cat .agents/recipes/_runner.md) RECIPE_BODY=$(sed '1,/^---$/{ /^---$/,/^---$/d }' "${RECIPE_DIR}/recipe.md") + MAX_TURNS=$(awk -F': *' ' + /^---$/ { section++; next } + section == 1 && $1 == "max_turns" { print $2; exit } + section == 2 { exit } + ' "${RECIPE_DIR}/recipe.md") + MAX_TURNS=${MAX_TURNS:-50} + echo "Using max turns: ${MAX_TURNS}" PROMPT=$(printf '%s\n\n%s\n' "${RUNNER_CTX}" "${RECIPE_BODY}" \ | sed "s|{{suite}}|${SUITE}|g" \ @@ -185,7 +192,7 @@ jobs: stdbuf -oL -eL claude \ --model "$AGENTIC_CI_MODEL" \ -p "$PROMPT" \ - --max-turns 50 \ + --max-turns "$MAX_TURNS" \ --output-format stream-json \ --verbose \ 2>&1 | tee /tmp/claude-audit-log.txt From 527a33b62763e02aae3d35b2eac187e7cca293e8 Mon Sep 17 00:00:00 2001 From: Andre Manoel Date: Wed, 13 May 2026 15:00:12 +0000 Subject: [PATCH 2/5] fix(agentic-ci): batch mechanical structure fixes --- .agents/recipes/_fix-policy.md | 52 ++++++++++++++++---------- .agents/recipes/_phase-fix.md | 8 +++- .agents/recipes/structure/recipe.md | 6 +++ .github/workflows/agentic-ci-daily.yml | 2 +- 4 files changed, 46 insertions(+), 22 deletions(-) diff --git a/.agents/recipes/_fix-policy.md b/.agents/recipes/_fix-policy.md index e95ef8ae7..37147d8d4 100644 --- a/.agents/recipes/_fix-policy.md +++ b/.agents/recipes/_fix-policy.md @@ -25,7 +25,9 @@ A finding may be converted to a fix only if all hold: | `packages/data-designer-config` | `make test-config` | | `packages/data-designer-engine` | `make test-engine` | | `packages/data-designer` | `make test-interface` | -- **Single concern**: one finding per PR. +- **Single concern**: one finding per PR, except suite-declared batchable + mechanical fixes. A batch must share one suite/category and satisfy the + localized-fix bar as a single combined diff. - **Allowlisted paths**: matches the suite's path allowlist. If the top-ranked candidate fails the bar, try the next. If none of the top @@ -79,6 +81,9 @@ Each daily recipe maintains two arrays in Also: `draft_until_proven` (boolean, per-suite, default `true` for code-quality and unset elsewhere) controls draft-PR mode. +Batch PRs still record one `attempted_fixes` entry per finding. Multiple +entries may point to the same `pr_number` and `branch`. + ### `fix_backlog` rules (audit phase populates this) - Append every detected finding in an eligible category. If `id` is already @@ -101,9 +106,9 @@ code-quality and unset elsewhere) controls draft-PR mode. `open` attempts that have a `pr_number`: query the PR and flip the attempt to `merged` or `closed` if it is no longer open. Then recover from crashes that left state un-updated: list open PRs (`gh pr list`) - whose bodies contain the - `` marker, parse out - each ``, and back-fill any missing `attempted_fixes` entries with + whose bodies contain one or more + `` markers, parse out + every ``, and back-fill any missing `attempted_fixes` entries with `outcome: "open"` and the parsed `pr_number` and `branch`. - Prune: drop `merged` entries older than 90 days. Do **not** prune `closed` or `abandoned` entries by age — pruning a single-strike entry @@ -175,7 +180,7 @@ Earlier criteria override later ones: 4. **Recency** — newer findings rank above long-standing ones. -Record the chosen finding's id, scores, and rationale at the top of +Record the chosen finding id(s), scores, and rationale at the top of `/tmp/audit-{{suite}}.md`. ## Standard fix procedure @@ -191,29 +196,36 @@ declare only the parts that vary (eligible categories, branch type, `merged`; surface two-strike entries in the report's `Repeatedly-failed fix attempts` section and drop them from selection. 3. Rank the remainder per the Ranking section. -4. For each candidate, top 5 max: - 1. Re-verify the finding still applies (re-grep / re-read). If not, - remove from `fix_backlog` and continue. - 2. Apply the fix. If the diff exceeds the localized-fix bar or touches - a non-allowlisted path, abandon and continue. - 3. If the category sets `test_required: true`, run the per-package +4. For each primary candidate, top 5 max: + 1. If the suite declares the category batchable, collect sibling + `fix_backlog` entries for the same suite/category that share the same + test target and branch type. Do not discover new findings; use only + existing backlog entries. + 2. Re-verify every finding still applies (re-grep / re-read). If a + sibling no longer applies, remove it from `fix_backlog`; if the + primary no longer applies, continue to the next primary candidate. + 3. Apply the fix or batch. If the combined diff exceeds the + localized-fix bar or touches a non-allowlisted path, abandon and + continue. + 4. If the category sets `test_required: true`, run the per-package test target (see the mapping table in "Localized fix bar" above) - for the package containing the change. On failure: abandon and + for the package containing the change(s). On failure: abandon and continue. - 4. Branch: `agentic-ci//-YYYYMMDD-`. Commit: + 5. Branch: `agentic-ci//-YYYYMMDD-`. Commit: `(agentic-ci): `. Push. - 5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the - hidden metadata block: + 6. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including one + hidden metadata block per fixed finding: `` - 6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft` + 7. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft` iff `draft_until_proven` is true for the suite. - 7. `gh pr edit --add-label agentic-ci --add-label agentic-ci/`. - 8. Record `attempted_fixes` entry with `outcome: "open"` and exit. + 8. `gh pr edit --add-label agentic-ci --add-label agentic-ci/`. + 9. Record one `attempted_fixes` entry per fixed finding with + `outcome: "open"` and exit. 5. If all 5 candidates were abandoned, append a one-line note to the report and exit cleanly. The state already reflects the abandonments. On any failure mid-flow: record `outcome: "abandoned"` for the chosen -finding (with `pr_number: null`), leave any pushed branch in place +finding(s) (with `pr_number: null`), leave any pushed branch in place (`pr-stale.yml` will reap it; branch deletion is forbidden), and continue to the next candidate. @@ -223,6 +235,8 @@ to the next candidate. interactive-only and shells the body inline; CI needs determinism. - **Title**: conventional, `(agentic-ci): `. - **Labels**: `agentic-ci`, `agentic-ci/`. +- **Batch markers**: batch PRs include one hidden finding marker per fixed + finding so crash recovery can reconstruct every `attempted_fixes` entry. - **Draft PRs**: `code-quality` opens draft until a maintainer flips `draft_until_proven` to `false` in runner-state, after at least two non-draft PRs from that suite have landed clean. This flip is diff --git a/.agents/recipes/_phase-fix.md b/.agents/recipes/_phase-fix.md index 44449dd8e..3d6fc06c1 100644 --- a/.agents/recipes/_phase-fix.md +++ b/.agents/recipes/_phase-fix.md @@ -16,9 +16,13 @@ This invocation runs the **FIX** phase only. codebase to discover new findings is forbidden. - Pick the highest-ranked eligible candidate from `fix_backlog`, apply the fix, run the package's tests if applicable, commit, push, and open - the PR using `gh pr create --body-file`. + the PR using `gh pr create --body-file`. If the recipe and + `_fix-policy.md` declare the category batchable, you may add sibling + entries from the existing `fix_backlog` after re-verifying each one. + Do not scan for findings that are not already in `fix_backlog`. - Record the attempt in `attempted_fixes` (whether successful, abandoned, - or failed through the top-5 fallback) before exiting. + or failed through the top-5 fallback) before exiting. Batch PRs record + one attempt per fixed finding, all pointing to the same PR and branch. - If no candidate qualifies after trying up to 5 of them, exit cleanly, append a short note to `/tmp/audit-{{suite}}.md` describing what was tried, and update `attempted_fixes` accordingly. Do NOT open a PR. diff --git a/.agents/recipes/structure/recipe.md b/.agents/recipes/structure/recipe.md index 3df885ecd..64b29da9c 100644 --- a/.agents/recipes/structure/recipe.md +++ b/.agents/recipes/structure/recipe.md @@ -223,6 +223,12 @@ Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: | missing-future | `chore` | yes | Insert `from __future__ import annotations` after the SPDX header block, before other imports. Fully deterministic. Tests required because `__future__` annotations can affect introspection-heavy code paths. | | lazy-import | `refactor` | yes | Move a top-level heavy import (pandas/numpy/polars/torch/duckdb/sqlfluff/faker) to the `data_designer.lazy_heavy_imports` accessor pattern. Eligible only when (a) file is under `packages/*/src/`, (b) the module is already wired in the lazy system, (c) the heavy module is used only inside function bodies. | +`missing-future` is batchable: when the primary candidate is +`missing-future`, include other `missing-future` backlog entries with the +same `test_target` if each file still lacks the import and the combined +diff remains within the localized-fix bar. Run the shared test target once. +Use one hidden finding marker and one `attempted_fixes` entry per file. + **Not eligible** — stays report-only: - Import boundary violations (architectural judgement). diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index 4a91c31d5..69270d369 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -150,7 +150,7 @@ jobs: if [ -n "$ANTHROPIC_BASE_URL" ] && [ -n "$ANTHROPIC_API_KEY" ]; then HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \ - --max-time 10 \ + --max-time 30 \ -X POST "${ANTHROPIC_BASE_URL}/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: ${ANTHROPIC_API_KEY}" \ From 4acac75d97cb0fbb28d5e572defbb1eddc282e49 Mon Sep 17 00:00:00 2001 From: Andre Manoel Date: Wed, 13 May 2026 16:07:25 +0000 Subject: [PATCH 3/5] fix(agentic-ci): harden CI recipes and preflights --- .agents/recipes/_runner.md | 3 +++ .agents/recipes/test-health/recipe.md | 18 ++++++++++++++++++ .github/workflows/agentic-ci-issue-triage.yml | 2 +- .github/workflows/agentic-ci-pr-review.yml | 2 +- 4 files changed, 23 insertions(+), 2 deletions(-) diff --git a/.agents/recipes/_runner.md b/.agents/recipes/_runner.md index 9705633d5..9480a31b0 100644 --- a/.agents/recipes/_runner.md +++ b/.agents/recipes/_runner.md @@ -67,6 +67,9 @@ Rules: passwords) in your output, even if you encounter them in code. - **Stay in scope.** Only perform the task described in the recipe. Do not explore unrelated areas of the codebase. +- **No subagents.** Do not use Task, Explore, or other delegated/local agents. + The CI key may not have access to their default models; do the work in the + main agent session. - **Cost awareness.** Minimize unnecessary file reads and tool calls. If you have the information you need, stop. diff --git a/.agents/recipes/test-health/recipe.md b/.agents/recipes/test-health/recipe.md index 2224684ff..64aacbad3 100644 --- a/.agents/recipes/test-health/recipe.md +++ b/.agents/recipes/test-health/recipe.md @@ -32,6 +32,24 @@ update `baselines` with current values and `known_issues` with new findings. ## Instructions +### Turn budget + +This suite must finish before the `max_turns` limit. Do not attempt a +repo-wide test audit in one run. + +1. Read runner memory. +2. Write `/tmp/audit-{{suite}}.md` immediately with the required headings and + empty tables. If the run is interrupted later, the workflow must still have + a usable partial report. +3. Use targeted searches to find candidates, then read only the files needed + to verify a specific finding. +4. Stop after either: + - 20 tool calls + - 2 new findings in a section + - all sections have been sampled +5. Finalize the report, update runner memory, and stop. If no new findings + were verified, replace the report with `NO_FINDINGS`. + ### 1. Test-to-source coverage mapping Map source files to their corresponding test files: diff --git a/.github/workflows/agentic-ci-issue-triage.yml b/.github/workflows/agentic-ci-issue-triage.yml index dd51153c2..a93acee78 100644 --- a/.github/workflows/agentic-ci-issue-triage.yml +++ b/.github/workflows/agentic-ci-issue-triage.yml @@ -56,7 +56,7 @@ jobs: if [ -n "$ANTHROPIC_BASE_URL" ] && [ -n "$ANTHROPIC_API_KEY" ]; then HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \ - --max-time 10 \ + --max-time 30 \ -X POST "${ANTHROPIC_BASE_URL}/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: ${ANTHROPIC_API_KEY}" \ diff --git a/.github/workflows/agentic-ci-pr-review.yml b/.github/workflows/agentic-ci-pr-review.yml index d2b1a8915..59e5da32d 100644 --- a/.github/workflows/agentic-ci-pr-review.yml +++ b/.github/workflows/agentic-ci-pr-review.yml @@ -193,7 +193,7 @@ jobs: # Quick API check (custom endpoint only) if [ -n "$ANTHROPIC_BASE_URL" ] && [ -n "$ANTHROPIC_API_KEY" ]; then HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \ - --max-time 10 \ + --max-time 30 \ -X POST "${ANTHROPIC_BASE_URL}/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: ${ANTHROPIC_API_KEY}" \ From f650384074be905c9ef6445967d2d65c08cc2119 Mon Sep 17 00:00:00 2001 From: Andre Manoel Date: Thu, 14 May 2026 18:06:01 +0000 Subject: [PATCH 4/5] fix(agentic-ci): align recipe turn budgets --- .agents/recipes/_fix-policy.md | 6 +++++- .agents/recipes/code-quality/recipe.md | 2 +- .agents/recipes/structure/recipe.md | 7 ++++--- .github/workflows/agentic-ci-daily.yml | 4 +++- 4 files changed, 13 insertions(+), 6 deletions(-) diff --git a/.agents/recipes/_fix-policy.md b/.agents/recipes/_fix-policy.md index 37147d8d4..dbb509e88 100644 --- a/.agents/recipes/_fix-policy.md +++ b/.agents/recipes/_fix-policy.md @@ -95,6 +95,9 @@ entries may point to the same `pr_number` and `branch`. - Cap at 200 entries (drop oldest by `first_seen`). - Populated **before** the `known_issues` filter so fixable findings persist even when their report row is suppressed for being unchanged. +- Batchable categories must include enough information in `data` to group + siblings safely. For package-scoped Python fixes, derive `test_target` from + the package containing the source file. ### `attempted_fixes` rules @@ -200,7 +203,8 @@ declare only the parts that vary (eligible categories, branch type, 1. If the suite declares the category batchable, collect sibling `fix_backlog` entries for the same suite/category that share the same test target and branch type. Do not discover new findings; use only - existing backlog entries. + existing backlog entries. Batch at most 3 entries to stay within the + localized-fix file cap. 2. Re-verify every finding still applies (re-grep / re-read). If a sibling no longer applies, remove it from `fix_backlog`; if the primary no longer applies, continue to the next primary candidate. diff --git a/.agents/recipes/code-quality/recipe.md b/.agents/recipes/code-quality/recipe.md index 268e0adff..764e34903 100644 --- a/.agents/recipes/code-quality/recipe.md +++ b/.agents/recipes/code-quality/recipe.md @@ -4,7 +4,7 @@ description: Audit code quality gaps not covered by ruff - complexity trends, ex trigger: schedule tool: claude-code timeout_minutes: 20 -max_turns: 30 +max_turns: 50 permissions: contents: write --- diff --git a/.agents/recipes/structure/recipe.md b/.agents/recipes/structure/recipe.md index 64b29da9c..1960974f5 100644 --- a/.agents/recipes/structure/recipe.md +++ b/.agents/recipes/structure/recipe.md @@ -4,7 +4,7 @@ description: Audit structural integrity - import boundaries, lazy import complia trigger: schedule tool: claude-code timeout_minutes: 20 -max_turns: 30 +max_turns: 50 permissions: contents: write --- @@ -226,8 +226,9 @@ Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: `missing-future` is batchable: when the primary candidate is `missing-future`, include other `missing-future` backlog entries with the same `test_target` if each file still lacks the import and the combined -diff remains within the localized-fix bar. Run the shared test target once. -Use one hidden finding marker and one `attempted_fixes` entry per file. +diff remains within the localized-fix bar. Batch at most 3 files. Run the +shared test target once. Use one hidden finding marker and one +`attempted_fixes` entry per file. **Not eligible** — stays report-only: diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index 69270d369..036585c83 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -191,7 +191,7 @@ jobs: /^---$/ { section++; next } section == 1 && $1 == "max_turns" { print $2; exit } section == 2 { exit } - ' "${RECIPE_DIR}/recipe.md") + ' "${RECIPE_DIR}/recipe.md" | grep -oE '[0-9]+' | head -n1) MAX_TURNS=${MAX_TURNS:-50} echo "Using max turns: ${MAX_TURNS}" @@ -260,6 +260,8 @@ jobs: | sed "s|{{date}}|$(date -u +%Y-%m-%d)|g" \ | sed "s|{{memory_path}}|.agentic-ci-state|g") + # Keep fix-phase turns fixed at 50; the audit budget is the + # suite-tuned scan limit, while fixes are bounded by scope gates. stdbuf -oL -eL claude \ --model "$AGENTIC_CI_MODEL" \ -p "$PROMPT" \ From 4274a350a60eb4e15862206b36a1106743d669e3 Mon Sep 17 00:00:00 2001 From: Andre Manoel Date: Thu, 14 May 2026 23:44:06 +0000 Subject: [PATCH 5/5] fix(agentic-ci): prune stale primary backlog entries --- .agents/recipes/_fix-policy.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.agents/recipes/_fix-policy.md b/.agents/recipes/_fix-policy.md index dbb509e88..81af53238 100644 --- a/.agents/recipes/_fix-policy.md +++ b/.agents/recipes/_fix-policy.md @@ -207,7 +207,8 @@ declare only the parts that vary (eligible categories, branch type, localized-fix file cap. 2. Re-verify every finding still applies (re-grep / re-read). If a sibling no longer applies, remove it from `fix_backlog`; if the - primary no longer applies, continue to the next primary candidate. + primary no longer applies, remove it from `fix_backlog` and continue + to the next primary candidate. 3. Apply the fix or batch. If the combined diff exceeds the localized-fix bar or touches a non-allowlisted path, abandon and continue.