Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 38 additions & 19 deletions .agents/recipes/_fix-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,9 @@ A finding may be converted to a fix only if all hold:
| `packages/data-designer-config` | `make test-config` |
| `packages/data-designer-engine` | `make test-engine` |
| `packages/data-designer` | `make test-interface` |
- **Single concern**: one finding per PR.
- **Single concern**: one finding per PR, except suite-declared batchable
mechanical fixes. A batch must share one suite/category and satisfy the
localized-fix bar as a single combined diff.
- **Allowlisted paths**: matches the suite's path allowlist.

If the top-ranked candidate fails the bar, try the next. If none of the top
Expand Down Expand Up @@ -79,6 +81,9 @@ Each daily recipe maintains two arrays in
Also: `draft_until_proven` (boolean, per-suite, default `true` for
code-quality and unset elsewhere) controls draft-PR mode.

Batch PRs still record one `attempted_fixes` entry per finding. Multiple
entries may point to the same `pr_number` and `branch`.

### `fix_backlog` rules (audit phase populates this)

- Append every detected finding in an eligible category. If `id` is already
Expand All @@ -90,6 +95,9 @@ code-quality and unset elsewhere) controls draft-PR mode.
- Cap at 200 entries (drop oldest by `first_seen`).
- Populated **before** the `known_issues` filter so fixable findings persist
even when their report row is suppressed for being unchanged.
- Batchable categories must include enough information in `data` to group
siblings safely. For package-scoped Python fixes, derive `test_target` from
the package containing the source file.

### `attempted_fixes` rules

Expand All @@ -101,9 +109,9 @@ code-quality and unset elsewhere) controls draft-PR mode.
`open` attempts that have a `pr_number`: query the PR and flip the
attempt to `merged` or `closed` if it is no longer open. Then recover
from crashes that left state un-updated: list open PRs (`gh pr list`)
whose bodies contain the
`<!-- agentic-ci finding=<id> suite=<suite> -->` marker, parse out
each `<id>`, and back-fill any missing `attempted_fixes` entries with
whose bodies contain one or more
`<!-- agentic-ci finding=<id> suite=<suite> -->` markers, parse out
every `<id>`, and back-fill any missing `attempted_fixes` entries with
`outcome: "open"` and the parsed `pr_number` and `branch`.
- Prune: drop `merged` entries older than 90 days. Do **not** prune
`closed` or `abandoned` entries by age — pruning a single-strike entry
Expand Down Expand Up @@ -175,7 +183,7 @@ Earlier criteria override later ones:

4. **Recency** — newer findings rank above long-standing ones.

Record the chosen finding's id, scores, and rationale at the top of
Record the chosen finding id(s), scores, and rationale at the top of
`/tmp/audit-{{suite}}.md`.

## Standard fix procedure
Expand All @@ -191,29 +199,38 @@ declare only the parts that vary (eligible categories, branch type,
`merged`; surface two-strike entries in the report's
`Repeatedly-failed fix attempts` section and drop them from selection.
3. Rank the remainder per the Ranking section.
4. For each candidate, top 5 max:
1. Re-verify the finding still applies (re-grep / re-read). If not,
remove from `fix_backlog` and continue.
2. Apply the fix. If the diff exceeds the localized-fix bar or touches
a non-allowlisted path, abandon and continue.
3. If the category sets `test_required: true`, run the per-package
4. For each primary candidate, top 5 max:
1. If the suite declares the category batchable, collect sibling
`fix_backlog` entries for the same suite/category that share the same
test target and branch type. Do not discover new findings; use only
existing backlog entries. Batch at most 3 entries to stay within the
localized-fix file cap.
2. Re-verify every finding still applies (re-grep / re-read). If a
sibling no longer applies, remove it from `fix_backlog`; if the
primary no longer applies, remove it from `fix_backlog` and continue
to the next primary candidate.
3. Apply the fix or batch. If the combined diff exceeds the
localized-fix bar or touches a non-allowlisted path, abandon and
continue.
Comment thread
andreatgretel marked this conversation as resolved.
4. If the category sets `test_required: true`, run the per-package
test target (see the mapping table in "Localized fix bar" above)
for the package containing the change. On failure: abandon and
for the package containing the change(s). On failure: abandon and
continue.
4. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit:
5. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit:
`<type>(agentic-ci): <one-line>`. Push.
5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the
hidden metadata block:
6. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including one
hidden metadata block per fixed finding:
`<!-- agentic-ci finding=<id> suite=<suite> -->`
6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft`
7. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft`
iff `draft_until_proven` is true for the suite.
7. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`.
8. Record `attempted_fixes` entry with `outcome: "open"` and exit.
8. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`.
9. Record one `attempted_fixes` entry per fixed finding with
`outcome: "open"` and exit.
5. If all 5 candidates were abandoned, append a one-line note to the
report and exit cleanly. The state already reflects the abandonments.

On any failure mid-flow: record `outcome: "abandoned"` for the chosen
finding (with `pr_number: null`), leave any pushed branch in place
finding(s) (with `pr_number: null`), leave any pushed branch in place
(`pr-stale.yml` will reap it; branch deletion is forbidden), and continue
to the next candidate.

Expand All @@ -223,6 +240,8 @@ to the next candidate.
interactive-only and shells the body inline; CI needs determinism.
- **Title**: conventional, `<type>(agentic-ci): <one-line>`.
- **Labels**: `agentic-ci`, `agentic-ci/<suite>`.
- **Batch markers**: batch PRs include one hidden finding marker per fixed
finding so crash recovery can reconstruct every `attempted_fixes` entry.
- **Draft PRs**: `code-quality` opens draft until a maintainer flips
`draft_until_proven` to `false` in runner-state, after at least two
non-draft PRs from that suite have landed clean. This flip is
Expand Down
8 changes: 6 additions & 2 deletions .agents/recipes/_phase-fix.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,13 @@ This invocation runs the **FIX** phase only.
codebase to discover new findings is forbidden.
- Pick the highest-ranked eligible candidate from `fix_backlog`, apply
the fix, run the package's tests if applicable, commit, push, and open
the PR using `gh pr create --body-file`.
the PR using `gh pr create --body-file`. If the recipe and
`_fix-policy.md` declare the category batchable, you may add sibling
entries from the existing `fix_backlog` after re-verifying each one.
Do not scan for findings that are not already in `fix_backlog`.
- Record the attempt in `attempted_fixes` (whether successful, abandoned,
or failed through the top-5 fallback) before exiting.
or failed through the top-5 fallback) before exiting. Batch PRs record
one attempt per fixed finding, all pointing to the same PR and branch.
- If no candidate qualifies after trying up to 5 of them, exit cleanly,
append a short note to `/tmp/audit-{{suite}}.md` describing what was
tried, and update `attempted_fixes` accordingly. Do NOT open a PR.
Expand Down
3 changes: 3 additions & 0 deletions .agents/recipes/_runner.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ Rules:
passwords) in your output, even if you encounter them in code.
- **Stay in scope.** Only perform the task described in the recipe. Do not
explore unrelated areas of the codebase.
- **No subagents.** Do not use Task, Explore, or other delegated/local agents.
The CI key may not have access to their default models; do the work in the
main agent session.
- **Cost awareness.** Minimize unnecessary file reads and tool calls. If you
have the information you need, stop.

Expand Down
2 changes: 1 addition & 1 deletion .agents/recipes/code-quality/recipe.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Audit code quality gaps not covered by ruff - complexity trends, ex
trigger: schedule
tool: claude-code
timeout_minutes: 20
max_turns: 30
max_turns: 50
permissions:
contents: write
---
Expand Down
35 changes: 29 additions & 6 deletions .agents/recipes/docs-and-references/recipe.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,31 @@ even when their report row is suppressed for being unchanged.

## Instructions

### Turn budget

This suite must finish before the `max_turns` limit. Do not attempt a
repo-wide audit in one run.

1. Read runner memory.
2. Write `/tmp/audit-{{suite}}.md` immediately with the required headings and
empty tables. If the run is interrupted later, the workflow must still have
a usable partial report.
3. Use targeted searches to find candidates, then read only the files needed
to verify a specific finding.
4. Stop after either:
- 20 tool calls
- 2 new findings in a section
- all sections have been sampled
5. Finalize the report, update runner memory, and stop. If no new findings
were verified, replace the report with `NO_FINDINGS`.

### 1. Docstring vs signature drift

This repo uses Google-style docstrings (`Args:`, `Returns:`, `Raises:`).
Scan public functions and methods in `packages/` for mismatches between the
docstring and the actual function signature:
Sample public functions and methods in `packages/` for mismatches between the
docstring and the actual function signature. Do not scan every source file.
Use `rg "Args:|Returns:|Raises:" packages/*/src/ --glob '*.py'` to find
candidates, then inspect at most 5 high-value files:

- Parameters in the `Args:` section that no longer exist in the signature
- Parameters in the signature that are missing from `Args:`
Expand All @@ -60,14 +80,17 @@ Check links in these locations:
- `docs/` - MkDocs content links, code references, cross-page links
- `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md` - relative links

For each link, verify the target file or anchor exists. Report broken links
with the source file, line number, and broken target.
Use targeted link extraction and inspect at most 10 candidate links. Prefer
high-value docs and links changed recently. For each sampled link, verify the
target file or anchor exists. Report broken links with the source file, line
number, and broken target.

### 3. Architecture doc references

The 10 files in `architecture/` reference specific classes, functions, files,
and registries by name. These are high-value docs that agents and developers
rely on for orientation. For each code reference:
rely on for orientation. Sample at most 3 architecture files per run,
prioritizing files changed recently. For each code reference:
- Verify the referenced class, function, or module still exists at the stated
location
- If renamed or moved, flag with the old and new location
Expand Down Expand Up @@ -105,7 +128,7 @@ Review for accuracy against the current code:
- Check that autodoc module paths point to modules that still exist.

**Prioritize by risk of drift**: pages with the most code symbols referenced
are most likely to be stale. Don't read every page - sample 5-10 high-value
are most likely to be stale. Don't read every page - sample 3-5 high-value
pages and flag patterns.

## Output format
Expand Down
9 changes: 8 additions & 1 deletion .agents/recipes/structure/recipe.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Audit structural integrity - import boundaries, lazy import complia
trigger: schedule
tool: claude-code
timeout_minutes: 20
max_turns: 30
max_turns: 50
permissions:
contents: write
---
Expand Down Expand Up @@ -223,6 +223,13 @@ Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits:
| missing-future | `chore` | yes | Insert `from __future__ import annotations` after the SPDX header block, before other imports. Fully deterministic. Tests required because `__future__` annotations can affect introspection-heavy code paths. |
| lazy-import | `refactor` | yes | Move a top-level heavy import (pandas/numpy/polars/torch/duckdb/sqlfluff/faker) to the `data_designer.lazy_heavy_imports` accessor pattern. Eligible only when (a) file is under `packages/*/src/`, (b) the module is already wired in the lazy system, (c) the heavy module is used only inside function bodies. |

`missing-future` is batchable: when the primary candidate is
`missing-future`, include other `missing-future` backlog entries with the
same `test_target` if each file still lacks the import and the combined
diff remains within the localized-fix bar. Batch at most 3 files. Run the
shared test target once. Use one hidden finding marker and one
`attempted_fixes` entry per file.

**Not eligible** — stays report-only:

- Import boundary violations (architectural judgement).
Expand Down
18 changes: 18 additions & 0 deletions .agents/recipes/test-health/recipe.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,24 @@ update `baselines` with current values and `known_issues` with new findings.

## Instructions

### Turn budget

This suite must finish before the `max_turns` limit. Do not attempt a
repo-wide test audit in one run.

1. Read runner memory.
2. Write `/tmp/audit-{{suite}}.md` immediately with the required headings and
empty tables. If the run is interrupted later, the workflow must still have
a usable partial report.
3. Use targeted searches to find candidates, then read only the files needed
to verify a specific finding.
4. Stop after either:
- 20 tool calls
- 2 new findings in a section
- all sections have been sampled
5. Finalize the report, update runner memory, and stop. If no new findings
were verified, replace the report with `NO_FINDINGS`.

### 1. Test-to-source coverage mapping

Map source files to their corresponding test files:
Expand Down
13 changes: 11 additions & 2 deletions .github/workflows/agentic-ci-daily.yml
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ jobs:

if [ -n "$ANTHROPIC_BASE_URL" ] && [ -n "$ANTHROPIC_API_KEY" ]; then
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
--max-time 10 \
--max-time 30 \
-X POST "${ANTHROPIC_BASE_URL}/v1/messages" \
-H "Content-Type: application/json" \
-H "x-api-key: ${ANTHROPIC_API_KEY}" \
Expand Down Expand Up @@ -187,6 +187,13 @@ jobs:
RUNNER_CTX=$(cat .agents/recipes/_runner.md)
FIX_POLICY=$(cat .agents/recipes/_fix-policy.md)
RECIPE_BODY=$(sed '1,/^---$/{ /^---$/,/^---$/d }' "${RECIPE_DIR}/recipe.md")
MAX_TURNS=$(awk -F': *' '
/^---$/ { section++; next }
section == 1 && $1 == "max_turns" { print $2; exit }
section == 2 { exit }
' "${RECIPE_DIR}/recipe.md" | grep -oE '[0-9]+' | head -n1)
MAX_TURNS=${MAX_TURNS:-50}
echo "Using max turns: ${MAX_TURNS}"

PROMPT=$(printf '%s\n\n%s\n\n%s\n\n%s\n' "${PHASE_DIRECTIVE}" "${RUNNER_CTX}" "${FIX_POLICY}" "${RECIPE_BODY}" \
| sed "s|{{suite}}|${SUITE}|g" \
Expand All @@ -196,7 +203,7 @@ jobs:
stdbuf -oL -eL claude \
--model "$AGENTIC_CI_MODEL" \
-p "$PROMPT" \
--max-turns 50 \
--max-turns "$MAX_TURNS" \
--output-format stream-json \
--verbose \
2>&1 | tee /tmp/claude-audit-log.txt
Expand Down Expand Up @@ -253,6 +260,8 @@ jobs:
| sed "s|{{date}}|$(date -u +%Y-%m-%d)|g" \
| sed "s|{{memory_path}}|.agentic-ci-state|g")

# Keep fix-phase turns fixed at 50; the audit budget is the
# suite-tuned scan limit, while fixes are bounded by scope gates.
stdbuf -oL -eL claude \
--model "$AGENTIC_CI_MODEL" \
-p "$PROMPT" \
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/agentic-ci-issue-triage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ jobs:

if [ -n "$ANTHROPIC_BASE_URL" ] && [ -n "$ANTHROPIC_API_KEY" ]; then
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
--max-time 10 \
--max-time 30 \
-X POST "${ANTHROPIC_BASE_URL}/v1/messages" \
-H "Content-Type: application/json" \
-H "x-api-key: ${ANTHROPIC_API_KEY}" \
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/agentic-ci-pr-review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ jobs:
# Quick API check (custom endpoint only)
if [ -n "$ANTHROPIC_BASE_URL" ] && [ -n "$ANTHROPIC_API_KEY" ]; then
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
--max-time 10 \
--max-time 30 \
-X POST "${ANTHROPIC_BASE_URL}/v1/messages" \
-H "Content-Type: application/json" \
-H "x-api-key: ${ANTHROPIC_API_KEY}" \
Expand Down
Loading