From d206bb33f8a2fc35164678f2899a95b0085e70a5 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Mon, 27 Apr 2026 10:12:50 -0600 Subject: [PATCH 1/2] feat: add bug-triage skill Adds a Claude skill that reads the project bug triage guide, classifies open requires-triage issues, and files a dated summary issue with recommended priority and area labels for a human to review and apply. --- .claude/skills/bug-triage/SKILL.md | 136 +++++++++++++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 .claude/skills/bug-triage/SKILL.md diff --git a/.claude/skills/bug-triage/SKILL.md b/.claude/skills/bug-triage/SKILL.md new file mode 100644 index 0000000000..20bc7f08a8 --- /dev/null +++ b/.claude/skills/bug-triage/SKILL.md @@ -0,0 +1,136 @@ +--- +name: bug-triage +description: Triage open Comet issues marked `requires-triage` per the project bug triage guide and file a GitHub issue summarizing recommended priority, area labels, and `good first issue` candidates. A human reviews the summary issue and applies the labels. +--- + +Run a bug triage pass for the `apache/datafusion-comet` repository. + +## Overview + +This skill produces a triage report as a new GitHub issue. It does NOT apply +labels, close issues, or comment on the triaged issues. A human reviewer reads +the summary issue, applies the recommended labels, and closes the summary issue +when satisfied. + +The triage criteria come from the project's own guide. Read it before doing any +classification work; do not rely on memory. + +## Step 1: Read the Triage Guide + +Read the canonical guide in this repository: + +``` +docs/source/contributor-guide/bug_triage.md +``` + +Use the priority decision tree, escalation triggers, area labels, and +prioritization principles from that guide. If the guide and this skill ever +disagree, the guide wins. Do not paraphrase the guide; quote the labels and +criteria verbatim when classifying. + +## Step 2: Gather Issues That Need Triage + +Fetch all open issues labeled `requires-triage`: + +```bash +gh issue list \ + --repo apache/datafusion-comet \ + --label requires-triage \ + --state open \ + --limit 200 \ + --json number,title,author,createdAt,labels,body,url +``` + +If the list is empty, stop and tell the user there is nothing to triage. Do not +file an empty summary issue. + +## Step 3: Classify Each Issue + +For each issue, review the title and body and determine: + +1. **Priority label** (exactly one): apply the decision tree from the guide. + - `priority:critical` if it could produce silent wrong results + - `priority:high` for crashes, panics, segfaults, NPEs on supported paths + - `priority:medium` for functional bugs / perf regressions with workarounds + - `priority:low` for test-only, CI flakes, tooling, cosmetic +2. **Area labels** (zero or more): from the area table in the guide + (`area:writer`, `area:shuffle`, `area:aggregation`, `area:scan`, + `area:expressions`, `area:ffi`, `area:ci`) plus the pre-existing area + indicators (`native_datafusion`, `native_iceberg_compat`, `spark 4`, + `spark sql tests`). +3. **`good first issue`**: only if the fix is likely straightforward AND + well-scoped. When in doubt, leave it off. +4. **Escalation note**: if the issue matches an escalation trigger from the + guide (e.g., a `priority:high` crash that may also produce wrong results), + call it out. +5. **Needs more info**: if the report has no reproduction steps, mark it as + needing a follow-up question to the reporter rather than guessing. + +If you cannot confidently classify an issue from its title and body, say so +explicitly in the summary instead of guessing. + +Do NOT edit, label, or comment on the triaged issues. Recommendations only. + +## Step 4: File the Summary Issue + +Compute today's date in `YYYY-MM-DD` form (use the system date, not memory): + +```bash +TRIAGE_DATE=$(date -u +%Y-%m-%d) +``` + +Title: `Bug triage results: ${TRIAGE_DATE}` + +Body: a markdown table or per-issue section listing, for each triaged issue: + +- Issue number and title (linked) +- Recommended priority label +- Recommended area labels +- `good first issue`? (yes/no) +- Escalation / needs-more-info notes +- One-sentence rationale tying the recommendation to the guide + +Include a short header that: + +- States the date, the number of issues triaged, and the count per priority +- Reminds reviewers that nothing has been labeled yet, and asks them to apply + the recommended labels (and remove `requires-triage`) before closing this + summary issue +- Links to `docs/source/contributor-guide/bug_triage.md` + +File the issue with `gh`: + +```bash +gh issue create \ + --repo apache/datafusion-comet \ + --title "Bug triage results: ${TRIAGE_DATE}" \ + --body-file <(cat <<'EOF' +... summary markdown here ... +EOF +) +``` + +Do not add labels to the summary issue itself. The human reviewer decides +whether to label it (and will close it when done). + +After creating the issue, print its URL so the reviewer can find it. + +## Output to the User + +Report back: + +1. Number of `requires-triage` issues processed +2. Count per recommended priority +3. URL of the new summary issue + +Do not paste the full triage table back into the chat; it is in the issue. + +## What This Skill Must Not Do + +- Do not apply labels to the triaged issues +- Do not remove `requires-triage` from any issue +- Do not comment on the triaged issues +- Do not close any issue +- Do not file the summary issue if there were zero `requires-triage` issues +- Do not invent priority or area labels that are not in the guide +- Do not include AI/Claude attribution in the summary issue From 2b536edf529b0d0f4d71ef1065376596c7ec1496 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Mon, 27 Apr 2026 10:14:45 -0600 Subject: [PATCH 2/2] feat: have bug-triage skill apply labels directly Reverses the earlier read-only behavior. The skill now applies the recommended priority and area labels and removes requires-triage; the dated summary issue records what was done so a human reviewer can spot-check and close it. --- .claude/skills/bug-triage/SKILL.md | 123 +++++++++++++++++++---------- 1 file changed, 80 insertions(+), 43 deletions(-) diff --git a/.claude/skills/bug-triage/SKILL.md b/.claude/skills/bug-triage/SKILL.md index 20bc7f08a8..eefe1934a4 100644 --- a/.claude/skills/bug-triage/SKILL.md +++ b/.claude/skills/bug-triage/SKILL.md @@ -1,16 +1,23 @@ --- name: bug-triage -description: Triage open Comet issues marked `requires-triage` per the project bug triage guide and file a GitHub issue summarizing recommended priority, area labels, and `good first issue` candidates. A human reviews the summary issue and applies the labels. +description: Triage open Comet issues marked `requires-triage` per the project bug triage guide. Applies the recommended priority and area labels (and `good first issue` where appropriate), removes `requires-triage`, and files a dated summary issue listing what was done. A human reviews the summary issue and closes it when satisfied. --- Run a bug triage pass for the `apache/datafusion-comet` repository. ## Overview -This skill produces a triage report as a new GitHub issue. It does NOT apply -labels, close issues, or comment on the triaged issues. A human reviewer reads -the summary issue, applies the recommended labels, and closes the summary issue -when satisfied. +This skill triages every open issue carrying the `requires-triage` label. For +each one it: + +1. Decides a priority and area labels using the project's triage guide. +2. Applies those labels via `gh`. +3. Removes the `requires-triage` label. +4. Records the decision (with rationale) in a single dated summary issue. + +A human reviewer reads the summary issue, sanity-checks the calls, and closes +it when satisfied. Any label correction is done by the reviewer directly on the +affected issue. The triage criteria come from the project's own guide. Read it before doing any classification work; do not rely on memory. @@ -42,7 +49,7 @@ gh issue list \ ``` If the list is empty, stop and tell the user there is nothing to triage. Do not -file an empty summary issue. +file an empty summary issue and do not modify any labels. ## Step 3: Classify Each Issue @@ -62,75 +69,105 @@ For each issue, review the title and body and determine: well-scoped. When in doubt, leave it off. 4. **Escalation note**: if the issue matches an escalation trigger from the guide (e.g., a `priority:high` crash that may also produce wrong results), - call it out. -5. **Needs more info**: if the report has no reproduction steps, mark it as - needing a follow-up question to the reporter rather than guessing. + note it in the summary. -If you cannot confidently classify an issue from its title and body, say so -explicitly in the summary instead of guessing. +## Step 4: Skip Issues You Cannot Confidently Classify -Do NOT edit, label, or comment on the triaged issues. Recommendations only. +If an issue lacks reproduction steps or is otherwise too ambiguous to classify +with confidence: -## Step 4: File the Summary Issue +- **Do not** apply a priority label. +- **Do not** remove `requires-triage`. +- **Do not** comment on the issue or ask the reporter for more info from this + skill (that is the human reviewer's call). +- Record it in the summary under a "Skipped — needs more info" section so the + reviewer can follow up. -Compute today's date in `YYYY-MM-DD` form (use the system date, not memory): +Guessing is worse than skipping. + +## Step 5: Apply Labels + +For each issue you classified in Step 3, apply the labels and remove +`requires-triage` in a single `gh` call: ```bash -TRIAGE_DATE=$(date -u +%Y-%m-%d) +gh issue edit \ + --repo apache/datafusion-comet \ + --add-label "priority:high,area:expressions" \ + --remove-label "requires-triage" ``` -Title: `Bug triage results: ${TRIAGE_DATE}` +Notes: -Body: a markdown table or per-issue section listing, for each triaged issue: +- Pass the labels as a single comma-separated string (no spaces around commas). +- Quote labels that contain spaces (e.g., `"spark 4"`). +- Only add labels that already exist in the repo. If a label from the guide is + missing in the repo, skip it for that issue and record a note in the summary + rather than creating new labels. +- Do not comment on the issue. -- Issue number and title (linked) -- Recommended priority label -- Recommended area labels -- `good first issue`? (yes/no) -- Escalation / needs-more-info notes -- One-sentence rationale tying the recommendation to the guide +If `gh issue edit` fails for any issue, leave that issue's `requires-triage` +label intact and record the failure in the summary under a "Failed to label" +section. -Include a short header that: +## Step 6: File the Summary Issue -- States the date, the number of issues triaged, and the count per priority -- Reminds reviewers that nothing has been labeled yet, and asks them to apply - the recommended labels (and remove `requires-triage`) before closing this - summary issue -- Links to `docs/source/contributor-guide/bug_triage.md` +Compute today's date in `YYYY-MM-DD` form (use the system date, not memory): -File the issue with `gh`: +```bash +TRIAGE_DATE=$(date -u +%Y-%m-%d) +``` + +Title: `Bug triage results: ${TRIAGE_DATE}` + +Body: a markdown report with these sections, in this order: + +1. **Header** + - Date, total issues processed, and counts per priority + - Link to `docs/source/contributor-guide/bug_triage.md` + - Note that labels have already been applied; the reviewer should spot-check + and close this issue when satisfied +2. **Triaged** — table with one row per issue: + - Issue (linked, e.g., `#1234`) + - Priority applied + - Area labels applied + - `good first issue`? (yes/no) + - One-sentence rationale tying the call to the guide +3. **Escalations to consider** (omit section if empty) +4. **Skipped — needs more info** (omit if empty) +5. **Failed to label** (omit if empty) — issue number plus the `gh` error + +File the issue with `gh`. Use a temp file for the body to keep quoting sane: ```bash gh issue create \ --repo apache/datafusion-comet \ --title "Bug triage results: ${TRIAGE_DATE}" \ - --body-file <(cat <<'EOF' -... summary markdown here ... -EOF -) + --body-file /tmp/triage-summary-${TRIAGE_DATE}.md ``` -Do not add labels to the summary issue itself. The human reviewer decides -whether to label it (and will close it when done). +Do not add labels to the summary issue itself. -After creating the issue, print its URL so the reviewer can find it. +After creating the issue, print its URL. ## Output to the User Report back: 1. Number of `requires-triage` issues processed -2. Count per recommended priority -3. URL of the new summary issue +2. Counts per priority that were applied +3. Number skipped (needs more info) and number failed +4. URL of the new summary issue Do not paste the full triage table back into the chat; it is in the issue. ## What This Skill Must Not Do -- Do not apply labels to the triaged issues -- Do not remove `requires-triage` from any issue +- Do not invent priority or area labels that are not in the guide +- Do not create new labels in the repo - Do not comment on the triaged issues -- Do not close any issue +- Do not close any triaged issue - Do not file the summary issue if there were zero `requires-triage` issues -- Do not invent priority or area labels that are not in the guide +- Do not re-label issues that were skipped or failed (leave `requires-triage` + in place so they show up in the next pass) - Do not include AI/Claude attribution in the summary issue