feat(triage): freeform maintainer directives on reproduced issues#1306
Conversation
Adds maintainer-reply.yml: when an issue is in triage/reproduced or triage/by-design, an authorized maintainer can comment `@emdashbot <directive>` to direct an implementation. A small Flue classifier maps the freeform intent (implement/close/takeover/unclear); an implement directive fires a maintainer-directive repository_dispatch at investigate.yml, which runs a directed investigation (overriding the judgment and fix gates) and routes the produced fix through the existing awaiting-reporter loop, where confirm/reject already lives. Closes the gap where the bot reproduced a bug but deferred the fix (e.g. needs-design-decision with options) and there was no way to reply.
|
PR template validation failedPlease fix the following issues by editing your PR description:
See CONTRIBUTING.md for the full contribution policy. |
Scope checkThis PR changes 640 lines across 6 files. Large PRs are harder to review and more likely to be closed without review. If this scope is intentional, no action needed. A maintainer will review it. If not, please consider splitting this into smaller PRs. See CONTRIBUTING.md for contribution guidelines. |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | cae0c6f | Jun 03 2026, 06:31 AM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-playground | cae0c6f | Jun 03 2026, 06:31 AM |
@emdash-cms/admin
@emdash-cms/auth
@emdash-cms/auth-atproto
@emdash-cms/blocks
@emdash-cms/cloudflare
@emdash-cms/contentful-to-portable-text
emdash
create-emdash
@emdash-cms/gutenberg-to-portable-text
@emdash-cms/plugin-cli
@emdash-cms/plugin-types
@emdash-cms/registry-client
@emdash-cms/registry-lexicons
@emdash-cms/sandbox-workerd
@emdash-cms/x402
@emdash-cms/plugin-ai-moderation
@emdash-cms/plugin-atproto
@emdash-cms/plugin-audit-log
@emdash-cms/plugin-color
@emdash-cms/plugin-embeds
@emdash-cms/plugin-field-kit
@emdash-cms/plugin-forms
@emdash-cms/plugin-webhook-notifier
commit: |
There was a problem hiding this comment.
Pull request overview
Adds a maintainer-driven “freeform directive” path to the triage bot so maintainers can respond to reproduced / by-design investigations with authoritative guidance (e.g., choose an option), triggering a directed investigate run that can bypass judgment gates while still respecting capability gates.
Changes:
- Introduces a new
maintainer-replyworkflow that authorizes maintainers via the GitHub permission API, classifies intent, and dispatches a directed investigate run (or relabels/disengages/asks to clarify). - Threads a maintainer
directivethroughinvestigate.ymlinto the Flue investigate payload asmaintainerDirective, and updates outcome comment wording for directed runs. - Adds a new Flue classifier workflow + schema for maintainer intent classification, and documents the new workflow in
.flue/README.md.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/maintainer-reply.yml | New workflow to authorize + classify maintainer @emdashbot directives and act (dispatch/close/takeover/unclear). |
| .github/workflows/investigate.yml | Accepts directive input/dispatch payload; passes it into agent payload and adjusts directed-run messaging. |
| .flue/workflows/investigate.ts | Adds maintainerDirective to context; bypasses judgment/fix gates when directed while keeping capability gates. |
| .flue/workflows/classify-maintainer-reply.ts | New lightweight classifier to map maintainer replies to a fixed intent + extracted directive. |
| .flue/README.md | Documents the new workflow and classifier in the bot architecture overview. |
| .flue/lib/classifier.ts | Adds maintainerIntentSchema and associated type for structured maintainer intent output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # The directive is attacker-shaped multiline text like the body and | ||
| # retry context; same treatment -- write to /tmp, never to a step | ||
| # output. Empty unless a maintainer-directive dispatch set it. | ||
| printf '%s' "$DIRECTIVE" > /tmp/ctx-directive.txt |
| // Triggered by .github/workflows/maintainer-reply.yml when a maintainer | ||
| // (OWNER/MEMBER/COLLABORATOR) addresses `@emdashbot` on an issue carrying | ||
| // a `triage/*` label. The workflow YAML reads the intent from this run's | ||
| // output and decides whether to dispatch a directed investigate run, flag the | ||
| // issue as by-design, disengage, or ask for clarification. |
| if [[ -n "$REASONING" ]]; then | ||
| echo | ||
| echo "> ${REASONING}" |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-demo-cache | cae0c6f | Jun 03 2026, 06:31 AM |
proceed and steer resolve to the same implement action, so the classifier shouldn't pick unclear when the only dilemma is between them. Reword the prompt: unclear is for no actionable instruction at all; the cost caution applies to close/takeover, not to choosing a fix.
There was a problem hiding this comment.
This is a well-scoped, architecturally sound feature. The gap it fills is real — when the bot reproduces an issue but defers because of needs-design-decision, maintainers previously had no way to direct the next step. Routing the produced fix through the existing awaiting-reporter / reporter-reply.yml loop is the right reuse choice, and gating the two comment-triggered workflows on disjoint label states prevents double-fire.
I traced the security surface, the shell-safety discipline, and the investigate.ts gate logic against the PR description. The new workflow mirrors reporter-reply.yml's defensive patterns correctly: permission-API auth rather than spoofable author_association, --rawfile for attacker-shaped input, intent whitelisting before step outputs, and repository_dispatch scoped to contents:write. The directed override in investigate.ts correctly bypasses only the judgment gates (bug classification, intended-behavior verdict, needs-design-decision fix gate) while preserving the capability gates (reproduction, fix verification), exactly as claimed.
The implementation is clean, but I found one logic bug in the new classifier: an empty bot-context file passes "" through to payload.botContext, and ?? does not fall back, so the model can be fed an empty context section instead of the fallback instructions.
No changeset is required; neither .flue nor .github is a published package.
| payload.botContext ?? | ||
| "(unavailable; assume the bot reproduced the issue and either proposed options or pushed a candidate fix)", |
There was a problem hiding this comment.
[needs fixing] The orchestrator writes "" to /tmp/bot-context.txt when there are no bot comments (e.g. a manually labelled issue or deleted comments), and --rawfile passes that empty string through as payload.botContext. The nullish-coalescing operator ?? only falls through on null/undefined, so the classifier prompt ends up with an empty ## Bot's investigation section instead of the intended fallback text. This degrades classification quality because the model loses the explicit cue to assume the bot reproduced the issue and proposed options.
Fix by using a truthiness check so empty/whitespace-only contexts still get the fallback:
| payload.botContext ?? | |
| "(unavailable; assume the bot reproduced the issue and either proposed options or pushed a candidate fix)", | |
| payload.botContext?.trim() || | |
| "(unavailable; assume the bot reproduced the issue and either proposed options or pushed a candidate fix)", |
They resolved to the same action, so the split only gave the classifier a cosmetic decision to make (and a section of prompt apologising for it). The directive field carries the actual which-option/what-change information.
…iner' The commenter is the authorized maintainer, so deferring the close 'to a maintainer' is circular. Address them directly: the bot never auto-closes, close it when ready.
The reporter-reply classifier scraped its result out of `flue run`'s stdout, but flue interleaves build-log lines and pretty-prints the returned value -- both defeat the line-by-line and slurp parses, so every classification silently defaulted to `unclear`. Reporters confirming "yes, fixed" got re-asked forever and no PR ever opened from the AI path (only the deterministic @emdashbot confirm path worked). Seen live on #1242, #1250. investigate.ts already solved this by writing its result to a file (INVESTIGATE_RESULT_PATH) and reading that; the classifiers never did. Add a shared persistClassifierResult() helper that writes to CLASSIFY_RESULT_PATH, and read that file in both reporter-reply and maintainer-reply instead of scraping stdout. The new classify-maintainer-reply copied the same fragile scrape, so this fixes it before it ever shipped.
| env: | ||
| GH_TOKEN: ${{ steps.app-token.outputs.token }} | ||
| ISSUE_NUMBER: ${{ steps.ctx.outputs.number }} | ||
| DIRECTED: ${{ steps.ctx.outputs.directed }} |
| // Triggered by .github/workflows/maintainer-reply.yml when a maintainer | ||
| // (OWNER/MEMBER/COLLABORATOR) addresses `@emdashbot` on an issue carrying | ||
| // a `triage/*` label. The workflow YAML reads the intent from this run's | ||
| // output and decides whether to dispatch a directed investigate run, flag the | ||
| // issue as by-design, disengage, or ask for clarification. |
There was a problem hiding this comment.
This is a re-review of PR #1306.
Approach
The feature is architecturally sound. It fills a real gap (maintainers directing fixes on deferred reproduced/by-design issues), routes produced fixes through the existing awaiting-reporter / reporter-reply.yml loop rather than duplicating PR-opening code, gates maintainer-reply.yml and reporter-reply.yml on disjoint label states to prevent double-fire, and reuses the proven file-handoff pattern from investigate.ts for classifier results. The security discipline (permission-API auth rather than spoofable author_association, --rawfile for attacker-shaped text, intent whitelisting before step outputs) mirrors the existing workflows correctly.
Prior finding status
My previous finding about the empty botContext fallback is still present and unfixed in the current code.
New findings
I found one additional logic bug during this pass: a directed investigate run whose fix agent abandons can be misrouted to the intended-behavior outcome branch in the orchestrator, incorrectly labeling the issue by-design even though a maintainer explicitly overrode that gate.
What’s clean
- The shared
persistClassifierResulthelper and the adoption ofCLASSIFY_RESULT_PATHin both reply workflows correctly fix the stdout-scraping bug that was stranding reporter replies in production. - Auth, shell-safety, dispatch wiring, and concurrency controls are all correct.
- No changeset is required (
.flue/.githubare not published packages). - No AGENTS.md product-code conventions are violated.
Verdict
comment — two logic bugs need fixing, but the approach is right and the architecture is sound.
- botContext: empty string (no bot comments) now falls back to the prompt cue via a truthiness check; `??` only caught null/undefined. - directed run with an abandoned fix + intended-behavior verdict no longer misroutes to the by-design branch; the parse step gates intended-behavior on not-directed so it falls through to the directed-aware reproduced wording. - whitespace-only directive (possible via manual workflow_dispatch) normalizes to empty so `directed` and the payload reflect only a real instruction. - close handler block-quotes every line of multi-line reasoning, not just the first. - correct the classify-maintainer-reply header: permission-API admin/write/ triage on reproduced/by-design, not OWNER/MEMBER/COLLABORATOR on any triage/*.
|
Thanks — all findings addressed in b4bcbaf. Logic bugs (both reviewers):
Copilot's correctness/doc finds:
|
| set -o pipefail | ||
| PAYLOAD="$(cat /tmp/classify-payload.json)" | ||
| rm -f /tmp/classify-result.json |
| # A clean run writes a single JSON object to the result file. A | ||
| # non-zero exit, a missing file, or a non-object means the run did | ||
| # not finish -- default to unclear (which re-asks, never acts). | ||
| if [[ $EXIT -ne 0 ]] || [[ ! -s /tmp/classify-result.json ]] || ! jq -e 'type == "object"' /tmp/classify-result.json >/dev/null 2>&1; then | ||
| echo "::warning::classifier exit=${EXIT} or no result file; defaulting to unclear" |
| set -o pipefail | ||
| PAYLOAD="$(cat /tmp/classify-payload.json)" | ||
| rm -f /tmp/classify-result.json |
| # A clean run writes a single JSON object to the result file. A | ||
| # non-zero exit, a missing file, or a non-object means the run did | ||
| # not finish -- default to unclear (which re-asks, never acts). | ||
| if [[ $EXIT -ne 0 ]] || [[ ! -s /tmp/classify-result.json ]] || ! jq -e 'type == "object"' /tmp/classify-result.json >/dev/null 2>&1; then | ||
| echo "::warning::classifier exit=${EXIT} or no result file; defaulting to unclear" | ||
| tail -n 50 /tmp/classify-stderr.log || true | ||
| echo "intent=unclear" >> "$GITHUB_OUTPUT" | ||
| exit 0 | ||
| fi | ||
| # Whitelist the intent -- the handler gate must be a known enum or we | ||
| # treat it as unclear. Defends against an unexpected model value. | ||
| INTENT_RAW="$(jq -r '.intent // "unclear"' /tmp/classify-result.json | tr -d '\r\n')" | ||
| case "$INTENT_RAW" in | ||
| implement|close|takeover|unclear) INTENT="$INTENT_RAW" ;; | ||
| *) INTENT="unclear" ;; | ||
| esac |
| jq -r '.directive // ""' /tmp/classify-result.json > /tmp/directive.txt | ||
| jq -r '.reasoning // ""' /tmp/classify-result.json > /tmp/classify-reasoning.txt |
There was a problem hiding this comment.
Approach
This PR is architecturally sound. It fills a real gap — maintainers previously had no way to direct the investigation bot on deferred triage/reproduced or triage/by-design issues — and it routes produced fixes through the existing awaiting-reporter / reporter-reply.yml loop rather than duplicating PR-opening code. The two workflows gate on disjoint label states (reproduced/by-design vs awaiting-reporter), so they never double-fire. The security discipline mirrors the existing workflows correctly: permission-API auth rather than spoofable author_association, --rawfile for attacker-shaped text, and intent whitelening before step outputs.
Prior findings
-
Directed run misrouted to
intended-behavioron fix abandonment — FIXED. The newDIRECTEDboolean derived from directive presence is now checked in bothinvestigate.ts(skipping the earlyintended-behaviorreturn whendirected) and ininvestigate.ymlparse logic ([[ "$VERDICT" == "intended-behavior" && "$DIRECTED" != "true" ]]). A directed run whose fix agent abandons now correctly falls through to thereproducedoutcome branch with directed-aware wording. -
Empty
botContextfallback — still present but is an edge-case suggestion rather than a blocking bug. It only fires when there are no bot comments on an issue that is already intriage/reproducedortriage/by-design(an anomalous state, since those labels are bot-set). The fallback text assumes the bot reproduced the issue, which is misleading fortriage/by-designwhere the bot concluded intended behavior. I recommend neutral wording instead.
What’s clean
- The shared
persistClassifierResult()helper and the adoption ofCLASSIFY_RESULT_PATHin both reply workflows correctly fix the live stdout-scraping bug that was stranding reporter replies in production. - Auth, shell-safety, dispatch wiring (
repository_dispatchwith JSON-escaped--rawfile), concurrency serialization per issue, and capability-gate bail wording are all correct. - No AGENTS.md product-code conventions are violated (this PR only touches
.flue/.github). - No changeset is required — neither
.fluenor.githubis a published package.
Verdict
comment — one minor suggestion on fallback wording; the approach is right and the prior logic bug is fixed.
| // Truthiness, not `??`: the orchestrator passes "" (not undefined) when | ||
| // there are no bot comments, and an empty section loses the model's cue. | ||
| payload.botContext?.trim() || | ||
| "(unavailable; assume the bot reproduced the issue and either proposed options or pushed a candidate fix)", |
There was a problem hiding this comment.
[suggestion] The fallback text is misleading for triage/by-design issues. This workflow fires on both triage/reproduced and triage/by-design, but the fallback says to "assume the bot reproduced the issue and either proposed options or pushed a candidate fix." On a by-design issue the bot concluded the behavior is intended, not that it reproduced a bug or proposed a fix. The fallback only triggers when botContext is empty (an edge case, since those labels are bot-set), but when it does fire, it gives the model the wrong context for by-design.
| "(unavailable; assume the bot reproduced the issue and either proposed options or pushed a candidate fix)", | |
| payload.botContext?.trim() || | |
| "(unavailable; assume the bot has already investigated this issue)", |
The fallback fires for triage/by-design too, where the bot concluded intended behavior rather than reproducing a bug. Drop the reproduced/proposed-fix assumption for neutral wording.
The classify steps set CLASSIFY_RESULT_PATH but then hard-coded the literal path in every read. Derive RESULT_PATH from the env var (failing loud if unset) so the path has one source of truth.
What
Adds a freeform maintainer-directive path for the
triage/reproducedandtriage/by-designstates — the gap where the investigation bot has reproduced a bug but deferred the fix (e.g. diagnose returnedneeds-design-decisionwith options, as in #1281) and there was previously no way for a maintainer to respond to the bot's comment.How it works
A maintainer comments
@emdashbot go with option A+B(or any freeform instruction). Then:The produced fix routes into the existing
awaiting-reporterloop, reusing reporter-reply's confirm/reject verbatim — no new PR-opening code.This complements
reporter-reply.yml(which ownsawaiting-reporter); the two gate on disjoint label states and never both fire on one comment.Files
maintainer-reply.yml(new) —issue_commenttrigger; gates onreproduced/by-design+ non-bot + an@emdashbotwake word at line start; permission-API auth (not the spoof-proneauthor_association); freeform classify →implement(dispatch) /close(relabelby-design, no auto-close) /takeover(disengage) /unclear(clarify).classify-maintainer-reply.ts(new) — cheap kimi classifier →{proceed|steer|close|takeover|unclear}+ extracted directive.classifier.ts— newmaintainerIntentSchema.investigate.ts—maintainerDirectivepayload; a directive overrides the bot's judgment gates (is-it-a-bug at stage 0, is-it-intended at verify) and the fix gate, but not the capability gates (can't reproduce → can't verify a fix → bails honestly).investigate.yml—directiveinput +maintainer-directivedispatch type, threaded into the agent payload via--rawfile; directed-aware wording on the skipped / not-reproduced / reproduced-no-fix comments.Security
Mirrors the existing reporter-reply/investigate discipline: attacker-shaped text (directive, classifier reasoning, bot-context, issue title) goes to
/tmpvia--rawfile, never inlined into shell or an unescaped$GITHUB_OUTPUTheredoc; intent is enum-whitelisted before it reaches a step output; auth is an authoritative permission-API lookup; the directive travels JSON-escaped throughrepository_dispatch(the App token hascontents:writebut notactions:write); branch names derive from the validated integer issue number. Auth matches reporter-reply (admin/write/triage).Review
Ran an adversarial review pass over the diff before opening. Cleared injection, double-fire, the
--slurp | jqCI footgun (#1291), and the dispatch wiring. Fixed one finding — a directive was being silently dropped at the stage-0/verify judgment gates with a misleading "declined to reproduce" comment on the by-design path; it now proceeds to a fix. Capability-gate bails got directed-aware wording.Testing notes
The triage bot runs only in CI (GitHub Actions + Flue agent against the live repo), so this can't be exercised locally. Validated: Flue typecheck + build (3 workflows discovered), all three workflow YAMLs parse, oxfmt clean. No changeset — neither
.fluenor.githubis a published package.Try this PR
Open a fresh playground →
A full working EmDash site, deployed from this branch. Each visit gets its own session-scoped sandbox: no login needed and no shared state. Try the admin, edit content, hit the public site.
Tracks
feat/maintainer-reply. Updated automatically when the playground redeploys.Also fixes: AI classifier silently stuck (live bug on #1242, #1250)
While testing, found the existing
reporter-replyAI classifier was broken in production — reporters confirming "yes, fixed" got re-asked forever and no PR ever opened from the classifier path (only the deterministic@emdashbot confirmpath worked, which is how recent issues actually reachedtriage/verified).Root cause: the classifier scraped its result out of
flue run's stdout, but flue interleaves build-log lines and pretty-prints the JSON result — defeating both the line-by-line and slurp parses, so every classification silently defaulted tounclear.investigate.tshad already hit this and moved to a file handoff (INVESTIGATE_RESULT_PATH) with a comment calling stdout-scraping fragile; the classifiers never got that treatment — and the newclassify-maintainer-replyhad copied the same fragile scrape, so this feature would have been broken on arrival.Fix: a shared
persistClassifierResult()helper writes the result toCLASSIFY_RESULT_PATH; bothreporter-replyandmaintainer-replynow read that file instead of scraping stdout. Mirrors the proven investigate handoff exactly.After merge, the stuck issues (#1242, #1250) can be unstuck by the reporter re-commenting or a maintainer
@emdashbot confirm.Note: the classifier path can't be exercised locally (needs the CF AI Gateway + Actions); verification is by mirroring investigate's working pattern, and the next reporter reply on any awaiting-reporter issue will confirm it end-to-end.