enhance(daily): structured-section prompt with omit-empty rule + repo-aware queries (#423 §A)#458
enhance(daily): structured-section prompt with omit-empty rule + repo-aware queries (#423 §A)#458
Conversation
…po-aware queries The single-paragraph "3-6 bullets" daily prompt produced thin reports and padded when sources were silent. Treat the prompt as a fetch-and-summarize spec: explicit section order (Title / Shipped / In flight / Reviews / Issues / CI / Trend / Blockers), hard per-section line budgets, and an "omit the section entirely if empty" rule. Switch the suggested GitHub queries to per-repo endpoints when `repositories=` is provided so the LLM doesn't fold the allowlist back into a global /search/* (which doesn't filter by repo cleanly). Pin the new schema and the empty-section rule with two AgentBuilderToolTests checks so future copy edits don't silently regress the spec. Refs #423 §A. Streaming-edit delivery (§B Option 2), batched delivery (§B Option 1), and the failure-notification cross-tenant fallback (§C) remain follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f8113c5dfd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .AppendLine("- GET /repos/{owner}/{repo}/pulls?state=closed&per_page=20 // filter merged_at >= {iso_date} client-side for shipped") | ||
| .AppendLine("- GET /repos/{owner}/{repo}/pulls?state=open&per_page=20 // filter author = {username} client-side for in flight") |
There was a problem hiding this comment.
Filter shipped PRs by the target user in repo mode
This per-repo shipped query currently pulls all closed PRs and only asks for a merged_at cutoff, which means active repos will include other contributors’ merges in a user-specific daily report. In repositories= mode this can systematically misattribute shipped work to the configured user. Please also filter by author (for example user.login == {username} client-side, or a query form that includes author:{username}) before summarizing the Shipped section.
Useful? React with 👍 / 👎.
| .AppendLine("- GET /search/issues?q=author:{username}+is:pr+is:merged+merged:>={iso_date} // shipped PRs") | ||
| .AppendLine("- GET /search/issues?q=author:{username}+is:pr+is:open // in flight") | ||
| .AppendLine("- GET /search/issues?q=reviewed-by:{username}+updated:>={iso_date} // reviews") | ||
| .AppendLine("- GET /search/issues?q=author:{username}+is:issue+created:>={iso_date} // issues opened") | ||
| .AppendLine("- GET /search/issues?q=author:{username}+is:issue+is:closed+closed:>={iso_date} // issues closed") |
There was a problem hiding this comment.
Provide a commit data source for the Shipped section
The new schema says Shipped includes “commits to the default branch,” but the suggested global calls now only cover PR/issues searches and omit any commit endpoint. When a user ships direct commits without merged PRs, the report has no instructed source for that activity and will undercount output. Add an explicit commit query/source (and the repo-scoped equivalent) or remove commit claims from the section contract.
Useful? React with 👍 / 👎.
| .AppendLine("- GET /repos/{owner}/{repo}/pulls?state=closed&per_page=20 // filter merged_at >= {iso_date} client-side for shipped") | ||
| .AppendLine("- GET /repos/{owner}/{repo}/pulls?state=open&per_page=20 // filter author = {username} client-side for in flight") | ||
| .AppendLine("- GET /search/issues?q=reviewed-by:{username}+repo:{owner}/{repo}+updated:>={iso_date}") | ||
| .AppendLine("- GET /search/issues?q=author:{username}+repo:{owner}/{repo}+is:issue+updated:>={iso_date}") |
There was a problem hiding this comment.
Include commenter activity in repo-scoped issue guidance
In repo-allowlist mode, the Issues guidance only queries author:{username} updates, but the section definition requires issues “opened, closed, or commented on” by the user. This drops the common case where the user comments on issues they did not author, so repo-scoped reports can miss meaningful issue activity. Add a per-repo commenter:{username} query (and keep opened/closed splits as needed) to align with the stated section behavior.
Useful? React with 👍 / 👎.
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## dev #458 +/- ##
==========================================
+ Coverage 70.59% 70.90% +0.30%
==========================================
Files 1208 1209 +1
Lines 86806 87045 +239
Branches 11369 11411 +42
==========================================
+ Hits 61284 61720 +436
+ Misses 21094 20877 -217
- Partials 4428 4448 +20
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
Reviewing §A changes. A few issues: 1. Test gap: Section 7 (Trend) not asserted despite claiming full coverage
If the omission is intentional (Trend is optional), the comment should say so. Otherwise, add: skillContent.Should().Contain("7. Trend");2. No CI query guidance when repo allowlist is emptyThe output schema defines §6 CI but the no-repo query suggestion list has zero GitHub Actions / check-run endpoints: The per-repo branch correctly includes 3.
|
eanzhao
left a comment
There was a problem hiding this comment.
Found one additional issue while reviewing PR 458.
| { | ||
| prompt | ||
| .AppendLine("Repository allowlist provided — run these per-repo (replace `{owner}/{repo}` with each entry, substitute `{iso_date}` with the start of the 24h window in ISO 8601 UTC). Do NOT collapse into one global query.") | ||
| .AppendLine("- GET /repos/{owner}/{repo}/pulls?state=closed&per_page=20 // filter merged_at >= {iso_date} client-side for shipped") |
There was a problem hiding this comment.
P2: This endpoint is not a reliable 24h merge-time source. /repos/{owner}/{repo}/pulls?state=closed&per_page=20 returns only one page of closed PRs, and the list API is not keyed by merged_at; filtering that small page client-side can miss PRs merged in the last 24h when they were created earlier or fall past the first page in active repos. Since this feeds the Shipped section, repo-scoped reports can undercount real shipped work. Prefer a query source keyed by merge time, e.g. repo:{owner}/{repo} author:{username} is:pr is:merged merged:>={iso_date}, or explicitly paginate/sort enough to make the client-side merged_at filter complete.
|
Additional review — supplementary to the three issues already raised: 4.
|
- §6 CI section: explicit OMIT in no-repo mode + the no-repo branch now states the
CI gap inline; per-repo `/actions/runs` drops the `{default_branch}` placeholder
and filters `conclusion=failure` + `created_at` client-side.
- §8 Blockers: qualified as `always last unless the report is empty-day` and the
empty-day fallback now explicitly says `do not append a Blockers line`. Resolves
the contradiction a weaker model could synthesise both sides of.
- Per-repo Shipped PRs: switch from `/repos/{owner}/{repo}/pulls?state=closed` to
`/search/issues?q=repo:...+author:...+is:pr+is:merged+merged:>=...`. The list
endpoint returned all contributors' closed-but-unmerged PRs (codex P1) and was
unreliable on active repos with one-page pagination (eanzhao inline). Search
form keys on merge time + author server-side.
- §2 Shipped: add `/search/commits` queries in both modes so the "PRs merged AND
commits authored" claim has a real source (codex P2).
- Repo-mode Issues: add a `commenter:{username}` query so the "opened, closed, or
commented on" schema isn't dropped by author-only filtering (codex P2).
- Substitution variables: document `{username}`, `{iso_date}`, and `{owner}/{repo}`
with concrete examples + a "do not leave literal placeholders" footer. The LLM
was previously expected to infer the username substitution from "Primary GitHub
username:" alone.
- Tests: pin section 7 (Trend) slot in the order-pinning assertion (the schema
position is load-bearing even when the section is optional). Pin the new commit
query, the per-repo commenter query, the empty-day Blockers carve-out, the
substitution-variable docs, and assert the dropped `/repos/{owner}/{repo}/pulls`
+ `{default_branch}` shapes are gone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
All six issues from the previous review round are addressed in this update. Verified:
Bonus fixes caught in this revision are also solid: shipped-commits query added for both modes, repo-mode issues now include the commenter case, final instruction blocks literal placeholder leakage. LGTM — no remaining issues from my side. |
eanzhao
left a comment
There was a problem hiding this comment.
The previous fix set looks mostly covered. I found one remaining issue around error handling semantics.
|
|
||
| prompt | ||
| .AppendLine() | ||
| .AppendLine("If a query returns 4xx, 5xx, or empty, treat that source as zero for the affected section and continue — do not retry, do not fall back to invented data. Do not leave any literal `{username}` / `{iso_date}` / `{owner}/{repo}` placeholders in outbound URLs."); |
There was a problem hiding this comment.
P1: Do not collapse transport/auth failures into zero activity. nyxid_proxy currently returns error-shaped JSON for upstream failures ({"error": true, "status": 401/5xx, ...}), and #439 is specifically about the runner/LLM masking those as a successful "no activity" daily. This instruction preserves that failure mode by telling the model to treat 4xx/5xx as zero and continue, so a revoked GitHub grant or proxy outage can still produce No measurable activity with the agent left healthy. Please distinguish successful empty responses from tool/proxy failures: 2xx empty can be zero; 4xx/5xx/error-shaped tool results should be surfaced as a source failure (or at least not allowed to trigger the empty-day fallback).
…back eanzhao P1: the previous "4xx/5xx/empty → treat as zero" instruction silently masked transport/auth failures as healthy empty-day reports. A revoked GitHub OAuth grant or proxy outage would produce `No measurable activity in the last 24h.` with the agent left in a Running state — which is exactly the bug referenced in #439. Replace the catch-all with a tiered classification: - 2xx with an empty list → genuine zero data; section is omitted as before. - 4xx / 5xx / tool error envelope (`{"error": true, ...}`, revoked grant, proxy timeout) → source UNAVAILABLE, append `Source health: ...` footer below Blockers naming the failed sources. - Empty-day fallback (`No measurable activity in the last 24h.`) is now ONLY valid when EVERY source returned 2xx. If ANY source failed, emit at least the title plus the `Source health:` footer. Pin this distinction with three new test assertions plus a `NotContain` on the old wording, so future copy edits can't quietly regress the failure-handling behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eanzhao
left a comment
There was a problem hiding this comment.
The source-health fix addresses the P1 behavior, but it leaves one prompt-order conflict.
| .AppendLine("5. Issues (≤4 lines) — issues opened, closed, or commented on by the user.") | ||
| .AppendLine("6. CI (≤3 lines) — failing GitHub Actions runs on the tracked repos. Best-effort and only feasible in repo-allowlist mode; OMIT this section in no-repo mode (the global search endpoints do not expose Actions run conclusions).") | ||
| .AppendLine("7. Trend (1 line, optional) — running totals vs the prior 24h, e.g. `Trend: shipped 3 (+1), reviews 5 (-2)`. Omit when the prior-window data could not be fetched.") | ||
| .AppendLine("8. Blockers (1 line, always last unless the report is empty-day) — `Blockers: <short list>` or `No blockers.` Auto-detect from: PRs >24h waiting on a review, CI red >2h, issues with labels `blocked` or `needs-info`.") |
There was a problem hiding this comment.
P2: This still says Blockers is always last except for empty-day reports, but the new source-health rule later requires Source health: at the very bottom after Blockers when any source fails. That gives the model two incompatible ordering rules in the exact case we care about for #439, and a weaker run can either drop the source-health footer or place it before Blockers. Please qualify this as something like always last among normal report sections unless empty-day or Source health is emitted, or make Source health an explicit final section/footer in the section order and pin that in the test.
eanzhao P2: §8 Blockers said "always last unless empty-day" but the source-health rule required `Source health:` "at the very bottom after Blockers" — two incompatible ordering rules in exactly the failure case #439 cares about. A weaker model could drop the Source health footer or place it before Blockers. Make Source health an explicit §9 footer in the schema list: - §8 Blockers: drop the "always last" qualifier; pin as `Position-locked at slot 8; the only section that may sit below it is the §9 Source health footer.` - §9 Source health: 1-line conditional footer, emitted only when at least one source returned non-2xx. When emitted, always the final line. - Empty-day fallback: explicitly forbid emitting §8 Blockers AND §9 Source health on a genuine empty day (every source 2xx with no items). If any source failed, the empty-day path is invalid — emit at least title + §9. - The "Source health — distinguish empty results from source failures" detail block is reframed as the rule for emitting §9, not a parallel ordering claim. Test pins: assert `9. Source health` in the order-pinning list, pin the new "Position-locked at slot 8" + "only section that may sit below it" wording on §8, and pin the strict empty-day forbidding both §8 and §9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Implements §A (content depth) of #423. The current
/dailyprompt is a singleparagraph that asks for "3-6 concise bullet points" — produces thin reports and
pads when sources are silent. This PR rewrites
TryBuildDailyReportSpecto treatthe system prompt as a fetch-and-summarize specification, not a freeform brief:
In flight (≤6) / Reviews (≤4) / Issues (≤4) / CI (≤3) / Trend (1, optional) /
Blockers (1).
we're guarding against is the LLM padding silent sources with
no activity in this areafiller. Empty-day fallback returns onlyTitle + No measurable activity in the last 24h.repositories=is provided, the promptsteers the LLM to per-repo
/repos/{owner}/{repo}/...calls and explicitly refusesthe
/search/*path (which doesn't filter to a repo allowlist cleanly). When theallowlist is empty, the suggestion list expands from 3 search queries to 6 (split
PRs / reviews / issues opened / issues closed / issues commented).
not invent fallback data.
Two new tests in
AgentBuilderToolTestspin the structured-section schema and theomit-empty rule so future copy edits don't silently regress the spec.
Out of scope (follow-ups for #423)
This PR addresses §A and the corresponding acceptance test. The remaining items
stay open under #423:
SkillRunnerStreamingReplySinkplus LarkPATCH /open-apis/im/v1/messages/{id}integration without a reply token. Larger surface; better as its own PR.
Option 2 hits a blocker.
TrySendFailureAsyncgoingthrough the same
s/api-lark-botproxy that just rejected. Tracked in enhance(daily): richer report content + progressive delivery (streaming-edit or batched) #423 §C andscorecard row C2; requires capturing a channel-bot fallback target at agent-create
time.
SkillRunnerInitializedfreezes
SkillContentat create time, so this PR only takes effect for newlycreated agents. The structural fix (versioned prompt templates) is tracked as refactor(daily-prompt): versioned daily report prompt templates (prerequisite for #423 enrichment) #450.
Migration path for users today:
/agents→ Delete →/daily.The smoke test in the issue's acceptance ("commits authored empty but reviews
non-empty → Reviews renders, Shipped omitted") is a manual run-against-real-GitHub
check; the automated pinning test guards the prompt itself, not the LLM's adherence.
Test plan
dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj— 475 passed (includes 2 new + 30 existing daily-report tests).bash tools/ci/architecture_guards.sh— all guards pass./dailyagent, trigger via/run-agent, verifythe output uses the new section structure and that empty sections are omitted.
Refs #423.