Skip to content

enhance(daily): structured-section prompt with omit-empty rule + repo-aware queries (#423 §A)#458

Open
eanzhao wants to merge 4 commits intodevfrom
feat/2026-04-27_daily-richer-content-prompt
Open

enhance(daily): structured-section prompt with omit-empty rule + repo-aware queries (#423 §A)#458
eanzhao wants to merge 4 commits intodevfrom
feat/2026-04-27_daily-richer-content-prompt

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented Apr 27, 2026

Summary

Implements §A (content depth) of #423. The current /daily prompt is a single
paragraph that asks for "3-6 concise bullet points" — produces thin reports and
pads when sources are silent. This PR rewrites TryBuildDailyReportSpec to treat
the system prompt as a fetch-and-summarize specification, not a freeform brief:

  • Explicit section order with per-section line budgets: Title (1) / Shipped (≤6) /
    In flight (≤6) / Reviews (≤4) / Issues (≤4) / CI (≤3) / Trend (1, optional) /
    Blockers (1).
  • "Omit the section entirely if empty" rule (header AND body) — the regression
    we're guarding against is the LLM padding silent sources with no activity in this area filler. Empty-day fallback returns only Title + No measurable activity in the last 24h.
  • Repo-aware query suggestions: when repositories= is provided, the prompt
    steers the LLM to per-repo /repos/{owner}/{repo}/... calls and explicitly refuses
    the /search/* path (which doesn't filter to a repo allowlist cleanly). When the
    allowlist is empty, the suggestion list expands from 3 search queries to 6 (split
    PRs / reviews / issues opened / issues closed / issues commented).
  • Honest empty-source handling: 4xx/5xx/empty → treat as zero, do not retry, do
    not invent fallback data.

Two new tests in AgentBuilderToolTests pin the structured-section schema and the
omit-empty rule so future copy edits don't silently regress the spec.

Out of scope (follow-ups for #423)

This PR addresses §A and the corresponding acceptance test. The remaining items
stay open under #423:

  • §B Option 2 (streaming-edit delivery) — preferred per the issue. Needs a new
    SkillRunnerStreamingReplySink plus Lark PATCH /open-apis/im/v1/messages/{id}
    integration without a reply token. Larger surface; better as its own PR.
  • §B Option 1 (batched delivery) — fallback shape; only worth landing if
    Option 2 hits a blocker.
  • §C failure-notification cross-tenant fallbackTrySendFailureAsync going
    through the same s/api-lark-bot proxy that just rejected. Tracked in enhance(daily): richer report content + progressive delivery (streaming-edit or batched) #423 §C and
    scorecard row C2; requires capturing a channel-bot fallback target at agent-create
    time.
  • Existing agents with frozen old prompt in actor stateSkillRunnerInitialized
    freezes SkillContent at create time, so this PR only takes effect for newly
    created agents. The structural fix (versioned prompt templates) is tracked as refactor(daily-prompt): versioned daily report prompt templates (prerequisite for #423 enrichment) #450.
    Migration path for users today: /agents → Delete → /daily.

The smoke test in the issue's acceptance ("commits authored empty but reviews
non-empty → Reviews renders, Shipped omitted") is a manual run-against-real-GitHub
check; the automated pinning test guards the prompt itself, not the LLM's adherence.

Test plan

  • dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj — 475 passed (includes 2 new + 30 existing daily-report tests).
  • bash tools/ci/architecture_guards.sh — all guards pass.
  • Manual smoke: create a fresh /daily agent, trigger via /run-agent, verify
    the output uses the new section structure and that empty sections are omitted.

Refs #423.

…po-aware queries

The single-paragraph "3-6 bullets" daily prompt produced thin reports and padded
when sources were silent. Treat the prompt as a fetch-and-summarize spec:
explicit section order (Title / Shipped / In flight / Reviews / Issues / CI /
Trend / Blockers), hard per-section line budgets, and an "omit the section
entirely if empty" rule. Switch the suggested GitHub queries to per-repo
endpoints when `repositories=` is provided so the LLM doesn't fold the
allowlist back into a global /search/* (which doesn't filter by repo cleanly).

Pin the new schema and the empty-section rule with two AgentBuilderToolTests
checks so future copy edits don't silently regress the spec.

Refs #423 §A. Streaming-edit delivery (§B Option 2), batched delivery (§B
Option 1), and the failure-notification cross-tenant fallback (§C) remain
follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8113c5dfd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +157 to +158
.AppendLine("- GET /repos/{owner}/{repo}/pulls?state=closed&per_page=20 // filter merged_at >= {iso_date} client-side for shipped")
.AppendLine("- GET /repos/{owner}/{repo}/pulls?state=open&per_page=20 // filter author = {username} client-side for in flight")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Filter shipped PRs by the target user in repo mode

This per-repo shipped query currently pulls all closed PRs and only asks for a merged_at cutoff, which means active repos will include other contributors’ merges in a user-specific daily report. In repositories= mode this can systematically misattribute shipped work to the configured user. Please also filter by author (for example user.login == {username} client-side, or a query form that includes author:{username}) before summarizing the Shipped section.

Useful? React with 👍 / 👎.

Comment on lines +146 to +150
.AppendLine("- GET /search/issues?q=author:{username}+is:pr+is:merged+merged:>={iso_date} // shipped PRs")
.AppendLine("- GET /search/issues?q=author:{username}+is:pr+is:open // in flight")
.AppendLine("- GET /search/issues?q=reviewed-by:{username}+updated:>={iso_date} // reviews")
.AppendLine("- GET /search/issues?q=author:{username}+is:issue+created:>={iso_date} // issues opened")
.AppendLine("- GET /search/issues?q=author:{username}+is:issue+is:closed+closed:>={iso_date} // issues closed")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Provide a commit data source for the Shipped section

The new schema says Shipped includes “commits to the default branch,” but the suggested global calls now only cover PR/issues searches and omit any commit endpoint. When a user ships direct commits without merged PRs, the report has no instructed source for that activity and will undercount output. Add an explicit commit query/source (and the repo-scoped equivalent) or remove commit claims from the section contract.

Useful? React with 👍 / 👎.

.AppendLine("- GET /repos/{owner}/{repo}/pulls?state=closed&per_page=20 // filter merged_at >= {iso_date} client-side for shipped")
.AppendLine("- GET /repos/{owner}/{repo}/pulls?state=open&per_page=20 // filter author = {username} client-side for in flight")
.AppendLine("- GET /search/issues?q=reviewed-by:{username}+repo:{owner}/{repo}+updated:>={iso_date}")
.AppendLine("- GET /search/issues?q=author:{username}+repo:{owner}/{repo}+is:issue+updated:>={iso_date}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include commenter activity in repo-scoped issue guidance

In repo-allowlist mode, the Issues guidance only queries author:{username} updates, but the section definition requires issues “opened, closed, or commented on” by the user. This drops the common case where the user comments on issues they did not author, so repo-scoped reports can miss meaningful issue activity. Add a per-repo commenter:{username} query (and keep opened/closed splits as needed) to align with the stated section behavior.

Useful? React with 👍 / 👎.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.90%. Comparing base (21f6112) to head (e408253).
⚠️ Report is 10 commits behind head on dev.

@@            Coverage Diff             @@
##              dev     #458      +/-   ##
==========================================
+ Coverage   70.59%   70.90%   +0.30%     
==========================================
  Files        1208     1209       +1     
  Lines       86806    87045     +239     
  Branches    11369    11411      +42     
==========================================
+ Hits        61284    61720     +436     
+ Misses      21094    20877     -217     
- Partials     4428     4448      +20     
Flag Coverage Δ
ci 70.90% <ø> (+0.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 27, 2026

Reviewing §A changes. A few issues:


1. Test gap: Section 7 (Trend) not asserted despite claiming full coverage

AgentBuilderToolTests.cs — the pinning test says "Structural sections must all be named in order" but only asserts sections 1–6 and 8, skipping section 7 entirely:

https://github.com/Aevatar-AI/aevatar/blob/feat/2026-04-27_daily-richer-content-prompt/test/Aevatar.GAgents.ChannelRuntime.Tests/AgentBuilderToolTests.cs#L75-L85

If the omission is intentional (Trend is optional), the comment should say so. Otherwise, add:

skillContent.Should().Contain("7. Trend");

2. No CI query guidance when repo allowlist is empty

The output schema defines §6 CI but the no-repo query suggestion list has zero GitHub Actions / check-run endpoints:

https://github.com/Aevatar-AI/aevatar/blob/feat/2026-04-27_daily-richer-content-prompt/agents/Aevatar.GAgents.ChannelRuntime/AgentBuilderTemplates.cs#L142-L149

The per-repo branch correctly includes /repos/{owner}/{repo}/actions/runs, but the global-search branch has nothing equivalent. The LLM has no guidance on how to populate the CI section in the no-repo case, so it will either skip it or hallucinate. Either add a search endpoint for CI (e.g. /search/commits?q=...+status:failure as a rough proxy) or explicitly note that CI is only available when repos are pinned.


3. {default_branch} substitution variable undocumented

In the per-repo query list, the CI query uses {default_branch} but it's never explained alongside {iso_date} and {username}:

https://github.com/Aevatar-AI/aevatar/blob/feat/2026-04-27_daily-richer-content-prompt/agents/Aevatar.GAgents.ChannelRuntime/AgentBuilderTemplates.cs#L154

The LLM may leave the literal {default_branch} in the URL or guess main/master. Consider either documenting the substitution (e.g. "replace {default_branch} with the repo's default branch — usually main or master") or switching to the unfiltered /actions/runs?per_page=10 and filtering client-side.


Otherwise the extraction into BuildDailyReportSkillPrompt and the omit-empty semantics look solid.

Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one additional issue while reviewing PR 458.

{
prompt
.AppendLine("Repository allowlist provided — run these per-repo (replace `{owner}/{repo}` with each entry, substitute `{iso_date}` with the start of the 24h window in ISO 8601 UTC). Do NOT collapse into one global query.")
.AppendLine("- GET /repos/{owner}/{repo}/pulls?state=closed&per_page=20 // filter merged_at >= {iso_date} client-side for shipped")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: This endpoint is not a reliable 24h merge-time source. /repos/{owner}/{repo}/pulls?state=closed&per_page=20 returns only one page of closed PRs, and the list API is not keyed by merged_at; filtering that small page client-side can miss PRs merged in the last 24h when they were created earlier or fall past the first page in active repos. Since this feeds the Shipped section, repo-scoped reports can undercount real shipped work. Prefer a query source keyed by merge time, e.g. repo:{owner}/{repo} author:{username} is:pr is:merged merged:>={iso_date}, or explicitly paginate/sort enough to make the client-side merged_at filter complete.

@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 27, 2026

Additional review — supplementary to the three issues already raised:


4. (always last) for §8 Blockers conflicts with empty-day fallback

Line 134 in AgentBuilderTemplates.cs:

.AppendLine("8. Blockers (1 line, always last) — ...")

But two lines later (136) the empty-day rule says:

return ONLY the title line followed by `No measurable activity in the last 24h.` and nothing else.

On an empty day, the LLM is told both (a) "Blockers is always last" and (b) "return nothing else". A weaker model could synthesise both and emit Blockers anyway. Consider qualifying Blockers as: (always last, unless the entire report is empty-day).


5. Per-repo ?state=closed shipped query pulls closed-but-not-merged PRs

Line 157:

.AppendLine("- GET /repos/{owner}/{repo}/pulls?state=closed&per_page=20               // filter merged_at >= {iso_date} client-side for shipped")

?state=closed returns PRs closed without merge (e.g. rejected). The instruction says "filter merged_at" but does not call out that a null merged_at means "not shipped". Without explicit guidance, the LLM may count closed-unmerged PRs in the Shipped section. Suggestion: add a note like "exclude PRs where merged_at is null" or switch to GET /repos/{owner}/{repo}/pulls?state=closed&sort=updated&direction=desc with "only count PRs whose merged_at falls in the window".


6. {username} substitution not documented alongside {iso_date} / {owner}/{repo}

In the no-repo query suggestions (lines 146–151), {username} appears in every URL but the substitution instruction (line 145) only documents {iso_date}. Given that issue #3 flags {default_branch} as undocumented, {username} has the same problem. Add a brief mention, e.g.:

(substitute `{username}` with {normalizedUser}, `{iso_date}` with the start of the 24h window in ISO 8601 UTC)

Otherwise the extraction into BuildDailyReportSkillPrompt and the omit-empty / section-budget structure are clean.

- §6 CI section: explicit OMIT in no-repo mode + the no-repo branch now states the
  CI gap inline; per-repo `/actions/runs` drops the `{default_branch}` placeholder
  and filters `conclusion=failure` + `created_at` client-side.
- §8 Blockers: qualified as `always last unless the report is empty-day` and the
  empty-day fallback now explicitly says `do not append a Blockers line`. Resolves
  the contradiction a weaker model could synthesise both sides of.
- Per-repo Shipped PRs: switch from `/repos/{owner}/{repo}/pulls?state=closed` to
  `/search/issues?q=repo:...+author:...+is:pr+is:merged+merged:>=...`. The list
  endpoint returned all contributors' closed-but-unmerged PRs (codex P1) and was
  unreliable on active repos with one-page pagination (eanzhao inline). Search
  form keys on merge time + author server-side.
- §2 Shipped: add `/search/commits` queries in both modes so the "PRs merged AND
  commits authored" claim has a real source (codex P2).
- Repo-mode Issues: add a `commenter:{username}` query so the "opened, closed, or
  commented on" schema isn't dropped by author-only filtering (codex P2).
- Substitution variables: document `{username}`, `{iso_date}`, and `{owner}/{repo}`
  with concrete examples + a "do not leave literal placeholders" footer. The LLM
  was previously expected to infer the username substitution from "Primary GitHub
  username:" alone.
- Tests: pin section 7 (Trend) slot in the order-pinning assertion (the schema
  position is load-bearing even when the section is optional). Pin the new commit
  query, the per-repo commenter query, the empty-day Blockers carve-out, the
  substitution-variable docs, and assert the dropped `/repos/{owner}/{repo}/pulls`
  + `{default_branch}` shapes are gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 27, 2026

All six issues from the previous review round are addressed in this update. Verified:

  1. Section 7 (Trend) test gap — now asserted, comment explains the rationale
  2. No-repo CI guidance — section 6 explicitly states "OMIT this section in no-repo mode", no-repo query list includes explicit skip comment
  3. {default_branch} removed — CI query now uses unfiltered /actions/runs?per_page=10 with client-side filtering; test asserts NotContain("{default_branch}")
  4. Blockers vs empty-day conflict — §8 now says "always last unless the report is empty-day"; empty-day rule says "do not append a Blockers line in this case"; test pins both phrases
  5. Closed-but-not-merged PRs — replaced /pulls?state=closed with search-API is:pr+is:merged+merged:>=; test asserts new query and NotContain on old one
  6. {username} substitution — new substitution-variable line documents {username}, {iso_date}, and {owner}/{repo}; test pins all three

Bonus fixes caught in this revision are also solid: shipped-commits query added for both modes, repo-mode issues now include the commenter case, final instruction blocks literal placeholder leakage.

LGTM — no remaining issues from my side.

Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous fix set looks mostly covered. I found one remaining issue around error handling semantics.


prompt
.AppendLine()
.AppendLine("If a query returns 4xx, 5xx, or empty, treat that source as zero for the affected section and continue — do not retry, do not fall back to invented data. Do not leave any literal `{username}` / `{iso_date}` / `{owner}/{repo}` placeholders in outbound URLs.");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Do not collapse transport/auth failures into zero activity. nyxid_proxy currently returns error-shaped JSON for upstream failures ({"error": true, "status": 401/5xx, ...}), and #439 is specifically about the runner/LLM masking those as a successful "no activity" daily. This instruction preserves that failure mode by telling the model to treat 4xx/5xx as zero and continue, so a revoked GitHub grant or proxy outage can still produce No measurable activity with the agent left healthy. Please distinguish successful empty responses from tool/proxy failures: 2xx empty can be zero; 4xx/5xx/error-shaped tool results should be surfaced as a source failure (or at least not allowed to trigger the empty-day fallback).

…back

eanzhao P1: the previous "4xx/5xx/empty → treat as zero" instruction silently
masked transport/auth failures as healthy empty-day reports. A revoked GitHub
OAuth grant or proxy outage would produce `No measurable activity in the last
24h.` with the agent left in a Running state — which is exactly the bug
referenced in #439.

Replace the catch-all with a tiered classification:
- 2xx with an empty list → genuine zero data; section is omitted as before.
- 4xx / 5xx / tool error envelope (`{"error": true, ...}`, revoked grant, proxy
  timeout) → source UNAVAILABLE, append `Source health: ...` footer below
  Blockers naming the failed sources.
- Empty-day fallback (`No measurable activity in the last 24h.`) is now ONLY
  valid when EVERY source returned 2xx. If ANY source failed, emit at least
  the title plus the `Source health:` footer.

Pin this distinction with three new test assertions plus a `NotContain` on the
old wording, so future copy edits can't quietly regress the failure-handling
behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source-health fix addresses the P1 behavior, but it leaves one prompt-order conflict.

.AppendLine("5. Issues (≤4 lines) — issues opened, closed, or commented on by the user.")
.AppendLine("6. CI (≤3 lines) — failing GitHub Actions runs on the tracked repos. Best-effort and only feasible in repo-allowlist mode; OMIT this section in no-repo mode (the global search endpoints do not expose Actions run conclusions).")
.AppendLine("7. Trend (1 line, optional) — running totals vs the prior 24h, e.g. `Trend: shipped 3 (+1), reviews 5 (-2)`. Omit when the prior-window data could not be fetched.")
.AppendLine("8. Blockers (1 line, always last unless the report is empty-day) — `Blockers: <short list>` or `No blockers.` Auto-detect from: PRs >24h waiting on a review, CI red >2h, issues with labels `blocked` or `needs-info`.")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: This still says Blockers is always last except for empty-day reports, but the new source-health rule later requires Source health: at the very bottom after Blockers when any source fails. That gives the model two incompatible ordering rules in the exact case we care about for #439, and a weaker run can either drop the source-health footer or place it before Blockers. Please qualify this as something like always last among normal report sections unless empty-day or Source health is emitted, or make Source health an explicit final section/footer in the section order and pin that in the test.

eanzhao P2: §8 Blockers said "always last unless empty-day" but the source-health
rule required `Source health:` "at the very bottom after Blockers" — two
incompatible ordering rules in exactly the failure case #439 cares about. A
weaker model could drop the Source health footer or place it before Blockers.

Make Source health an explicit §9 footer in the schema list:
- §8 Blockers: drop the "always last" qualifier; pin as `Position-locked at slot
  8; the only section that may sit below it is the §9 Source health footer.`
- §9 Source health: 1-line conditional footer, emitted only when at least one
  source returned non-2xx. When emitted, always the final line.
- Empty-day fallback: explicitly forbid emitting §8 Blockers AND §9 Source health
  on a genuine empty day (every source 2xx with no items). If any source failed,
  the empty-day path is invalid — emit at least title + §9.
- The "Source health — distinguish empty results from source failures" detail
  block is reframed as the rule for emitting §9, not a parallel ordering claim.

Test pins: assert `9. Source health` in the order-pinning list, pin the new
"Position-locked at slot 8" + "only section that may sit below it" wording on
§8, and pin the strict empty-day forbidding both §8 and §9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant