Symptom
After /daily eanzhao (binding my own GitHub username), the very first daily run came back with:
GitHub Daily Update — eanzhao
- No meaningful public GitHub activity found for eanzhao in the last 24 hours.
- No recent authored commits surfaced in the checked window.
- No recently updated authored issues or PRs surfaced in the checked window.
- No recent issue or PR comments surfaced in the checked window.
No blockers.
But eanzhao actually had 52 commits + 114 authored issues/PRs + 32 comments during the same 24h window. Verified by running the exact three search queries from the prompt directly against api.github.com:
| query |
result |
/search/commits?q=author:eanzhao+author-date:>=2026-04-26 |
52 hits (PR #289 merge, "Address review" series, Telegram coverage, etc.) |
/search/issues?q=author:eanzhao+updated:>=2026-04-26 |
114 hits (PRs #427/#428/#289, issue #436 itself, charon work, …) |
/search/issues?q=commenter:eanzhao+updated:>=2026-04-26 |
32 hits |
/agent-status skill-runner-de2c162c9d454cc8a4b64b190673d722 showed Status: running, error_count: 0, last_error: "" — i.e. the runner believed the execution had succeeded.
Re-running via /run-agent (Run Now button) ~15 minutes later returned the correct full report immediately, so the underlying GitHub access path was healthy. The first run was a transient infrastructure failure that surfaced as a fake-success in the agent's output, in /agent-status, and in the registry. Operator and end-user have no signal to investigate.
NyxID approval gate ruled out (no approval push received during the window).
Root cause
Three collaborating layers all default to "swallow":
1. Prompt — agents/Aevatar.GAgents.ChannelRuntime/AgentBuilderTemplates.cs:48-64
.AppendLine("Suggested GitHub proxy calls:")
.AppendLine("- GET /search/commits?q=author:{username}+author-date:>={iso_date}")
.AppendLine("- GET /search/issues?q=author:{username}+updated:>={iso_date}")
.AppendLine("- GET /search/issues?q=commenter:{username}+updated:>={iso_date}")
.AppendLine("If there is no meaningful activity, say so plainly instead of inventing progress.")
The "say so plainly" line is the only fallback the LLM has. It conflates two distinct outcomes:
- True negative — search returned 0 items (user was actually idle).
- Tool failure — proxy returned 4xx/5xx/7xxx, no items observable at all.
The LLM picks the same bullet template for both.
2. Tool — src/Aevatar.AI.ToolProviders.NyxId/Tools/NyxIdProxyTool.cs:111-120
var result = await _client.ProxyRequestAsync(effectiveToken, slug, path, method, body, headers, ct);
if (IsApprovalError(result, out var approvalCode, out var approvalRequestId))
{
_logger.LogInformation(
"[nyxid_proxy] Approval response: code={Code} requestId={RequestId}",
approvalCode, approvalRequestId);
}
return result;
When the proxy returns {"error": true, "status": 401, ...} or {"code": 7000, ...} or any other structured failure, the tool returns the JSON as-is. The LLM gets a string identical in shape to a normal response and has no schema-level way to know "this is an error, not an empty result."
3. Runner — agents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs:141-157
var output = await ExecuteSkillAsync(now, command.Reason, CancellationToken.None);
await SendOutputAsync(output, CancellationToken.None);
await PersistDomainEventAsync(new SkillRunnerExecutionCompletedEvent
{
CompletedAt = Timestamp.FromDateTimeOffset(now),
Output = output,
});
Any non-empty output ⇒ SkillRunnerExecutionCompletedEvent, which clears LastError and resets ErrorCount to 0 (SkillRunnerGAgent.cs:532-540). The LLM's bullet fallback is non-empty, so the runner records the failure as a clean success. The catalog projection downstream (/agent-status) inherits the lie.
The end-to-end consequence: a transient nyxid_proxy failure → opaque error JSON to the LLM → bullet fallback text → Completed event → invisible to operator and user.
How this slipped in
The original daily prompt was written to handle a legitimate idle case ("don't manufacture progress when the user has nothing to report"), and the wording is reasonable for that case alone. The blast radius widened when GitHub tool failures became indistinguishable from genuine emptiness. There was never a path in the prompt — or the tool, or the runner — for "the call failed, the data is unknown."
The default-to-swallow behavior also exists at the runner level (any LLM string is "success") and the tool level (any proxy response is forwarded verbatim), so the prompt fix alone won't be enough.
Suggested fix direction (not prescriptive)
A working fix probably needs at least two of three layers, or it'll slide back:
- Prompt layer: split "no activity" from "tool error" in the daily prompt. Require the LLM to call out tool failures explicitly (e.g., a final
Errors: section listing which queries failed and with what status) instead of collapsing them into the bullet template.
- Tool layer: in
nyxid_proxy, give the LLM a structured signal it can't accidentally ignore — e.g., wrap the response with { "tool_status": "error|empty|ok", "data": ..., "error_detail": "..." } for the daily-report skill, or fail the tool call with an exception (Aevatar's tool middleware can catch and surface as a tool-error message that the LLM is forced to acknowledge).
- Runner layer: in
SkillRunnerGAgent.HandleTriggerAsync, if the LLM output is structurally consistent with the "tool failure" pattern (e.g., contains explicit error markers from the tool layer), downgrade the run to SkillRunnerExecutionFailedEvent. This is the safety net even if prompt/tool drift.
Pure prompt-level fixes are fragile — LLMs can and do drop format requirements under load. A defense-in-depth fix that also touches the tool and runner layers is the right shape.
Acceptance criteria
Affected files
agents/Aevatar.GAgents.ChannelRuntime/AgentBuilderTemplates.cs:48-64 — prompt fallback
src/Aevatar.AI.ToolProviders.NyxId/Tools/NyxIdProxyTool.cs:71-121 — error pass-through
agents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs:141-157 — output → completed/failed routing
agents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs:532-540 — ApplyCompleted clears LastError/ErrorCount
Related
Symptom
After
/daily eanzhao(binding my own GitHub username), the very first daily run came back with:But
eanzhaoactually had 52 commits + 114 authored issues/PRs + 32 comments during the same 24h window. Verified by running the exact three search queries from the prompt directly againstapi.github.com:/search/commits?q=author:eanzhao+author-date:>=2026-04-26/search/issues?q=author:eanzhao+updated:>=2026-04-26/search/issues?q=commenter:eanzhao+updated:>=2026-04-26/agent-status skill-runner-de2c162c9d454cc8a4b64b190673d722showedStatus: running,error_count: 0,last_error: ""— i.e. the runner believed the execution had succeeded.Re-running via
/run-agent(Run Now button) ~15 minutes later returned the correct full report immediately, so the underlying GitHub access path was healthy. The first run was a transient infrastructure failure that surfaced as a fake-success in the agent's output, in/agent-status, and in the registry. Operator and end-user have no signal to investigate.NyxID approval gate ruled out (no approval push received during the window).
Root cause
Three collaborating layers all default to "swallow":
1. Prompt —
agents/Aevatar.GAgents.ChannelRuntime/AgentBuilderTemplates.cs:48-64The "say so plainly" line is the only fallback the LLM has. It conflates two distinct outcomes:
The LLM picks the same bullet template for both.
2. Tool —
src/Aevatar.AI.ToolProviders.NyxId/Tools/NyxIdProxyTool.cs:111-120When the proxy returns
{"error": true, "status": 401, ...}or{"code": 7000, ...}or any other structured failure, the tool returns the JSON as-is. The LLM gets a string identical in shape to a normal response and has no schema-level way to know "this is an error, not an empty result."3. Runner —
agents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs:141-157Any non-empty
output⇒SkillRunnerExecutionCompletedEvent, which clearsLastErrorand resetsErrorCountto 0 (SkillRunnerGAgent.cs:532-540). The LLM's bullet fallback is non-empty, so the runner records the failure as a clean success. The catalog projection downstream (/agent-status) inherits the lie.The end-to-end consequence: a transient
nyxid_proxyfailure → opaque error JSON to the LLM → bullet fallback text →Completedevent → invisible to operator and user.How this slipped in
The original daily prompt was written to handle a legitimate idle case ("don't manufacture progress when the user has nothing to report"), and the wording is reasonable for that case alone. The blast radius widened when GitHub tool failures became indistinguishable from genuine emptiness. There was never a path in the prompt — or the tool, or the runner — for "the call failed, the data is unknown."
The default-to-swallow behavior also exists at the runner level (any LLM string is "success") and the tool level (any proxy response is forwarded verbatim), so the prompt fix alone won't be enough.
Suggested fix direction (not prescriptive)
A working fix probably needs at least two of three layers, or it'll slide back:
Errors:section listing which queries failed and with what status) instead of collapsing them into the bullet template.nyxid_proxy, give the LLM a structured signal it can't accidentally ignore — e.g., wrap the response with{ "tool_status": "error|empty|ok", "data": ..., "error_detail": "..." }for the daily-report skill, or fail the tool call with an exception (Aevatar's tool middleware can catch and surface as a tool-error message that the LLM is forced to acknowledge).SkillRunnerGAgent.HandleTriggerAsync, if the LLM output is structurally consistent with the "tool failure" pattern (e.g., contains explicit error markers from the tool layer), downgrade the run toSkillRunnerExecutionFailedEvent. This is the safety net even if prompt/tool drift.Pure prompt-level fixes are fragile — LLMs can and do drop format requirements under load. A defense-in-depth fix that also touches the tool and runner layers is the right shape.
Acceptance criteria
nyxid_proxyreturns a 4xx/5xx/7xxx error, the daily report output explicitly names the failing endpoint(s) and surfaces the underlying status — never the silent "No X surfaced" template.SkillRunnerGAgentpersistsSkillRunnerExecutionFailedEvent(notCompleted), so/agent-statusshowserror_count > 0and a non-emptylast_error.Affected files
agents/Aevatar.GAgents.ChannelRuntime/AgentBuilderTemplates.cs:48-64— prompt fallbacksrc/Aevatar.AI.ToolProviders.NyxId/Tools/NyxIdProxyTool.cs:71-121— error pass-throughagents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs:141-157— output → completed/failed routingagents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs:532-540—ApplyCompletedclearsLastError/ErrorCountRelated
/dailyflow).