Skip to content

bug(daily): SkillRunner masks GitHub tool failures as silent "no activity" successful runs #439

@eanzhao

Description

@eanzhao

Symptom

After /daily eanzhao (binding my own GitHub username), the very first daily run came back with:

GitHub Daily Update — eanzhao

- No meaningful public GitHub activity found for eanzhao in the last 24 hours.
- No recent authored commits surfaced in the checked window.
- No recently updated authored issues or PRs surfaced in the checked window.
- No recent issue or PR comments surfaced in the checked window.

No blockers.

But eanzhao actually had 52 commits + 114 authored issues/PRs + 32 comments during the same 24h window. Verified by running the exact three search queries from the prompt directly against api.github.com:

query result
/search/commits?q=author:eanzhao+author-date:>=2026-04-26 52 hits (PR #289 merge, "Address review" series, Telegram coverage, etc.)
/search/issues?q=author:eanzhao+updated:>=2026-04-26 114 hits (PRs #427/#428/#289, issue #436 itself, charon work, …)
/search/issues?q=commenter:eanzhao+updated:>=2026-04-26 32 hits

/agent-status skill-runner-de2c162c9d454cc8a4b64b190673d722 showed Status: running, error_count: 0, last_error: "" — i.e. the runner believed the execution had succeeded.

Re-running via /run-agent (Run Now button) ~15 minutes later returned the correct full report immediately, so the underlying GitHub access path was healthy. The first run was a transient infrastructure failure that surfaced as a fake-success in the agent's output, in /agent-status, and in the registry. Operator and end-user have no signal to investigate.

NyxID approval gate ruled out (no approval push received during the window).

Root cause

Three collaborating layers all default to "swallow":

1. Prompt — agents/Aevatar.GAgents.ChannelRuntime/AgentBuilderTemplates.cs:48-64

.AppendLine("Suggested GitHub proxy calls:")
.AppendLine("- GET /search/commits?q=author:{username}+author-date:>={iso_date}")
.AppendLine("- GET /search/issues?q=author:{username}+updated:>={iso_date}")
.AppendLine("- GET /search/issues?q=commenter:{username}+updated:>={iso_date}")
.AppendLine("If there is no meaningful activity, say so plainly instead of inventing progress.")

The "say so plainly" line is the only fallback the LLM has. It conflates two distinct outcomes:

  • True negative — search returned 0 items (user was actually idle).
  • Tool failure — proxy returned 4xx/5xx/7xxx, no items observable at all.

The LLM picks the same bullet template for both.

2. Tool — src/Aevatar.AI.ToolProviders.NyxId/Tools/NyxIdProxyTool.cs:111-120

var result = await _client.ProxyRequestAsync(effectiveToken, slug, path, method, body, headers, ct);

if (IsApprovalError(result, out var approvalCode, out var approvalRequestId))
{
    _logger.LogInformation(
        "[nyxid_proxy] Approval response: code={Code} requestId={RequestId}",
        approvalCode, approvalRequestId);
}

return result;

When the proxy returns {"error": true, "status": 401, ...} or {"code": 7000, ...} or any other structured failure, the tool returns the JSON as-is. The LLM gets a string identical in shape to a normal response and has no schema-level way to know "this is an error, not an empty result."

3. Runner — agents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs:141-157

var output = await ExecuteSkillAsync(now, command.Reason, CancellationToken.None);
await SendOutputAsync(output, CancellationToken.None);
await PersistDomainEventAsync(new SkillRunnerExecutionCompletedEvent
{
    CompletedAt = Timestamp.FromDateTimeOffset(now),
    Output = output,
});

Any non-empty outputSkillRunnerExecutionCompletedEvent, which clears LastError and resets ErrorCount to 0 (SkillRunnerGAgent.cs:532-540). The LLM's bullet fallback is non-empty, so the runner records the failure as a clean success. The catalog projection downstream (/agent-status) inherits the lie.

The end-to-end consequence: a transient nyxid_proxy failure → opaque error JSON to the LLM → bullet fallback text → Completed event → invisible to operator and user.

How this slipped in

The original daily prompt was written to handle a legitimate idle case ("don't manufacture progress when the user has nothing to report"), and the wording is reasonable for that case alone. The blast radius widened when GitHub tool failures became indistinguishable from genuine emptiness. There was never a path in the prompt — or the tool, or the runner — for "the call failed, the data is unknown."

The default-to-swallow behavior also exists at the runner level (any LLM string is "success") and the tool level (any proxy response is forwarded verbatim), so the prompt fix alone won't be enough.

Suggested fix direction (not prescriptive)

A working fix probably needs at least two of three layers, or it'll slide back:

  • Prompt layer: split "no activity" from "tool error" in the daily prompt. Require the LLM to call out tool failures explicitly (e.g., a final Errors: section listing which queries failed and with what status) instead of collapsing them into the bullet template.
  • Tool layer: in nyxid_proxy, give the LLM a structured signal it can't accidentally ignore — e.g., wrap the response with { "tool_status": "error|empty|ok", "data": ..., "error_detail": "..." } for the daily-report skill, or fail the tool call with an exception (Aevatar's tool middleware can catch and surface as a tool-error message that the LLM is forced to acknowledge).
  • Runner layer: in SkillRunnerGAgent.HandleTriggerAsync, if the LLM output is structurally consistent with the "tool failure" pattern (e.g., contains explicit error markers from the tool layer), downgrade the run to SkillRunnerExecutionFailedEvent. This is the safety net even if prompt/tool drift.

Pure prompt-level fixes are fragile — LLMs can and do drop format requirements under load. A defense-in-depth fix that also touches the tool and runner layers is the right shape.

Acceptance criteria

  • When nyxid_proxy returns a 4xx/5xx/7xxx error, the daily report output explicitly names the failing endpoint(s) and surfaces the underlying status — never the silent "No X surfaced" template.
  • When all GitHub tool calls fail, SkillRunnerGAgent persists SkillRunnerExecutionFailedEvent (not Completed), so /agent-status shows error_count > 0 and a non-empty last_error.
  • When some GitHub calls succeed and some fail, the report includes the partial data and lists the failed queries.
  • Test fixtures cover: (a) all-fail; (b) mixed; (c) genuinely-empty (no activity but all tools returned 200 with empty arrays). The "All No-X" template is only allowed in case (c).

Affected files

  • agents/Aevatar.GAgents.ChannelRuntime/AgentBuilderTemplates.cs:48-64 — prompt fallback
  • src/Aevatar.AI.ToolProviders.NyxId/Tools/NyxIdProxyTool.cs:71-121 — error pass-through
  • agents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs:141-157 — output → completed/failed routing
  • agents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs:532-540ApplyCompleted clears LastError/ErrorCount

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions