Skip to content

refactor(daily-catalog): make UserAgentCatalogGAgent pure set-membership; projector consumes SkillRunner committed events directly #444

@eanzhao

Description

@eanzhao

Architectural follow-up surfaced in docs/audit-scorecard/2026-04-27-daily-pipeline-architecture-review.md §B1 + §B6. This is the structural fix behind #440; #440's own "Suggested fix direction" only lists patches (coalesce / defer / watermark) that leave the underlying coupling.

Symptom (recap)

UserAgentCatalogGAgent is a well-known single actor that wears two hats:

  • set-membership owner (which agents exist) — long-lived fact owner ✅
  • per-entry execution-status hub — receives UserAgentCatalogExecutionUpdateCommand from every SkillRunnerGAgent after every run ❌

The early-return guard in HandleExecutionUpdateAsync (agents/Aevatar.GAgents.ChannelRuntime/UserAgentCatalogGAgent.cs:91-114):

```csharp
if (State.Entries.All(x => !string.Equals(x.AgentId, command.AgentId, StringComparison.Ordinal))) {
Logger.LogWarning("Cannot update execution state for missing user agent catalog entry: {AgentId}", command.AgentId);
return;
}
```

silently drops the trigger-side execution update if it arrives before the init-side upsert has been applied to state. That is the architectural root cause of #440 "first-run not reflected in /agent-status".

Architectural violations

  • CLAUDE.md "单线程 actor 不做热点共享服务" — every agent's every execution funnels through one well-known actor.
  • CLAUDE.md "Actor 即业务实体" — catalog is an aggregate view, not a per-execution coordinator.
  • CLAUDE.md "actor 即业务实体... 数据与方法同住" — SkillRunnerGAgent shouldn't know catalog's command schema; it does today (see #B6 in the audit doc).

Proposed direction

  1. Catalog actor handles only UserAgentCatalogUpsertCommand / UserAgentCatalogTombstoneCommand. Remove UserAgentCatalogExecutionUpdateCommand from its message set.
  2. SkillRunnerGAgent (or its split successors per #refactor-split-skill-runner) commits SkillRunnerExecutionCompletedEvent / SkillRunnerExecutionFailedEvent only.
  3. UserAgentCatalogProjector subscribes to both event streams (catalog upsert/tombstone + SkillRunner execution events), keyed by agent_id, and writes to UserAgentCatalogDocument with covering-write semantics.
  4. The early-return guard goes away — projector covers by primary key, late events still land correctly.
  5. SkillRunnerGAgent.UpdateRegistryExecutionAsync and the UserAgentCatalogExecutionUpdateCommand proto / handler are deleted (net code removal).

Acceptance

  • After /daily ... with run_immediately, /agent-status reflects Last run / Next run within projection lag SLO. (Same as bug(agent-status): SkillRunner first-run completion does not reflect in Last run / Next run #440 acceptance.)
  • No "Cannot update execution state for missing user agent catalog entry" warnings under any timing.
  • SkillRunnerGAgent no longer references any UserAgentCatalogXxx command type.
  • Test: spawn N concurrent /daily creates, all of them reflect committed status in catalog readmodel without races.
  • Net diff is negative (deletes the command path).

Affected files

  • `agents/Aevatar.GAgents.ChannelRuntime/UserAgentCatalogGAgent.cs`
  • `agents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs` (UpsertRegistryAsync / UpdateRegistryExecutionAsync sites)
  • `agents/Aevatar.GAgents.ChannelRuntime/UserAgentCatalogProjector.cs`
  • `agents/Aevatar.GAgents.ChannelRuntime/AgentBuilderTool.cs` (UserAgentCatalogUpsertCommand call sites)
  • `agents/Aevatar.GAgents.ChannelRuntime/channel_runtime_messages.proto` (UserAgentCatalogExecutionUpdateCommand removal)

Related

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions