Skip to content

refactor(daily-outbound): late-bind lark_receive_id at send time instead of freezing at agent creation #448

@eanzhao

Description

@eanzhao

Architectural follow-up surfaced in docs/audit-scorecard/2026-04-27-daily-pipeline-architecture-review.md §B5.

Symptom

AgentBuilderTool.cs:274 calls ResolveDeliveryTarget(conversationId, agentId) at agent creation time and freezes the resulting (receive_id, receive_id_type) primary+fallback pair into SkillRunnerOutboundConfig.LarkReceiveId{*,Fallback} (proto fields 22-25 of UserAgentCatalogEntry, 14-17 of SkillRunnerOutboundConfig).

Implementation: AgentBuilderTool.cs:1881.

Once frozen, any of the following permanently breaks delivery:

  • Cross-app deployment where outbound bot is a different Lark app than inbound (the recorded chat_id is unknown to the outbound app)
  • The chat is renamed or the bot is removed from it
  • The user leaves the chat
  • The user's union_id changes

The current escape hatch is /delete-agent + recreate, which is the architecture problem outsourced to the user.

Architectural violations

  • CLAUDE.md "本地可用不等于分布式正确: 依赖本地 runtime 偶然细节才成立的实现视为未完成设计".

The "local accident" here is "the chat topology at creation time = the chat topology at every future execution".

Proposed direction

  • OutboundConfig (per-subscription) holds only a logical reference: (platform, conversation_canonical_key, owner_lark_user_id).
  • A LarkDeliveryResolver (per-platform adapter) resolves the current physical receive_id at send time, querying current binding state (e.g. via LarkUserGAgent from bug(daily): GitHub username binding shared across all Lark users of one bot — last writer wins #436's channel-user-binding direction).
  • Binding-failed becomes an observable failure surfaced to the user ("Bot is no longer in this chat — run /rebind to choose another."), not a silently-failing agent.
  • Plays well with #refactor-split-skill-runner: DailyReportRunGAgent queries LarkDeliveryResolver once per run; DailyReportSubscriptionGAgent doesn't carry physical address.

Acceptance

  • No lark_receive_id field on subscription state or readmodel.
  • Send time resolves current target every run; cross-app and chat-renamed scenarios work without recreating the agent.
  • When binding is gone, the user gets a structured "please /rebind" message; subscription is paused (not thrashing).
  • Test: rename the chat between two scheduled runs; second run still delivers.
  • Test: outbound bot removed from chat; subscription pauses with actionable error.

Dependencies

Related

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions