Skip to content

fix(governance): reject body owner identity that conflicts with claims#433

Merged
eanzhao merged 5 commits intodevfrom
fix/2026-04-27_binding-owner-identity-mismatch
Apr 27, 2026
Merged

fix(governance): reject body owner identity that conflicts with claims#433
eanzhao merged 5 commits intodevfrom
fix/2026-04-27_binding-owner-identity-mismatch

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented Apr 27, 2026

Summary

Why this scope

TryResolveContext is shared by many endpoints; flipping it to be loud universally would change semantics on ServiceEndpoints, ServicePolicyEndpoints, ServiceEndpointCatalogEndpoints, and break the four existing WhenAuthenticatedIdentityConflictsWithBody_ShouldUseClaimIdentity tests in ServiceEndpointsTests.cs / GovernanceEndpointsTests.cs. That deserves its own discussion. This PR is the narrow fix for the asymmetry within binding endpoints — owner is loud now, parallel to bound-service.

Implementation

  • New TryValidateOwnerIdentity helper inside ServiceBindingEndpoints.cs, parallel to TryValidateBoundServiceIdentity. Returns 400 OWNER_SERVICE_IDENTITY_CONFLICT when authenticated context exists and body/query identity is non-empty and non-matching.
  • MatchesAuthenticatedValue semantics preserved: empty/whitespace request fields are accepted (claims fill them in).
  • Called from each of the four binding handlers before any state mutation.

Test plan

  • dotnet build test/Aevatar.GAgentService.Integration.Tests/... — clean (only pre-existing warnings)
  • dotnet test test/Aevatar.GAgentService.Integration.Tests/... — 266 / 266 pass
  • Existing BindingEndpoints_WhenAuthenticatedBoundServiceOmitsIdentity_ShouldUseClaimIdentityForOwnerAndBoundService updated to use claim-matching owner body (previously used spoof-* and relied on silent override). Test purpose preserved (bound-service omission still derives from claims).
  • New BindingEndpoints_WhenAuthenticatedOwnerIdentityConflictsWithClaims_ShouldReturnBadRequest — POST body mismatch → 400, no command dispatched
  • New BindingEndpoints_WhenAuthenticatedOwnerIdentityConflictsWithQuery_ShouldReturnBadRequest — GET query mismatch → 400, no query dispatched

Out of scope

🤖 Generated with Claude Code

Issue #362 residue C-1. Bound-service identity already returns
400 BOUND_SERVICE_IDENTITY_CONFLICT on body/claim mismatch
(commit 2ba2b2a), but the owner identity on the same binding
endpoints was still claim-silent-overrides-body. Make it
symmetric: when authenticated, reject body owner tenantId /
appId / namespace that disagree with claims with
400 OWNER_SERVICE_IDENTITY_CONFLICT.

Applies to all four ServiceBindingEndpoints handlers: Create,
Update, Retire, Get. Other identity-resolving endpoints are out
of scope; their tests still assert silent override.

Test changes:
- BindingEndpoints_WhenAuthenticatedBoundServiceOmitsIdentity_ShouldUseClaimIdentityForOwnerAndBoundService:
  body owner switched from "spoof-*" to claim-matching, since
  it now would 400 on the spoof. Test purpose unchanged
  (bound-service omission still derives from claims).
- New: BindingEndpoints_WhenAuthenticatedOwnerIdentityConflictsWithClaims_ShouldReturnBadRequest (POST)
- New: BindingEndpoints_WhenAuthenticatedOwnerIdentityConflictsWithQuery_ShouldReturnBadRequest (GET query)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 93.22034% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.59%. Comparing base (ce42415) to head (36c0f9b).

Files with missing lines Patch % Lines
...nance.Hosting/Endpoints/ServiceBindingEndpoints.cs 90.00% 1 Missing and 3 partials ⚠️
@@            Coverage Diff             @@
##              dev     #433      +/-   ##
==========================================
+ Coverage   70.57%   70.59%   +0.01%     
==========================================
  Files        1208     1208              
  Lines       86760    86806      +46     
  Branches    11363    11369       +6     
==========================================
+ Hits        61233    61277      +44     
- Misses      21098    21100       +2     
  Partials     4429     4429              
Flag Coverage Δ
ci 70.59% <93.22%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
....Hosting/Identity/ServiceIdentityEndpointAccess.cs 90.98% <100.00%> (+1.56%) ⬆️
...nance.Hosting/Endpoints/ServiceBindingEndpoints.cs 88.32% <90.00%> (+1.93%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体过了一遍,方向 OK,跟 #311BOUND_SERVICE_IDENTITY_CONFLICT 模式对得上。三个观察,按重要程度排:

1. 测试覆盖缺口(建议补)

PR 描述说 "Applies to all four ServiceBindingEndpoints handlers (Create, Update, Retire, Get)",但新增的 conflict 测试只覆盖了 Create (POST) 和 Get (GET)。HandleUpdateAsync(PUT)和 HandleRetireAsync(POST :retire)也都加了 TryValidateOwnerIdentity 调用,但没有对应的 conflict 集成测试锁定行为。建议补两个:

  • BindingEndpoints_WhenAuthenticatedOwnerIdentityOnUpdateConflictsWithBody_ShouldReturnBadRequest
  • BindingEndpoints_WhenAuthenticatedOwnerIdentityOnRetireConflictsWithBody_ShouldReturnBadRequest

否则未来重构 Update/Retire 路径不小心把这一步漏掉,CI 不会报。

2. 跨 endpoint 不对称(至少要 follow-up tracking)

PR 描述里也提到了,但值得显式 track。这次 PR 落地后:

  • POST /api/services/{id}/bindings 与 owner conflict → 400
  • POST /api/services/{id}/policies 与 owner conflict → 202 静默改写
  • POST /api/services/{id}/endpoint-catalog 与 owner conflict → 202 静默改写

三者共用 TryResolveContext,挂在同一段 /api/services/{serviceId}/... 下面,从 API 客户端看行为分裂会比较费解。CLAUDE.md 里 "API 字段单一语义" 倾向于全部 loud,但那要一次性翻 4 个 WhenAuthenticatedIdentityConflictsWithBody_ShouldUseClaimIdentity 旧测试,PR 选择不动可以接受。建议要么在 #362 上把 "C-1 done / C-3(policies + endpoint-catalog 拉齐)owed" 显式记录、要么开个新 follow-up issue,免得 #362 关掉后这条遗漏漂走。

3. 校验顺序的小不一致(nit)

四个 handler 分两种顺序:

  • Create / Update:先 TryResolveContextTryValidateOwnerIdentity
  • Retire / Get:先 TryValidateOwnerIdentityTryResolveIdentity

功能等价(malformed claims 时 Resolve() 返回 null,validation 自动 no-op),只是读起来略乱。统一到 "先 validate 再 resolve" 会更一致——具体冲突错误(400 OWNER_SERVICE_IDENTITY_CONFLICT)优先于通用 auth 失败(403 SERVICE_IDENTITY_ACCESS_DENIED)也更合理。

顺带:identityResolver.Resolve() 在每个 handler 里被调了两次(一次进入 TryResolveContext/TryResolveIdentity 内部,一次本地存 authenticatedContext)。可以让 helper 顺手 out 一个 authenticatedContext 出来(unauthenticated 时为 null),handler 直接复用,少一次 claim 解析。

如果只动一个,优先补 1。


response.StatusCode.Should().Be(HttpStatusCode.BadRequest);
host.QueryPort.LastBindingsIdentity.Should().BeNull();
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

两个新 conflict 测试都只断言了 status code 和 command/query 没被分发,但没断言 response body 里 code == "OWNER_SERVICE_IDENTITY_CONFLICT"。这个 code 是 API 契约里客户端会 switch 的字段——message 字符串可以变,code 不能变。建议加一条:

var body = await response.Content.ReadFromJsonAsync<JsonElement>();
body.GetProperty("code").GetString().Should().Be("OWNER_SERVICE_IDENTITY_CONFLICT");

(同样的缺口 BOUND_SERVICE_IDENTITY_CONFLICT 那条测试也有,但 forward-fix 在这里更便宜。)

另:这两个测试后面缺 PUT 和 :retire 的 conflict 场景,HandleUpdateAsync / HandleRetireAsync 里加的 TryValidateOwnerIdentity 实际上没被集成测试覆盖到。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6860b02. The two existing owner-conflict tests (POST body, GET query) now assert code == "OWNER_SERVICE_IDENTITY_CONFLICT", and the bound-service test gets the same forward-fix asserting BOUND_SERVICE_IDENTITY_CONFLICT. Added two new integration scenarios that previously had zero coverage:

  • BindingEndpoints_WhenAuthenticatedOwnerIdentityConflictsOnUpdate_ShouldReturnBadRequest (PUT /bindings/{id})
  • BindingEndpoints_WhenAuthenticatedOwnerIdentityConflictsOnRetire_ShouldReturnBadRequest (POST /bindings/{id}:retire)

Both assert 400 + OWNER_SERVICE_IDENTITY_CONFLICT + the corresponding command port stayed null. The shared setup is now collapsed into AddAuthenticatedClaims and ReadCodeAsync helpers.

[FromServices] IServiceGovernanceCommandPort commandPort,
CancellationToken ct)
{
var authenticatedContext = identityResolver.Resolve();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retire 和 Get 这里是先 validate 再 resolve,但 Create (line 43-44) 和 Update (line 78-79) 是反过来——先 TryResolveContext 再 validate。四个 handler 的顺序不统一,读起来要回头比对。

功能是等价的(malformed claims 时 Resolve() 返回 null,validation 自然 no-op),但建议四个都对齐到 "先 validate 再 resolve":具体的 400 OWNER_SERVICE_IDENTITY_CONFLICT 比通用的 403 SERVICE_IDENTITY_ACCESS_DENIED 更早返回,给客户端的诊断信息也更精确。

顺带:Resolve() 现在在每个 handler 里被调两次(一次进 helper 内部,一次本地拿 authenticatedContext)。可以让 TryResolveContext / TryResolveIdentity 顺手 out ServiceIdentityContext? authenticatedContext,handler 直接用,少一次 claim 解析。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6860b02. All four handlers now share the same shape: (1) resolve authenticatedContext once, (2) validate body identity → 400 specific-code on mismatch, (3) TryResolveContext / TryResolveIdentity passing the already-resolved auth context (avoids the double Resolve() call), (4) dispatch. So HandleCreateAsync and HandleUpdateAsync now match HandleRetireAsync and HandleGetAsync — validate-then-resolve, and 400 OWNER_SERVICE_IDENTITY_CONFLICT precedes 403 SERVICE_IDENTITY_ACCESS_DENIED for malformed-claims-with-spoofed-body.

Added overloads TryResolveContext(resolver, authenticatedContext, ...) and TryResolveIdentity(resolver, authenticatedContext, ...) to ServiceIdentityEndpointAccess. Old overloads delegate into the new ones, so other callers (ServiceServingEndpoints, ServiceEndpoints, ServicePolicyEndpoints, ServiceEndpointCatalogEndpoints, …) keep compiling unchanged.

…n conflict tests

Addresses two PR #433 review comments.

1. ServiceBindingEndpoints handler order is now consistent across all four
   handlers (Create / Update / Retire / Get): validate-then-resolve.

   Previously Create and Update did TryResolveContext (which internally
   calls Resolve()) before TryValidateOwnerIdentity, so a malformed-claims
   request that should have produced 400 OWNER_SERVICE_IDENTITY_CONFLICT
   would short-circuit on 403 SERVICE_IDENTITY_ACCESS_DENIED instead. The
   four handlers now share the same shape:
     a) Resolve authenticated context once.
     b) Validate body identity → 400 with specific code on mismatch.
     c) TryResolveContext / TryResolveIdentity, passing the already-resolved
        authenticated context (avoids the double Resolve() call).
     d) Dispatch.

   Added new overloads to ServiceIdentityEndpointAccess that accept a
   pre-resolved ServiceIdentityContext, so the handler does not pay for
   Resolve() twice. Old overloads delegate into the new ones, keeping
   every other caller (ServiceServingEndpoints, ServiceEndpoints,
   ServicePolicyEndpoints, ServiceEndpointCatalogEndpoints, etc.) compiling
   unchanged.

2. Conflict tests now assert the response body's `code` JSON property,
   which is the API contract clients switch on. Added forward-fix
   coverage to BindingEndpoints_WhenAuthenticatedBoundServiceIdentity-
   ConflictsWithClaims as well.

   Plus two missing integration scenarios for the new owner-identity
   guard:
   - BindingEndpoints_WhenAuthenticatedOwnerIdentityConflictsOnUpdate
     → 400 OWNER_SERVICE_IDENTITY_CONFLICT, no UpdateBindingCommand
   - BindingEndpoints_WhenAuthenticatedOwnerIdentityConflictsOnRetire
     → 400 OWNER_SERVICE_IDENTITY_CONFLICT, no RetireBindingCommand

   Test helpers AddAuthenticatedClaims and ReadCodeAsync collapse the
   shared setup that was duplicated across each scenario.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

三条都跟上了,干净(我是 PR author 没法走 APPROVE 流程,但内容上 LGTM):

1. 测试覆盖 ✅ 补了 ...ConflictsOnUpdate......ConflictsOnRetire...,PUT/:retire 两条路径现在都有 conflict 测试锁定。

2. 跨 endpoint 不对称#362 上加了 follow-up comment 显式记录 policiesendpoint-catalog 还欠 alignment,避免 issue 关掉后漂走。

3. 顺序统一 + 去掉双 Resolve ✅ 四个 handler 现在都是 Resolve once → validate owner → validate bound-service → TryResolveContext → dispatchTryResolveContext / TryResolveIdentity 加了带 authenticatedContext 参数的 overload,旧调用站点保留为薄壳,handler 直接复用解析结果。Header 的 doc comment 把契约也讲清楚了(null → 走原始 403 路径,non-null → 复用),不会让后续读者误以为新 overload 跳过了 access-denied 分支。

附带:四个 conflict 测试现在都通过 ReadCodeAsync 断言 code 字段,连旧的 BOUND_SERVICE_IDENTITY_CONFLICT 也 forward-fix 了。AddAuthenticatedClaims 抽出来也减了不少重复。

LGTM,可以合。

@eanzhao eanzhao merged commit 298cec8 into dev Apr 27, 2026
12 checks passed
eanzhao added a commit that referenced this pull request Apr 27, 2026
- agents/Aevatar.GAgents.ChannelRuntime/ChannelUserConfigScope.cs (added on
  dev for issue #436 / PR #438) → moved to agents/Aevatar.GAgents.Channel.Runtime/
  with namespace updated to Aevatar.GAgents.Channel.Runtime, visibility lifted
  to public so Authoring's AgentBuilderTool / AgentBuilderCardFlow can compose
  per-Lark-user scopes across the package boundary.
- src/platform/Aevatar.GAgentService.Governance.Hosting/Endpoints/
  ServiceBindingEndpoints.cs: take origin/dev (PR #433 / 9e1737e / 6860b02
  refined the validation/handler-order shape).
- test/Aevatar.GAgentService.Integration.Tests/GovernanceEndpointsTests.cs:
  take origin/dev (matches the refined endpoint shape).

ChannelUserConfigScopeTests: add `using Aevatar.GAgents.Channel.Runtime;`
for the relocated type. 473/473 ChannelRuntime tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant