Skip to content

refactor(gateway): close GatewaySession proxy gap; ban reach-through (#570)#571

Merged
sytone merged 1 commit into
mainfrom
refactor/570-gateway-session-facade
May 27, 2026
Merged

refactor(gateway): close GatewaySession proxy gap; ban reach-through (#570)#571
sytone merged 1 commit into
mainfrom
refactor/570-gateway-session-facade

Conversation

@sytone

@sytone sytone commented May 27, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #570. Refs #523.

Adds the missing ConversationId proxy to GatewaySession, migrates every production reach-through (session.Session.X) to the proxy (25 sites across 18 files), and adds an architecture fence preventing regression.

Root cause

GatewaySession proxied 14 fields of the inner Session record but ConversationId was missing. Every caller that needed conversation-id read/write had to dive through session.Session.ConversationId. Once a few sites did that, the same shape spread to fields that DID have proxies (laziness pattern).

The damage: reach-through writes bypass GatewaySessionRuntime — its lock, the stream replay buffer, the secret redactor. PR #540 added a ThreadSafeHistoryArchitectureTests fence for the History case; this PR generalises the defence to every other proxied field.

Changes

Domain — proxy facade

  • GatewaySession.cs: added ConversationId get/set proxy (the missing facade). Deleted dead ToSession() (zero callers). Rewrote FromSession() XML doc to be meaningful.

Production — reach-through migration (18 files, 25 sites)

All session.Session.Xsession.X:

Area Files Sites
GatewayHost (auto-compaction, fan-out, ask-user, dispatch) GatewayHost.cs 8
Channels GatewayHub.cs 4
API controllers ChannelHistoryController.cs, ConversationsController.cs, CrossWorldFederationController.cs, SessionsController.cs 9
Triggers CronTrigger.cs, HeartbeatTrigger.cs 4
Routing / reset DefaultConversationRouter.cs, DefaultConversationResetService.cs 5
Stores FileSessionStore.cs, SessionStoreBase.cs, SqliteSessionStore.cs 3
Agent layer AgentExchangeService.cs, DefaultSubAgentManager.cs, WorkspaceContextBuilder.cs 4
Isolation / tools InProcessIsolationStrategy.cs, ConversationTool.cs 4

False-positive renames (caught pre/at-impl)

  • Rubber-duck pre-impl review caught: ChannelHistoryController.HistorySlice.Session (type GatewaySession) — the record property name collided with the banned shape. Renamed to HistorySlice.GatewaySession (5 callsites migrated).
  • Verification grep caught (post-edits): CrossWorldFederationController.ResolveResult.Session had the same shape. Renamed to ResolveResult.GatewaySession (3 callsites migrated). Filed in commit; mention this here because it surfaced after the rubber-duck pass.

Architecture fence — GatewaySessionFacadeArchitectureTests (7 tests)

  • Main fence (1): source-scans src/**/*.cs for \.Session\b\s*!?\s*\.\s*\w+ outside a 2-entry allowlist (GatewaySession.cs, GatewaySessionRuntime.cs — the proxy and its runtime).

  • Vacuity guard (1): synthetic session.Session.ConversationId = ... MUST trip the fence.

  • Null-forgiving operator pin (1): session!.Session!.UpdatedAt MUST also trip — realistic shape, regex permits [!]?.

  • False-positive guards (4):

    • .Session as bare argument (FlushAsync(..., session.Session, ...)) — not banned (receiver takes value record by design).
    • Object/record initializer (new GatewaySession { Session = inner }, gs with { Session = updated }) — not banned.
    • Sessions.X / SessionStore.X / SessionId.X\b word boundary discriminates.
    • Comment mention (XML doc / //) — comment stripper removes before regex.
  • Comment-stripping lexer: identical state-machine from SingleShotWireValueArchitectureTests (PR chore(gateway): rename single-shot CompletionReason "objectiveMet" to "singleShot" (#552) #569) — handles regular strings, verbatim strings (@"…"), char literals.

Behavior pins — GatewaySessionBehaviorSnapshotTests (3 tests)

  • ConversationId_RoundTripsThroughProxy_ReadingInnerRecord — set via proxy, assert visible on inner record (persistence sees it).
  • ConversationId_ProxyGetter_ReflectsInnerRecordChanges — set on inner, assert proxy reads through (defends against cached-field regression).
  • ConversationId_ProxyAcceptsNull_ForOrphanSessions — null tolerance for orphan / legacy sessions.

Intentional non-bans

Passing session.Session as a bare argument to a method that takes Session is not banned — the receiver gets the value record by design (e.g. _memoryFlusher.FlushAsync(..., session.Session, options, ct) at GatewayHost.cs:454, persistence-layer signatures). The fence regex requires .Session.<identifier> (an access chain after the dot), so bare-argument shapes don't match.

Out of scope (separate PRs)

  • Thinning the 8 stream-replay proxies (NextSequenceId, StreamEventLog, ReplayBuffer, AllocateSequenceId, AddStreamEvent, GetStreamEventsAfter, GetStreamEventSnapshot, SetStreamReplayState). Only FileSessionStore (3 sites) and tests use them — separate cleanup PR.
  • HeartbeatTrigger.ReplaceHistoryTryReplaceHistoryFromSnapshot migration. Behavior change (race-safety), separate PR.

Plan deviation note (plan-vs-impl critique LOW, accepted with rationale):
Original plan.md §4 Phase 7 proposed splitting GatewaySession into a thin core + a separate SessionStreamReplay type. This PR does NOT split — it adds the missing ConversationId proxy and enforces facade-only access via the fence. The split is the LARGER refactor; you cannot safely extract SessionStreamReplay until every caller already routes through the proxy. This PR is the structural prerequisite for that split — once the fence is in place, the 8 stream-replay proxies can be carved out into a new type with confidence that no production code reaches into the inner record. Filed as follow-up scope (above bullet) for a separate issue.

Known fence gaps (documented, not supported v1)

  • Parenthesized: (session.Session).ConversationId — won't match. Realistic but unusual; future PR can extend if needed.
  • Local-var aliasing: var inner = session.Session; inner.X = … — won't match.

Neither pattern appears anywhere in current src/. Adding regex variants for them is scope creep until a real call site emerges.

Validation

  • Build clean (dotnet build BotNexus.slnx --nologo --tl:off): 0 warnings, 0 errors under TreatWarningsAsErrors.
  • Full test suite (dotnet test BotNexus.slnx --nologo --tl:off --no-build):
    • Gateway.Tests: 1886 passed / 0 failed / 1 skipped (+3 ConversationId proxy pins).
    • Architecture.Tests: 83 passed (76 baseline + 7 new = main fence + vacuity + null-forgiving + 4 false-positive guards).
    • All other projects green; only pre-existing E2E/conversation [Skip] flags unchanged.

Multi-model critique sweep (autopilot policy)

Per user directive, 3 critique agents reviewed this PR before merge:

Agent Model Verdict
security-review gpt-5.5 CLEAN — verified authz, federation invariants, XPIA, secret redaction, fence completeness, behavioural pins; no HIGH/MEDIUM/LOW findings
plan-vs-impl review claude-opus-4.7 PASS — migration verified complete, fence correct, renames wire-safe (both record types are file-local privates), ToSession() deletion verified, 1 LOW (plan-deviation note, folded into "Out of scope" above)
bug-hunt review gpt-5.3-codex NO_BUGS_FOUND — verified all 8 areas (proxy substitution correctness, behavioural pin coverage, thread-safety not widened, fence runtime/CI validity, no reflective bypass helper exists, build clean, F-6 fence still matches)

No HIGH/MEDIUM findings to fold in.

Verification grep

After all edits, grep -nE '\.Session\b\s*!?\s*\.\s*\w+' src/**/*.cs returns exactly one hit:

  • GatewaySession.cs:84 — inside the XML doc on the new ConversationId proxy, documenting the banned shape. Stripped by the lexer; file is allowlisted anyway.

…570)

Adds ConversationId to the GatewaySession proxy surface, migrates every
production reach-through (~25 sites) to the proxy, and adds an architecture
fence preventing regression.

Root cause: GatewaySession proxied 14 fields of the inner Session record but
ConversationId was missing. Every caller that needed conversation-id read/
write had to dive through `session.Session.ConversationId`. Once a few sites
did that, the same shape spread to fields that DID have proxies (laziness).

The damage: reach-through writes bypass GatewaySessionRuntime — its lock, the
stream replay buffer, the secret redactor. PR #540 added a fence for the
History case; this PR generalises the defence.

Changes:
- src/domain/BotNexus.Domain/Gateway/Models/GatewaySession.cs:
  - Added ConversationId get/set proxy (the missing facade).
  - Deleted dead ToSession() (zero callers).
  - Rewrote XML doc on FromSession() to be meaningful.

- 18 production files migrated from `session.Session.X` to `session.X`:
  - src/extensions/BotNexus.Extensions.Channels.SignalR/GatewayHub.cs (4)
  - src/gateway/BotNexus.Gateway.Api/Controllers/ChannelHistoryController.cs (5)
  - src/gateway/BotNexus.Gateway.Api/Controllers/ConversationsController.cs (1)
  - src/gateway/BotNexus.Gateway.Api/Controllers/CrossWorldFederationController.cs (2)
  - src/gateway/BotNexus.Gateway.Api/Controllers/SessionsController.cs (1)
  - src/gateway/BotNexus.Gateway.Api/Triggers/CronTrigger.cs (3)
  - src/gateway/BotNexus.Gateway.Api/Triggers/HeartbeatTrigger.cs (1)
  - src/gateway/BotNexus.Gateway.Conversations/DefaultConversationRouter.cs (3)
  - src/gateway/BotNexus.Gateway.Sessions/FileSessionStore.cs (1)
  - src/gateway/BotNexus.Gateway.Sessions/SessionStoreBase.cs (1)
  - src/gateway/BotNexus.Gateway.Sessions/SqliteSessionStore.cs (1)
  - src/gateway/BotNexus.Gateway/Agents/AgentExchangeService.cs (2)
  - src/gateway/BotNexus.Gateway/Agents/DefaultSubAgentManager.cs (1)
  - src/gateway/BotNexus.Gateway/Agents/WorkspaceContextBuilder.cs (1)
  - src/gateway/BotNexus.Gateway/Conversations/DefaultConversationResetService.cs (2)
  - src/gateway/BotNexus.Gateway/GatewayHost.cs (8)
  - src/gateway/BotNexus.Gateway/Isolation/InProcessIsolationStrategy.cs (1)
  - src/gateway/BotNexus.Gateway/Tools/ConversationTool.cs (3)

- False-positive fix (rubber-duck pre-impl review):
  - ChannelHistoryController.HistorySlice.Session -> .GatewaySession
    (record property name collided with the banned shape; 5 callsites migrated)
  - CrossWorldFederationController.ResolveResult.Session -> .GatewaySession
    (same shape — caught at the verification grep, not pre-impl review;
    3 callsites migrated)

Tests:
- New tests/architecture/BotNexus.Architecture.Tests/GatewaySessionFacadeArchitectureTests.cs:
  - Main fence: 1 (no production file matches `\.Session\b\s*!?\s*\.\s*\w+`
    outside the allowlist of GatewaySession.cs and GatewaySessionRuntime.cs).
  - Vacuity guard: 1 (synthetic reach-through MUST trip).
  - Null-forgiving operator pin: 1 (session!.Session!.X must still match).
  - False-positive guards: 4 (.Session as bare argument, object initializer,
    Sessions/SessionStore/SessionId word-boundary, comment mentions).
  - Comment stripper: identical state-machine lexer from
    SingleShotWireValueArchitectureTests (PR #569) — preserves strings/chars.

- tests/gateway/BotNexus.Gateway.Tests/GatewaySessionBehaviorSnapshotTests.cs:
  - ConversationId_RoundTripsThroughProxy_ReadingInnerRecord (set via proxy,
    assert visible on inner record for persistence).
  - ConversationId_ProxyGetter_ReflectsInnerRecordChanges (set on inner,
    assert proxy getter reads through — defends against cached-field
    regression).
  - ConversationId_ProxyAcceptsNull_ForOrphanSessions (null tolerance).

Argument-passing of `session.Session` (e.g. _memoryFlusher.FlushAsync(...,
session.Session, ...)) is intentionally NOT banned — the receiver takes the
Session value record by design (persistence-layer signatures). The fence
regex requires `.Session.<identifier>`, so bare-argument shapes don't match.

Out of scope (separate PRs):
- Thinning the 8 stream-replay proxies (NextSequenceId, StreamEventLog,
  ReplayBuffer, AllocateSequenceId, AddStreamEvent, GetStreamEventsAfter,
  GetStreamEventSnapshot, SetStreamReplayState) — only FileSessionStore (3
  sites) + tests use them.
- Migrating HeartbeatTrigger.ReplaceHistory to TryReplaceHistoryFromSnapshot
  — behaviour change (race safety), separate PR.

Build clean: 0 warnings, 0 errors under TreatWarningsAsErrors.
Tests: Gateway.Tests 1886/0/1 (+3 ConversationId proxy pins),
       Architecture.Tests 76 -> 83 (+7 fence + guards).

Closes #570. Refs #523.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Gateway] Phase 7 (F-9) — close GatewaySession proxy gap; ban reach-through

1 participant