Skip to content

feat(notebook): quiet connection/identity slot for cloud and desktop (#3421 PR 3)#4

Closed
quillaid wants to merge 16 commits into
local-first/02b-preview-hardeningfrom
local-first/03-slot
Closed

feat(notebook): quiet connection/identity slot for cloud and desktop (#3421 PR 3)#4
quillaid wants to merge 16 commits into
local-first/02b-preview-hardeningfrom
local-first/03-slot

Conversation

@quillaid

Copy link
Copy Markdown
Owner

PR 3 of the nteract#3421 stack (intra-fork: base is the nteract#3577 branch so the diff shows only this PR; promotes to nteract/nteract when nteract#3577 merges).

What this does

The first UI consumer of connectionStatus$ (nteract#3530): a quiet self-identity + connectivity slot in the upper-right, designed against the distilled rules from the three previously-pulled designs (nteract#3273/nteract#3290/nteract#3337/nteract#3349) — flat, icon-only, sr-only labels, state via dot tone/opacity, errors stay in the notices stack, nothing renders for a purely local desktop session (the nteract#3290 lesson, centralized as isRemoteNotebookContext).

  • Cloud: mounts in the empty identityControls slot, fed by a session-stable switching bridge (CloudConnectionStatusBridge) that follows replacement transports across initial-connect retries and escalation teardowns, masking a disposed transport's terminal offline as reconnecting while the session retries.
  • Desktop: mounts in trailingControls, driven by real daemon lifecycle events (ready→online, disconnected→reconnecting, unavailable→offline) with copy scoped to the measured link ("Daemon connection: …") — the review caught that the transport's own status was a constant that could never transition. daemon↔room link health for runtime-peer contexts is recorded as ADR future work.
  • Sustained-reconnecting notice: after 3s of reconnecting, one calm notices-stack line — "Reconnecting. Your edits are kept locally and will sync when the connection returns." — replacing the per-drop "Live room needs attention." for exact transport-minted link-loss shapes only (terminal auth/access failures keep the actionable diagnostic route). Flap-silent, single-fire, clears on recovery.
  • A11y: polite live-region announces status changes (mounts are silent), avatar initials are aria-hidden so the scoped sr-only copy is the accessible name.

Review

4-lens adversarial review (aesthetic conformance vs the pulled-design rules / bridge lifecycle / a11y+API / test rigor) + per-finding verification: 13 confirmed findings fixed; 12 refuted — notably the surface-split, kernel-dot-adjacency, and double-avatar concerns were all verified as consistent with the design authorities.

Verification

  • Full vp suite: 2133 passed, 0 failed (component: 17 tests incl. status×identity matrix, local-renders-nothing, live reconnect-loop dot; daemon source, toolbar flow, App mount pins)
  • notebook-cloud: 786 passed, 0 failed (bridge switching/dedup/teardown-masking, notice routing, refactor(cloud): quiet notebook shell chrome nteract/nteract#3337 quiet-chrome regressions intact)
  • tsc --noEmit (both apps + root) and cargo xtask lint clean

Deployed for field QA via draft nteract#3570 (preview.runt.run).

🤖 Generated with Claude Code

@github-actions github-actions Bot added documentation Improvements or additions to documentation frontend test labels Jun 11, 2026
quillaid and others added 15 commits June 11, 2026 06:34
…dvisories (nteract#3576)

Resolves the two high-severity pnpm audit findings — fast-uri path
traversal (GHSA-q3j6-qgpj-74h6) and host confusion (GHSA-v39h-62p7-jpjc)
— reached via @modelcontextprotocol/sdk -> ajv -> fast-uri. The SDK bump
plus a transitive update moves the lockfile to fast-uri 3.1.2.

pnpm audit --prod now reports 0 high (was 2 high).
…e connect, offline detection, liveness probe (nteract#3577)

* fix(notebook-cloud): preserve painted projection across live-effect re-runs

With IndexedDB seeding, the persisted snapshot paints cells before the
live-room effect settles. The effect re-runs for reasons that are not a
notebook switch (the mount-time session-status fetch replacing the session
object identity, OIDC refreshes, manual retries), and its cleanup cleared
the view-store projection unconditionally - a full -> empty -> full flicker
on every page load.

Reuse the desktop bootstrap-preservation gate: track the painted notebook
identity in a ref, and skip the cleanup clear when the same notebook with
visible cells is about to re-materialize in place. A real notebook switch
still clears, and true unmount still clears via the mount-scoped effect.

The helper is re-exported through notebook-surface so the cloud viewer
keeps importing desktop code only via the public surface.

* fix(notebook-cloud): keep app session identity stable across confirming fetches

The mount-time /api/auth/session fetch installed a fresh session object
even when it only confirmed the session the page was rendered with. That
new object identity rippled through resolveSyncAuth into the live-room
effect deps, tearing down and reconnecting the live room once per page
load - the second trigger behind the field-reported flicker.

Reduce the fetch result through nextCloudAppSessionReadyState: content-
equal sessions keep the current session object, and a fully-confirming
fetch returns the current state object unchanged so React bails out of
the update entirely.

* feat(notebook-cloud): detect offline and probe liveness on the cloud transport

An OS-level offline window never fires WS close/error: the socket stays
OPEN, sends buffer silently, and the only loss signal was the TCP
retransmit abort minutes later - connectionStatus$ never flipped before
connectivity returned, so the field report saw zero UI change. Two
detection paths close that hole:

- Browser offline event: a window 'offline' listener mirrors the online
  short-circuit and proactively recycles the current socket through
  connectionLost, flipping status to reconnecting immediately and fixing
  the zombie-socket-on-recovery hazard as well.
- App-level liveness probe: after each cloud_room_ready the transport
  sends a text LIVENESS_PING every 20s and treats a pong missed within
  10s as connection loss (covers upstream loss with the interface up,
  where 'offline' never fires). The room DO answers via CF-native
  setWebSocketAutoResponse - no DO wake, persists across hibernation -
  with a manual handleMessage fallback placed before the binary-only
  rejection so pings never count toward rejected-frame close. Probe
  cadence is injectable for tests, and the timers are unref'd so a
  connected transport cannot keep a Node test process alive.

* fix(notebook-cloud): preserve runtime/output stores in the flicker gate; make switch-clear reachable

Two review findings against the flicker gate:

- The gate preserved the cell store but still ran resetRuntimeState() and
  resetRuntimeStoresProjection() unconditionally. CodeCell renders outputs
  and execution counts exclusively through the execution/output stores, so
  output-bearing notebooks kept flickering their dominant visual mass.
  The gate now keeps or clears every projected store together, exactly
  matching desktop's preserve path (useAutomergeNotebook beforeBootstrap);
  resetPoolState stays unconditional as on desktop. Stale-data safety is
  unchanged: the next materialization replaces cells wholesale and the
  cloud-owned id sweep still removes stale output/execution entries.

- The cleanup's notebook-switch clause was structurally unreachable: the
  cleanup closes over its own run's config, so previous and next identity
  could never differ. Switch-clearing now lives where it is reachable —
  the NEXT effect run's body, before connecting, clears all projected
  stores iff the painted identity differs (desktop's beforeBootstrap
  placement) and drops the painted identity so later gates fail closed.
  The cleanup honestly reduces to "painted with visible cells => preserve",
  and true unmount clears everything via the mount-scoped effect.

The shared gate lives in notebook-view-store-bridge as
resetCloudProjectionUnlessPreserved, importing desktop code through a new
headless notebook-surface-stores entry (same ./lib implementations as
notebook-surface, no component/CSS imports) so the bridge stays loadable
under node test runners. Tests paint real cells with outputs through the
real projection path and assert cells, outputs, and execution pointers
survive a preserved gate and clear on a switch; source pins cover the two
session call sites that cannot run under node.

* fix(notebook-cloud): count any inbound frame as liveness evidence; skip pong deadline in hidden tabs

The probe cleared its pong deadline only on the exact LIVENESS_PONG text
frame. The pong is an ordinary frame serialized into the same in-order WS
stream as sync frames, so during a large post-reconnect sync over a slow
link it queues behind the backlog and the fixed 10s deadline killed
sockets that were demonstrably alive and actively delivering frames -
recycling exactly the connections least able to afford the resync churn.
handleMessage now treats ANY inbound message on the current socket as
liveness evidence; the strict pong match remains only the probe's reply
channel. A zombie socket delivers nothing, so detection is unchanged.

Second false-positive path: background tabs throttle and suspend timers,
so a frozen deadline could fire on resume before the queued pong
MessageEvent dispatches, declaring a healthy connection dead on every tab
foreground. The deadline is no longer armed while
document.visibilityState === "hidden" (the simpler shape) - pings still
flow, and the next visible-tab ping re-arms enforcement.

New coverage: inbound-frame evidence with a true-zombie follow-up, the
previously unreachable keep-original-deadline branch (deadline > interval
overlap), a pong landing one tick before the deadline, ping send failure
-> connectionLost, the livenessPingIntervalMs: 0 disable knob, a stale
pong from a superseded socket not satisfying the new probe, hidden-tab
deadline suppression with foreground re-arm, and the two uncovered
offline windows (parked connectTarget no-op + online supersession, and
pre-ready loss keeping status "connecting").

* test(notebook-cloud): pin hibernation re-arm and partial feature detection in the room

Two mutants the arm test could not kill: moving the auto-response arming
out of the constructor into a once-per-process path (hibernation wake IS
a constructor run against the same durable state - re-arming must be an
idempotent re-set of the same pair), and dropping the second conjunct of
the feature detection (global WebSocketRequestResponsePair present but
state.setWebSocketAutoResponse absent would TypeError in the constructor:
a total room outage, not a degraded probe).

* test(notebook-cloud): pin the session-status updater wiring via a rendered hook

The session-identity reducer was thoroughly unit-tested, but its single
consumer - the setState functional-updater callsite in
useCloudAppSessionStatus - was unpinned: reverting it to a plain object
setState kept every suite green while silently reintroducing the
once-per-page-load live-room reconnect. Render the real hook with a
stubbed readCloudAppSessionStatus and assert the session object reference
survives content-equal confirming fetches (mount and manual refresh),
renewed sessions are adopted, the no-initial-session path reaches ready,
and fetch failures keep the existing session.

Cloud viewer hooks need a DOM renderer, so the root vitest config gains
an apps/notebook-cloud/viewer/__tests__ include; the rest of the cloud
suite stays on node:test.
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* fix(cloud): apply connection actor to comms doc writes

* test(cloud): find widget smoke sliders across output frames
…eract#3421)

NotebookConnectionIdentity fills the reserved identity slot: the self
actor's flattened avatar (shared actor projection, initials/icon only)
paired with a connectivity dot — connectionStatus$'s first UI consumer.
Runtime-state stores are deliberately not blanked while a transport
reconnects, so the pulsing dot is what makes frozen kernel/execution
chrome interpretable during the offline window.

Hard quiet-chrome constraints from the pulled nteract#3273/nteract#3290/nteract#3337/nteract#3349
designs, enforced in component + tests:
- renders NOTHING for a purely local desktop session
  (isRemoteNotebookContext: access.source !== 'local' or
  runtime.target.kind === 'runtime_peer') — conditionality is why nteract#3290
  was pulled;
- flat rounded-md border-border/70 bg-muted/35, never
  rounded-full+shadow;
- icon-only at every width: no visible text pill; label and status live
  in sr-only copy and the title tooltip;
- state expresses as dot color (statusTone vocabulary: emerald online,
  amber pulse connecting/reconnecting, muted offline) and wrapper
  opacity, never copy;
- errors/reconnect prompts stay in the notices stack; connection state
  never rides the kernel status surface.

The component takes a structural NotebookConnectionStatusSource (an
rxjs-free Observable subset), so shared src/ gains no dependency;
NotebookActorAvatar is exported from NotebookIdentity with an optional
status-tone override to carry the connection dot.

Desktop mount: trailingControls on <NotebookToolbar> in App.tsx ->
identityControls at the right end of the command toolbar, driven by the
host transport's connectionStatus$ (stable for the app's lifetime). A
purely local session renders nothing, exactly as before nteract#3273.
… bridge (nteract#3421)

CloudConnectionStatusBridge (live-sync.ts): one BehaviorSubject-backed
connection-status source that stays stable for the session's lifetime.
The transport survives transport-level reconnects, but initial-connect
attempts and escalation teardowns still REPLACE it — the host facade's
delegation captures whichever transport exists at subscribe time and
never switches, so a subscriber could watch a dead object forever. The
bridge attaches to each replacement transport via onTransportCreated,
dedups across switches, and on escalation teardown detaches BEFORE the
dispose and reports 'reconnecting' — the disposed transport's terminal
'offline' (manual-disconnect vocabulary) never surfaces while the
session is retrying.

useCloudViewerSession exposes connectionStatus$ from the bridge;
notebook-viewer.tsx fills the formerly-null identityControls slot with
NotebookConnectionIdentity driven by it.

Tests: bridge replacement + dedup, teardown-retry masking, and the PR-2
reconnect loop reflected through one stable subscription; the
viewer-render-props quiet-chrome regression now pins the new mount
instead of identityControls={null}.
- Component name (NotebookConnectionIdentity), centralized
  isRemoteNotebookContext gate, structural rxjs-free status source.
- Desktop mount via host transport; cloud mount via
  CloudConnectionStatusBridge with teardown-retry masking semantics.
- The slot is what makes the PR-2 stale-chrome limitation interpretable:
  the dot stays live through 'reconnecting'.
- Tests paragraph updated to the landed coverage, including the
  identityControls={null} pin becoming a pin on the new mount.
…teract#3421)

Per the PR-3 adversarial review:

- The desktop dot was a constant: TauriTransport's connectionStatus$
  initializes 'online' and the app never disconnects it, so the slot
  could never transition — emerald through a daemon restart, the exact
  window its doc comment promised to make interpretable. The mount now
  feeds createDesktopConnectionStatusSource, derived from the host
  facade's daemon lifecycle events (daemon:ready -> online with
  ready-cache backfill for late mounts; daemon:disconnected ->
  reconnecting, since the host auto-reconnects; daemon:unavailable ->
  offline). The IPC transport's own subject stays untouched and honest
  about IPC.
- Scoped copy: connectionLabel names the link the source measures
  ('Daemon connection' on desktop) in the title/sr-only copy and the
  announcements, so the dot never asserts daemon<->room health it does
  not observe (recorded as future work).
- getCurrent(): optional synchronous snapshot on the source contract;
  first paint shows the real status instead of a one-frame
  'connecting' flash.
- Live region: status CHANGES (never the initial value — the first
  source delivery is the baseline) are announced via a polite sr-only
  region using the scoped copy; sources dedup so flaps don't spam.
- aria-hidden on the avatar so SR users hear the sr-only copy, not
  'AL Alice — Connected'.

Tests hardened per review: fixtures use valid runtime-target shapes
(attached/ready + label); no-visible-text now clone-and-strips so
wrapper-level text nodes are caught; flat-never-raised walks the whole
subtree banning shadow tokens (rounded-full stays wrapper-scoped —
Avatar internals are legitimately round) with classList token checks;
unmount-unsubscribe pinned via listenerCount; new tests for getCurrent
first paint, live-region change-only announcements, aria-hidden, and
scoped copy. Desktop coverage added: daemon-lifecycle source unit
tests, a composition test feeding REAL desktopNotebookShellCapabilities
output into isRemoteNotebookContext, toolbar trailingControls flow with
real projection fixtures, and a source-text pin on the App.tsx mount
(daemon source + scoped label, never the IPC transport).
…contract (nteract#3421)

- Auth-refresh gap: the live-room effect cleanup detached the bridge and
  the refreshing re-run early-returns without attaching, so the bridge
  served stale 'online' while the transport was fully disconnected. The
  cleanup now uses noteTeardownRetry() — the gap reads as 'reconnecting'
  (a transition), and the disposed transports' terminal 'offline' still
  never surfaces. Harmless at real unmount.
- CloudConnectionStatusBridge implements the slot's source contract
  directly (subscribe + getCurrent), and the session exposes the bridge
  itself so first paint reads the synchronous snapshot.
- Tests: plain detach() severance (no late emissions, no spurious
  status), the source contract (replay + snapshot + unsubscribe);
  viewer-render-props slot pin is now module-scoped via
  viewerFileContaining with bounded windows and pins
  capabilities={shellCapabilities} (no cross-corpus false-pass), plus
  source guardrails for the session wiring order:
  attach-on-transport-created and retry-before-dispose in all three
  teardown paths (scheduleReconnect, poison-pill discard, effect
  cleanup).
- Desktop is fed by the daemon-lifecycle source, not the IPC transport
  (whose status is constant in practice); copy scoped via
  connectionLabel; daemon<->room link health recorded as future work.
- getCurrent first-paint snapshot, change-only live-region
  announcements, aria-hidden avatar.
- Bridge cleanup semantics: teardown paths AND the effect cleanup
  report 'reconnecting' so the auth-refresh re-run gap never reads as
  stale 'online'.
- Tests paragraph updated to the landed coverage.
…tices stack

The connection/identity slot stays an 8px dot by design, so a sustained
outage was only legible to someone watching the far right of the header.
Surface ONE quiet notices-stack line ("Reconnecting." / "Your edits are
kept locally and will sync when the connection returns.") once the live
room status has stayed "reconnecting" for 3s, and clear it on online.

The debounce lives in SustainedReconnectingTracker: reconnecting arms one
timer, sub-debounce flaps cancel it silently, repeated reconnecting
deliveries never re-arm or duplicate, and a replacement transport's
pre-handshake "connecting" neither arms nor clears. Transport-level
connectionError strings (socket drops, handshake timeouts, missed
liveness pongs, browser offline) now route through this debounced line
instead of the immediate per-drop "Live room needs attention." warning;
access and sign-in diagnostics keep their dedicated notices. The chip and
dot visuals are untouched.
…t noise

The broad prefix match (/^cloud (sync|room) /) swallowed every
transport-wrapped failure into the debounced calm "Reconnecting." line -
including genuinely terminal ones. A mid-session auth/access revocation
surfaces as wrapped connect-target or socket failures, and the actionable
diagnostics path only runs pre-ready, so an affected user saw an infinite
quiet reconnect with no sign-in CTA.

The classifier now matches the transport's own link-loss reasons shape by
shape, anchored (socket close/failure, browser offline, handshake
timeout, liveness ping/pong misses). Connect-target resolution failures,
socket creation failures, protocol decode failures, and the session
escalation's "cloud room rejected frame: ..." keep the existing "Live
room needs attention." warning with its action. Tests pin both routes,
including the rendered notice for non-link failures.
…hook harness

The tracker had thorough unit coverage but the hook itself was never
executed, leaving its React wiring unpinned - dropping tracker.dispose()
from the effect cleanup passed every suite. Render the real hook against
a BehaviorSubject-shaped fake source: debounce boundary drives the
returned flag, sub-debounce flaps stay silent, unmount unsubscribes and
disposes the pending timer (timer count reaches zero), and a source
identity change mid-outage pins the intended clear-then-re-arm-via-replay
behavior.
@quillaid

Copy link
Copy Markdown
Owner Author

Promoted upstream (rebased onto main incl. nteract#3577/nteract#3579/nteract#3580): see the cross-fork PR on nteract/nteract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants