feat(notebook): quiet connection/identity slot for cloud and desktop (#3421 PR 3)#4
Closed
quillaid wants to merge 16 commits into
Closed
feat(notebook): quiet connection/identity slot for cloud and desktop (#3421 PR 3)#4quillaid wants to merge 16 commits into
quillaid wants to merge 16 commits into
Conversation
…dvisories (nteract#3576) Resolves the two high-severity pnpm audit findings — fast-uri path traversal (GHSA-q3j6-qgpj-74h6) and host confusion (GHSA-v39h-62p7-jpjc) — reached via @modelcontextprotocol/sdk -> ajv -> fast-uri. The SDK bump plus a transitive update moves the lockfile to fast-uri 3.1.2. pnpm audit --prod now reports 0 high (was 2 high).
…e connect, offline detection, liveness probe (nteract#3577) * fix(notebook-cloud): preserve painted projection across live-effect re-runs With IndexedDB seeding, the persisted snapshot paints cells before the live-room effect settles. The effect re-runs for reasons that are not a notebook switch (the mount-time session-status fetch replacing the session object identity, OIDC refreshes, manual retries), and its cleanup cleared the view-store projection unconditionally - a full -> empty -> full flicker on every page load. Reuse the desktop bootstrap-preservation gate: track the painted notebook identity in a ref, and skip the cleanup clear when the same notebook with visible cells is about to re-materialize in place. A real notebook switch still clears, and true unmount still clears via the mount-scoped effect. The helper is re-exported through notebook-surface so the cloud viewer keeps importing desktop code only via the public surface. * fix(notebook-cloud): keep app session identity stable across confirming fetches The mount-time /api/auth/session fetch installed a fresh session object even when it only confirmed the session the page was rendered with. That new object identity rippled through resolveSyncAuth into the live-room effect deps, tearing down and reconnecting the live room once per page load - the second trigger behind the field-reported flicker. Reduce the fetch result through nextCloudAppSessionReadyState: content- equal sessions keep the current session object, and a fully-confirming fetch returns the current state object unchanged so React bails out of the update entirely. * feat(notebook-cloud): detect offline and probe liveness on the cloud transport An OS-level offline window never fires WS close/error: the socket stays OPEN, sends buffer silently, and the only loss signal was the TCP retransmit abort minutes later - connectionStatus$ never flipped before connectivity returned, so the field report saw zero UI change. Two detection paths close that hole: - Browser offline event: a window 'offline' listener mirrors the online short-circuit and proactively recycles the current socket through connectionLost, flipping status to reconnecting immediately and fixing the zombie-socket-on-recovery hazard as well. - App-level liveness probe: after each cloud_room_ready the transport sends a text LIVENESS_PING every 20s and treats a pong missed within 10s as connection loss (covers upstream loss with the interface up, where 'offline' never fires). The room DO answers via CF-native setWebSocketAutoResponse - no DO wake, persists across hibernation - with a manual handleMessage fallback placed before the binary-only rejection so pings never count toward rejected-frame close. Probe cadence is injectable for tests, and the timers are unref'd so a connected transport cannot keep a Node test process alive. * fix(notebook-cloud): preserve runtime/output stores in the flicker gate; make switch-clear reachable Two review findings against the flicker gate: - The gate preserved the cell store but still ran resetRuntimeState() and resetRuntimeStoresProjection() unconditionally. CodeCell renders outputs and execution counts exclusively through the execution/output stores, so output-bearing notebooks kept flickering their dominant visual mass. The gate now keeps or clears every projected store together, exactly matching desktop's preserve path (useAutomergeNotebook beforeBootstrap); resetPoolState stays unconditional as on desktop. Stale-data safety is unchanged: the next materialization replaces cells wholesale and the cloud-owned id sweep still removes stale output/execution entries. - The cleanup's notebook-switch clause was structurally unreachable: the cleanup closes over its own run's config, so previous and next identity could never differ. Switch-clearing now lives where it is reachable — the NEXT effect run's body, before connecting, clears all projected stores iff the painted identity differs (desktop's beforeBootstrap placement) and drops the painted identity so later gates fail closed. The cleanup honestly reduces to "painted with visible cells => preserve", and true unmount clears everything via the mount-scoped effect. The shared gate lives in notebook-view-store-bridge as resetCloudProjectionUnlessPreserved, importing desktop code through a new headless notebook-surface-stores entry (same ./lib implementations as notebook-surface, no component/CSS imports) so the bridge stays loadable under node test runners. Tests paint real cells with outputs through the real projection path and assert cells, outputs, and execution pointers survive a preserved gate and clear on a switch; source pins cover the two session call sites that cannot run under node. * fix(notebook-cloud): count any inbound frame as liveness evidence; skip pong deadline in hidden tabs The probe cleared its pong deadline only on the exact LIVENESS_PONG text frame. The pong is an ordinary frame serialized into the same in-order WS stream as sync frames, so during a large post-reconnect sync over a slow link it queues behind the backlog and the fixed 10s deadline killed sockets that were demonstrably alive and actively delivering frames - recycling exactly the connections least able to afford the resync churn. handleMessage now treats ANY inbound message on the current socket as liveness evidence; the strict pong match remains only the probe's reply channel. A zombie socket delivers nothing, so detection is unchanged. Second false-positive path: background tabs throttle and suspend timers, so a frozen deadline could fire on resume before the queued pong MessageEvent dispatches, declaring a healthy connection dead on every tab foreground. The deadline is no longer armed while document.visibilityState === "hidden" (the simpler shape) - pings still flow, and the next visible-tab ping re-arms enforcement. New coverage: inbound-frame evidence with a true-zombie follow-up, the previously unreachable keep-original-deadline branch (deadline > interval overlap), a pong landing one tick before the deadline, ping send failure -> connectionLost, the livenessPingIntervalMs: 0 disable knob, a stale pong from a superseded socket not satisfying the new probe, hidden-tab deadline suppression with foreground re-arm, and the two uncovered offline windows (parked connectTarget no-op + online supersession, and pre-ready loss keeping status "connecting"). * test(notebook-cloud): pin hibernation re-arm and partial feature detection in the room Two mutants the arm test could not kill: moving the auto-response arming out of the constructor into a once-per-process path (hibernation wake IS a constructor run against the same durable state - re-arming must be an idempotent re-set of the same pair), and dropping the second conjunct of the feature detection (global WebSocketRequestResponsePair present but state.setWebSocketAutoResponse absent would TypeError in the constructor: a total room outage, not a degraded probe). * test(notebook-cloud): pin the session-status updater wiring via a rendered hook The session-identity reducer was thoroughly unit-tested, but its single consumer - the setState functional-updater callsite in useCloudAppSessionStatus - was unpinned: reverting it to a plain object setState kept every suite green while silently reintroducing the once-per-page-load live-room reconnect. Render the real hook with a stubbed readCloudAppSessionStatus and assert the session object reference survives content-equal confirming fetches (mount and manual refresh), renewed sessions are adopted, the no-initial-session path reaches ready, and fetch failures keep the existing session. Cloud viewer hooks need a DOM renderer, so the root vitest config gains an apps/notebook-cloud/viewer/__tests__ include; the rest of the cloud suite stays on node:test.
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* fix(cloud): apply connection actor to comms doc writes * test(cloud): find widget smoke sliders across output frames
…eract#3421) NotebookConnectionIdentity fills the reserved identity slot: the self actor's flattened avatar (shared actor projection, initials/icon only) paired with a connectivity dot — connectionStatus$'s first UI consumer. Runtime-state stores are deliberately not blanked while a transport reconnects, so the pulsing dot is what makes frozen kernel/execution chrome interpretable during the offline window. Hard quiet-chrome constraints from the pulled nteract#3273/nteract#3290/nteract#3337/nteract#3349 designs, enforced in component + tests: - renders NOTHING for a purely local desktop session (isRemoteNotebookContext: access.source !== 'local' or runtime.target.kind === 'runtime_peer') — conditionality is why nteract#3290 was pulled; - flat rounded-md border-border/70 bg-muted/35, never rounded-full+shadow; - icon-only at every width: no visible text pill; label and status live in sr-only copy and the title tooltip; - state expresses as dot color (statusTone vocabulary: emerald online, amber pulse connecting/reconnecting, muted offline) and wrapper opacity, never copy; - errors/reconnect prompts stay in the notices stack; connection state never rides the kernel status surface. The component takes a structural NotebookConnectionStatusSource (an rxjs-free Observable subset), so shared src/ gains no dependency; NotebookActorAvatar is exported from NotebookIdentity with an optional status-tone override to carry the connection dot. Desktop mount: trailingControls on <NotebookToolbar> in App.tsx -> identityControls at the right end of the command toolbar, driven by the host transport's connectionStatus$ (stable for the app's lifetime). A purely local session renders nothing, exactly as before nteract#3273.
… bridge (nteract#3421) CloudConnectionStatusBridge (live-sync.ts): one BehaviorSubject-backed connection-status source that stays stable for the session's lifetime. The transport survives transport-level reconnects, but initial-connect attempts and escalation teardowns still REPLACE it — the host facade's delegation captures whichever transport exists at subscribe time and never switches, so a subscriber could watch a dead object forever. The bridge attaches to each replacement transport via onTransportCreated, dedups across switches, and on escalation teardown detaches BEFORE the dispose and reports 'reconnecting' — the disposed transport's terminal 'offline' (manual-disconnect vocabulary) never surfaces while the session is retrying. useCloudViewerSession exposes connectionStatus$ from the bridge; notebook-viewer.tsx fills the formerly-null identityControls slot with NotebookConnectionIdentity driven by it. Tests: bridge replacement + dedup, teardown-retry masking, and the PR-2 reconnect loop reflected through one stable subscription; the viewer-render-props quiet-chrome regression now pins the new mount instead of identityControls={null}.
- Component name (NotebookConnectionIdentity), centralized
isRemoteNotebookContext gate, structural rxjs-free status source.
- Desktop mount via host transport; cloud mount via
CloudConnectionStatusBridge with teardown-retry masking semantics.
- The slot is what makes the PR-2 stale-chrome limitation interpretable:
the dot stays live through 'reconnecting'.
- Tests paragraph updated to the landed coverage, including the
identityControls={null} pin becoming a pin on the new mount.
…teract#3421) Per the PR-3 adversarial review: - The desktop dot was a constant: TauriTransport's connectionStatus$ initializes 'online' and the app never disconnects it, so the slot could never transition — emerald through a daemon restart, the exact window its doc comment promised to make interpretable. The mount now feeds createDesktopConnectionStatusSource, derived from the host facade's daemon lifecycle events (daemon:ready -> online with ready-cache backfill for late mounts; daemon:disconnected -> reconnecting, since the host auto-reconnects; daemon:unavailable -> offline). The IPC transport's own subject stays untouched and honest about IPC. - Scoped copy: connectionLabel names the link the source measures ('Daemon connection' on desktop) in the title/sr-only copy and the announcements, so the dot never asserts daemon<->room health it does not observe (recorded as future work). - getCurrent(): optional synchronous snapshot on the source contract; first paint shows the real status instead of a one-frame 'connecting' flash. - Live region: status CHANGES (never the initial value — the first source delivery is the baseline) are announced via a polite sr-only region using the scoped copy; sources dedup so flaps don't spam. - aria-hidden on the avatar so SR users hear the sr-only copy, not 'AL Alice — Connected'. Tests hardened per review: fixtures use valid runtime-target shapes (attached/ready + label); no-visible-text now clone-and-strips so wrapper-level text nodes are caught; flat-never-raised walks the whole subtree banning shadow tokens (rounded-full stays wrapper-scoped — Avatar internals are legitimately round) with classList token checks; unmount-unsubscribe pinned via listenerCount; new tests for getCurrent first paint, live-region change-only announcements, aria-hidden, and scoped copy. Desktop coverage added: daemon-lifecycle source unit tests, a composition test feeding REAL desktopNotebookShellCapabilities output into isRemoteNotebookContext, toolbar trailingControls flow with real projection fixtures, and a source-text pin on the App.tsx mount (daemon source + scoped label, never the IPC transport).
…contract (nteract#3421) - Auth-refresh gap: the live-room effect cleanup detached the bridge and the refreshing re-run early-returns without attaching, so the bridge served stale 'online' while the transport was fully disconnected. The cleanup now uses noteTeardownRetry() — the gap reads as 'reconnecting' (a transition), and the disposed transports' terminal 'offline' still never surfaces. Harmless at real unmount. - CloudConnectionStatusBridge implements the slot's source contract directly (subscribe + getCurrent), and the session exposes the bridge itself so first paint reads the synchronous snapshot. - Tests: plain detach() severance (no late emissions, no spurious status), the source contract (replay + snapshot + unsubscribe); viewer-render-props slot pin is now module-scoped via viewerFileContaining with bounded windows and pins capabilities={shellCapabilities} (no cross-corpus false-pass), plus source guardrails for the session wiring order: attach-on-transport-created and retry-before-dispose in all three teardown paths (scheduleReconnect, poison-pill discard, effect cleanup).
- Desktop is fed by the daemon-lifecycle source, not the IPC transport (whose status is constant in practice); copy scoped via connectionLabel; daemon<->room link health recorded as future work. - getCurrent first-paint snapshot, change-only live-region announcements, aria-hidden avatar. - Bridge cleanup semantics: teardown paths AND the effect cleanup report 'reconnecting' so the auth-refresh re-run gap never reads as stale 'online'. - Tests paragraph updated to the landed coverage.
…tices stack
The connection/identity slot stays an 8px dot by design, so a sustained
outage was only legible to someone watching the far right of the header.
Surface ONE quiet notices-stack line ("Reconnecting." / "Your edits are
kept locally and will sync when the connection returns.") once the live
room status has stayed "reconnecting" for 3s, and clear it on online.
The debounce lives in SustainedReconnectingTracker: reconnecting arms one
timer, sub-debounce flaps cancel it silently, repeated reconnecting
deliveries never re-arm or duplicate, and a replacement transport's
pre-handshake "connecting" neither arms nor clears. Transport-level
connectionError strings (socket drops, handshake timeouts, missed
liveness pongs, browser offline) now route through this debounced line
instead of the immediate per-drop "Live room needs attention." warning;
access and sign-in diagnostics keep their dedicated notices. The chip and
dot visuals are untouched.
…t noise The broad prefix match (/^cloud (sync|room) /) swallowed every transport-wrapped failure into the debounced calm "Reconnecting." line - including genuinely terminal ones. A mid-session auth/access revocation surfaces as wrapped connect-target or socket failures, and the actionable diagnostics path only runs pre-ready, so an affected user saw an infinite quiet reconnect with no sign-in CTA. The classifier now matches the transport's own link-loss reasons shape by shape, anchored (socket close/failure, browser offline, handshake timeout, liveness ping/pong misses). Connect-target resolution failures, socket creation failures, protocol decode failures, and the session escalation's "cloud room rejected frame: ..." keep the existing "Live room needs attention." warning with its action. Tests pin both routes, including the rendered notice for non-link failures.
…hook harness The tracker had thorough unit coverage but the hook itself was never executed, leaving its React wiring unpinned - dropping tracker.dispose() from the effect cleanup passed every suite. Render the real hook against a BehaviorSubject-shaped fake source: debounce boundary drives the returned flag, sub-debounce flaps stay silent, unmount unsubscribes and disposes the pending timer (timer count reaches zero), and a source identity change mid-outage pins the intended clear-then-re-arm-via-replay behavior.
Owner
Author
|
Promoted upstream (rebased onto main incl. nteract#3577/nteract#3579/nteract#3580): see the cross-fork PR on nteract/nteract. |
73eb1a2 to
65577bc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR 3 of the nteract#3421 stack (intra-fork: base is the nteract#3577 branch so the diff shows only this PR; promotes to nteract/nteract when nteract#3577 merges).
What this does
The first UI consumer of
connectionStatus$(nteract#3530): a quiet self-identity + connectivity slot in the upper-right, designed against the distilled rules from the three previously-pulled designs (nteract#3273/nteract#3290/nteract#3337/nteract#3349) — flat, icon-only, sr-only labels, state via dot tone/opacity, errors stay in the notices stack, nothing renders for a purely local desktop session (the nteract#3290 lesson, centralized asisRemoteNotebookContext).identityControlsslot, fed by a session-stable switching bridge (CloudConnectionStatusBridge) that follows replacement transports across initial-connect retries and escalation teardowns, masking a disposed transport's terminalofflineasreconnectingwhile the session retries.trailingControls, driven by real daemon lifecycle events (ready→online,disconnected→reconnecting,unavailable→offline) with copy scoped to the measured link ("Daemon connection: …") — the review caught that the transport's own status was a constant that could never transition. daemon↔room link health for runtime-peer contexts is recorded as ADR future work.reconnecting, one calm notices-stack line — "Reconnecting. Your edits are kept locally and will sync when the connection returns." — replacing the per-drop "Live room needs attention." for exact transport-minted link-loss shapes only (terminal auth/access failures keep the actionable diagnostic route). Flap-silent, single-fire, clears on recovery.Review
4-lens adversarial review (aesthetic conformance vs the pulled-design rules / bridge lifecycle / a11y+API / test rigor) + per-finding verification: 13 confirmed findings fixed; 12 refuted — notably the surface-split, kernel-dot-adjacency, and double-avatar concerns were all verified as consistent with the design authorities.
Verification
tsc --noEmit(both apps + root) andcargo xtask lintcleanDeployed for field QA via draft nteract#3570 (preview.runt.run).
🤖 Generated with Claude Code