fix(terminal): deterministic sessionId + orphan reaper (closes #217)#1375
Open
fix(terminal): deterministic sessionId + orphan reaper (closes #217)#1375
Conversation
5 tasks
…fId) Why: Issue #217 loses terminal scrollback across unclean shutdowns because the minted sessionId is random — the persist-sessionId write races SIGKILL, so the post-restart PTY maps to a fresh history dir instead of the previous one. mintPtySessionId now hashes (tabId, leafId) when both are present so a crash before persistence still lands on the same history dir after restart. Falls back to randomUUID() when callers don't thread pane identity (non-daemon-host spawns and legacy paths). The origin is surfaced via mintPtySessionIdWithOrigin so ipc/pty can log random-fallback at warn — a daemon-host spawn that hits the fallback indicates a plumbing regression in the renderer. Thread tabId/leafId from renderer through PtyTransport → preload → pty:spawn IPC → daemon-pty-adapter fallback so all daemon-host spawns produce deterministic ids. See DESIGN_DOC_TERMINAL_HISTORY_FIX.md §Deterministic sessionId. Co-authored-by: Orca <help@stably.ai>
…eplay Why: PaneManager.nextPaneId is a per-instance monotonic counter that resets to 1 when the app restarts. replayTerminalLayout rewrites the persisted pane:N leafIds verbatim, so the next user-triggered split would mint a leafId colliding with an existing one. With the new deterministic sessionId path that collision matters materially: both panes would derive the same (worktreeId, tabId, leafId) triple and share one history directory, interleaving their scrollback. Adds PaneManager.advanceNextPaneIdTo(min) (forward-only) and a strict /^pane:(\d+)$/ parser that walks the replayed snapshot. Non-matching leafIds (legacy or corrupted) are skipped rather than thrown on — the counter advance is best-effort safety. Co-authored-by: Orca <help@stably.ai>
…gates
Why: deterministic sessionIds make cold-restore work, but the history
tree no longer self-cleans — a pane closed permanently before its
workspace-session write landed leaks its dir forever. Without a reaper,
`terminal-history/` grows unboundedly.
The reaper is intentionally conservative. Three stacked guards protect
against erroneous deletion:
1. Keep set: union of live daemon session ids, verbatim ptyIds on
tabs / ptyIdsByLeafId / remoteSessionIdsByTabId, AND deterministic
ids re-derived from every (worktreeId, tabId, leafId) triple
reachable through the persisted layout snapshots. The last entry
is the critical one — it protects history whose ptyId never got
persisted due to a SIGKILL-before-persist.
2. mtime gate (30 days): any dir whose on-disk mtime is newer than
the grace window is kept regardless of keep-set membership.
3. Two-launch quarantine: every orphan candidate is recorded in
`reaper-seen.json` (sibling to `terminal-history/` under userData,
so clearing history for recovery doesn't also erase the quarantine
memory). Only reaped after >= 2 elapsed launches. A corrupt sidecar
is treated as "nothing seen yet" — extra quarantine, never
premature reap.
Wired as a sibling pass after initDaemonPtyProvider() in main/index.ts;
best-effort — failures are logged, never crash startup.
Co-authored-by: Orca <help@stably.ai>
…eaper Cross-restart collision prevention: - mintPtySessionId: same (worktreeId, tabId, leafId) produces same suffix; different tabIds or leafIds never collide; missing either field falls back to random (no silent "empty-input" collision). - mintPtySessionIdWithOrigin: reports deterministic / explicit / random-fallback so the ipc/pty log-level split is testable. Pane-counter advance (replayTerminalLayout): - Bumps the counter past max(persisted N). - Skips non-matching leafIds (legacy/corrupt snapshots) without throwing. - No-ops when the snapshot has no root. Orphan reaper: - Keep-set coverage: live daemon ids, verbatim ptyIds, ptyIdsByLeafId, deterministic derivation from (worktreeId, tabId, leafId), and arbitrary SSH remote ids. - mtime gate: dirs newer than REAP_MTIME_GRACE_MS never reaped. - Two-launch quarantine: reap only after >= 2 elapsed launches after first orphan sighting; keep-set re-hit resets the clock; corrupt sidecar => extra quarantine, never premature reap. - No-ops on fresh install (no history dir); ignores malformed directory names. Co-authored-by: Orca <help@stably.ai>
Why: a plain NUL separator is insufficient. `hash("a\0b" + "\0" + "c")`
produces the same digest as `hash("a" + "\0" + "b\0c")` because
createHash.update is a streaming append — the NUL byte is not a
boundary, just another byte. Today's inputs never contain NUL (tabIds
are UUIDs, leafIds are `pane:<N>`), so no real collision exists, but
the design doc §7.1 calls this property out explicitly and
length-prefixing is the cheapest unambiguous encoding.
Tests added: distinguish (tabId="a\0b", leafId="c") from
(tabId="a", leafId="b\0c"); distinguish ("ab", "cd") from ("a", "bcd");
1000-pair no-collision fixture to catch broken hashing impls.
Co-authored-by: Orca <help@stably.ai>
Why (design doc §9): surface the signal that matters after shipping the deterministic-sessionId fix. Two log sites added: - daemon-pty-adapter.ts: warn when a spawn with an explicit sessionId produced a new daemon session AND no disk history was found — that means either the reaper ate the dir, or the persisted mapping pointed at a session that never wrote a checkpoint. Shouldn't happen under the defensive keep-set derivation; worth watching. - main/index.ts: log the reaper's ReapStats at each startup pass so we can tell whether it's firing at all and whether the three gates (keep set, mtime grace, quarantine) are in expected ratios. E2E test added (§7.2): Issue #217's load-bearing scenario — scrollback survives a quit where the persisted orca-data.json lacks `ptyId`. We simulate the SIGKILL-before-persist race by mutating orca-data.json between launches (delete ptyId, leave terminal-history/ dir intact). Pre-fix code couldn't reach the dir; post-fix code re-derives the id via mintPtySessionId(worktreeId, tabId, leafId) and cold-restores. Co-authored-by: Orca <help@stably.ai>
Why: the 2-pane e2e locks down the distinct-leafId derivation arm (single-pane alone would miss a regression where every pane collapses to the tab-level random fallback). The cold-restore-miss warn (design doc §9) is verified at unit level against DaemonPtyAdapter.doSpawn — Playwright's _electron.launch only reliably captures a subset of main-process console output, which made an e2e-scale repro flaky even though the production code path fires correctly. - tests/e2e/terminal-restart-persistence.spec.ts: new "2-pane split" test; bootstrapRestoredLaunch now accepts expectedPaneCount. - src/main/daemon/daemon-pty-adapter.test.ts: two new unit tests — warn fires when explicit sessionId lands on a fresh session with no disk history; warn does NOT fire on the happy path (history present). Co-authored-by: Orca <help@stably.ai>
Why: discoverActivePtyId broadcasts a marker to every candidate pty and reads the active pane's xterm buffer. Right after a split the new pane's PTY isn't yet registered in ptyIdsByTabId on slower machines (CI), so the probe times out. Poll the store for ptyIdsByTabId[tabId].length >= 2 instead and identify the second ptyId as the one not equal to the first. Passes locally in 4.9s and no longer depends on xterm-buffer timing after a layout change. Co-authored-by: Orca <help@stably.ai>
a6697fc to
e159aaf
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ptyIddidn't make it to disk before shutdown. The renderer now derives the sessionId from(worktreeId, tabId, leafId)via SHA-256, so cold-restore reaches the sameterminal-history/<encodedId>/directory even whenorca-data.json'sptyIdslot is empty. Closes the SIGKILL-between-spawn-and-persist race.reaper-seen.jsonsidecar. Safe by construction: a history dir whose owning pane still exists is never reaped, even after a transient keep-set miss.PaneManager.nextPaneIdpastmax(N)in persistedpane:NleafIds on replay so post-restart splits don't mint a paneId that collides with an existing deterministic-history dir.Design
See
DESIGN_DOC_TERMINAL_HISTORY_FIX.md(worktree-local) — Option B (deterministic sessionId) + separable reaper, approved.Test plan
pnpm typecheckcleanpnpm vitest run src/main/daemon src/renderer/src/components/terminal-pane/layout-serialization.test.ts— 391 pass, 3 pre-existingshell-ready.test.tsfailures unrelated to this PR (confirmed by stash test on HEAD before my changes)pty-session-id(collision fixture, length-prefix boundary, fallback origins),orphan-history-reaper(keep-set / mtime / quarantine / edge-case matrix),layout-serialization(counter-advance via mock PaneManager)tests/e2e/terminal-restart-persistence.spec.ts— "scrollback survives quit with missing ptyId (Issue fix: Windows compatibility — CRLF line endings and cross-drive path traversal #217 regression)" — simulates the SIGKILL-before-persist race by mutating orca-data.json between two real Electron launchesMade with Orca 🐋