Skip to content

fix(terminal): deterministic sessionId + orphan reaper (closes #217)#1375

Open
Jinwoo-H wants to merge 8 commits intomainfrom
Jinwoo-H/Jinwoo-H-investigate-terminal-history-loss
Open

fix(terminal): deterministic sessionId + orphan reaper (closes #217)#1375
Jinwoo-H wants to merge 8 commits intomainfrom
Jinwoo-H/Jinwoo-H-investigate-terminal-history-loss

Conversation

@Jinwoo-H
Copy link
Copy Markdown
Contributor

@Jinwoo-H Jinwoo-H commented May 3, 2026

Summary

  • Fixes Issue fix: Windows compatibility — CRLF line endings and cross-drive path traversal #217: terminal scrollback lost on restart when the persisted ptyId didn't make it to disk before shutdown. The renderer now derives the sessionId from (worktreeId, tabId, leafId) via SHA-256, so cold-restore reaches the same terminal-history/<encodedId>/ directory even when orca-data.json's ptyId slot is empty. Closes the SIGKILL-between-spawn-and-persist race.
  • Adds a startup orphan reaper with three stacked guards — keep set (deterministic re-derivation + verbatim ptyIds + remote session ids + live daemon ids), 30-day mtime grace, two-launch quarantine via reaper-seen.json sidecar. Safe by construction: a history dir whose owning pane still exists is never reaped, even after a transient keep-set miss.
  • Bumps PaneManager.nextPaneId past max(N) in persisted pane:N leafIds on replay so post-restart splits don't mint a paneId that collides with an existing deterministic-history dir.

Design

See DESIGN_DOC_TERMINAL_HISTORY_FIX.md (worktree-local) — Option B (deterministic sessionId) + separable reaper, approved.

Test plan

  • pnpm typecheck clean
  • pnpm vitest run src/main/daemon src/renderer/src/components/terminal-pane/layout-serialization.test.ts — 391 pass, 3 pre-existing shell-ready.test.ts failures unrelated to this PR (confirmed by stash test on HEAD before my changes)
  • New vitest coverage: pty-session-id (collision fixture, length-prefix boundary, fallback origins), orphan-history-reaper (keep-set / mtime / quarantine / edge-case matrix), layout-serialization (counter-advance via mock PaneManager)
  • E2E added: tests/e2e/terminal-restart-persistence.spec.ts — "scrollback survives quit with missing ptyId (Issue fix: Windows compatibility — CRLF line endings and cross-drive path traversal #217 regression)" — simulates the SIGKILL-before-persist race by mutating orca-data.json between two real Electron launches
  • Post-release monitoring: warn log on cold-restore-miss for explicit sessionId; reaper stats logged every startup
  • Manual smoke: quit → relaunch → scrollback present (expected default-path behavior)
  • Manual smoke: split 2 panes → quit → relaunch → both panes show their pre-quit scrollback

Made with Orca 🐋

Jinwoo-H and others added 8 commits May 4, 2026 16:24
…fId)

Why: Issue #217 loses terminal scrollback across unclean shutdowns
because the minted sessionId is random — the persist-sessionId write
races SIGKILL, so the post-restart PTY maps to a fresh history dir
instead of the previous one.

mintPtySessionId now hashes (tabId, leafId) when both are present so a
crash before persistence still lands on the same history dir after
restart. Falls back to randomUUID() when callers don't thread pane
identity (non-daemon-host spawns and legacy paths). The origin is
surfaced via mintPtySessionIdWithOrigin so ipc/pty can log
random-fallback at warn — a daemon-host spawn that hits the fallback
indicates a plumbing regression in the renderer.

Thread tabId/leafId from renderer through PtyTransport → preload →
pty:spawn IPC → daemon-pty-adapter fallback so all daemon-host spawns
produce deterministic ids.

See DESIGN_DOC_TERMINAL_HISTORY_FIX.md §Deterministic sessionId.

Co-authored-by: Orca <help@stably.ai>
…eplay

Why: PaneManager.nextPaneId is a per-instance monotonic counter that
resets to 1 when the app restarts. replayTerminalLayout rewrites the
persisted pane:N leafIds verbatim, so the next user-triggered split
would mint a leafId colliding with an existing one.

With the new deterministic sessionId path that collision matters
materially: both panes would derive the same (worktreeId, tabId, leafId)
triple and share one history directory, interleaving their scrollback.

Adds PaneManager.advanceNextPaneIdTo(min) (forward-only) and a strict
/^pane:(\d+)$/ parser that walks the replayed snapshot. Non-matching
leafIds (legacy or corrupted) are skipped rather than thrown on —
the counter advance is best-effort safety.

Co-authored-by: Orca <help@stably.ai>
…gates

Why: deterministic sessionIds make cold-restore work, but the history
tree no longer self-cleans — a pane closed permanently before its
workspace-session write landed leaks its dir forever. Without a reaper,
`terminal-history/` grows unboundedly.

The reaper is intentionally conservative. Three stacked guards protect
against erroneous deletion:

  1. Keep set: union of live daemon session ids, verbatim ptyIds on
     tabs / ptyIdsByLeafId / remoteSessionIdsByTabId, AND deterministic
     ids re-derived from every (worktreeId, tabId, leafId) triple
     reachable through the persisted layout snapshots. The last entry
     is the critical one — it protects history whose ptyId never got
     persisted due to a SIGKILL-before-persist.

  2. mtime gate (30 days): any dir whose on-disk mtime is newer than
     the grace window is kept regardless of keep-set membership.

  3. Two-launch quarantine: every orphan candidate is recorded in
     `reaper-seen.json` (sibling to `terminal-history/` under userData,
     so clearing history for recovery doesn't also erase the quarantine
     memory). Only reaped after >= 2 elapsed launches. A corrupt sidecar
     is treated as "nothing seen yet" — extra quarantine, never
     premature reap.

Wired as a sibling pass after initDaemonPtyProvider() in main/index.ts;
best-effort — failures are logged, never crash startup.

Co-authored-by: Orca <help@stably.ai>
…eaper

Cross-restart collision prevention:
- mintPtySessionId: same (worktreeId, tabId, leafId) produces same
  suffix; different tabIds or leafIds never collide; missing either
  field falls back to random (no silent "empty-input" collision).
- mintPtySessionIdWithOrigin: reports deterministic / explicit /
  random-fallback so the ipc/pty log-level split is testable.

Pane-counter advance (replayTerminalLayout):
- Bumps the counter past max(persisted N).
- Skips non-matching leafIds (legacy/corrupt snapshots) without
  throwing.
- No-ops when the snapshot has no root.

Orphan reaper:
- Keep-set coverage: live daemon ids, verbatim ptyIds,
  ptyIdsByLeafId, deterministic derivation from
  (worktreeId, tabId, leafId), and arbitrary SSH remote ids.
- mtime gate: dirs newer than REAP_MTIME_GRACE_MS never reaped.
- Two-launch quarantine: reap only after >= 2 elapsed launches after
  first orphan sighting; keep-set re-hit resets the clock; corrupt
  sidecar => extra quarantine, never premature reap.
- No-ops on fresh install (no history dir); ignores malformed
  directory names.

Co-authored-by: Orca <help@stably.ai>
Why: a plain NUL separator is insufficient. `hash("a\0b" + "\0" + "c")`
produces the same digest as `hash("a" + "\0" + "b\0c")` because
createHash.update is a streaming append — the NUL byte is not a
boundary, just another byte. Today's inputs never contain NUL (tabIds
are UUIDs, leafIds are `pane:<N>`), so no real collision exists, but
the design doc §7.1 calls this property out explicitly and
length-prefixing is the cheapest unambiguous encoding.

Tests added: distinguish (tabId="a\0b", leafId="c") from
(tabId="a", leafId="b\0c"); distinguish ("ab", "cd") from ("a", "bcd");
1000-pair no-collision fixture to catch broken hashing impls.

Co-authored-by: Orca <help@stably.ai>
Why (design doc §9): surface the signal that matters after shipping
the deterministic-sessionId fix. Two log sites added:

- daemon-pty-adapter.ts: warn when a spawn with an explicit sessionId
  produced a new daemon session AND no disk history was found — that
  means either the reaper ate the dir, or the persisted mapping
  pointed at a session that never wrote a checkpoint. Shouldn't happen
  under the defensive keep-set derivation; worth watching.

- main/index.ts: log the reaper's ReapStats at each startup pass so
  we can tell whether it's firing at all and whether the three gates
  (keep set, mtime grace, quarantine) are in expected ratios.

E2E test added (§7.2): Issue #217's load-bearing scenario — scrollback
survives a quit where the persisted orca-data.json lacks `ptyId`. We
simulate the SIGKILL-before-persist race by mutating orca-data.json
between launches (delete ptyId, leave terminal-history/ dir intact).
Pre-fix code couldn't reach the dir; post-fix code re-derives the id
via mintPtySessionId(worktreeId, tabId, leafId) and cold-restores.

Co-authored-by: Orca <help@stably.ai>
Why: the 2-pane e2e locks down the distinct-leafId derivation arm
(single-pane alone would miss a regression where every pane collapses
to the tab-level random fallback). The cold-restore-miss warn (design
doc §9) is verified at unit level against DaemonPtyAdapter.doSpawn —
Playwright's _electron.launch only reliably captures a subset of
main-process console output, which made an e2e-scale repro flaky even
though the production code path fires correctly.

- tests/e2e/terminal-restart-persistence.spec.ts: new "2-pane split"
  test; bootstrapRestoredLaunch now accepts expectedPaneCount.
- src/main/daemon/daemon-pty-adapter.test.ts: two new unit tests —
  warn fires when explicit sessionId lands on a fresh session with no
  disk history; warn does NOT fire on the happy path (history present).

Co-authored-by: Orca <help@stably.ai>
Why: discoverActivePtyId broadcasts a marker to every candidate pty and
reads the active pane's xterm buffer. Right after a split the new pane's
PTY isn't yet registered in ptyIdsByTabId on slower machines (CI), so
the probe times out. Poll the store for ptyIdsByTabId[tabId].length >= 2
instead and identify the second ptyId as the one not equal to the first.
Passes locally in 4.9s and no longer depends on xterm-buffer timing after
a layout change.

Co-authored-by: Orca <help@stably.ai>
@Jinwoo-H Jinwoo-H force-pushed the Jinwoo-H/Jinwoo-H-investigate-terminal-history-loss branch from a6697fc to e159aaf Compare May 4, 2026 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant