Skip to content

fix(tui): cap live step trace to prevent OOM on long sprint runs#104

Merged
lukas-grigis merged 1 commit into
mainfrom
fix/tui-oom-steps-cap
May 6, 2026
Merged

fix(tui): cap live step trace to prevent OOM on long sprint runs#104
lukas-grigis merged 1 commit into
mainfrom
fix/tui-oom-steps-cap

Conversation

@lukas-grigis
Copy link
Copy Markdown
Owner

Summary

  • Cap steps state in execute-view.tsx at 200 entries (FIFO eviction).
  • Slice StepTrace render to last 50 with an elision row above for the rest.
  • Add regression tests covering both the cap and the under-cap path.

Why

After ~15h of ralphctl sprint start Node V8 hit Ineffective mark-compacts near heap limit Allocation failed and aborted at ~4 GB. The native stack trace's smoking gun was Builtins_ArrayPrototypeReverse fired from uv__run_checkEnvironment::CheckImmediate — the only .reverse() reachable from React's commit phase is Ink's [...node.childNodes].reverse() in render-node-to-output.js:42.

Tracing the React tree pointed at one consumer: execute-view.tsx's steps state subscriber appended every step event without a cap, and StepTrace rendered steps.map(...) without a slice. A 200-task sprint × ~10 inner leaves per task yielded thousands of entries; combined with 80–120 ms spinner heartbeats in spinner.tsx / header-heartbeat.tsx / task-execution-list.tsx / step-trace.tsx, Ink reconciled and re-reverse()-ed the giant child list every tick.

Capping at the consumer (not in kernel/) preserves the canonical chain trace asserted by every <flow>-flow.test.ts step-order test — the leak is renderer allocation churn, not data retention.

Test plan

  • pnpm typecheck
  • pnpm lint
  • pnpm test — 232 files / 2240 tests pass
  • New tests in step-trace.test.tsx: 5000-entry cap and 10-entry under-cap path
  • Manual: pnpm dev sprint start against a long sprint, watch RSS plateau instead of growing

- Cap `steps` state in execute-view at 200 entries (FIFO eviction).
- Slice StepTrace render to last 50 with an elision row for the rest.
- Add regression tests covering both the cap and the under-cap path.

Long `sprint start` runs (200+ tasks × ~10 leaves each) grew the live
step list into the thousands. Combined with 80-120ms spinner heartbeats,
Ink's per-render `[...childNodes].reverse()` allocated a fresh array
per tick and OOMed Node after ~15h.
@lukas-grigis lukas-grigis merged commit 0b6eece into main May 6, 2026
1 check passed
@lukas-grigis lukas-grigis deleted the fix/tui-oom-steps-cap branch May 6, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant