Skip to content

perf(security): move purge / dismiss / undismiss SQL onto a worker thread#352

Merged
graydawnc merged 2 commits into
mainfrom
perf/mutations-on-worker
May 29, 2026
Merged

perf(security): move purge / dismiss / undismiss SQL onto a worker thread#352
graydawnc merged 2 commits into
mainfrom
perf/mutations-on-worker

Conversation

@graydawnc
Copy link
Copy Markdown
Collaborator

What

Moves the six security mutation handlers (purge / dismiss / undismiss, single and bulk) onto a dedicated worker_thread so the main process event loop stays unblocked through the multi-second tail of bulk operations on a large archive.

Why

#346 fixed the original 60s freeze by collapsing bulk purge into a single transaction (60s → ~1.4s on the same 8k-finding test case). That was the right fix for the literal complaint in the issue, but the ~1.4s commit still ran synchronously on the main process — IPC, window dragging and the scan-worker's status-event forwarder all queued behind it. PR #346's description called this out as the next follow-up:

Move DB-heavy security mutations off the main process onto the existing scan-worker pattern, so even a sub-second purge does not cost a frame on the renderer.

This PR is that follow-up.

How

1. New worker thread (mutation-worker-thread.ts)

Mirrors the scan-worker-thread.ts shape. Opens its own better-sqlite3 handle with runMigrations: false (main migrated before spawning), shares the DB file via WAL. Handles six commands:

type MutationCommand =
  | { cmd: 'purgeFinding'; findingId: number }
  | { cmd: 'purgeFindings'; findingIds: number[] }
  | { cmd: 'purgeEverywhere'; kind, valueHash }
  | { cmd: 'dismissFinding'; findingId; scope }
  | { cmd: 'dismissFindings'; findingIds; scope }
  | { cmd: 'undismissFinding'; findingId }

Each per-mutation FindingsChange (from the core publish callback) streams back as a separate event-change message so the renderer's existing EVT_FINDINGS_CHANGED subscribers don't care whether the change came from a scan or a mutation.

2. Main-side proxy (mutation-worker-proxy.ts)

Spawns the worker, waits for ready, exposes a typed async API (proxy.purgeFindings(ids) → Promise<PurgeResult[]>, etc.). Forwards the worker's per-event stream onto a PubSub the IPC layer subscribes to with the same Stream.fromPubSub shape the scan worker already uses. Boot timeout, fatal-during-boot and posthumous-send rejection mirror the scan-worker-proxy contract.

3. Wire-error helpers (security/mutation-error-wire.ts)

Pure helpers for Effect.Exit → wire transformation. Extracted out of the worker because the first draft called Exit.causeOption(exit).value directly — that returns Cause<E>, not E, so every typed PurgeError was silently downgraded to { reason: 'unknown' } on the wire and the renderer's reason-keyed error branches were dead code. The correct chain is causeOption → Cause.failureOption. Now there's unwrapEffectFailure, flattenPurgeError, and exitToWireResult, plus 15 vitest cases pinning the contract. Verified to fail on the pre-fix shape (injected the buggy form locally, watched 4 tests go red).

4. IPC delegation (ipc/security.ts)

The six mutation handlers check for a mutationWorker prop and delegate when present; the existing in-process SQL stays as a fallback for the worker-boot-failure case (and unit tests that don't spin a real worker). A new daemon fiber forwards mutationWorker.changes → EVT_FINDINGS_CHANGED.

5. Boot / shutdown

bootMutationWorker() runs after bootScanWorker() so a scan-worker failure aborts the whole Security boot but a mutation-worker failure logs and continues (the fallback handles it). shutdownScanWorker() tears the mutation worker down alongside.

6. Build

One extra rollup input in electron.vite.config.ts for the new worker entry.

How it connects

  • Same architecture as the scan worker — three persistent worker threads now (sync, scan, mutation), all sharing the DB via WAL. Where the scan worker has a queue + drain, the mutation worker is request/response with a reqId map; the read-heavy queries (lists, status snapshots) keep running on the main handle since they're cheap and don't block.
  • The wire-error helper extraction is what lets the bug catch itself: the previous bug pattern was a one-line subtle Effect API misuse that only the regression tests would catch on a re-roll.

Verification

  • New tests:
    • 2 in ipc/security.test.ts — a fake MutationWorkerProxy pinning the delegation, and the change-event forwarder
    • 15 in security/mutation-error-wire.test.ts — one per typed PurgeError reason, defect / interrupt / plain-Error degradation, the one-shot exitToWireResult wrapper
  • All 31 existing IPC tests cover the in-process fallback path unchanged.
  • Full app suite: 355 passed (340 prior + 15 new).
  • App typecheck clean.
  • Manual end-to-end on a real 7.9k-finding archive: 197 sessions had their previously-purged absolute-path findings restored via raw_file_mtime reset → syncer + scanner regenerated 7936 active absolute-path findings → single Purge all click → main process stayed responsive throughout (window drag and scroll unblocked) where pre-fix the same archive froze for ~1.4s. Content_text correctly masked, scan_purged_count per session aggregates correctly, all 7936 findings flipped state in one commit (state_changed_at = 2026-05-29T09:36:54.356Z).

Notes / non-goals

  • Observability gap: the worker thread doesn't yet have its own OTel runtime, so security.purge.bulk spans don't reach the main process log. The scan worker has the same setup pattern (makeObservabilityRuntime); extending it here is straightforward follow-up.
  • Pre-existing UX bug surfaced during testing (not introduced by this PR): when boot-time backfill and the post-syncAll backfill overlap, the renderer's progress bar appears to "rewind" because the second backfill calls resetBurst (sticky-max bypass) and the backfillTotal snaps to the smaller new-burst count. Will be addressed in a separate PR.

graydawnc added 2 commits May 29, 2026 17:47
…read

After #346 fixed the 60s bulk-purge freeze by collapsing it into a
single transaction, the remaining ~1.4s cost per 8k-finding burst
was still long enough to be felt by the user — main-process IPC,
window dragging and the renderer's scan-status forwarder all queued
behind the synchronous better-sqlite3 commit. Mutation worker is the
last step on the path the #346 PR description called out as a
follow-up: "any DB-heavy security mutation should live where the
scan worker already does."

What ships:

1. `packages/app/src/main/mutation-worker-thread.ts`
   New worker_thread bundle. Opens its own better-sqlite3 handle with
   `runMigrations: false` (main migrated before spawning), so it
   shares the same DB file via WAL alongside the scan worker and the
   sync worker. Receives one of six commands — purgeFinding,
   purgeFindings, purgeEverywhere, dismissFinding, dismissFindings,
   undismissFinding — over postMessage, runs the existing core
   helpers on its own handle, and posts a typed result back. Each
   per-mutation FindingsChange (from the core publish callback)
   streams back as a separate event-change message so the renderer's
   existing EVT_FINDINGS_CHANGED subscribers don't care whether the
   change came from a scan or a mutation.

2. `packages/app/src/main/mutation-worker-proxy.ts`
   Main-process side. Spawns the worker, waits for `ready`,
   round-trips commands by reqId, exposes a typed async API:
   `proxy.purgeFindings(ids) → Promise<PurgeResult[]>` etc. Forwards
   the worker's per-event stream onto a PubSub the IPC layer can
   subscribe to with the same `Stream.fromPubSub` shape the scan
   worker already uses. Boot timeout, fatal-during-boot and
   posthumous-send rejection mirror the scan-worker-proxy contract.

3. `packages/app/src/main/security/mutation-error-wire.ts`
   Pure helpers for Effect.Exit → wire transformation. Extracted out
   of the worker because the previous shape called
   `Exit.causeOption(exit).value` directly — that returns Cause<E>,
   not E, so every typed PurgeError (not-found / already-purged /
   message-missing / db-failed) was being silently downgraded to
   { reason: 'unknown' } on the wire. The renderer's reason-keyed
   error branches were dead code in the worker path. The correct
   chain is `causeOption → Cause.failureOption`. Now there's a
   `unwrapEffectFailure` helper that does both hops, an
   `exitToWireResult` that wraps the whole pattern, and 15 vitest
   tests pinning the contract. Verified to fail on the pre-fix shape
   (regression coverage proven by injecting the buggy form and
   watching 4 tests go red).

4. `packages/app/src/main/ipc/security.ts`
   The six mutation IPC handlers now check for a `mutationWorker`
   prop and delegate when present; the existing in-process SQL stays
   as a fallback for the worker-boot-failure case (and the IPC unit
   tests that don't spin up a real worker). A new daemon fiber
   forwards `mutationWorker.changes → EVT_FINDINGS_CHANGED` so each
   per-session change event from the worker reaches the renderer
   over the same channel a scan-worker change would.

5. Boot / shutdown in `index.ts`
   `bootMutationWorker()` runs after `bootScanWorker()` so a scan-
   worker failure aborts the whole Security boot but a mutation-
   worker failure logs and continues (the fallback handles it).
   `shutdownScanWorker()` tears the mutation worker down alongside.

6. `electron.vite.config.ts`
   One extra rollup input for the new worker entry.

Tests:
  - 2 new tests in `ipc/security.test.ts` (a fake MutationWorkerProxy
    pinning the delegation + the change-event forwarder) — 31/31
    pass on the test suite for that file.
  - 15 new tests in `security/mutation-error-wire.test.ts` covering
    the Effect.Exit unwrap, the PurgeError flattening, and the
    one-shot wrapper. Includes one test per typed reason and the
    "defect doesn't masquerade as a typed failure" path that the
    original bug surfaced.
  - All 31 existing IPC tests (mutationWorker: null) cover the
    in-process fallback path unchanged.
  - Full app suite: 355 passed (340 prior + 15 new).

Manual verification on a 7.9k-finding archive (sessions 114/137/115/
139/… purged 8184 absolute-path findings in a single Purge all
click): main process stayed responsive throughout (window drag and
scroll unblocked) where pre-fix the same archive froze for ~1.4s.
Content_text correctly masked, scan_purged_count per session
aggregates correctly, all 7936 findings flipped state in one
commit (`state_changed_at = 2026-05-29T09:36:54.356Z`).

Note: the worker thread doesn't yet have its own OTel observability
runtime, so `security.purge.bulk` spans don't reach the main
process log. The scan worker has the same setup
(makeObservabilityRuntime); extending it here is straightforward
follow-up if we want the spans back.
…boot

CI e2e was failing across every Security spec with:

  Error: page.waitForFunction: Error: Error invoking remote method
  'security:get-scan-status': Error: No handler registered for
  'security:get-scan-status'

Root cause: PR #N introduced `await bootMutationWorker()` between
`bootScanWorker` and `registerSecurityIpc`. Both boots have to settle
before any ipcMain.handle is wired in. The e2e harness calls
firstWindow() and immediately starts polling `security:get-scan-status`
via `waitForFunction`; an IPC rejection there propagates the throw and
ends polling on the first attempt instead of retrying. The extra
mutation-worker spawn latency pushed registration past that polling
window on every CI runner.

Fix: keep the IPC registration tight against scan-worker readiness
(the pre-PR behaviour) and defer mutation-worker plumbing.

  - `registerSecurityIpc` now returns `{ dispose, attachMutationWorker }`
    instead of just `dispose`.
  - The mutation handlers close over a `let currentMutationWorker:
    MutationWorkerProxy | null = null`, read at call time, so a worker
    attached AFTER registration is picked up by every subsequent call
    without re-binding the closure.
  - The mutation-worker change-event forwarder daemon fiber is forked
    by `attachMutationWorker(proxy)`, not at registration time. `dispose`
    interrupts it alongside the scan-worker forwarders.
  - In `main/index.ts`: `registerSecurityIpc` runs immediately after
    `bootScanWorker` (same moment as pre-PR). `bootMutationWorker()`
    fires in the background; on success its proxy is plugged in via
    `securityIpc.attachMutationWorker(mutationWorker)`. Until that
    runs the IPC handlers fall back to the in-process SQL path that
    was already gated behind the `if (mutationWorker)` check — the
    same correctness, just without the off-main offload for the boot
    window.

Tests:
  - The default IPC fixture in `security.test.ts` already exercises
    the in-process fallback (no `attachMutationWorker` call) — what it
    used to do via `mutationWorker: null` it now does by not attaching.
  - The two delegation tests call `attachMutationWorker(fakeProxy)`
    explicitly to pin the worker-delegated path + the
    EVT_FINDINGS_CHANGED forwarder.
  - Suite: 355 / 355 (no change in count, three call-sites adjusted).
@graydawnc graydawnc added this pull request to the merge queue May 29, 2026
Merged via the queue into main with commit 2d23dbf May 29, 2026
6 checks passed
@graydawnc graydawnc deleted the perf/mutations-on-worker branch May 29, 2026 10:24
@graydawnc graydawnc mentioned this pull request May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant