fix(screenshot): wait for paint via Page.lifecycleEvent#7
Conversation
RapierCraft
left a comment
There was a problem hiding this comment.
PR Review: #7 — fix(screenshot): wait for paint via Page.lifecycleEvent
Verdict: CHANGES REQUESTED
Reviewed commit: HEAD | Scope: src/cdp-client.ts | Depends on: PR #6
Findings
| # | Finding | Severity | Confidence |
|---|---|---|---|
| 1 | Listener leak in waitForLifecycle: (this.client as any)?.removeListener?.('Page.lifecycleEvent', listener) is not the correct API for chrome-remote-interface. The CRI library uses .off() or the event-specific deregistration pattern, not .removeListener(). The listener registered via .on('Page.lifecycleEvent', listener) is never actually removed, leaking a listener on every screenshot that waits for FCP. |
HIGH | CONFIRMED |
| 2 | Race condition in waitForLifecycle: The synchronous cache check (for...of frameLifecycle.values()) happens before the listener is registered. If FCP fires in that gap, neither the cache check nor the listener catches it, causing a needless 2s timeout. Fix: register listener first, then check cache. |
MEDIUM | CONFIRMED |
| 3 | No cleanup in disconnect(): frameLifecycle map and lifecycleListener are not cleared or deregistered when disconnecting. Map grows unbounded across reconnections to different tabs. |
MEDIUM | CONFIRMED |
| 4 | waitForLifecycle is public with no documented contract — should be private since it's an internal implementation detail |
LOW | CONFIRMED |
| 5 | Inherits PR #6's || vs && viewport check issue (see PR #6 review) |
HIGH | CONFIRMED |
Recommendation
Do not merge until findings #1-3 are resolved:
- #1: Use
(this.client as any).off('Page.lifecycleEvent', listener)or store a reference and use CRI's deregistration API - #2: Register the
.on()listener before checking the cache, or use a single code path - #3: Clear
frameLifecycleand removelifecycleListenerin thedisconnect()method
This PR depends on #6 — merge #6 first (after its fix), then rebase this.
Review complete. 1 agent (security + infrastructure). 3 blocking findings.
|
Hey @madisonrickert — love the lifecycle-based approach, it's how Lighthouse/Puppeteer do it and is much better than readyState polling. Three issues to fix before merging: 1. Listener leak in try { (this.client as any)?.removeListener?.('Page.lifecycleEvent', listener); } catch { /* ignore */ }
2. Race condition (MEDIUM) The synchronous cache check runs before the listener is registered: // 1. check cache — FCP not found
for (const { events } of this.frameLifecycle.values()) { ... }
// ← FCP fires HERE, updates frameLifecycle, but listener isn't registered yet
// 2. register listener — misses FCP, waits full 2s timeout
return new Promise<boolean>((resolve) => { ... });Fix: register the listener first, then check the cache inside the promise. 3. No cleanup in
this.frameLifecycle.clear();
this.lifecycleListener = null;Once these are addressed, this is ready to merge (after #6 lands first since this depends on it). |
Page.captureScreenshot can stall for ~2 minutes when the connected target reports a 0x0 layout viewport — observed when the MCP attaches to the Perplexity sidecar panel (visibilityState='hidden') in some Comet window states (cold launch, no real browsing tabs in front). Without captureBeyondViewport, captureScreenshot captures the layout viewport and waits for compositor frames that never arrive. When Page.getLayoutMetrics reports a zero-size viewport, supply an explicit clip + captureBeyondViewport=true so the renderer produces a frame immediately. No-op on real foreground tabs (where the layout viewport is non-zero, so the clip path doesn't activate). Closes RapierCraft#2.
…bility Pulls the screenshot-with-fallback logic out of the CometCDPClient method body and into an exported free function backed by a minimal ScreenshotPageAPI interface — same seam pattern as CometAIClient, so unit tests can substitute a hand-written fake page without a live CDP connection. Also promotes the inline 1280x800 fallback clip to a named SCREENSHOT_FALLBACK_CLIP constant, with the explanatory comment attached to the constant rather than buried mid-method. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers: - normal viewport (1024x768) -> no clip; format forwarded as-is - cssLayoutViewport missing -> falls back to layoutViewport - jpeg format forwarded - bringToFront is invoked before capture (best-effort) - 0x0, 0xN, Nx0, both-fields-absent viewports -> fallback clip + captureBeyondViewport=true - getLayoutMetrics throws -> fallback clip - bringToFront throws -> swallowed; capture still runs - captureScreenshot returns empty data -> throws descriptive error The 0xN / Nx0 cases document the wrapper's defensive || semantics: any zero dimension triggers the fallback, on the basis that a 0-pixel screenshot is useless regardless of whether the same compositor stall reproduces. The hang itself is only confirmed for 0x0 in live testing; the wrapper-logic tests here don't claim otherwise. Adds tests/unit/fakes/fake-screenshot-page.ts following the existing FakeCdpClient style (records calls, programmable responses). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h dims zero Switches the screenshot-fallback condition from `||` to `&&` so the fallback clip only activates on a strict 0x0 layout viewport — the only viewport state we have evidence for actually stalling Page.captureScreenshot. Live CDP testing against a running Comet instance: - Sidecar (visibilityState=hidden, cold launch): natural 0x0 viewport; naked Page.captureScreenshot timed out at 15s; the wrapper completed in ~1s with valid PNG data. - Forced 0x0 via Emulation.setDeviceMetricsOverride on a normal page: Chromium silently rejected the zero values and reported the natural viewport (1024x768) instead of 0x0. - Forced 0xN and Nx0 via Emulation.setDeviceMetricsOverride: Chromium rejected the single-zero dimension the same way, substituting the natural value for the missing axis. So a 0xN or Nx0 layout viewport is not a state we can observe in practice. The `||` form was defensive against a hypothetical that doesn't appear to be reachable; `&&` matches the predicate to the evidence and avoids guessing on inputs we can't generate. Unit-test expectations for the 0xN / Nx0 boundary cases move from "fallback clip is applied" to "fallback clip is not applied", documented inline. The 0x0, missing-viewport-fields, and getLayoutMetrics-throws cases still trigger the fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests
The 0×N and N×0 boundary tests previously asserted only the predicate
outcome ("no fallback clip applied") while letting the fake return a
canned valid-PNG byte string — which doesn't match what a real
encoder would do. PNG IHDR and JPEG SOF both require both dimensions
to be >= 1, so a zero-dim viewport can't produce a valid image
regardless of which encoder runs.
Update those two tests to:
1. Configure the fake's captureScreenshot to return empty data
(modeling the real encoder's behavior on a zero-dim viewport).
2. Assert that the wrapper surfaces the failure via the existing
empty-data guard rather than masking it with a synthetic 1280×800
capture.
The call shape assertion (`{ format: "png" }`, i.e. no clip /
captureBeyondViewport) is preserved so the predicate decision is still
covered.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous catch block applied the fallback clip for any throw from Page.getLayoutMetrics, which masked transport-level failures (websocket dropped, target detached) behind a synthetic 1280x800 capture. Tighten the discriminator to only recover from Chrome-side rejections, and propagate everything else so the caller sees the real failure. Mechanics: - Bump chrome-remote-interface from ^0.33.2 to ^0.34.0 to pick up the newly-exposed ProtocolError class (added in 82d1d60b upstream). - @types/chrome-remote-interface hasn't declared ProtocolError yet (DefinitelyTyped/DefinitelyTyped#74992 in flight), so bridge the gap with a small local cast at import time. Comment marks the cast for removal once the upstream types land. - Switch the catch block to `if (err instanceof ProtocolError) { clip = SCREENSHOT_FALLBACK_CLIP } else { throw err }`. Tests: - Fake gains `setMetricsToThrowProtocolError()` and `setMetricsToThrowGeneric()` (replaces the single `setMetricsToThrow`). - Splits the existing "getLayoutMetrics throws -> fallback" test into: - ProtocolError throw -> fallback clip applied (recovery path) - generic Error throw -> propagates, no captureScreenshot attempted Safety analysis vs. the broad catch we're replacing: - getLayoutMetrics returns 0x0: fallback (unchanged) - ProtocolError on unsupported: fallback (unchanged) - 0x0 + ProtocolError on probe: fallback (unchanged — captureScreenshot still gets the clip) - websocket dropped mid-probe: propagates (was: silently fell back, then captureScreenshot also failed with the same error one call later — same end state, cleaner provenance) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses review feedback on RapierCraft#7: - Capture the unsubscribe function returned by client.Page.lifecycleEvent() in connect(), store on this.lifecycleUnsubscribe. CRI's domain-event callbacks return `() => chrome.removeListener(rawEventName, handler)` (api.js:49), which is the idiomatic deregistration path. Replaces the prior `(client as any).removeListener(...)` call, which worked (Chrome extends EventEmitter) but bypassed the typed API. - In waitForLifecycle, use the same unsubscribe-function pattern instead of `client.on(...)` + `client.removeListener(...)`. Register the listener before scanning the cache: single-threaded JS makes the synchronous-only path safe today, but ordering matters if anything upstream (CRI internals, scheduler) ever inserts a microtask between the two operations. - Mark waitForLifecycle private; only screenshot() uses it. - Clear frameLifecycle, lifecycleListener, and lifecycleUnsubscribe in disconnect() so stale state from the prior tab doesn't carry across a disconnect/reconnect cycle.
Previously screenshot() proceeded as soon as the connection was up, which meant a screenshot taken mid-navigation could capture an unrendered or stale frame. Lighthouse and Puppeteer use the CDP Page.lifecycleEvent stream to know when the renderer has actually painted; we now do the same. - At connect(): enable Page.setLifecycleEventsEnabled and accumulate events per (frameId, loaderId). - New waitForLifecycle(name, timeoutMs) returns immediately if the event has already fired for the current document, otherwise waits for the next occurrence. - screenshot() awaits firstContentfulPaint up to 2s before capture. Returns immediately for steady-state pages; bounds blast radius if the page never paints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses review feedback on RapierCraft#7: - Capture the unsubscribe function returned by client.Page.lifecycleEvent() in connect(), store on this.lifecycleUnsubscribe. CRI's domain-event callbacks return `() => chrome.removeListener(rawEventName, handler)` (api.js:49), which is the idiomatic deregistration path. Replaces the prior `(client as any).removeListener(...)` call, which worked (Chrome extends EventEmitter) but bypassed the typed API. - In waitForLifecycle, use the same unsubscribe-function pattern instead of `client.on(...)` + `client.removeListener(...)`. Register the listener before scanning the cache: single-threaded JS makes the synchronous-only path safe today, but ordering matters if anything upstream (CRI internals, scheduler) ever inserts a microtask between the two operations. - Mark waitForLifecycle private; only screenshot() uses it. - Clear frameLifecycle, lifecycleListener, and lifecycleUnsubscribe in disconnect() so stale state from the prior tab doesn't carry across a disconnect/reconnect cycle.
…estability Mirrors the captureScreenshotWithFallback pattern: move the lifecycle-wait logic into a free function that takes a `LifecyclePageAPI` slice and a `FrameLifecycleMap`, so unit tests can drive it with a hand-written fake instead of a live CDP connection. Drops the private wrapper method on CometCDPClient (no other callers) and inlines the call site in screenshot(). No behavior change.
Covers the three exit paths (cache hit, listener fires, timeout) plus error handling when the CRI lifecycleEvent registration throws. Pins the review-fix invariants: - listener is unsubscribed on every exit path (no leaked handlers across calls — guards the listener-leak finding) - listener is registered before the cache scan (defensive ordering against any future microtask insertion between cache check and registration) - unrelated lifecycle events on the same connection do not resolve the Promise — only the requested eventName matters Uses a FakeLifecyclePage that satisfies LifecyclePageAPI and records registrations/unsubscribes/live-handler-count so the leak invariants can be asserted directly.
e05add6 to
191ea5c
Compare
|
Branch is rebased onto the current head of #6 so it picks up the unit-test infra from #10, and the lifecycle code now lives behind the same free-function pattern as Notes on the analysis behind each finding: Listener leak. Race condition. In single-threaded Node, the cache scan, Promise construction, and listener registration execute in one synchronous block with no event-loop turn between them. CRI events arrive via WebSocket I/O callbacks, which only fire on event-loop ticks. So there isn't an actual race today. But listener-first ordering is defensive against any future microtask insertion (CRI internals, scheduler changes), and it costs nothing. Swapped, with a test that pins the ordering. Disconnect cleanup.
The new tests cover the three exit paths (cache hit, listener fires, timeout) and pin the listener-leak invariant (unsubscribe count on every path) plus the defensive ordering. Just pending #6 landing still. |
Code Review: fix(screenshot): wait for paint via Page.lifecycleEventReviewed commit: Overall Assessment: APPROVE (with minor observations)This is a well-engineered PR. The lifecycle waiting logic is correct, the cleanup paths are complete, error handling is defensive without masking real failures, and the test coverage is thorough. The PR is safe to merge. Detailed Findings1. Lifecycle Event Handling — Correct
Cleanup: The 2. Timeout Handling — SoundThe 2-second timeout is well-chosen: short enough to avoid user-visible stalls, long enough for typical FCP on SPAs. On timeout, 3. Lifecycle Map Management — Correct
4.
|
| Category | Status |
|---|---|
| Correctness | Pass — no race conditions, proper TOCTOU ordering |
| Cleanup/Leaks | Pass — all exit paths clean up listener + timer |
| Timeout behavior | Pass — bounded, graceful degradation |
| Error handling | Pass — defensive without masking transport failures |
| Test coverage | Pass — all paths exercised with fake timers |
| Breaking changes | None — steady-state pages resolve synchronously |
Verdict: Clean. No blocking issues found. Safe to merge after PR #6 lands (as noted in the PR description).
Summary
Page.lifecycleEventat connect time and accumulates events per(frameId, loaderId)for the current documentwaitForLifecycle(name, timeoutMs)— returns synchronously if the event already fired for the current document, otherwise waits for the next occurrence (or times out)screenshot()awaitsfirstContentfulPaintup to 2s before captureProblem
screenshot()proceeds as soon as the CDP connection is up. On a freshly navigated tab the renderer may not have painted yet, so the captured frame is blank or shows the previous document.document.readyState === 'complete'is too weak a signal — for SPAs it fires when the bundle loads, often before answer content has rendered.Fix
Lighthouse and Puppeteer use
Page.lifecycleEventto know when the renderer has actually painted. This PR does the same:Properties:
waitForLifecyclereturns synchronously.setLifecycleEventsEnabledfails (older protocol versions, etc.), the lifecycle map stays empty andwaitForLifecyclereturnsfalseimmediately.The new
waitForLifecyclehelper is also reusable for future ops that want to gate onnetworkAlmostIdle,load, etc.Test Plan
npm run buildpassesDependency
Builds on #6 (the 0×0 viewport fix). The first commit in the diff is #6's commit; once #6 merges, this PR will rebase down to a single commit. Happy to rebase manually if preferred.