fix(screenshot): wait for paint via Page.lifecycleEvent by madisonrickert · Pull Request #7 · RapierCraft/Perplexity-Comet-MCP

madisonrickert · 2026-05-03T19:02:14Z

Summary

Subscribes to Page.lifecycleEvent at connect time and accumulates events per (frameId, loaderId) for the current document
Adds waitForLifecycle(name, timeoutMs) — returns synchronously if the event already fired for the current document, otherwise waits for the next occurrence (or times out)
screenshot() awaits firstContentfulPaint up to 2s before capture

Problem

screenshot() proceeds as soon as the CDP connection is up. On a freshly navigated tab the renderer may not have painted yet, so the captured frame is blank or shows the previous document. document.readyState === 'complete' is too weak a signal — for SPAs it fires when the bundle loads, often before answer content has rendered.

Fix

Lighthouse and Puppeteer use Page.lifecycleEvent to know when the renderer has actually painted. This PR does the same:

// At connect():
await this.client.Page.setLifecycleEventsEnabled({ enabled: true });
this.client.Page.lifecycleEvent((params) => {
  // accumulate (frameId, loaderId) → Set<eventName>
});

// In screenshot():
await this.waitForLifecycle('firstContentfulPaint', 2000);

Properties:

No regression on steady-state pages — FCP has already fired, so waitForLifecycle returns synchronously.
Bounded blast radius — 2s timeout means a non-painting page degrades to the prior behavior, never hangs longer.
Best-effort — if setLifecycleEventsEnabled fails (older protocol versions, etc.), the lifecycle map stays empty and waitForLifecycle returns false immediately.

The new waitForLifecycle helper is also reusable for future ops that want to gate on networkAlmostIdle, load, etc.

Test Plan

npm run build passes
Verified screenshot on steady-state page returns immediately (FCP cached)
Verified screenshot post-navigation waits briefly then captures the new document, not the old one

Dependency

Builds on #6 (the 0×0 viewport fix). The first commit in the diff is #6's commit; once #6 merges, this PR will rebase down to a single commit. Happy to rebase manually if preferred.

RapierCraft

PR Review: #7 — fix(screenshot): wait for paint via Page.lifecycleEvent

Verdict: CHANGES REQUESTED

Reviewed commit: HEAD | Scope: src/cdp-client.ts | Depends on: PR #6

Findings

#	Finding	Severity	Confidence
1	Listener leak in `waitForLifecycle`: `(this.client as any)?.removeListener?.('Page.lifecycleEvent', listener)` is not the correct API for `chrome-remote-interface`. The CRI library uses `.off()` or the event-specific deregistration pattern, not `.removeListener()`. The listener registered via `.on('Page.lifecycleEvent', listener)` is never actually removed, leaking a listener on every screenshot that waits for FCP.	HIGH	CONFIRMED
2	Race condition in `waitForLifecycle`: The synchronous cache check (`for...of frameLifecycle.values()`) happens before the listener is registered. If FCP fires in that gap, neither the cache check nor the listener catches it, causing a needless 2s timeout. Fix: register listener first, then check cache.	MEDIUM	CONFIRMED
3	No cleanup in `disconnect()`: `frameLifecycle` map and `lifecycleListener` are not cleared or deregistered when disconnecting. Map grows unbounded across reconnections to different tabs.	MEDIUM	CONFIRMED
4	`waitForLifecycle` is `public` with no documented contract — should be `private` since it's an internal implementation detail	LOW	CONFIRMED
5	Inherits PR #6's `\|\|` vs `&&` viewport check issue (see PR #6 review)	HIGH	CONFIRMED

Recommendation

Do not merge until findings #1-3 are resolved:

#1: Use (this.client as any).off('Page.lifecycleEvent', listener) or store a reference and use CRI's deregistration API
#2: Register the .on() listener before checking the cache, or use a single code path
#3: Clear frameLifecycle and remove lifecycleListener in the disconnect() method

This PR depends on #6 — merge #6 first (after its fix), then rebase this.

Review complete. 1 agent (security + infrastructure). 3 blocking findings.

RapierCraft · 2026-05-12T14:57:58Z

Hey @madisonrickert — love the lifecycle-based approach, it's how Lighthouse/Puppeteer do it and is much better than readyState polling. Three issues to fix before merging:

1. Listener leak in waitForLifecycle (HIGH)

try { (this.client as any)?.removeListener?.('Page.lifecycleEvent', listener); } catch { /* ignore */ }

chrome-remote-interface doesn't expose removeListener on the client object — this is a no-op, so every waitForLifecycle call that waits (doesn't return synchronously) permanently leaks a listener. Use (this.client as any).off('Page.lifecycleEvent', listener) or store a reference from the CRI registration API.

2. Race condition (MEDIUM)

The synchronous cache check runs before the listener is registered:

// 1. check cache — FCP not found
for (const { events } of this.frameLifecycle.values()) { ... }
// ← FCP fires HERE, updates frameLifecycle, but listener isn't registered yet
// 2. register listener — misses FCP, waits full 2s timeout
return new Promise<boolean>((resolve) => { ... });

Fix: register the listener first, then check the cache inside the promise.

3. No cleanup in disconnect() (MEDIUM)

frameLifecycle map and lifecycleListener are never cleared or deregistered on disconnect. The map grows unbounded across reconnections. Add to disconnect():

this.frameLifecycle.clear();
this.lifecycleListener = null;

Once these are addressed, this is ready to merge (after #6 lands first since this depends on it).

Page.captureScreenshot can stall for ~2 minutes when the connected target reports a 0x0 layout viewport — observed when the MCP attaches to the Perplexity sidecar panel (visibilityState='hidden') in some Comet window states (cold launch, no real browsing tabs in front). Without captureBeyondViewport, captureScreenshot captures the layout viewport and waits for compositor frames that never arrive. When Page.getLayoutMetrics reports a zero-size viewport, supply an explicit clip + captureBeyondViewport=true so the renderer produces a frame immediately. No-op on real foreground tabs (where the layout viewport is non-zero, so the clip path doesn't activate). Closes RapierCraft#2.

…bility Pulls the screenshot-with-fallback logic out of the CometCDPClient method body and into an exported free function backed by a minimal ScreenshotPageAPI interface — same seam pattern as CometAIClient, so unit tests can substitute a hand-written fake page without a live CDP connection. Also promotes the inline 1280x800 fallback clip to a named SCREENSHOT_FALLBACK_CLIP constant, with the explanatory comment attached to the constant rather than buried mid-method. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Covers: - normal viewport (1024x768) -> no clip; format forwarded as-is - cssLayoutViewport missing -> falls back to layoutViewport - jpeg format forwarded - bringToFront is invoked before capture (best-effort) - 0x0, 0xN, Nx0, both-fields-absent viewports -> fallback clip + captureBeyondViewport=true - getLayoutMetrics throws -> fallback clip - bringToFront throws -> swallowed; capture still runs - captureScreenshot returns empty data -> throws descriptive error The 0xN / Nx0 cases document the wrapper's defensive || semantics: any zero dimension triggers the fallback, on the basis that a 0-pixel screenshot is useless regardless of whether the same compositor stall reproduces. The hang itself is only confirmed for 0x0 in live testing; the wrapper-logic tests here don't claim otherwise. Adds tests/unit/fakes/fake-screenshot-page.ts following the existing FakeCdpClient style (records calls, programmable responses). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…h dims zero Switches the screenshot-fallback condition from `||` to `&&` so the fallback clip only activates on a strict 0x0 layout viewport — the only viewport state we have evidence for actually stalling Page.captureScreenshot. Live CDP testing against a running Comet instance: - Sidecar (visibilityState=hidden, cold launch): natural 0x0 viewport; naked Page.captureScreenshot timed out at 15s; the wrapper completed in ~1s with valid PNG data. - Forced 0x0 via Emulation.setDeviceMetricsOverride on a normal page: Chromium silently rejected the zero values and reported the natural viewport (1024x768) instead of 0x0. - Forced 0xN and Nx0 via Emulation.setDeviceMetricsOverride: Chromium rejected the single-zero dimension the same way, substituting the natural value for the missing axis. So a 0xN or Nx0 layout viewport is not a state we can observe in practice. The `||` form was defensive against a hypothetical that doesn't appear to be reachable; `&&` matches the predicate to the evidence and avoids guessing on inputs we can't generate. Unit-test expectations for the 0xN / Nx0 boundary cases move from "fallback clip is applied" to "fallback clip is not applied", documented inline. The 0x0, missing-viewport-fields, and getLayoutMetrics-throws cases still trigger the fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… tests The 0×N and N×0 boundary tests previously asserted only the predicate outcome ("no fallback clip applied") while letting the fake return a canned valid-PNG byte string — which doesn't match what a real encoder would do. PNG IHDR and JPEG SOF both require both dimensions to be >= 1, so a zero-dim viewport can't produce a valid image regardless of which encoder runs. Update those two tests to: 1. Configure the fake's captureScreenshot to return empty data (modeling the real encoder's behavior on a zero-dim viewport). 2. Assert that the wrapper surfaces the failure via the existing empty-data guard rather than masking it with a synthetic 1280×800 capture. The call shape assertion (`{ format: "png" }`, i.e. no clip / captureBeyondViewport) is preserved so the predicate decision is still covered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous catch block applied the fallback clip for any throw from Page.getLayoutMetrics, which masked transport-level failures (websocket dropped, target detached) behind a synthetic 1280x800 capture. Tighten the discriminator to only recover from Chrome-side rejections, and propagate everything else so the caller sees the real failure. Mechanics: - Bump chrome-remote-interface from ^0.33.2 to ^0.34.0 to pick up the newly-exposed ProtocolError class (added in 82d1d60b upstream). - @types/chrome-remote-interface hasn't declared ProtocolError yet (DefinitelyTyped/DefinitelyTyped#74992 in flight), so bridge the gap with a small local cast at import time. Comment marks the cast for removal once the upstream types land. - Switch the catch block to `if (err instanceof ProtocolError) { clip = SCREENSHOT_FALLBACK_CLIP } else { throw err }`. Tests: - Fake gains `setMetricsToThrowProtocolError()` and `setMetricsToThrowGeneric()` (replaces the single `setMetricsToThrow`). - Splits the existing "getLayoutMetrics throws -> fallback" test into: - ProtocolError throw -> fallback clip applied (recovery path) - generic Error throw -> propagates, no captureScreenshot attempted Safety analysis vs. the broad catch we're replacing: - getLayoutMetrics returns 0x0: fallback (unchanged) - ProtocolError on unsupported: fallback (unchanged) - 0x0 + ProtocolError on probe: fallback (unchanged — captureScreenshot still gets the clip) - websocket dropped mid-probe: propagates (was: silently fell back, then captureScreenshot also failed with the same error one call later — same end state, cleaner provenance) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses review feedback on RapierCraft#7: - Capture the unsubscribe function returned by client.Page.lifecycleEvent() in connect(), store on this.lifecycleUnsubscribe. CRI's domain-event callbacks return `() => chrome.removeListener(rawEventName, handler)` (api.js:49), which is the idiomatic deregistration path. Replaces the prior `(client as any).removeListener(...)` call, which worked (Chrome extends EventEmitter) but bypassed the typed API. - In waitForLifecycle, use the same unsubscribe-function pattern instead of `client.on(...)` + `client.removeListener(...)`. Register the listener before scanning the cache: single-threaded JS makes the synchronous-only path safe today, but ordering matters if anything upstream (CRI internals, scheduler) ever inserts a microtask between the two operations. - Mark waitForLifecycle private; only screenshot() uses it. - Clear frameLifecycle, lifecycleListener, and lifecycleUnsubscribe in disconnect() so stale state from the prior tab doesn't carry across a disconnect/reconnect cycle.

Previously screenshot() proceeded as soon as the connection was up, which meant a screenshot taken mid-navigation could capture an unrendered or stale frame. Lighthouse and Puppeteer use the CDP Page.lifecycleEvent stream to know when the renderer has actually painted; we now do the same. - At connect(): enable Page.setLifecycleEventsEnabled and accumulate events per (frameId, loaderId). - New waitForLifecycle(name, timeoutMs) returns immediately if the event has already fired for the current document, otherwise waits for the next occurrence. - screenshot() awaits firstContentfulPaint up to 2s before capture. Returns immediately for steady-state pages; bounds blast radius if the page never paints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses review feedback on RapierCraft#7: - Capture the unsubscribe function returned by client.Page.lifecycleEvent() in connect(), store on this.lifecycleUnsubscribe. CRI's domain-event callbacks return `() => chrome.removeListener(rawEventName, handler)` (api.js:49), which is the idiomatic deregistration path. Replaces the prior `(client as any).removeListener(...)` call, which worked (Chrome extends EventEmitter) but bypassed the typed API. - In waitForLifecycle, use the same unsubscribe-function pattern instead of `client.on(...)` + `client.removeListener(...)`. Register the listener before scanning the cache: single-threaded JS makes the synchronous-only path safe today, but ordering matters if anything upstream (CRI internals, scheduler) ever inserts a microtask between the two operations. - Mark waitForLifecycle private; only screenshot() uses it. - Clear frameLifecycle, lifecycleListener, and lifecycleUnsubscribe in disconnect() so stale state from the prior tab doesn't carry across a disconnect/reconnect cycle.

…estability Mirrors the captureScreenshotWithFallback pattern: move the lifecycle-wait logic into a free function that takes a `LifecyclePageAPI` slice and a `FrameLifecycleMap`, so unit tests can drive it with a hand-written fake instead of a live CDP connection. Drops the private wrapper method on CometCDPClient (no other callers) and inlines the call site in screenshot(). No behavior change.

Covers the three exit paths (cache hit, listener fires, timeout) plus error handling when the CRI lifecycleEvent registration throws. Pins the review-fix invariants: - listener is unsubscribed on every exit path (no leaked handlers across calls — guards the listener-leak finding) - listener is registered before the cache scan (defensive ordering against any future microtask insertion between cache check and registration) - unrelated lifecycle events on the same connection do not resolve the Promise — only the requested eventName matters Uses a FakeLifecyclePage that satisfies LifecyclePageAPI and records registrations/unsubscribes/live-handler-count so the leak invariants can be asserted directly.

madisonrickert · 2026-05-12T23:33:58Z

Branch is rebased onto the current head of #6 so it picks up the unit-test infra from #10, and the lifecycle code now lives behind the same free-function pattern as captureScreenshotWithFallback.

Notes on the analysis behind each finding:

Listener leak. chrome-remote-interface's Chrome class extends Node's EventEmitter (chrome.js:15), so the prior removeListener call wasn't actually a no-op. That said, the idiomatic CRI pattern is to keep the unsubscribe function CRI itself returns from client.Page.lifecycleEvent(handler): api.js:49 returns exactly () => chrome.removeListener(rawEventName, handler). Switched to that. Cleaner, drops the as any cast, and disconnect() uses it too.

Race condition. In single-threaded Node, the cache scan, Promise construction, and listener registration execute in one synchronous block with no event-loop turn between them. CRI events arrive via WebSocket I/O callbacks, which only fire on event-loop ticks. So there isn't an actual race today. But listener-first ordering is defensive against any future microtask insertion (CRI internals, scheduler changes), and it costs nothing. Swapped, with a test that pins the ordering.

Disconnect cleanup. connect() already calls this.frameLifecycle.clear(), so the map doesn't grow unbounded across reconnections. The real residue is stale events between disconnect() and the next connect() plus a dangling listener reference. Added the cleanup as you suggested.

waitForLifecycle privacy. Removed the class method entirely. The logic now lives as a top-level waitForLifecycle(page, frameLifecycle, eventName, timeoutMs) function (callable from outside the class for future ops on networkAlmostIdle / load, per the PR description), and screenshot() calls it directly.

The new tests cover the three exit paths (cache hit, listener fires, timeout) and pin the listener-leak invariant (unsubscribe count on every path) plus the defensive ordering. npx vitest run shows 51/51.

Just pending #6 landing still.

RapierCraft · 2026-05-16T23:49:35Z

Code Review: fix(screenshot): wait for paint via Page.lifecycleEvent

Reviewed commit: 191ea5c

Overall Assessment: APPROVE (with minor observations)

This is a well-engineered PR. The lifecycle waiting logic is correct, the cleanup paths are complete, error handling is defensive without masking real failures, and the test coverage is thorough. The PR is safe to merge.

Detailed Findings

1. Lifecycle Event Handling — Correct

waitForLifecycle race condition analysis: The implementation is safe. The listener is registered before scanning the cache (line ~193 vs ~195 in the diff), which closes the classic TOCTOU window. Since JavaScript is single-threaded, no event can arrive between listener registration and the cache scan that would be lost. The comment at line 165-167 correctly documents this.

Cleanup: The finish() guard (if (done) return) prevents double-resolution if the timer fires after the event arrives (or vice versa). Both the listener and timer are cleaned up on every exit path (cache hit, event arrival, timeout, registration failure). No leaks.

2. Timeout Handling — Sound

The 2-second timeout is well-chosen: short enough to avoid user-visible stalls, long enough for typical FCP on SPAs. On timeout, waitForLifecycle returns false (not an exception), so screenshot() degrades gracefully to the prior behavior — captures whatever is on screen. This matches the "bounded blast radius" claim in the PR description.

3. Lifecycle Map Management — Correct

connect(): Clears the map and unsubscribes any prior listener before re-subscribing. This prevents stale events from a previous document leaking into the new connection.
disconnect(): Cleans up listener, nulls references, clears map. Correct order (unsubscribe before client.close()).
LoaderId rotation: When a new loaderId arrives for a frameId, the entry is replaced (new Set([name])), which correctly invalidates stale events from the prior navigation. This is the key correctness property for SPA navigations.

4. `captureScreenshotWithFallback` — Well-Separated Concerns

The 0x0 viewport fallback (from PR #6) is cleanly extracted into a testable free function. The ProtocolError vs generic error discrimination is correct — only Chrome-side rejections trigger the fallback; transport failures propagate.

5. Minor Observations (non-blocking)

5a. ProtocolError import pattern: The cast (CDP as unknown as { ProtocolError: ... }).ProtocolError works at runtime but relies on an undocumented export. The comment notes this is a workaround pending DefinitelyTyped updates — acceptable.

5b. Lifecycle listener type: this.lifecycleListener is typed as ((params: any) => void) | null. The any is pragmatic given CRI's typing limitations, but the actual params are well-validated inside the handler (if (!frameId || !loaderId || !name) return).

5c. waitForLifecycle passes this.client!.Page directly: This works because LifecyclePageAPI only requires lifecycleEvent(handler) which CRI's Page domain provides. The type narrowing via the interface is a good pattern for testability.

5d. The !v?.clientWidth && !v?.clientHeight check: This uses falsy coercion, so 0 triggers the fallback. Since viewport dimensions are numeric, this is correct — width=0 AND height=0 means degenerate. A viewport with width=0 but height=800 would NOT trigger the fallback (intentional, per the comments and tests).

6. Test Quality

The tests cover all meaningful paths:

Cache-hit (synchronous resolution)
Listener path (async resolution)
Timeout path (with fake timers)
Error handling (registration failure)
Cleanup verification (no leaked handlers)
Defensive ordering assertion

The fakes are minimal and well-typed. The FakeScreenshotPage correctly mirrors the ProtocolError construction pattern from the source.

7. Dependency Bump

chrome-remote-interface 0.33.2 -> 0.34.0: Required for the ProtocolError export. The @types/chrome-remote-interface stays at ^0.33.0 — no runtime risk from the types lag since the cast pattern handles it.

Summary

Category	Status
Correctness	Pass — no race conditions, proper TOCTOU ordering
Cleanup/Leaks	Pass — all exit paths clean up listener + timer
Timeout behavior	Pass — bounded, graceful degradation
Error handling	Pass — defensive without masking transport failures
Test coverage	Pass — all paths exercised with fake timers
Breaking changes	None — steady-state pages resolve synchronously

Verdict: Clean. No blocking issues found. Safe to merge after PR #6 lands (as noted in the PR description).

RapierCraft reviewed May 12, 2026

View reviewed changes

madisonrickert and others added 6 commits May 12, 2026 14:53

madisonrickert and others added 4 commits May 12, 2026 16:24

madisonrickert force-pushed the upstream-pr/screenshot-paint-wait branch from e05add6 to 191ea5c Compare May 12, 2026 23:33

RapierCraft merged commit be984af into RapierCraft:main May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(screenshot): wait for paint via Page.lifecycleEvent#7

fix(screenshot): wait for paint via Page.lifecycleEvent#7
RapierCraft merged 10 commits into
RapierCraft:mainfrom
madisonrickert:upstream-pr/screenshot-paint-wait

madisonrickert commented May 3, 2026

Uh oh!

RapierCraft left a comment

Uh oh!

RapierCraft commented May 12, 2026

Uh oh!

madisonrickert commented May 12, 2026 •

edited

Loading

Uh oh!

RapierCraft commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

madisonrickert commented May 3, 2026

Summary

Problem

Fix

Test Plan

Dependency

Uh oh!

RapierCraft left a comment

Choose a reason for hiding this comment

PR Review: #7 — fix(screenshot): wait for paint via Page.lifecycleEvent

Verdict: CHANGES REQUESTED

Findings

Recommendation

Uh oh!

RapierCraft commented May 12, 2026

Uh oh!

madisonrickert commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RapierCraft commented May 16, 2026

Code Review: fix(screenshot): wait for paint via Page.lifecycleEvent

Overall Assessment: APPROVE (with minor observations)

Detailed Findings

1. Lifecycle Event Handling — Correct

2. Timeout Handling — Sound

3. Lifecycle Map Management — Correct

4. captureScreenshotWithFallback — Well-Separated Concerns

5. Minor Observations (non-blocking)

6. Test Quality

7. Dependency Bump

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

madisonrickert commented May 12, 2026 •

edited

Loading

4. `captureScreenshotWithFallback` — Well-Separated Concerns