perf(engine): faster shader transitions via page-side WebGL compositing#832
perf(engine): faster shader transitions via page-side WebGL compositing#832vanceingalls wants to merge 9 commits into
Conversation
f4ee937 to
759b399
Compare
5bc730a to
c6a9818
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
miguel-heygen
left a comment
There was a problem hiding this comment.
Review: Page-side compositing for shader transitions
Why it's 6.5x faster
The architectural change is elegant. Before: each transition frame required TWO transparent-alpha screenshots (one per scene layer), pixel data transfer from Chrome → Node, color space conversion (rgb48le/rgba8), then CPU-bound GLSL shader blending in a Node worker_threads pool. Two round-trips through the screenshot pipeline plus CPU pixel math per frame.
After: a WebGL canvas overlay inside Chrome captures both scenes via drawElementImage, uploads them as textures, and runs the GLSL fragment shader on Chrome's GPU — all in-process. The engine takes ONE opaque RGB screenshot of the already-composited result. This eliminates: (1) two per-layer transparent screenshots, (2) Node-side color space conversion, (3) the entire Node-side shader-blend worker pool. The GPU work happens where the GPU already is.
One issue to resolve
Default mismatch: PR description says "default OFF" (opt-in), but the code sets default: true in the CLI flag and enablePageSideCompositing: true in DEFAULT_CONFIG. These need to agree — if ON is intended, existing CI fixtures may need PSNR-based pinning since f32 (WebGL) vs f64 (Node) precision means output is not bit-identical.
Notes (non-blocking)
composition.videos.length === 0gate silently disables page-side compositing for any composition with embedded video. Worth documenting.- No PSNR-pinned visual regression test yet (documented as follow-up) — acceptable given the opt-in framing, less so if it defaults ON.
- Error handling and fallback are solid — probes for
drawElementImage+ WebGL, falls back to opacity-flip with warning.
miguel-heygen
left a comment
There was a problem hiding this comment.
Description updated — default ON matches the code now. LGTM.
Two correctness fixes from PR #821 self-review: 1. Cache priority order. Previous order was hyperframes-managed cache → puppeteer cache. HF cache is pinned to CHROME_VERSION (131-era) which lags 17+ releases behind upstream; if a user separately installed a newer chrome-headless-shell via @puppeteer/browsers install, the CLI would silently hand engine the older HF-cache binary while engine's own resolveHeadlessShellPath would have picked the newer one. Flip the priority so puppeteer cache wins, matching engine semantics. 2. Numeric (not lexicographic) version sort. `readdirSync.sort().reverse()` over names like `linux-148.0.7778.97` and `linux-99.0.6533.123` would return `linux-99...` first because character '9' outranks '1'. Parse each name into integer segments and compare them numerically. Tests: add both-caches-populated and linux-148-beats-linux-99 cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ike) Add an opt-in `--page-side-compositing` flag (CLI) backed by a new engine config field `enablePageSideCompositing` and env var `HF_PAGE_SIDE_COMPOSITING`. When set, SDR shader-transition compositions skip the Node-side layered blend (the hf#677 chain) and instead run the shader inside Chrome via a page-side WebGL canvas; the engine then captures ONE opaque RGB frame per output frame via the existing streaming capture path. This is the strongest non-beginFrame perf lever for Mac users, who cannot take the beginFrame `~5×` path (Chromium structural limit, crbug.com/40656275). Stacks on top of the hf#677 1.95× baseline. Default OFF — existing fixture pins (byte-exact MP4 output) are preserved. Opt-in path is intentionally PSNR-pinned, not byte-equal (WebGL is f32; Node is f64). HDR content forces the existing layered path regardless. Implementation: - engine: new `EngineConfig.enablePageSideCompositing` (default false). - producer/fileServer: new `HF_PAGE_SIDE_COMPOSITING_STUB` early-page script injected into the served HTML head when the flag is on. - producer/renderOrchestrator: when the flag + no HDR + no png-sequence, route SDR transitions through the streaming path instead of the layered HDR stage. - shader-transitions: new `engineModePageComposite.ts` installs a fullscreen WebGL compositor overlay and wraps `window.__hf.seek` so each seek inside a transition window captures both scenes via the Chromium `drawElementImage` API to GL textures, runs the fragment shader, and displays the composited result on the overlay canvas. The engine takes one screenshot per frame and sees the composited overlay. - cli: new `--page-side-compositing` flag sets `HF_PAGE_SIDE_COMPOSITING=true` before producer load. - scripts/page-side-compositing-smoke: bundled-CLI smoke that renders a representative fixture with and without the flag, validates the canary strings are in the shipped bundles, and writes a wall-time pair. Determinism trade documented in the engine config doc-comment. The smoke script enforces the bundled-CLI validation discipline from prior perf work (see internal feedback note `validate_bundled_cli_not_dev_path`). Runtime requirement: Chromium's `CanvasDrawElement` feature (already enabled by the engine's `--enable-features=CanvasDrawElement` launch flag). When the runtime feature is unavailable, the page-side installer logs a warning and falls back to opacity-flip mode — the engine still takes the streaming path; the transition window degrades to a hard scene swap. Vance will validate on Mac Chrome where the feature is supported. Co-Authored-By: Vai <vai@heygen.com>
…ture The original drawElementImage approach fails in engine render mode because the virtual-time shim prevents Chromium from generating paint records for cloned elements. drawElementImage requires a cached paint record from the browser's compositor — clones created at capture time never receive one because (a) shimmed rAFs deadlock inside the seek wrapper, (b) original rAFs don't produce real paints under virtual-time control, and (c) layoutsubtree canvases don't apply CSS stylesheet rules to children. Switch scene capture to html2canvas (foreignObjectRendering: false), the same JS-based renderer already used by the preview-mode fallback path in capture.ts. html2canvas reads computed styles and renders via its own canvas drawing pipeline with no dependency on the browser paint cycle. Also fixes: - Engine seek must return the result so Puppeteer awaits async seek promises (frameCapture.ts). - GSAP opacity cache: compositor must restore scene opacity before seek, not after — GSAP caches inline values and skips re-writes. - Support check gates on WebGL availability, not drawElementImage. Perf: 15-scene shader-perf fixture (28s, 14 transitions, 30fps) Baseline (Node-side layered): 137s Page-side (html2canvas+WebGL): 33s → 4.1× speedup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…positor - Use uploadTexture (zeroes canvas backing store after upload) to prevent ~2.2GB transient memory pressure across 280 html2canvas calls per render - Add ignoreElements + stabilizeTransformedBoxShadows to html2canvas call, matching the preview-path capture.ts behavior - Parallelize from/to scene captures with Promise.all - Wrap post-capture render in try/finally so opacity is always restored - Fix WebGL context leak in isPageSideCompositingSupported probe - Remove dead ResolvedTransition.index field - Export stabilizeTransformedBoxShadows from capture.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses three issues from staff review:
1. ignoreElements filter stripped all in-scene canvases (Chart.js, D3,
p5.js) — narrowed to data-no-capture only since the compositor canvas
is a body sibling never in the scene subtree.
2. Docker mode silently dropped --page-side-compositing — thread
pageSideCompositing through DockerRenderOptions/buildDockerRunArgs
with regression tests.
3. Fragmented gating across 4 independent sites could disagree:
- Stub injection gated only on cfg flag (leaked into HDR/alpha)
- Probe-created fileServer never got the stub
- needsAlpha (WebM/MOV) not excluded from the gate
- WebGL-unavailable fallback claimed layered path would run but
orchestrator had already disabled it
Fix: compute stub injection at the same site as the layered-bypass
decision (after hasHdrContent is known), using addPreHeadScript on
the already-running fileServer. Single predicate now gates both
decisions, including !needsAlpha.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iting
Replace html2canvas with native drawElementImage for scene capture in
the page-side compositor. drawElementImage reads from the browser's own
paint cache, giving pixel-identical output to the preview path.
The blocker was that cloned elements inside layoutsubtree canvases have
no cached paint record under virtual time — the compositor only paints
when explicitly triggered. Fix: split the seek+composite into two phases
with an engine-forced paint between them.
Phase 1 (seek wrapper, page-side):
- GSAP seek positions the timeline
- Clone FROM/TO scenes into visible layoutsubtree staging canvases
- Set window.__hf_page_composite_pending flag
Engine paint force (frameCapture.ts):
- Detect pending flag after seek returns
- Fire micro Page.captureScreenshot (1x1 clip) via CDP to force the
browser compositor to paint all visible elements including staging
canvas children
Phase 2 (page.evaluate, page-side):
- drawElementImage reads the now-valid paint records
- Upload textures to WebGL, run shader, show GL overlay
Key insight: staging canvases must be visible (not opacity:0) for the
browser to paint their children. They sit at z-index:-9998, behind
the main DOM and covered by the GL overlay during transitions.
Perf: 15-scene fixture (28s, 14 transitions, 30fps):
Baseline (Node-side layered): 137s
html2canvas + WebGL: 33s (3.7×)
drawElementImage + WebGL: 21s (6.6×)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- uploadTextureSource instead of uploadTexture: eliminates ~2.3GB of canvas buffer alloc/dealloc churn (persistent staging canvases don't need the one-shot zeroing behavior) - Fold hasPending check into seek page.evaluate: eliminates one CDP round-trip per frame (~700 unnecessary IPC calls on non-transition frames) - Fix renderShader error handling: on failure, leave source scenes visible as fallback instead of hiding both scenes + GL overlay (which produced black frames) - Move mutable state declarations above resolveComposite to prevent TDZ risk on refactor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… guard - Clear staging canvas children when leaving transition window (prevents visible clone bleed-through on transparent compositions) - Clear __hf_page_composite_pending on all resolveComposite exit paths - Guard micro-screenshot paint force against beginFrame mode (CDP Page.captureScreenshot conflicts with beginFrame compositor control) - Update CLI flag description: document video/canvas limitation, remove stale PSNR claim Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4c11123 to
c878919
Compare
c6a9818 to
c1ba528
Compare
The base branch was changed.
…ions Page-side compositing is now enabled by default for SDR shader-transition renders without video content. The 6.6× speedup applies automatically — no flag needed. Auto-disables when: - HDR content detected - Alpha output (WebM/MOV/PNG-sequence) - Composition contains <video> elements (cloneNode loses playback state) - beginFrame capture mode (Linux headless) Use --no-page-side-compositing to force the Node-side layered path. Changes: - Engine config: enablePageSideCompositing defaults to true - CLI: flag default flipped to true; --no-page-side-compositing disables - Orchestrator: added composition.videos.length === 0 gate - Docker: forwards --no-page-side-compositing when explicitly disabled - Config tests updated for new default Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c878919 to
63eb5bb
Compare

Summary
Moves shader transition blending from the Node-side layered pipeline into Chrome's WebGL, using a two-phase
drawElementImagecapture protocol. Default-on for SDR shader-transition renders — no flag needed.Performance (15 scenes, 14 shader transitions, 28s @ 30fps, Mac)
Published CLI gets slower with more workers on shader transitions (coordination overhead dominates). Page-side compositing scales linearly with workers.
drawElementImage(same capture path as preview mode)How it works
layoutsubtreestaging canvases + set pending flagPage.captureScreenshot(1×1 clip) forces browser compositor to paint staging canvas childrendrawElementImagereads paint records → WebGL shader blend → GL overlay displayedWhy two phases?
Under virtual time,
requestAnimationFrameis shimmed and the browser compositor doesn't paint new elements.drawElementImagerequires a cached paint record, which only exists after a compositor pass. The micro-screenshot forces that pass without needing rAF.Gating
Auto-enabled when all conditions met:
compiled.hasShaderTransitions!hasHdrContent(HDR needs per-layer alpha compositing in Node)!needsAlpha(WebM/MOV/PNG-sequence need transparent captures)composition.videos.length === 0(cloneNodeloses video playback state)captureMode !== "beginframe"(CDP micro-screenshot conflicts with beginFrame compositor)Use
--no-page-side-compositingto force the Node-side layered path.Changes by package
engine
frameCapture.ts: two-phase protocol — detect pending flag after seek, micro-screenshot paint force, call resolveconfig.ts:enablePageSideCompositingdefaults totrueshader-transitions
engineModePageComposite.ts: full rewrite — two-phasedrawElementImagecompositor replacing the original brokendrawElementImageattempt and the intermediatehtml2canvasapproachcapture.ts: exportstabilizeTransformedBoxShadowsproducer
renderOrchestrator.ts: unified gating predicate (usePageSideCompositingForTransitions) gates both stub injection and layered-path bypass;addPreHeadScriptfor probe-created fileServer;!needsAlphaandvideos.length === 0exclusionsfileServer.ts:addPreHeadScriptmethod onFileServerHandlefor post-creation stub injectioncli
render.ts: flag default flipped totrue;--no-page-side-compositingto disabledockerRunArgs.ts: forwards--no-page-side-compositingwhen disabledTest plan
npx hyperframes(both 1 and 6 workers)--no-page-side-compositingfalls back to layered path🤖 Generated with Claude Code