Add scene-graph traversal benchmark for the incremental scene index by hmaarrfk · Pull Request #11 · pygfx/pygfx-benchmarks

hmaarrfk · 2026-06-07T18:02:30Z

Adds benchmarks/bm_scene_index.py, a benchmark for the per-render scene-graph traversal cost that the incremental scene-index work targets (pygfx issue #1298, PR hmaarrfk/pygfx#13).

FlatScene historically walked the whole scene tree on every render to bucket lights/shadow-casters/renderables and propagate group_order. The change maintains that categorization incrementally at scene-mutation time, so a steady-state render loop no longer pays the walk.

What's in here

Pure-CPU micro-benchmark (main()) — times FlatScene(scene) construction directly across flat, grouped, and deep scene shapes. GC is disabled during sampling and 5 warm-up iterations settle the per-object transform/uniform updates common to both code paths, so the median isolates the traversal/categorization delta. No GPU required.
Full-render control (@benchmark functions) — a no-regression / smoke check. At realistic sizes a full frame is dominated by GPU submission and per-object uniform updates, so the traversal saving is below the render noise floor. These are here to show no regression, not a speedup.

Results (`results/scene_index/`)

A/B of pygfx@main (baseline) vs the incremental-scene-index branch, median of 200 FlatScene constructions.

Apple M5 Max: deep_500 6.32×, deep_200 3.74×, grouped_500x10 1.26×, flat_10000 1.14×.
Intel i7-1180G7 (low-power): deep_500 6.90×, deep_200 3.92×, grouped_200x10 1.38×, flat_10000 1.24×.

The win scales with scene structure (flat ~1.1–1.3×, grouped ~1.2–1.4×, deep up to ~6.9×) and is larger on the weaker CPU, where the Python scene-walk is a bigger share of frame time — exactly the low-power target. Full tables in results/scene_index/RESULTS.md.

🤖 Generated with Claude Code

<details><summary>Claude's draft</summary> Adds `benchmarks/bm_scene_index.py`, which measures the per-render scene-graph traversal cost that the incremental scene-index work (pygfx#1298) targets. `FlatScene` previously walked the whole tree every render to bucket lights/shadow-casters/renderables and propagate `group_order`; the change maintains that categorization at scene-mutation time so a steady-state render loop no longer pays the walk. The file has two parts: - A pure-CPU micro-benchmark (`main()`) that times `FlatScene(scene)` construction directly across flat, grouped, and deep scene shapes. GC is disabled during sampling and 5 warm-up iterations settle the per-object transform/uniform updates common to both code paths, so the median reflects the traversal/categorization delta. - A full offscreen-render control (`@benchmark` functions) that confirms no end-to-end regression. At realistic sizes a full frame is dominated by GPU submission and per-object uniform updates, so the traversal saving is below the render noise floor — these exist to show "no regression", not a speedup. `results/scene_index/` records A/B runs on two machines (Apple M5 Max and a low-power Intel i7-1180G7). The win scales with scene structure — flat ~1.1-1.3x, grouped ~1.2-1.4x, deep up to ~6.3-6.9x — and is larger on the weaker CPU, where the Python scene-walk is a bigger share of frame time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume 32529552-c0e8-4a5d-82f4-572687adb193 ``` </details>

<details><summary>Claude's draft</summary> Adds `benchmarks/bm_render_loop.py`, which measures the per-frame `FlatScene(scene, view_matrix) + sort()` cost for a large scene (3000 groups x 10 = ~33k objects) with a moving camera and without one — the CPU work the renderer does each frame before issuing GPU draws. `results/scene_index/render_loop_optimizations.md` records the per-commit progression (main -> tip of the render-loop work on pygfx PR #13) on two machines: Apple M5 Max and a low-power Intel i7-1180G7. Each commit is attributed (scene-index, vectorize, cache, skip-id) with per-step and cumulative speedups, plus the raw `*_percommit_*.txt` output. For a moving camera the cumulative speedup is ~15.7x (M5) and ~14x (nano); the absolute saving is largest on the weak CPU (439 ms -> 31 ms per frame). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume 32529552-c0e8-4a5d-82f4-572687adb193 ``` </details>

hmaarrfk marked this pull request as draft June 7, 2026 18:20

hmaarrfk mentioned this pull request Jun 7, 2026

Avoid scene-graph traversal on every render pygfx/pygfx#1298

Open

hmaarrfk mentioned this pull request Jun 7, 2026

Maintain scene-graph index incrementally; render reads flat lists hmaarrfk/pygfx#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scene-graph traversal benchmark for the incremental scene index#11

Add scene-graph traversal benchmark for the incremental scene index#11
hmaarrfk wants to merge 2 commits into
pygfx:mainfrom
hmaarrfk:scene-graph-index-benchmark

hmaarrfk commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hmaarrfk commented Jun 7, 2026

What's in here

Results (results/scene_index/)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Results (`results/scene_index/`)