Add scene-graph traversal benchmark for the incremental scene index#11
Draft
hmaarrfk wants to merge 2 commits into
Draft
Add scene-graph traversal benchmark for the incremental scene index#11hmaarrfk wants to merge 2 commits into
hmaarrfk wants to merge 2 commits into
Conversation
<details><summary>Claude's draft</summary> Adds `benchmarks/bm_scene_index.py`, which measures the per-render scene-graph traversal cost that the incremental scene-index work (pygfx#1298) targets. `FlatScene` previously walked the whole tree every render to bucket lights/shadow-casters/renderables and propagate `group_order`; the change maintains that categorization at scene-mutation time so a steady-state render loop no longer pays the walk. The file has two parts: - A pure-CPU micro-benchmark (`main()`) that times `FlatScene(scene)` construction directly across flat, grouped, and deep scene shapes. GC is disabled during sampling and 5 warm-up iterations settle the per-object transform/uniform updates common to both code paths, so the median reflects the traversal/categorization delta. - A full offscreen-render control (`@benchmark` functions) that confirms no end-to-end regression. At realistic sizes a full frame is dominated by GPU submission and per-object uniform updates, so the traversal saving is below the render noise floor — these exist to show "no regression", not a speedup. `results/scene_index/` records A/B runs on two machines (Apple M5 Max and a low-power Intel i7-1180G7). The win scales with scene structure — flat ~1.1-1.3x, grouped ~1.2-1.4x, deep up to ~6.3-6.9x — and is larger on the weaker CPU, where the Python scene-walk is a bigger share of frame time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume 32529552-c0e8-4a5d-82f4-572687adb193 ``` </details>
<details><summary>Claude's draft</summary> Adds `benchmarks/bm_render_loop.py`, which measures the per-frame `FlatScene(scene, view_matrix) + sort()` cost for a large scene (3000 groups x 10 = ~33k objects) with a moving camera and without one — the CPU work the renderer does each frame before issuing GPU draws. `results/scene_index/render_loop_optimizations.md` records the per-commit progression (main -> tip of the render-loop work on pygfx PR #13) on two machines: Apple M5 Max and a low-power Intel i7-1180G7. Each commit is attributed (scene-index, vectorize, cache, skip-id) with per-step and cumulative speedups, plus the raw `*_percommit_*.txt` output. For a moving camera the cumulative speedup is ~15.7x (M5) and ~14x (nano); the absolute saving is largest on the weak CPU (439 ms -> 31 ms per frame). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume 32529552-c0e8-4a5d-82f4-572687adb193 ``` </details>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
benchmarks/bm_scene_index.py, a benchmark for the per-render scene-graph traversal cost that the incremental scene-index work targets (pygfx issue #1298, PR hmaarrfk/pygfx#13).FlatScenehistorically walked the whole scene tree on every render to bucket lights/shadow-casters/renderables and propagategroup_order. The change maintains that categorization incrementally at scene-mutation time, so a steady-state render loop no longer pays the walk.What's in here
main()) — timesFlatScene(scene)construction directly across flat, grouped, and deep scene shapes. GC is disabled during sampling and 5 warm-up iterations settle the per-object transform/uniform updates common to both code paths, so the median isolates the traversal/categorization delta. No GPU required.@benchmarkfunctions) — a no-regression / smoke check. At realistic sizes a full frame is dominated by GPU submission and per-object uniform updates, so the traversal saving is below the render noise floor. These are here to show no regression, not a speedup.Results (
results/scene_index/)A/B of
pygfx@main(baseline) vs theincremental-scene-indexbranch, median of 200FlatSceneconstructions.Apple M5 Max: deep_500 6.32×, deep_200 3.74×, grouped_500x10 1.26×, flat_10000 1.14×.
Intel i7-1180G7 (low-power): deep_500 6.90×, deep_200 3.92×, grouped_200x10 1.38×, flat_10000 1.24×.
The win scales with scene structure (flat ~1.1–1.3×, grouped ~1.2–1.4×, deep up to ~6.9×) and is larger on the weaker CPU, where the Python scene-walk is a bigger share of frame time — exactly the low-power target. Full tables in
results/scene_index/RESULTS.md.🤖 Generated with Claude Code