Skip to content

Add scene-graph traversal benchmark for the incremental scene index#11

Draft
hmaarrfk wants to merge 2 commits into
pygfx:mainfrom
hmaarrfk:scene-graph-index-benchmark
Draft

Add scene-graph traversal benchmark for the incremental scene index#11
hmaarrfk wants to merge 2 commits into
pygfx:mainfrom
hmaarrfk:scene-graph-index-benchmark

Conversation

@hmaarrfk

@hmaarrfk hmaarrfk commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Adds benchmarks/bm_scene_index.py, a benchmark for the per-render scene-graph traversal cost that the incremental scene-index work targets (pygfx issue #1298, PR hmaarrfk/pygfx#13).

FlatScene historically walked the whole scene tree on every render to bucket lights/shadow-casters/renderables and propagate group_order. The change maintains that categorization incrementally at scene-mutation time, so a steady-state render loop no longer pays the walk.

What's in here

  • Pure-CPU micro-benchmark (main()) — times FlatScene(scene) construction directly across flat, grouped, and deep scene shapes. GC is disabled during sampling and 5 warm-up iterations settle the per-object transform/uniform updates common to both code paths, so the median isolates the traversal/categorization delta. No GPU required.
  • Full-render control (@benchmark functions) — a no-regression / smoke check. At realistic sizes a full frame is dominated by GPU submission and per-object uniform updates, so the traversal saving is below the render noise floor. These are here to show no regression, not a speedup.

Results (results/scene_index/)

A/B of pygfx@main (baseline) vs the incremental-scene-index branch, median of 200 FlatScene constructions.

Apple M5 Max: deep_500 6.32×, deep_200 3.74×, grouped_500x10 1.26×, flat_10000 1.14×.
Intel i7-1180G7 (low-power): deep_500 6.90×, deep_200 3.92×, grouped_200x10 1.38×, flat_10000 1.24×.

The win scales with scene structure (flat ~1.1–1.3×, grouped ~1.2–1.4×, deep up to ~6.9×) and is larger on the weaker CPU, where the Python scene-walk is a bigger share of frame time — exactly the low-power target. Full tables in results/scene_index/RESULTS.md.

🤖 Generated with Claude Code

<details><summary>Claude's draft</summary>

Adds `benchmarks/bm_scene_index.py`, which measures the per-render
scene-graph traversal cost that the incremental scene-index work
(pygfx#1298) targets. `FlatScene` previously walked the whole tree every
render to bucket lights/shadow-casters/renderables and propagate
`group_order`; the change maintains that categorization at scene-mutation
time so a steady-state render loop no longer pays the walk.

The file has two parts:

- A pure-CPU micro-benchmark (`main()`) that times `FlatScene(scene)`
  construction directly across flat, grouped, and deep scene shapes. GC is
  disabled during sampling and 5 warm-up iterations settle the per-object
  transform/uniform updates common to both code paths, so the median
  reflects the traversal/categorization delta.
- A full offscreen-render control (`@benchmark` functions) that confirms no
  end-to-end regression. At realistic sizes a full frame is dominated by GPU
  submission and per-object uniform updates, so the traversal saving is
  below the render noise floor — these exist to show "no regression", not a
  speedup.

`results/scene_index/` records A/B runs on two machines (Apple M5 Max and a
low-power Intel i7-1180G7). The win scales with scene structure — flat
~1.1-1.3x, grouped ~1.2-1.4x, deep up to ~6.3-6.9x — and is larger on the
weaker CPU, where the Python scene-walk is a bigger share of frame time.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Resume this Claude session:
```
claude --resume 32529552-c0e8-4a5d-82f4-572687adb193
```
</details>
<details><summary>Claude's draft</summary>

Adds `benchmarks/bm_render_loop.py`, which measures the per-frame
`FlatScene(scene, view_matrix) + sort()` cost for a large scene (3000
groups x 10 = ~33k objects) with a moving camera and without one — the CPU
work the renderer does each frame before issuing GPU draws.

`results/scene_index/render_loop_optimizations.md` records the per-commit
progression (main -> tip of the render-loop work on pygfx PR #13) on two
machines: Apple M5 Max and a low-power Intel i7-1180G7. Each commit is
attributed (scene-index, vectorize, cache, skip-id) with per-step and
cumulative speedups, plus the raw `*_percommit_*.txt` output.

For a moving camera the cumulative speedup is ~15.7x (M5) and ~14x (nano);
the absolute saving is largest on the weak CPU (439 ms -> 31 ms per frame).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Resume this Claude session:
```
claude --resume 32529552-c0e8-4a5d-82f4-572687adb193
```
</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant