ArenaNode for array-backed expressions by MilesCranmerBot · Pull Request #167 · SymbolicML/DynamicExpressions.jl

MilesCranmerBot · 2026-02-20T21:59:20Z

Prototype ArenaNode: an array-backed arena representation for expressions, implementing the full NodeInterface. Evaluation runs directly on the arena facade and matches Node on allocations (scripts/bench_arenanode.jl), with cheaper copy and a smaller memory footprint.

github-actions · 2026-02-20T22:15:12Z

Benchmark Results (Julia v1)

Time benchmarks

	master	`60d0023`...	master / `60d0023`...
eval/ComplexF32/evaluation	6.98 ± 0.52 ms	6.99 ± 0.55 ms	0.997 ± 0.11
eval/ComplexF64/evaluation	10.4 ± 0.77 ms	10.4 ± 0.96 ms	0.998 ± 0.12
eval/Float32/derivative	11.6 ± 1.5 ms	11.7 ± 2.1 ms	0.997 ± 0.22
eval/Float32/derivative_turbo	11.9 ± 2.2 ms	11.5 ± 1.7 ms	1.03 ± 0.24
eval/Float32/evaluation	2.5 ± 0.26 ms	2.49 ± 0.26 ms	1.01 ± 0.15
eval/Float32/evaluation_bumper	0.607 ± 0.017 ms	0.61 ± 0.019 ms	0.995 ± 0.043
eval/Float32/evaluation_turbo	0.542 ± 0.037 ms	0.537 ± 0.04 ms	1.01 ± 0.1
eval/Float32/evaluation_turbo_bumper	0.608 ± 0.018 ms	0.603 ± 0.017 ms	1.01 ± 0.041
eval/Float64/derivative	15.1 ± 3.6 ms	14.9 ± 3.6 ms	1.01 ± 0.35
eval/Float64/derivative_turbo	14.5 ± 3.1 ms	14.7 ± 3.9 ms	0.987 ± 0.34
eval/Float64/evaluation	2.89 ± 0.29 ms	2.91 ± 0.31 ms	0.994 ± 0.15
eval/Float64/evaluation_bumper	1.28 ± 0.046 ms	1.27 ± 0.046 ms	1.01 ± 0.052
eval/Float64/evaluation_turbo	1.04 ± 0.07 ms	1.03 ± 0.065 ms	1.01 ± 0.092
eval/Float64/evaluation_turbo_bumper	1.28 ± 0.045 ms	1.27 ± 0.046 ms	1.01 ± 0.051
utils/combine_operators/break_sharing	0.0431 ± 0.0011 ms	0.0408 ± 0.00081 ms	1.06 ± 0.035
utils/convert/break_sharing	27.9 ± 3.5 μs	27.8 ± 2.9 μs	1 ± 0.16
utils/convert/preserve_sharing	0.0999 ± 0.0095 ms	0.0992 ± 0.0098 ms	1.01 ± 0.14
utils/copy/break_sharing	27.8 ± 3.5 μs	27.2 ± 3.1 μs	1.02 ± 0.17
utils/copy/preserve_sharing	0.0997 ± 0.0092 ms	0.0995 ± 0.0099 ms	1 ± 0.14
utils/count_constant_nodes/break_sharing	12.7 ± 0.73 μs	13 ± 0.66 μs	0.975 ± 0.075
utils/count_constant_nodes/preserve_sharing	0.0854 ± 0.0083 ms	0.0851 ± 0.0067 ms	1 ± 0.13
utils/count_depth/break_sharing	13.5 ± 0.91 μs	14 ± 0.72 μs	0.964 ± 0.082
utils/count_nodes/break_sharing	12.8 ± 0.82 μs	12.7 ± 0.69 μs	1.01 ± 0.085
utils/count_nodes/preserve_sharing	0.0874 ± 0.0072 ms	0.0864 ± 0.0051 ms	1.01 ± 0.1
utils/get_set_constants!/break_sharing	0.0329 ± 0.004 ms	0.0326 ± 0.0034 ms	1.01 ± 0.16
utils/get_set_constants!/preserve_sharing	0.173 ± 0.011 ms	0.175 ± 0.012 ms	0.989 ± 0.093
utils/get_set_constants_parametric	0.0464 ± 0.0038 ms	0.0452 ± 0.0038 ms	1.03 ± 0.12
utils/has_constants/break_sharing	7.54 ± 0.8 μs	7.86 ± 1.3 μs	0.96 ± 0.19
utils/has_operators/break_sharing	2.62 ± 0.14 μs	2.68 ± 0.33 μs	0.978 ± 0.13
utils/hash/break_sharing	23.6 ± 1.5 μs	24.3 ± 2.1 μs	0.968 ± 0.1
utils/hash/preserve_sharing	0.1 ± 0.0094 ms	0.1 ± 0.0055 ms	1 ± 0.11
utils/index_constant_nodes/break_sharing	28.4 ± 1.8 μs	28.6 ± 1.9 μs	0.992 ± 0.091
utils/index_constant_nodes/preserve_sharing	0.0999 ± 0.0069 ms	0.0994 ± 0.0052 ms	1 ± 0.087
utils/is_constant/break_sharing	8.21 ± 0.89 μs	8.06 ± 0.82 μs	1.02 ± 0.15
utils/simplify_tree/break_sharing	30.6 ± 1.1 μs	30.4 ± 1 μs	1.01 ± 0.05
utils/simplify_tree/preserve_sharing	0.111 ± 0.0077 ms	0.11 ± 0.0051 ms	1.01 ± 0.084
utils/string_tree/break_sharing	0.451 ± 0.026 ms	0.456 ± 0.024 ms	0.989 ± 0.078
utils/string_tree/preserve_sharing	0.547 ± 0.028 ms	0.55 ± 0.026 ms	0.995 ± 0.069
time_to_load	0.175 ± 0.006 s	0.172 ± 0.0019 s	1.01 ± 0.036

Memory benchmarks

	master	`60d0023`...	master / `60d0023`...
eval/ComplexF32/evaluation	0.975 k allocs: 2.5 MB	0.981 k allocs: 2.51 MB	0.994
eval/ComplexF64/evaluation	0.969 k allocs: 4.94 MB	1.01 k allocs: 5.12 MB	0.964
eval/Float32/derivative	4.63 k allocs: 17.4 MB	4.69 k allocs: 17.7 MB	0.987
eval/Float32/derivative_turbo	4.7 k allocs: 17.7 MB	4.68 k allocs: 17.6 MB	1
eval/Float32/evaluation	0.984 k allocs: 1.28 MB	0.984 k allocs: 1.28 MB	1
eval/Float32/evaluation_bumper	0.303 k allocs: 0.393 MB	0.303 k allocs: 0.393 MB	1
eval/Float32/evaluation_turbo	0.972 k allocs: 1.27 MB	0.957 k allocs: 1.25 MB	1.02
eval/Float32/evaluation_turbo_bumper	0.303 k allocs: 0.393 MB	0.303 k allocs: 0.393 MB	1
eval/Float64/derivative	4.74 k allocs: 0.0346 GB	4.75 k allocs: 0.0348 GB	0.996
eval/Float64/derivative_turbo	4.83 k allocs: 0.0353 GB	4.77 k allocs: 0.0349 GB	1.01
eval/Float64/evaluation	0.999 k allocs: 2.56 MB	0.993 k allocs: 2.54 MB	1.01
eval/Float64/evaluation_bumper	0.303 k allocs: 0.771 MB	0.303 k allocs: 0.771 MB	1
eval/Float64/evaluation_turbo	0.99 k allocs: 2.53 MB	0.996 k allocs: 2.55 MB	0.994
eval/Float64/evaluation_turbo_bumper	0.303 k allocs: 0.771 MB	0.303 k allocs: 0.771 MB	1
utils/combine_operators/break_sharing	4 allocs: 0.953 kB	4 allocs: 0.953 kB	1
utils/convert/break_sharing	2 k allocs: 0.123 MB	2 k allocs: 0.123 MB	1
utils/convert/preserve_sharing	2.4 k allocs: 0.192 MB	2.4 k allocs: 0.192 MB	1
utils/copy/break_sharing	2 k allocs: 0.123 MB	2 k allocs: 0.123 MB	1
utils/copy/preserve_sharing	2.4 k allocs: 0.192 MB	2.4 k allocs: 0.192 MB	1
utils/count_constant_nodes/break_sharing	4 allocs: 0.953 kB	4 allocs: 0.953 kB	1
utils/count_constant_nodes/preserve_sharing	0.404 k allocs: 0.0696 MB	0.404 k allocs: 0.0696 MB	1
utils/count_depth/break_sharing	4 allocs: 0.953 kB	4 allocs: 0.953 kB	1
utils/count_nodes/break_sharing	4 allocs: 0.953 kB	4 allocs: 0.953 kB	1
utils/count_nodes/preserve_sharing	0.404 k allocs: 0.0696 MB	0.404 k allocs: 0.0696 MB	1
utils/get_set_constants!/break_sharing	0.898 k allocs: 25.2 kB	0.898 k allocs: 25.2 kB	1
utils/get_set_constants!/preserve_sharing	1.7 k allocs: 0.138 MB	1.7 k allocs: 0.138 MB	1
utils/get_set_constants_parametric	1.42 k allocs: 0.0663 MB	1.42 k allocs: 0.0663 MB	1
utils/has_constants/break_sharing	4 allocs: 0.203 kB	4 allocs: 0.203 kB	1
utils/has_operators/break_sharing	4 allocs: 0.203 kB	4 allocs: 0.203 kB	1
utils/hash/break_sharing	0.104 k allocs: 2.52 kB	0.104 k allocs: 2.52 kB	1
utils/hash/preserve_sharing	0.504 k allocs: 0.0711 MB	0.504 k allocs: 0.0711 MB	1
utils/index_constant_nodes/break_sharing	2.1 k allocs: 0.094 MB	2.1 k allocs: 0.094 MB	1
utils/index_constant_nodes/preserve_sharing	2.5 k allocs: 0.163 MB	2.5 k allocs: 0.163 MB	1
utils/is_constant/break_sharing	4 allocs: 0.203 kB	4 allocs: 0.203 kB	1
utils/simplify_tree/break_sharing	0.104 k allocs: 2.52 kB	0.104 k allocs: 2.52 kB	1
utils/simplify_tree/preserve_sharing	0.504 k allocs: 0.0711 MB	0.504 k allocs: 0.0711 MB	1
utils/string_tree/break_sharing	11.9 k allocs: 0.999 MB	11.9 k allocs: 0.999 MB	1
utils/string_tree/preserve_sharing	12.3 k allocs: 1.07 MB	12.3 k allocs: 1.07 MB	1
time_to_load	0.145 k allocs: 11 kB	0.145 k allocs: 11 kB	1

MilesCranmer · 2026-06-10T07:37:01Z

@MilesCranmerBot comments from Claude... pls integrate. Maybe benchmark against master too, to see if it works...

Please remove the eval shim and let evaluation run directly on the arena facade:

1. In `src/Evaluate.jl`, delete the `eval_tree_array(tree::ArenaNode{T}, ...)` method and the `import ..ArenaNodeModule: ArenaNode, tree_from_arena` line. ArenaNode is an AbstractExpressionNode, so the generic method should handle it. Converting to a heap `Node` on every call allocates O(tree size) garbage per eval, which defeats the purpose of the arena.

2. Run the full test suite with `dispatch_doctor_mode = "error"` to confirm nothing fires on the direct path. If something does fire, fix it locally with concrete accessors on the arena (same pattern as the existing `branch_equal`/`leaf_equal` specializations). Do not reintroduce any representation conversion to satisfy the linter.

3. In `test/test_arenanode.jl`, extend the fresh-subprocess allocation checks with an `eval_tree_array` case: after warmup, `@allocated eval_tree_array(atree, X, operators)` should be comparable to the `Node` equivalent on the same tree (within ~10%), not proportional to conversion overhead.

4. While you're in there: delete `ArenaScratch` (unused), and re-evaluate whether the `is_constant(::ArenaNode)` specialization in `src/NodeUtils.jl` is still needed once (2) passes. If it is needed, rewrite it to use a plain local `Vector{Int32}` stack instead of allocating an `ArenaCursor` per call.

Post the before/after numbers from `scripts/bench_arenanode.jl` in the PR description.

Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>

MilesCranmerBot · 2026-06-10T08:19:57Z

Integrated in 6b91ec70 and pushed to arenanode-prototype.

What changed:

removed the eval_tree_array(::ArenaNode, ...) heap-Node conversion shim
stabilized ArenaNode facade property access for the generic eval path
removed unused ArenaScratch
kept is_constant(::ArenaNode) with a plain Vector{Int32} stack
added the fresh-subprocess eval_tree_array allocation comparison

Validation: SR_TEST=main /usr/local/bin/julia --startup-file=no --project=. -e 'using Pkg; Pkg.test(; coverage=false)' passes: 899/899.

Benchmark (scripts/bench_arenanode.jl): Arena eval allocation went from 243352 bytes before to 241064 bytes after; Node baseline is 240216 bytes. origin/master does not have the ArenaNode prototype/script, so there is no direct master run for this benchmark.

codecov · 2026-06-10T09:46:03Z

Codecov Report

❌ Patch coverage is 91.32150% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.02%. Comparing base (efbc2aa) to head (60d0023).

Files with missing lines	Patch %	Lines
src/ArenaNode.jl	92.36%	22 Missing ⚠️
src/ArenaNodeEval.jl	92.89%	15 Missing ⚠️
ext/DynamicExpressionsLoopVectorizationExt.jl	0.00%	7 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #167       +/-   ##
===========================================
+ Coverage   59.43%   74.02%   +14.59%     
===========================================
  Files          30       32        +2     
  Lines        2682     3188      +506     
===========================================
+ Hits         1594     2360      +766     
+ Misses       1088      828      -260

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MilesCranmer

@MilesCranmerBot please address above comments.

Also, shouldn't we have a tree_mapreduce? Then we wouldn't need to write a custom is_constant and so forth.

Btw, please expand your test suite. You should actually be using the full Node Interface tests. ArenaNode should pass ALL of them.

Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>

MilesCranmerBot · 2026-06-10T12:27:47Z

Addressed the review in the latest pushes (fb799dd0, a4b86496, 6b2aaed3).

Changes:

added full NodeInterface coverage for ArenaNode
removed the custom is_constant(::ArenaNode) in favor of generic tree_mapreduce
switched the allocation checks to PerformanceTestTools.include_foreach
moved the arena @testitem into test/test_arenanode.jl
restored the main Project.toml compat changes out of this PR
added the cross-arena child-copy regression test

Local validation:

JULIA_PKG_PRECOMPILE_AUTO=0 SR_TEST=main /usr/local/bin/julia --startup-file=no --project=. -e 'using Pkg; Pkg.test(; coverage=false)' -> 855/855 passed
JULIA_PKG_PRECOMPILE_AUTO=0 /usr/local/bin/julia --startup-file=no --project=. scripts/bench_arenanode.jl -> Node and ArenaNode eval both allocate 240216 bytes

All addressed review threads are resolved. CI is pending on the latest push.

Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>

MilesCranmer

@MilesCranmerBot

See new comments.

Also.

Please test with expressions types too. eg Expression with ArenaNode - do the full expression interface tests.

Also maybe try derivatives too?

- Test Expression wrapping ArenaNode with the full ExpressionInterface - Test eval_grad_tree_array and Zygote derivatives for ArenaNode-based expressions - Cover both variable and constant gradients Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>

MilesCranmer

@MilesCranmerBot please see comments above

Also please resolve any comments that have already been addressed.

e/a/n/c/d/f/v/st and the kernel abbreviations (sp, ks, is, svs, svals, desc, nfree, fbase, doff, offs, isscal, off, rem, B) become entry, arena, node, child_idx, degree, feature, value, state, stack_top, kinds, idxs, scalar_args, scalar_vals, descriptors, num_free, free_base, dest_offset, offsets, is_scalar, offset, remaining, buffer_rows. No behavior change (2614 tests pass; bench ratios 0.55/0.43/0.39, unchanged).

MilesCranmer

Please address all comments. Resolve them individually as you address each comment locally. And before you are done, please audit your changes and ensure that all logic is maintained, and no speed is lost due to, e.g., type instabilities.

- expose ArenaNode/Arena as DynamicExpressions.ArenaNode etc.; tests use 'using DynamicExpressions: ArenaNode' instead of a module alias - _push_node! takes keyword defaults; push_constant!/push_feature! only override the relevant fields - PlanState/PlanRegisters (no leading underscore on internal types) - constrain ArenaEntry/Arena/ArenaNode to T<:Number - split _arena_eval into _materialize_features! and _write_root_to_output! - readable planner names: feature_mask, scalar_stack, permanent_stack, arity_mask, num_recyclable_args; the feature-slot computation lives in a named _feature_slot helper shared with _push_leaf! - iszero/isone instead of == 0 / != 1 comparisons - shorten testitem names so headers fit on one line 2614 tests pass; bench ratios 0.56/0.47/0.37 vs Node (unchanged).

_exec_op! is now a thin compile-time arity dispatch; _exec_op_arity! pops the operand descriptors and branches once into _fold_constant_args! (the scalar-lane fold, written with guard clauses) or _run_op_kernel! (slot recycling, destination allocation, operand validation, kernel dispatch). Max nesting depth drops from 6 to 2. No behavior or speed change: 2614 tests pass, and an interleaved A/B against the previous commit gives identical benchmark ratios.

…ding

set_child! and set_children! both duplicated the validate-then-link-or-copy logic; it now lives in one helper with the cross-arena copy rationale on it.

@inbounds

_plan_scratch previously re-implemented the executor's stack machine as raw bitmask arithmetic — the two had to agree exactly, since the executor trusts the planner's slot counts under @inbounds, but nothing tied them together. The kind and recycling policy now lives once: _leaf_kind, _op_result_kind, and _is_recyclable are called by both sides, and the planner's bitmask pair becomes a KindStack with push/pop/query operations named after the executor's concepts. Drift between the passes is now a compile error or an obviously-local edit instead of a silent buffer-size mismatch. No behavior change: 2614 tests pass, bench ratios within session noise.

…inbounds

…inbounds in set_scalar_constants! get_arena/get_index are now the only getfield call sites: internal code calls them instead of raw getfield (named, greppable) and instead of property access (functions reachable from getproperty, like get_child via the :l/:r branches, must not cycle back through it). Arena has no custom getproperty, so its getfield(arena, :nodes) calls become plain dot access. set_scalar_constants! gets its @inbounds back: refs are produced by get_scalar_constants, and callers passing stale refs are out of contract.

@inline

_pack_descriptor/_descriptor_kind/_descriptor_slot replace the raw kind|slot<<2 packing, _feature_bit names the used-feature bitset position, and _slot_offset replaces _slotoff. All @inline; perf-neutral (verified by interleaved A/B against the pre-refactor base 74438bd: with early_exit disabled the two are indistinguishable; the ~5-8% ratio gap with early_exit on is entirely the operand-validation correctness fix, not the refactors).

The parity fix had moved slot validation to consumption, re-summing feature columns at every use (~5-8% relative vs the pre-fix base). Validation now happens at equivalent-but-cheaper points: - features: one check at materialization (a separate pass over the just-written, cache-hot slot; fusing the sum into the copy loop spoiled its memcpy pattern). Skipped for single-leaf trees, where no operator consumes the feature. - intermediates: checked at production, right after the kernel writes them. Every non-root intermediate is consumed exactly once, so this rejects the same trees as consumption checks. - scalar operands: still checked at consumption (O(1)). - root output: never checked, as in the generic evaluator. Also adds mark_compact!/invalidate_compact!/is_compact instead of raw arena.compact[] writes. Interleaved A/B vs 74438bd: ratios now overlap at all sizes (0.69-0.72 / 0.59-0.64 / 0.57-0.61). 2616 tests pass, including a new bare-feature-root NaN parity case.

@enum

@enum OperandKind::UInt8 FoldedConstant PinnedSlot ScratchSlot replaces the raw _K_SCALAR/_K_PSLOT/_K_SLOT byte constants. Interleaved A/B shows no measurable cost: packing and comparisons compile to the same integer ops, and the membership check in the unpacking constructor is branch-predicted away. 2616 tests pass.

This reverts commit bd0a2c7.

Only the three types (ArenaEntry, Arena, ArenaNode) keep docstrings; every internal function's docstring becomes a short '#' comment, so Documenter does not pick them up.

@inline

All 38 @inline annotations removed (interleaved A/B shows the inliner does the same job unannotated; the zero-allocation copy tests still pass). The one survivor is the call-site @inline on the predicate in _arena_any, which forces the closure to specialize into the recursion. Also: setproperty! branches share a single trailing 'return value'; getproperty loads the entry once instead of in five branches; trivial helpers collapse to one-line definitions; _nullable_child and setproperty! use get_arena/get_index like everything else.

Mutation hot path; inlining guarantees the constant-symbol branch folds at the call site rather than relying on interprocedural constprop.

Documents the self-buffering experiment result: the generic fused unbuffered evaluator benches even with a plan running on a self-allocated pool, while allocating half the bytes, so the plan stays buffer-gated. Also shows buffered Node is slower than unbuffered Node (strided buffer rows), which is the headroom the plan's contiguous slots reclaim.

ArenaNode.jl keeps the types and the AbstractExpressionNode interface (~510 lines); ArenaNodeEval.jl holds the buffered plan evaluator (~530 lines), included from within the module. Also drops a tutorial-grade comment and records why the plan does not self-allocate a pool.

…t compacting copy CI: - _scalar_degn binds T via Tuple{T,Vararg{T}} and derives the arity from the tuple type, fixing Aqua's unbound-args failure - formatted with JuliaFormatter v1 (the CI pin; local runs had used v2) Investigation of the slow any(f, ::ArenaNode) found the generic AbstractNode traversal machinery is already fast on ArenaNode (within 15% of Node, zero allocations) -- the custom _arena_any/_arena_mapreduce/ is_constant overrides were unnecessary, and the closure-based recursion in the any override was nondeterministically up to 8x slower (inlining of the Base.any layers is compilation-order dependent). So: - the traversal overrides are deleted; ArenaNode rides the generic machinery, except a 3-line compact-arena flat scan for (1.4-1.7x faster than Node, and has_constants/is_constant inherit it) - the unset-child UndefRefError guard moves from the deleted override into _load_entry at the facade layer, so poison facades throw like Node's undefined fields for every generic consumer Copying out of a non-compact arena is now an entry-level compacting write (_write_subtree! into a pre-sized vector; no facade traversal, no per-push growth checks), also used by copy_into! and cross-arena attachment. copy(::ArenaNode): compact 5.7-11x faster than Node at n=15-255; non-compact crosses over near n=20 (1.1-1.2x above). ArenaNodeEval.jl is now its own module included from DynamicExpressions.jl rather than a nested include. The eval benchmark also reports unbuffered paths.

Property tests (using the existing supposition_utils generators): Node -> ArenaNode -> Node round-trips with all read-only interface results equal; evaluation matches Node (unbuffered exactly; the buffered plan's ok flag is best-effort, values must agree whenever both report ok); random valid mutations applied to both representations keep them equivalent; copy re-compacts and preserves evaluation.

…_nodes hash went through the generic tree_mapreduce machinery (0.64x vs Node); direct recursion over entries through the shared leaf_hash/branch_hash combinators recovers it to 0.92x -- the rest is the hash arithmetic itself, which must be identical across representations since cross-representation == requires equal hashes (now pinned by the supposition round-trip property). count_constant_nodes on a compact arena is a flat count over the entry vector (1.6x vs Node).

The generic traversal skeleton re-reads the entry through the facade for every field access and shuffles Nullable-wrapped children, which taxed all tree_mapreduce clients at once (hash 0.64x, collect 0.70x, count_depth 0.76x, count_constant_nodes 0.73x vs Node). One override of the primitive -- read each entry exactly once, dispatch on the local degree, hand f the facade -- normalizes the whole family (1.0x for collect/count_depth/ count_constant_nodes, 0.94x index_constant_nodes, 0.85x hash) in both compact and non-compact arenas. This replaces the per-function fixes: the previous hash and count_constant_nodes specializations are deleted, and since the generic hash combinators now just run on the fast skeleton, hash equality across representations holds again (pinned in the supposition round-trip property).

At 31-node trees the generic any falls to 0.52x vs Node on non-compact arenas (facade re-reads per field, like the tree_mapreduce skeleton); the entry-level recursion recovers to 0.67x. The remainder is f reading fields through the facade (arena -> nodes -> entry per access vs Node's single deref), inherent to the facade contract; compact arenas bypass it entirely via the flat scan (1.8x ahead of Node at n=31).

A linear postfix scan over compact arenas benches ~20% faster for pure reductions (hash 1.03x vs Node), but f-application order (parent first, siblings left to right) is observable API through collect/filter/foreach: the postfix scan reorders them, applying positional mutations to different nodes than Node would. Caught by the supposition properties with shrunk counterexamples (abs(5.0e-324), one set_val). Recursion stays; the comment on _entry_mapreduce documents the constraint.

The flat scan was the recursive traversal with recursion disabled, so the two share a single _entry_any whose children loop is gated on a static Val{recurse} flag: the compact path drives it from a flat loop with the recursion compiled out, the non-compact path recurses. Performance unchanged (compact 1.6-2.0x vs Node at n=31, non-compact 0.63-0.68x).

@inline

…tant ops One driver for order-free whole-tree visits, with both walks side by side and the unused one compiled out via a static flag: compact arenas walk the entry array (indices valid by construction, so the load is uncheckable and dead when unused); non-compact arenas walk the children pointers with the per-degree dispatch unrolled like the generic any (a children loop benches ~40% slower; a worklist allocates and loses at SR sizes; closure-driven recursion through Base.any inlines nondeterministically -- all measured). any/count_nodes/get_scalar_constants are one-line clients, replacing the recursive _entry_any, the count_nodes invoke fallback, and the filter_map path. Also, on the canonical-form principle (pure or order-free ops on a compact arena may use the entries directly): - == between compact trees is a flat field-compare (equal trees have identical entry vectors); 2.7x vs Node, from 1.1x - count_constant_nodes: flat count when compact (2.0x), entry-level mapreduce otherwise - set_scalar_constants! writes entries directly: val-only writes cannot change structure, so the setindex! degree/children diff is skipped - @inline restored on getproperty (the removal had been validated against an eval-only benchmark that never exercises property reads) n=31 vs Node: compact 1.4-3.4x on ten ops, floor 0.81x (hash); non-compact 0.85-1.75x, floor 0.65x (copy_into!, the re-compaction price that wins back above n~20).

MilesCranmer · 2026-06-12T23:52:42Z

@codex review

Older Supposition versions resolved by the downgrade-compat CI job lack length(::Data.OneOf); a single multi-arg map selecting the mutation kind by integer generates the same tuples on any version.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6fda350b22

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-13T00:03:09Z

+function copy_into!(
+    dest::Arena{T,D},
+    src::ArenaNode{T,D};
+    ref::Union{Nothing,Base.RefValue{<:Integer}}=nothing,


Honor the copy_into! ref counter

When callers pass ref to copy_into!, this ArenaNode overload currently accepts it but never resets or increments it, so ref[] remains stale/zero even after a successful copy. The generic preallocation path uses this counter to report how many nodes were consumed and tests assert that behavior for other node types; ArenaNode should either update it to the copied node count (including the dest === src.arena fast path) or not expose the keyword.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-13T00:03:09Z

+        i = refs[j]
+        nodes[i] = _replace(nodes[i]; val=convert(T, constants[j]))
+    end
+    return nothing


Return the mutated tree from set_scalar_constants!

For ArenaNode callers, set_scalar_constants! now returns nothing, while the existing generic implementation returns the mutated tree. Code that follows the established API by chaining or reassigning the result, e.g. tree = set_scalar_constants!(tree, vals, refs), will lose the ArenaNode even though the constants were updated successfully.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-13T00:03:09Z

+end
+
+function set_child!(node::ArenaNode{T,D}, child::AbstractNode{D}, i::Int) where {T,D}
+    idx = _resolve_child_index!(node, child)


Validate child indices before copying

When i is out of range and child comes from another arena, this resolves the child by copying it into node.arena before the bounds-checked entry.children[i] access throws. That leaves an orphaned subtree appended to the parent arena after a failed set_child!, unlike the generic tuple-based implementation which throws without mutating state.

Useful? React with 👍 / 👎.

MilesCranmerBot force-pushed the arenanode-prototype branch from e8acc61 to 847c2c6 Compare February 21, 2026 12:56

MilesCranmerBot added 10 commits June 9, 2026 09:09

Prototype arena-backed ArenaNode

4871734

Add reusable ArenaCursor preorder traversal

5291356

Add postfix stack-based utilities inspired by symbolic_regression.rs

93c2422

Add postfix validity check

b5b4116

ArenaNode: postfix roundtrip + minimal commutative rewrite

79b6fe1

docs/tests: clarify postfix utilities are for roundtrips only

9285357

ArenaNode: allow cross-arena set_child!/set_children! via copy

3e36294

ArenaNode: traversal-based postfix emit + cursor-based rewrite

7dcd653

fix: stabilize ArenaNode prototype traversal and tests

c6f37f3

fix: stabilize ArenaNode follow-up

04e763d

MilesCranmerBot force-pushed the arenanode-prototype branch from 847c2c6 to 04e763d Compare June 9, 2026 09:23

fix: evaluate ArenaNode directly

6b91ec7

Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>

fix: stabilize ArenaNode CI follow-up

1427d4a

MilesCranmer reviewed Jun 10, 2026

View reviewed changes

Comment thread src/ArenaNode.jl

Comment thread src/ArenaNode.jl Outdated

Comment thread src/ArenaNode.jl Outdated

Comment thread test/test_arenanode.jl Outdated

Comment thread test/unittest.jl Outdated

Comment thread Project.toml Outdated

Comment thread src/ArenaNode.jl Outdated

MilesCranmerBot and others added 3 commits June 10, 2026 12:11

test: expand ArenaNode interface coverage

fb799dd

Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>

chore: restore compat bounds

a4b8649

Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>

style: format ArenaNode follow-up

6b2aaed

MilesCranmer reviewed Jun 10, 2026

View reviewed changes

Comment thread src/NodeUtils.jl Outdated

fix: revert is_constant rewrite

b790948

Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>

MilesCranmer requested changes Jun 10, 2026

View reviewed changes

Comment thread src/ArenaNode.jl Outdated

Comment thread src/ArenaNode.jl Outdated

Comment thread src/ArenaNode.jl Outdated

Comment thread src/ArenaNode.jl Outdated

Comment thread src/ArenaNode.jl Outdated

Comment thread src/ArenaNode.jl Outdated

Comment thread src/ArenaNode.jl Outdated

MilesCranmerBot and others added 2 commits June 10, 2026 19:51

fix: stabilize ArenaNode follow-up CI

694d46a

MilesCranmer requested changes Jun 11, 2026

View reviewed changes

Comment thread src/ArenaNode.jl Outdated

Comment thread src/ArenaNode.jl Outdated

Comment thread scripts/bench_arenanode.jl Outdated

fix: address ArenaNode review cleanup

f90e4f6

MilesCranmer reviewed Jun 12, 2026

View reviewed changes

MilesCranmer added 23 commits June 12, 2026 19:10

refactor(ArenaNode): replace out() thunk with a plain output view bin…

fc73362

…ding

refactor(ArenaNode): share child attachment via _resolve_child_index!

622a787

set_child! and set_children! both duplicated the validate-then-link-or-copy logic; it now lives in one helper with the cross-arena copy rationale on it.

Revert "refactor(ArenaNode): OperandKind enum for descriptor kinds"

7d95417

This reverts commit bd0a2c7.

docs(ArenaNode): plain comments instead of docstrings on internals

8fb31a2

Only the three types (ArenaEntry, Arena, ArenaNode) keep docstrings; every internal function's docstring becomes a short '#' comment, so Documenter does not pick them up.

perf(ArenaNode): keep @inline on setproperty!

6695e1c

Mutation hot path; inlining guarantees the constant-symbol branch folds at the call site rather than relying on interprocedural constprop.

test(ArenaNode): avoid Data.OneOf in the mutation generator

6fda350

Older Supposition versions resolved by the downgrade-compat CI job lack length(::Data.OneOf); a single multi-arg map selecting the mutation kind by integer generates the same tuples on any version.

chatgpt-codex-connector Bot reviewed Jun 13, 2026

View reviewed changes

MilesCranmerBot added 2 commits June 13, 2026 09:37

test: raise downgraded test dependency bounds

25276ff

fix: address ArenaNode API regressions

60d0023

Conversation

MilesCranmerBot commented Feb 20, 2026 • edited by MilesCranmer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results (Julia v1)

Uh oh!

MilesCranmer commented Jun 10, 2026

Uh oh!

MilesCranmerBot commented Jun 10, 2026

Uh oh!

codecov Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MilesCranmer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MilesCranmerBot commented Jun 10, 2026

Uh oh!

Uh oh!

MilesCranmer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MilesCranmer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MilesCranmer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MilesCranmer commented Jun 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

MilesCranmerBot commented Feb 20, 2026 •

edited by MilesCranmer

Loading

github-actions Bot commented Feb 20, 2026 •

edited

Loading

codecov Bot commented Jun 10, 2026 •

edited

Loading