Fixed-Step Simulation Hot-Path Improvements by CarlOlsson · Pull Request #21 · Gradient-Aerospace/SystemsOfSystems.jl

CarlOlsson · 2026-04-15T11:48:44Z

Summary

This PR speeds up fixed-step simulation in SystemsOfSystems.jl for the RK4(dt = 0.004 s) use case that drives the aircraft control-analysis benchmark.

The main goal was to remove generic immutable-tree overhead from the inner integration loop so that the top-level simulation can run well above 100x real time when logging is disabled.

Problem

The original fixed-step RK4 path still spent a large fraction of its time in framework code rather than model physics. The main issues were:

generic NamedTuple propagation through nested model-state trees on every RK stage
repeated work for deterministic random-variable subtrees
unnecessary discrete-update work between actual event boundaries
outer-loop stepping overhead when fixed-step RK4 could safely consume its own internal substeps

These are all hot-path problems. They matter at 250 Hz because a 120 s simulation requires 30000 solver steps, and RK4 evaluates the RHS four times per step.

What Changed

1. Specialized state propagation

The propagation helpers in src/Solvers.jl were changed from generic map-based NamedTuple reconstruction to generated, field-specialized builders for:

single-derivative propagation
multi-derivative propagation used by adaptive integrators
nested submodel propagation

This removes the runtime completion of missing submodel outputs and the repeated generic tuple plumbing from the solver hot path.

2. Deterministic random-subtree fast paths

TypedModelDescription now stores:

has_continuous_random_subtree
has_discrete_random_subtree

These flags are computed once during strip_fluff_from_model_description. draw_wc and draw_wd now return immediately for deterministic subtrees instead of rebuilding state descriptions that do not change.

3. Empty-update fast paths

The sim loop now short-circuits when there is no real work to do:

empty RatesOutput propagation returns the original state
empty UpdatesOutput returns the original state

4. Event-only discrete updates

step! now runs discrete updates only at actual user-requested or model-requested event boundaries. Fixed-step internal solver substeps no longer trigger discrete-update work that cannot change anything.

5. Let fixed-step RK4 own its internal substeps

When logging is disabled and monitors are empty, step! now advances the outer loop only to true event boundaries and lets RungeKutta4 consume its internal dt = 0.004 s substeps inside solve.

This avoids forcing the top-level sim loop to re-enter framework logic for every fixed substep.

Why These Changes

The key observation from profiling was that the fixed-step benchmark was still paying framework costs that scale with solver stage count:

rebuild nested immutable state trees
rebuild nested submodel rate trees
re-run boundary/event logic more often than necessary

The physics model was not the only bottleneck. The framework itself needed to become more monomorphic and allocation-free in the inner loop.

Validation

Micro-level

During the investigation, the hot propagate(msd, dt, ro) path on the benchmark model went from roughly:

about 14.4 us per call
about 28432 B allocated per call

to roughly:

about 0.93 us per call
0 B allocated per call

after specialization.

End-to-end

With the matching GradientModels.jl changes applied, the full 120 s benchmark at RK4(dt = 0.004 s) and logging disabled improved from about:

3.73 s wall time
about 32x real time

to warmed runs of about:

0.77 s to 0.81 s
about 148x to 155x real time

This clears the 100x target.

Compatibility

This PR does not change the public simulation API. It changes internal execution behavior only:

model definitions are unchanged
initialization contracts are unchanged
solver option types are unchanged
YAML-facing model configuration is unchanged

The new fast path is activated by runtime conditions, primarily fixed-step RK4 with logging disabled and no monitors.

Risks and Follow-Ups

The BasicLog path is now the next major bottleneck for logging-enabled runs.
If logging performance becomes the next target, the right follow-up is to optimize Logs.jl and TimeSeries.jl, not the solver core.

tuckermcclure

Some good stuff, some questionable stuff, and some big changes. I love the short-circuiting that was added for random subtrees and empty RatesOutputs and UpdatesOutputs.

I'm pretty unsure about the change to when discrete updates run (now: always; here: only when any model requests one). That might be a good paradigm, but it is different.

There's a failure in CI about a non-pure generated function, which spooks me. I'd need to spend more time looking at that.

tuckermcclure · 2026-04-15T23:50:13Z

        return (t_last, msd, stop, t_next_suggested)
    end

+    run_discrete_update = (t_next == t_next_from_user) || (t_next == t_next_from_models)


This is a big difference. Before, the discrete updates always run after a continuous-time step. With this update, they only run when any model wants a discrete step. That is, at least some model has to explicitly request a discrete step for any model to actually get one. That might be fine, but it's a big switch from "discrete steps always happen at the end of continuous-time steps."

tuckermcclure · 2026-04-15T23:50:43Z

+
    # Make the discrete draws.
-    msd = draw_wd(t_next, ommd, msd)
+        msd = draw_wd(t_next, ommd, msd)


It's truly bizarre that the comment wasn't indented. Why is Codex so bad about comments?

tuckermcclure · 2026-04-15T23:55:55Z

+    # the models requested. For fixed-step RK4 with logging/monitors disabled, let the solver
+    # consume its own substeps internally so this outer loop advances only at real event
+    # boundaries.
+    t_next = if mh === nothing && isempty(monitors) && Solvers.handles_internal_substepping(solver)


Here, the internal sub-stepping is only used when we're not logging. Sure, that makes the run faster, but I wonder how often we'll run without logging. I was running without logging primarily to help me zoom in on inefficiencies. With this change, the models actually run differently (though the results will be the same). I'm not sure how much this helps, as it is. However, there might be a feature for this, like only_log_on_discrete_samples. Did we really need the continuous-time outputs on every single point between the discrete updates? Then, this feature would help quite a bit.

tuckermcclure · 2026-04-15T23:56:49Z

 end

 function update(msd::ModelStateDescription, updates_output::UpdatesOutput)
+    is_empty_updates_output(updates_output) && return msd


I like these little short circuits. We might also have a singleton for an empty RatesOutput and empty DiscreteOutput and simply compare to those.

tuckermcclure · 2026-04-15T23:59:55Z

+    is_empty_rates_output(k1) && is_empty_rates_output(k2) &&
+        is_empty_rates_output(k3) && is_empty_rates_output(k4) && return msd


If any of these is empty when the others aren't, that would be an error, so it's sufficient to check only one.

tuckermcclure · 2026-04-16T00:03:10Z

+    return map(
+        (sm, ro1, ro2, ro3, ro4) -> propagate_rk4(sm, dt, ro1, ro2, ro3, ro4),
+        submodels, complete_m1, complete_m2, complete_m3, complete_m4,


I'm surprised about this part! The multi-input map often seems to optimize more poorly.

tuckermcclure · 2026-04-16T00:06:33Z

 end

-function propagate_set(x::T1, dt, x_dot::T2) where {T1, T2}
+@generated function propagate_set(x::T1, dt, x_dot::T2) where {T1, T2}


I was pretty happy that I had no allocations for RK4 without generated functions! I wonder if this is actually an improvement.

prototype speedup

737ad5a

tuckermcclure reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed-Step Simulation Hot-Path Improvements#21

Fixed-Step Simulation Hot-Path Improvements#21
CarlOlsson wants to merge 1 commit into
mainfrom
carl/speed_up_test

CarlOlsson commented Apr 15, 2026

Uh oh!

tuckermcclure left a comment

Uh oh!

tuckermcclure Apr 15, 2026

Uh oh!

tuckermcclure Apr 15, 2026

Uh oh!

tuckermcclure Apr 15, 2026

Uh oh!

tuckermcclure Apr 15, 2026

Uh oh!

tuckermcclure Apr 15, 2026

Uh oh!

tuckermcclure Apr 16, 2026

Uh oh!

tuckermcclure Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		is_empty_rates_output(k1) && is_empty_rates_output(k2) &&
		is_empty_rates_output(k3) && is_empty_rates_output(k4) && return msd

Conversation

CarlOlsson commented Apr 15, 2026

Summary

Problem

What Changed

1. Specialized state propagation

2. Deterministic random-subtree fast paths

3. Empty-update fast paths

4. Event-only discrete updates

5. Let fixed-step RK4 own its internal substeps

Why These Changes

Validation

Micro-level

End-to-end

Compatibility

Risks and Follow-Ups

Uh oh!

tuckermcclure left a comment

Choose a reason for hiding this comment

Uh oh!

tuckermcclure Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

tuckermcclure Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

tuckermcclure Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

tuckermcclure Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

tuckermcclure Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

tuckermcclure Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

tuckermcclure Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants