Skip to content
This repository was archived by the owner on May 12, 2026. It is now read-only.
This repository was archived by the owner on May 12, 2026. It is now read-only.

Tagging system, autoplot, and benchmark helpers for work-precision diagrams #177

@ChrisRackauckas-Claude

Description

@ChrisRackauckas-Claude

Motivation

The exemplary benchmarks like linear_wpd, allen_cahn_fdm_wpd, and BCR follow a pattern of within-family and between-family comparisons (e.g., all Rosenbrock methods, all 5th order methods, then best-of-each-family comparisons). Currently, achieving this requires significant manual boilerplate in each benchmark file: creating separate WorkPrecisionSets per family, manually selecting best methods, repeating tolerance ranges and error modes, and manually composing comparison plots.

This issue proposes a set of infrastructure improvements to DiffEqDevTools that would:

  1. Make it trivial to tag methods and generate family/cross-family comparison plots
  2. Run all methods once and generate many views from the same data
  3. Automatically identify "best of family" methods for cross-family comparisons
  4. Support interactive plots for dense comparison diagrams
  5. Eventually simplify SciMLBenchmarks code significantly via these helpers

Planned Features

Phase 1: Core tagging infrastructure

Add a tags field to WorkPrecision:

mutable struct WorkPrecision
    # ... existing fields ...
    tags::Vector{Symbol}  # NEW: e.g., [:rosenbrock, :stiff, :4th_order, :autodiff]
end

Tags are specified via the setups dict (backward compatible — no tags = empty vector):

setups = [
    Dict(:alg => Rosenbrock23(), :tags => [:rosenbrock, :2nd_order, :stiff]),
    Dict(:alg => Rodas5P(),      :tags => [:rosenbrock, :5th_order, :stiff]),
    Dict(:alg => TRBDF2(),       :tags => [:bdf, :2nd_order, :stiff]),
    Dict(:alg => KenCarp4(),     :tags => [:sdirk, :4th_order, :stiff, :imex]),
    Dict(:alg => Tsit5(),        :tags => [:rk, :5th_order, :nonstiff, :reference]),
    Dict(:alg => CVODE_BDF(),    :tags => [:bdf, :sundials, :reference]),
]

Filtering helpers:

# Get subset matching ALL specified tags (AND logic)
filter_by_tags(wp_set, :rosenbrock)           # all rosenbrock methods
filter_by_tags(wp_set, :5th_order, :stiff)    # 5th order AND stiff
filter_by_tags(wp_set, :reference)            # just reference methods

# Exclude by tags
exclude_by_tags(wp_set, :reference)           # everything except reference

SDE/DAE compatibility: Tags are purely additive metadata. The existing WorkPrecisionSet constructors for AbstractRODEProblem, AbstractEnsembleProblem, AbstractBVProblem, etc., just need to pass tags through. No changes to numruns_error, error_estimate = :weak_final, prob_choice, or any SDE/DAE-specific parameters.

DAE formalism comparisons can use tags naturally:

setups = [
    Dict(:alg => Rodas5P(), :prob_choice => 1, :tags => [:mass_matrix, :rosenbrock]),
    Dict(:alg => Rodas5P(), :prob_choice => 2, :tags => [:sparse, :rosenbrock]),
    Dict(:alg => IDA(),     :prob_choice => 3, :tags => [:sundials, :dae_residual]),
]

Phase 2: Multi-error-mode runs

Currently each WorkPrecisionSet uses a single error_estimate. To avoid re-running everything for each error mode, support computing multiple error metrics in one pass:

wp_set = WorkPrecisionSet(prob, abstols, reltols, setups;
    error_estimates = [:final, :l2, :L2],  # compute all three
    appxsol = test_sol)

The errors StructArray in WorkPrecision already stores all computed error types as a NamedTuple, so this is mainly about requesting timeseries_errors=true, dense_errors=true together and storing a vector of active error estimates for plotting.

For weak SDE errors (:weak_final, :weak_l2, etc.), the same mechanism applies — these are already stored in the errors dict. The expensive part (ensemble runs via numruns_error) only needs to happen once regardless of how many weak error metrics are extracted from the results.

Phase 3: Tag-based and reference-method plotting

Extend the plot recipe to support tag-based subsetting and reference method overlays:

# Plot only rosenbrock family
plot(wp_set, tags = [:rosenbrock])

# Plot 5th order methods with reference methods always included
plot(wp_set, tags = [:5th_order], include_tags = [:reference])

# Reference methods get distinct styling (dashed, thinner)
plot(wp_set, tags = [:imex], reference_tags = [:reference],
     reference_style = (linestyle = :dash, linewidth = 1, alpha = 0.5))

When reference methods are far out of frame from the main methods, option to:

  • Auto-adjust axis limits to include them
  • Or clip/omit them with a warning
  • Or use a secondary inset plot

Phase 4: Best-of-family helpers

Automatically identify standout methods per family:

# Get the best method per family tag (by Pareto efficiency on the error-time curve)
best = best_by_tag(wp_set, :rosenbrock; n = 2, error_estimate = :final)

# Create a "best of all families" WorkPrecisionSet
families = [:rosenbrock, :bdf, :sdirk, :rk, :imex, :exponential]
best_of = best_of_families(wp_set, families; n = 2)
plot(best_of)  # cross-family comparison with the top 2 from each

"Best" should be determined by Pareto efficiency on the work-precision curve (not just minimum error or minimum time, but the overall curve quality). Could use area under the log-log curve, or minimum time at a reference error level, or a combination.

Phase 5: Time cutoff for slow methods

Some methods are extremely slow at certain tolerances. The current NaN-filtering handles crashes, but not the case where a method takes 100x longer than others. Options:

  1. Process-level timeout via Distributed: Run each solve in a worker process with a timeout. If it exceeds the cutoff, kill it and mark as NaN.

    WorkPrecisionSet(prob, abstols, reltols, setups;
        timeout = 300.0,  # seconds per solve
        parallel = :distributed)  # use Distributed workers
  2. Relative timeout: Set timeout as a multiple of the fastest solve at each tolerance level (e.g., 50x the fastest).

  3. Callback-based: Use a DiscreteCallback that checks wall-clock time and terminates.

The Distributed approach is cleanest for hard timeouts but adds a dependency. The callback approach works within a single process. We should support both.

For SDE weak benchmarks (which are already very expensive with numruns_error = 1000), the timeout should apply per-trajectory or per-ensemble, not per-individual-solve.

Phase 6: AutoDiff on/off comparison helpers

# Automatically create AD vs no-AD variants of setups
setups_with_ad = with_autodiff_variants(setups;
    ad_backends = [AutoForwardDiff(), AutoFiniteDiff()],
    methods = [:best, 3]  # only for top 3 methods or :all
)

# Or tag-based: only show AD comparison for reference + best methods
plot(wp_set, tags = [:autodiff_forward], compare_tags = [:autodiff_finite])

Phase 7: Autoplot — generate comprehensive plot sets

A single function that generates all standard comparison plots:

plots = autoplot(wp_set;
    families = [:rosenbrock, :bdf, :sdirk, :rk, :imex],
    tolerance_ranges = Dict(:low => (1e-3, 1e-8), :high => (1e-8, 1e-13)),
    error_modes = [:final, :l2, :L2],
    reference_tags = [:reference],
    autodiff_compare = true,
    best_n = 2,
    backend = :gr  # or :plotlyjs for interactive
)

Returns a structured collection of plots:

  • Per-family plots (within-family comparison)
  • Cross-family "best of" plots
  • AD on/off comparison (with best methods only)
  • Low tolerance and high tolerance versions of each
  • Each error mode version

Phase 8: Interactive Plotly support

For plots with many overlapping curves, interactive Plotly plots would help:

  • Hover to see method name and exact values
  • Click legend to toggle individual methods
  • Zoom into regions of interest

This could be a separate package extension (DiffEqDevToolsPlotlyExt) or just work via PlotlyJS backend for Plots.jl. The plot recipes should degrade gracefully — same recipe, different backend.

Phase 9: SciMLBenchmarks migration

After the DiffEqDevTools infrastructure is in place, update SciMLBenchmarks to use it:

  • Replace manual family-grouping with tags
  • Replace repeated WorkPrecisionSet calls with single tagged runs
  • Use autoplot to generate the standard comparison plots
  • Dramatically reduce per-benchmark boilerplate

Design Constraints

  • Full backward compatibility: All new fields have defaults, all new parameters are keyword-only with defaults matching current behavior.
  • SDE weak benchmarks: The most expensive benchmarks (1000+ trajectories × multiple methods). Tagging/filtering must be zero-cost at solve time — it's purely metadata for post-hoc plot generation.
  • DAE problem formalism: prob_choice pattern must continue to work. Tags complement it by adding semantic meaning (:mass_matrix vs :dae_residual vs :mtk_reduced).
  • No new hard dependencies: Plotly support via package extension. Distributed timeout via package extension or optional import.

Implementation Plan

  • PR 1: Phase 1 — Core tagging infrastructure (tags field, filtering helpers)
  • PR 2: Phase 2 — Multi-error-mode support
  • PR 3: Phase 3 — Tag-based plotting and reference method overlays
  • PR 4: Phase 4 — Best-of-family helpers
  • PR 5: Phase 5 — Time cutoff mechanism
  • PR 6: Phase 6 — AutoDiff comparison helpers
  • PR 7: Phase 7 — Autoplot
  • PR 8: Phase 8 — Plotly extension
  • PR 9+: Phase 9 — SciMLBenchmarks migration (multiple PRs, one per benchmark category)

PRs 1–4 are the core value and can be done incrementally. PRs 5–8 are enhancements. PR 9+ is the payoff in simplified benchmark code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions