Skip to content

Dense startup time profiling#5072

Open
AntoineRichard wants to merge 12 commits intoisaac-sim:developfrom
AntoineRichard:antoiner/feat/startup_profiler
Open

Dense startup time profiling#5072
AntoineRichard wants to merge 12 commits intoisaac-sim:developfrom
AntoineRichard:antoiner/feat/startup_profiler

Conversation

@AntoineRichard
Copy link
Collaborator

Description

Allows for dense startup time profiling.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Documentation update

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

- Add error handling for YAML whitelist loading (OSError, YAMLError,
  type validation)
- Use None instead of 0.0 for missing Timer sub-timings with info log
- Warn when IsaacLab source directory is missing (empty prefix list)
- Warn on unmatched whitelist patterns to catch typos
- Use try/finally for cProfile disable to handle exceptions cleanly
@github-actions github-actions bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Mar 20, 2026
- Fix orphaned docstring (convert to comment)
- Improve main() and module docstrings for accuracy
- Remove phase numbering from section comments (avoid confusion)
- Fix Timer comment wording (backends -> environment types)
- Fix top_n docstring (remove "per phase")
- Remove dead print_startup_summary function
The humanoid observations pattern needs a leading wildcard to match the
full module path (isaaclab_tasks.manager_based.classic.humanoid...).
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR introduces a dense startup-time profiling benchmark (benchmark_startup.py) that wraps each phase of an IsaacLab environment's startup sequence (app launch, Python imports, task config resolution, env creation, and first step) in its own cProfile session, emitting per-function timings as SingleMeasurement entries through the standard benchmark backend. It also extends SummaryMetrics to dynamically render any benchmark phases it doesn't already handle by name, provides a default whitelist YAML config for dashboard-stable function tracking, and ships corresponding documentation and a minor version bump.

Key changes:

  • New: scripts/benchmarks/benchmark_startup.py — five-phase cProfile benchmark script with optional whitelist-mode function filtering
  • New: scripts/benchmarks/startup_whitelist.yaml — default per-phase fnmatch pattern whitelist
  • Modified: scripts/benchmarks/utils.py — adds parse_cprofile_stats() helper using the internal pstats.Stats.stats dict
  • Modified: source/isaaclab/isaaclab/test/benchmark/backends.pySummaryMetrics now renders unknown phases dynamically instead of silently dropping them
  • Docs / metadata: benchmarks.rst, CHANGELOG.rst, extension.toml updated accordingly

Three logic-level issues were found in benchmark_startup.py: (1) env_cfg.seed is unconditionally overridden with None when --seed is not supplied, silently clearing any config-level seed; (2) env_creation_profile is not disabled if gym.make() raises a BaseException (e.g. KeyboardInterrupt), because the handler uses except Exception instead of except BaseException or a finally; and (3) the first-step action sampling hard-codes a 1D continuous Box space assumption via .single_action_space.shape[0], which will raise or produce incorrect results for discrete or multi-dimensional action spaces.

Confidence Score: 2/5

  • The new benchmark script has three logic-level bugs that could cause silent correctness issues or unexpected crashes during profiling runs.
  • The core infrastructure changes (backends.py dynamic rendering, utils.py parse_cprofile_stats) are solid and well-guarded. However, benchmark_startup.py carries three reproducible logic bugs: unconditional seed override strips config-level seeds; bare except Exception leaves the profile enabled on keyboard interrupts or SystemExit; and the action-space shape assumption will crash on non-1D Box environments. These do not affect the existing library but will affect correctness and usability of the new benchmark tool.
  • scripts/benchmarks/benchmark_startup.py requires attention for the seed override, BaseException handling, and action space shape assumption.

Important Files Changed

Filename Overview
scripts/benchmarks/benchmark_startup.py New startup profiling script with three logic issues: unconditional seed override with None, env_creation_profile not disabled on BaseException from gym.make(), and hard-coded 1D Box action space assumption for first_step sampling.
scripts/benchmarks/utils.py Adds parse_cprofile_stats() helper. Relies on the internal pstats.Stats.stats dict (acknowledged in a code comment). Whitelist pattern-matching and placeholder emission logic looks correct. Minor: print_startup_summary defined in a prior thread is noted as dead code.
source/isaaclab/isaaclab/test/benchmark/backends.py Adds a dynamic fallback renderer for unknown phases in SummaryMetrics._print_summary(). Implementation is correct and consistent with existing frametime rendering. known_phases set is defined inline; if a future developer adds another hard-coded phase after this loop, it could render twice — minor future-maintenance concern.
scripts/benchmarks/startup_whitelist.yaml Default whitelist config covering app_launch, env_creation, and first_step phases. Pattern syntax is consistent with fnmatch; no issues found.
docs/source/testing/benchmarks.rst Adds comprehensive documentation for the startup profiling benchmark and updates the SummaryMetrics description. Content is accurate relative to the implementation.

Sequence Diagram

sequenceDiagram
    participant CLI as __main__
    participant LS as launch_simulation()
    participant M as main()
    participant GYM as gym.make()
    participant ENV as env
    participant BP as BaseIsaacLabBenchmark
    participant BE as Backend (SummaryMetrics etc.)

    CLI->>LS: app_launch_profile.enable() → with launch_simulation(env_cfg, args_cli)
    LS-->>CLI: context entered (app_launch_profile.disable())
    CLI->>M: main(env_cfg, app_launch_profile, app_launch_wall_ms)
    M->>GYM: env_creation_profile.enable() → gym.make(task, cfg)
    GYM-->>M: env
    M->>ENV: env.reset()  [inner try/finally disables env_creation_profile]
    ENV-->>M: obs
    M->>ENV: env.step(actions)  [first_step_profile try/finally]
    ENV-->>M: obs, reward, ...
    M->>BP: benchmark.add_measurement(phase, SingleMeasurement) ×N phases
    M->>BP: benchmark.update_manual_recorders()
    M->>BP: benchmark._finalize_impl()
    BP->>BE: backend.finalize(output_path, output_filename)
    BE-->>CLI: JSON + console summary printed
    M->>ENV: env.close()  [outer finally]
Loading

Last reviewed commit: "Document reliance on..."

- Make --task required to fail early on missing argument
- Validate whitelist YAML values are list[str] per phase
- Warn on unknown phase names in whitelist config
- Wrap post-env-creation code in try/finally for env.close()
- Remove [UNMATCHED] prefix from placeholder labels to keep
  dashboard keys stable (matches docstring/RST contract)
- Move fnmatch import to function top, replace warnings.warn
  with print("[WARNING]") for consistency
- Remove stale "Launch Isaac Sim Simulator first" comment
- Use _ for unused lineno in pstats tuple unpacking
@AntoineRichard
Copy link
Collaborator Author

@greptile

Move the try/finally guarding env.close() to wrap everything after
gym.make() succeeds, so env.reset() failures also trigger cleanup.
@AntoineRichard
Copy link
Collaborator Author

@greptile

Comment on lines +177 to +180
{"name": "seed", "data": args_cli.seed},
{"name": "num_envs", "data": args_cli.num_envs},
{"name": "top_n", "data": args_cli.top_n},
{"name": "presets", "data": get_preset_string(hydra_args)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 env_creation_profile left enabled on BaseException

except Exception does not catch KeyboardInterrupt, SystemExit, or any other BaseException subclass. If gym.make() is interrupted by a signal or a hard abort, the profile object is left in the enabled state for the remainder of the process lifetime, which silently continues accumulating CPU-time into the stale profile.

The standard pattern is to use a finally clause (or a bare except) so the profile is always stopped:

env_creation_profile.enable()
try:
    env = gym.make(args_cli.task, cfg=env_cfg)
except BaseException:
    env_creation_profile.disable()
    raise

Or, more idiomatically, fold the disable into a try/finally around the whole block, the same way first_step_profile is handled a few lines below.

AntoineRichard and others added 2 commits March 20, 2026 15:27
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Antoine RICHARD <antoiner@nvidia.com>
except Exception misses KeyboardInterrupt and SystemExit, leaving the
profiler enabled for the rest of the process lifetime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant