Skip to content

Expand Visualizer Tests and Patch Visualizer Bugs#5075

Open
matthewtrepte wants to merge 1 commit intoisaac-sim:developfrom
matthewtrepte:mtrepte/expand_viz_tests
Open

Expand Visualizer Tests and Patch Visualizer Bugs#5075
matthewtrepte wants to merge 1 commit intoisaac-sim:developfrom
matthewtrepte:mtrepte/expand_viz_tests

Conversation

@matthewtrepte
Copy link
Contributor

Description

Add visualizer tests which load cartpole scene and check the viewport isn't black.

Newton Visualizer

  • Fix "Pause Rendering" button
  • Add Munch library dependency

Kit Visualizer

  • Fix Play/Pause button not pausing simulation and only pausing rendering.
  • TODO: fix Unpause behavior, currently body transforms remain stale after Unpausing... recent regression

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions bot added the isaac-lab Related to Isaac Lab team label Mar 21, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR fixes bugs in the Kit and Newton visualizers (Play/Pause button behavior, Pause Rendering button), adds munch as a Newton visualizer dependency, and expands the smoke-test suite with tiled-camera non-black frame assertions and a Newton viewer frame check.

Key changes:

  • simulation_context.py: When KitVisualizer signals training-paused, SimulationContext.pause() is now called to bridge the Kit transport state into the simulation level. The rendering-pause branch now calls viz.step(0.0) to keep UI events alive while paused.
  • kit_visualizer.py: Minor comment clarification; core logic (temporarily disabling /app/player/playSimulations around app.update()) is unchanged.
  • setup.py: munch added to the newton extras.
  • test_visualizer_smoke_logs.py: New test_cartpole_visualizer_non_black_camera_frame (all 4 visualizers × 2 backends) and test_newton_visualizer_non_black_viewer_frame tests; viser port is now dynamically allocated like rerun.

Issues found:

  • The is_training_paused() check in update_visualizers uses type(viz).__name__ == "KitVisualizer" (string comparison) instead of a polymorphic method on BaseVisualizer, making it fragile for subclasses and typo-prone.
  • The new viz.step(0.0) call inside the rendering-pause branch can silently discard a user-initiated pause that arrives during app.update(), because step() unconditionally restores /app/player/playSimulations to True after each update — a new bug not present before this PR.
  • The test suite accesses env._tiled_camera (private attribute) without a guard, and the preset config lookup in _make_cartpole_camera_env lacks a type assertion, making future breakage harder to diagnose.
  • Dynamic port allocation for viser (and existing rerun) is subject to a TOCTOU race that can cause intermittent CI failures.

Confidence Score: 2/5

  • Not safe to merge as-is due to a newly introduced bug where user-initiated pauses can be silently discarded when rendering is also paused.
  • The fragile string-based type dispatch and — more critically — the new viz.step(0.0) call in the rendering-pause branch introduce a regression: a pause action processed during app.update() is immediately overwritten by the unconditional set_bool(True) restore in step(), so the pause is never propagated. The PR description already acknowledges a related unpause regression but does not address this new path. The test additions are welcome but have minor robustness gaps (private attribute access, missing type guard, TOCTOU port race). The setup.py and kit_visualizer.py changes are safe.
  • Pay close attention to source/isaaclab/isaaclab/sim/simulation_context.py (pause-bridge logic and rendering-pause branch) and source/isaaclab_visualizers/test/test_visualizer_smoke_logs.py (private attribute access and preset config lookup).

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/sim/simulation_context.py Adds KitVisualizer-specific pause bridging and keeps UI alive during rendering-pause; string-based type dispatch is fragile and the new viz.step(0.0) call in the rendering-pause branch can silently discard user-initiated pauses.
source/isaaclab_visualizers/isaaclab_visualizers/kit/kit_visualizer.py Minor comment improvement and blank-line removal; the functional logic (disabling /app/player/playSimulations before app.update and restoring after) is unchanged and correct for its intended purpose.
source/isaaclab_visualizers/setup.py Adds munch as a dependency for the newton visualizer extra; straightforward change with no issues.
source/isaaclab_visualizers/test/test_visualizer_smoke_logs.py Adds tiled-camera non-black frame tests and a Newton viewer frame test; port allocation is subject to TOCTOU races, a private attribute is accessed without a guard, and the preset config lookup lacks a type assertion.

Sequence Diagram

sequenceDiagram
    participant SC as SimulationContext
    participant UV as update_visualizers()
    participant KV as KitVisualizer
    participant KS as Kit Settings<br/>(/app/player/playSimulations)

    SC->>UV: update_visualizers(dt)
    UV->>KV: is_training_paused()?
    KV->>KS: get("/app/player/playSimulations")
    KS-->>KV: False (user pressed Pause)
    KV-->>UV: True
    UV->>SC: pause()  [simulation paused]
    UV->>KV: is_rendering_paused()?
    KV-->>UV: False
    UV->>KV: [while is_training_paused] step(0.0)
    KV->>KS: set_bool(False)
    KV->>KV: app.update()
    KV->>KS: set_bool(True) ← pause flag reset!
    KV-->>UV: step() returns
    Note over UV,KS: Flag is now True (not paused)<br/>Simulation stays paused only via self.pause().<br/>UI shows "Playing" — unpause regression.

    SC->>UV: update_visualizers(dt) [next call, rendering paused]
    UV->>KV: is_training_paused()?
    KS-->>KV: True (flag=True → not paused)
    KV-->>UV: False  ← pause not re-detected
    UV->>KV: is_rendering_paused()?
    KV-->>UV: True
    UV->>KV: step(0.0)
    KV->>KS: set_bool(False)
    KV->>KV: app.update() ← user presses Pause here
    KV->>KS: set_bool(True) ← NEW BUG: concurrent pause discarded
    UV->>UV: continue (pause never propagates)
Loading

Last reviewed commit: "init"

continue
# Treat Kit transport pause as a simulation-level pause trigger.
# Intentionally no auto-resume bridge: unpause should be explicit.
if type(viz).__name__ == "KitVisualizer" and viz.is_training_paused() and self.is_playing():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Fragile string-based type check

type(viz).__name__ == "KitVisualizer" bypasses Python's type system. This will silently fail for any subclass of KitVisualizer (since the class name would differ), and the string is a typo-prone magic constant with no import-time validation.

A more idiomatic and robust approach is to add a capability method on BaseVisualizer (e.g. needs_transport_pause_bridge() -> bool, returning False by default) and override it to return True in KitVisualizer. That way the dispatch is polymorphic, avoids string comparison, and scales to future visualizer types without modifying simulation_context.py.

Comment on lines 700 to 703
if viz.is_rendering_paused():
# Keep polling viewer/UI events while rendering is paused so users can unpause.
viz.step(0.0)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 step() inside rendering-pause branch silently discards concurrent user pauses

KitVisualizer.step() always ends by calling settings.set_bool("/app/player/playSimulations", True), unconditionally restoring the play flag. If the user clicks the "Pause Simulation" button during the app.update() call inside step(), the resulting flag-flip to False is immediately overwritten by the restore back to True. Because the is_training_paused() guard at the top of this loop already ran before step() was entered, that protect-pause window is missed entirely.

Concretely:

  1. is_training_paused()False (play flag is True) — KitVisualizer check is skipped.
  2. is_rendering_paused()True — enters this branch.
  3. viz.step(0.0) → sets flag False, calls app.update() (user presses Pause here), restores flag to True — pause discarded.
  4. Next iteration: is_training_paused()False again; pause never propagates.

Before this PR the rendering-pause branch simply called continue without invoking step(), so this path did not exist. Consider re-checking is_training_paused() after viz.step(0.0) and calling self.pause() at that point if needed, or restructuring step() not to restore the flag when it was set by the user rather than by step() itself.

actions = torch.zeros((env.num_envs, env.action_space.shape[-1]), device=env.device)
for _ in range(_SMOKE_STEPS):
env.step(action=actions)
rgb = env._tiled_camera.data.output["rgb"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Access to private attribute _tiled_camera

env._tiled_camera is a private (name-mangled by convention) attribute. If the CartpoleCameraEnv implementation renames or restructures this attribute, the test will raise an AttributeError at assertion time rather than at the step that would surface a black-frame regression. Consider either exposing a stable public property on CartpoleCameraEnv, or at minimum adding a guard:

Suggested change
rgb = env._tiled_camera.data.output["rgb"]
tiled_camera = getattr(env, "_tiled_camera", None)
assert tiled_camera is not None, "CartpoleCameraEnv does not expose _tiled_camera."
rgb = tiled_camera.data.output["rgb"]
_assert_non_black_tensor(rgb)

Comment on lines +100 to +101
port = _find_free_tcp_port(host="127.0.0.1")
return ViserVisualizerCfg(open_browser=False, port=port), ViserVisualizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 TOCTOU race condition in port allocation

_find_free_tcp_port() works by binding to port 0, reading the OS-assigned port, then releasing the socket. Between that release and the viser server's actual bind(), another process on the system can claim the port, causing a ConnectionError or silent fall-back to a default port. The same race applies to the rerun ports allocated in _allocate_rerun_test_ports.

The viser application would ideally accept a socket.socket object that is already bound (keeping the kernel reservation alive), but that requires upstream support. A pragmatic mitigation until then is a retry loop in the test harness. This is worth noting as a known flakiness source in CI environments with many parallel jobs.

Comment on lines +248 to +266
def _make_cartpole_camera_env(visualizer_kind: str, backend_kind: str) -> CartpoleCameraEnv:
"""Create cartpole camera env configured with selected visualizer and physics backend."""
env_cfg_root = CartpoleCameraPresetsEnvCfg()
# PresetCfg wrappers may expose concrete presets either on the instance or class.
env_cfg = getattr(env_cfg_root, "default", None)
if env_cfg is None:
env_cfg = getattr(type(env_cfg_root), "default", None)
if env_cfg is None:
raise RuntimeError(
"CartpoleCameraPresetsEnvCfg does not expose a 'default' preset config. "
f"Available attributes: {sorted(vars(env_cfg_root).keys())}"
)
env_cfg = copy.deepcopy(env_cfg)
env_cfg.scene.num_envs = 1
env_cfg.seed = 42
env_cfg.sim.physics, _ = _get_physics_cfg(backend_kind)
visualizer_cfg, _ = _get_visualizer_cfg(visualizer_kind)
env_cfg.sim.visualizer_cfgs = visualizer_cfg
return CartpoleCameraEnv(env_cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _make_cartpole_camera_env silently succeeds with wrong config if default attribute is a non-config value

The function looks up env_cfg_root.default (or type(env_cfg_root).default) and deep-copies it. If a future refactor of CartpoleCameraPresetsEnvCfg adds a default attribute that is not an env-cfg object (e.g., a string label or integer), copy.deepcopy will succeed, subsequent attribute assignments (env_cfg.scene.num_envs = 1) will raise an opaque AttributeError, and the RuntimeError guard will never fire.

Adding a isinstance check after the lookup would make the failure mode explicit and easier to diagnose:

if not isinstance(env_cfg, DirectRLEnvCfg):
    raise RuntimeError(
        f"CartpoleCameraPresetsEnvCfg.default is {type(env_cfg)!r}, expected a DirectRLEnvCfg subclass."
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant