Expand Visualizer Tests and Patch Visualizer Bugs#5075
Expand Visualizer Tests and Patch Visualizer Bugs#5075matthewtrepte wants to merge 1 commit intoisaac-sim:developfrom
Conversation
Greptile SummaryThis PR fixes bugs in the Kit and Newton visualizers (Play/Pause button behavior, Pause Rendering button), adds Key changes:
Issues found:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant SC as SimulationContext
participant UV as update_visualizers()
participant KV as KitVisualizer
participant KS as Kit Settings<br/>(/app/player/playSimulations)
SC->>UV: update_visualizers(dt)
UV->>KV: is_training_paused()?
KV->>KS: get("/app/player/playSimulations")
KS-->>KV: False (user pressed Pause)
KV-->>UV: True
UV->>SC: pause() [simulation paused]
UV->>KV: is_rendering_paused()?
KV-->>UV: False
UV->>KV: [while is_training_paused] step(0.0)
KV->>KS: set_bool(False)
KV->>KV: app.update()
KV->>KS: set_bool(True) ← pause flag reset!
KV-->>UV: step() returns
Note over UV,KS: Flag is now True (not paused)<br/>Simulation stays paused only via self.pause().<br/>UI shows "Playing" — unpause regression.
SC->>UV: update_visualizers(dt) [next call, rendering paused]
UV->>KV: is_training_paused()?
KS-->>KV: True (flag=True → not paused)
KV-->>UV: False ← pause not re-detected
UV->>KV: is_rendering_paused()?
KV-->>UV: True
UV->>KV: step(0.0)
KV->>KS: set_bool(False)
KV->>KV: app.update() ← user presses Pause here
KV->>KS: set_bool(True) ← NEW BUG: concurrent pause discarded
UV->>UV: continue (pause never propagates)
Last reviewed commit: "init" |
| continue | ||
| # Treat Kit transport pause as a simulation-level pause trigger. | ||
| # Intentionally no auto-resume bridge: unpause should be explicit. | ||
| if type(viz).__name__ == "KitVisualizer" and viz.is_training_paused() and self.is_playing(): |
There was a problem hiding this comment.
Fragile string-based type check
type(viz).__name__ == "KitVisualizer" bypasses Python's type system. This will silently fail for any subclass of KitVisualizer (since the class name would differ), and the string is a typo-prone magic constant with no import-time validation.
A more idiomatic and robust approach is to add a capability method on BaseVisualizer (e.g. needs_transport_pause_bridge() -> bool, returning False by default) and override it to return True in KitVisualizer. That way the dispatch is polymorphic, avoids string comparison, and scales to future visualizer types without modifying simulation_context.py.
| if viz.is_rendering_paused(): | ||
| # Keep polling viewer/UI events while rendering is paused so users can unpause. | ||
| viz.step(0.0) | ||
| continue |
There was a problem hiding this comment.
step() inside rendering-pause branch silently discards concurrent user pauses
KitVisualizer.step() always ends by calling settings.set_bool("/app/player/playSimulations", True), unconditionally restoring the play flag. If the user clicks the "Pause Simulation" button during the app.update() call inside step(), the resulting flag-flip to False is immediately overwritten by the restore back to True. Because the is_training_paused() guard at the top of this loop already ran before step() was entered, that protect-pause window is missed entirely.
Concretely:
is_training_paused()→False(play flag isTrue) — KitVisualizer check is skipped.is_rendering_paused()→True— enters this branch.viz.step(0.0)→ sets flagFalse, callsapp.update()(user presses Pause here), restores flag toTrue— pause discarded.- Next iteration:
is_training_paused()→Falseagain; pause never propagates.
Before this PR the rendering-pause branch simply called continue without invoking step(), so this path did not exist. Consider re-checking is_training_paused() after viz.step(0.0) and calling self.pause() at that point if needed, or restructuring step() not to restore the flag when it was set by the user rather than by step() itself.
| actions = torch.zeros((env.num_envs, env.action_space.shape[-1]), device=env.device) | ||
| for _ in range(_SMOKE_STEPS): | ||
| env.step(action=actions) | ||
| rgb = env._tiled_camera.data.output["rgb"] |
There was a problem hiding this comment.
Access to private attribute
_tiled_camera
env._tiled_camera is a private (name-mangled by convention) attribute. If the CartpoleCameraEnv implementation renames or restructures this attribute, the test will raise an AttributeError at assertion time rather than at the step that would surface a black-frame regression. Consider either exposing a stable public property on CartpoleCameraEnv, or at minimum adding a guard:
| rgb = env._tiled_camera.data.output["rgb"] | |
| tiled_camera = getattr(env, "_tiled_camera", None) | |
| assert tiled_camera is not None, "CartpoleCameraEnv does not expose _tiled_camera." | |
| rgb = tiled_camera.data.output["rgb"] | |
| _assert_non_black_tensor(rgb) |
| port = _find_free_tcp_port(host="127.0.0.1") | ||
| return ViserVisualizerCfg(open_browser=False, port=port), ViserVisualizer |
There was a problem hiding this comment.
TOCTOU race condition in port allocation
_find_free_tcp_port() works by binding to port 0, reading the OS-assigned port, then releasing the socket. Between that release and the viser server's actual bind(), another process on the system can claim the port, causing a ConnectionError or silent fall-back to a default port. The same race applies to the rerun ports allocated in _allocate_rerun_test_ports.
The viser application would ideally accept a socket.socket object that is already bound (keeping the kernel reservation alive), but that requires upstream support. A pragmatic mitigation until then is a retry loop in the test harness. This is worth noting as a known flakiness source in CI environments with many parallel jobs.
| def _make_cartpole_camera_env(visualizer_kind: str, backend_kind: str) -> CartpoleCameraEnv: | ||
| """Create cartpole camera env configured with selected visualizer and physics backend.""" | ||
| env_cfg_root = CartpoleCameraPresetsEnvCfg() | ||
| # PresetCfg wrappers may expose concrete presets either on the instance or class. | ||
| env_cfg = getattr(env_cfg_root, "default", None) | ||
| if env_cfg is None: | ||
| env_cfg = getattr(type(env_cfg_root), "default", None) | ||
| if env_cfg is None: | ||
| raise RuntimeError( | ||
| "CartpoleCameraPresetsEnvCfg does not expose a 'default' preset config. " | ||
| f"Available attributes: {sorted(vars(env_cfg_root).keys())}" | ||
| ) | ||
| env_cfg = copy.deepcopy(env_cfg) | ||
| env_cfg.scene.num_envs = 1 | ||
| env_cfg.seed = 42 | ||
| env_cfg.sim.physics, _ = _get_physics_cfg(backend_kind) | ||
| visualizer_cfg, _ = _get_visualizer_cfg(visualizer_kind) | ||
| env_cfg.sim.visualizer_cfgs = visualizer_cfg | ||
| return CartpoleCameraEnv(env_cfg) |
There was a problem hiding this comment.
_make_cartpole_camera_env silently succeeds with wrong config if default attribute is a non-config value
The function looks up env_cfg_root.default (or type(env_cfg_root).default) and deep-copies it. If a future refactor of CartpoleCameraPresetsEnvCfg adds a default attribute that is not an env-cfg object (e.g., a string label or integer), copy.deepcopy will succeed, subsequent attribute assignments (env_cfg.scene.num_envs = 1) will raise an opaque AttributeError, and the RuntimeError guard will never fire.
Adding a isinstance check after the lookup would make the failure mode explicit and easier to diagnose:
if not isinstance(env_cfg, DirectRLEnvCfg):
raise RuntimeError(
f"CartpoleCameraPresetsEnvCfg.default is {type(env_cfg)!r}, expected a DirectRLEnvCfg subclass."
)
Description
Add visualizer tests which load cartpole scene and check the viewport isn't black.
Newton Visualizer
Kit Visualizer
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there