isaac-sim · alexmillane · Feb 12, 2026 · Feb 12, 2026 · Feb 12, 2026 · Feb 12, 2026
diff --git a/docker/Dockerfile.isaaclab_arena b/docker/Dockerfile.isaaclab_arena
@@ -1,7 +1,10 @@
-ARG BASE_IMAGE=nvcr.io/nvidia/isaac-sim:5.0.0
+ARG BASE_IMAGE=nvcr.io/nvidia/isaac-sim:5.1.0
 
 FROM ${BASE_IMAGE}
 
+# Isaac Sim 5.1.0+ runs as a non-root user; switch to root for installation steps.
+USER root
+
 # GR00T Policy Build Arguments, these are only used if INSTALL_GROOT is true
 ARG INSTALL_GROOT=false
 
@@ -22,9 +25,6 @@ RUN apt-get update && apt-get install -y \
   sudo \
   python3-pip
 
-# Update pip to the latest version
-RUN pip3 install --upgrade pip
-
 ################################
 # Install Isaac Lab
 ################################
@@ -37,9 +37,10 @@ ENV TERM=xterm
 # Symlink isaac sim to IsaacLab
 RUN ln -s /isaac-sim/ ${WORKDIR}/submodules/IsaacLab/_isaac_sim
 # Install IsaacLab dependencies
-RUN for DIR in ${WORKDIR}/submodules/IsaacLab/source/isaaclab*/; do pip install --no-deps -e "$DIR"; done
+RUN for DIR in ${WORKDIR}/submodules/IsaacLab/source/isaaclab*/; do /isaac-sim/python.sh -m pip install --no-deps -e "$DIR"; done
 # Logs and other stuff appear under dist-packages per default, so this dir has to be writeable.
 RUN chmod 777 -R /isaac-sim/kit/
+RUN chmod a+x /isaac-sim
 # NOTE(alexmillane, 2026-02-10): We started having issues with flatdict 4.0.1 installation
 # during IsaacLab install. We install here with build isolation which seems to fix the issue.
 RUN /isaac-sim/python.sh -m pip install flatdict==4.0.1 --no-build-isolation
@@ -49,7 +50,7 @@ RUN ${ISAACLAB_PATH}/isaaclab.sh -i
 # Patch for osqp in IsaacLab. Downgrade qpsolvers
 # TODO(alexmillane): Watch the thread here: https://nvidia.slack.com/archives/C06HLQ6CB41/p1764680205807019
 #                    and remove this thread when IsaacLab has a fix.
-RUN if python -c "import qpsolvers; print(qpsolvers.available_solvers)" | grep -q "osqp"; then \
+RUN if /isaac-sim/python.sh -c "import qpsolvers; print(qpsolvers.available_solvers)" | grep -q "osqp"; then \
         echo "OSQP is installed. You can remove this clause from the Arena dockerfile."; \
     else \
         echo "OSQP missing, installing... This is a patch for an Isaac Lab bug."; \
@@ -79,7 +80,7 @@ ENV LW_API_ENDPOINT="https://api-dev.lightwheel.net"
 
 # HuggingFace for downloading datasets and models.
 # NOTE(alexmillane, 2025-10-28): For some reason the CLI has issues when installed in the IsaacSim version of python.
-RUN pip install huggingface-hub[cli]
+RUN pip install huggingface-hub[cli] --break-system-packages
 # Create alias for hf command to use the system-installed version
 RUN echo "alias hf='/usr/local/bin/hf'" >> /etc/bash.bashrc
 
@@ -136,7 +137,7 @@ RUN echo "alias pytest='/isaac-sim/python.sh -m pytest'" >> /etc/bash.bashrc
 #    It will pause waiting for the debugger to attach.
 # 3) Attach to the running container with VSCode using the "Attach to debugpy session"
 #    configuration from the Run and Debug panel.
-RUN pip3 install debugpy
+RUN /isaac-sim/python.sh -m pip install debugpy
 RUN echo "alias debugpy='python -Xfrozen_modules=off -m debugpy --listen localhost:5678 --wait-for-client'" >> /etc/bash.bashrc
 
 # Change prompt so it's obvious we're inside the arena container

diff --git a/docker/setup/entrypoint.sh b/docker/setup/entrypoint.sh
@@ -21,7 +21,7 @@ userdel ubuntu || true
 useradd --no-log-init \
         --uid "$DOCKER_RUN_USER_ID" \
         --gid "$DOCKER_RUN_GROUP_NAME" \
-        --groups sudo \
+        --groups sudo,isaac-sim \
         --shell /bin/bash \
         $DOCKER_RUN_USER_NAME
 chown $DOCKER_RUN_USER_NAME:$DOCKER_RUN_GROUP_NAME /home/$DOCKER_RUN_USER_NAME

diff --git a/docs/README.md b/docs/README.md
@@ -1,45 +1,42 @@
-# `isaaclab_arena` Dox - Developer Guide
+# `isaaclab_arena` Docs - Developer Guide
 
-To build the `isaaclab_arena` docs locally follow the following instructions.
+The docs are built on the **host machine** (not inside Docker) using a dedicated Python 3.11 venv.
 
-Enter the `isaaclab_arena` docker.
+## Prerequisites
 
-```
-./docker/run_docker.sh
-```
-
-The version of sphinx that we use requires a newer version of python.
-Install a newer version of `python` and `venv`:
+`python3.11` and `python3.11-venv` must be installed on the host:
 
-```
-sudo apt-get install python3.11 python3.11-venv
+```bash
+sudo apt-get install -y python3.11 python3.11-venv
 ```
 
-> It looks like this actually overwrites the currently installed version of python
-> inside.
+## First-time setup
 
-Create a `venv` and install the dependencies
+From the repo root, create the venv and install dependencies:
 
-```
+```bash
+cd docs
 python3.11 -m venv venv_docs
-source venv_docs/bin/activate
-cd ./docs
-python3.11 -m pip install -r requirements.txt
+venv_docs/bin/pip install -r requirements.txt
 ```
 
-To make the current version of docs
 
-```
-make html
+## Build and view
+
+```bash
+cd docs
+venv_docs/bin/sphinx-build -M html . _build/current
+xdg-open _build/current/html/index.html
 ```
 
-To view the docs, navigate to `isaaclab_arena/docs/_build/current/html/index.html`, and double-click.
 
-To make the multi version docs. Note that this will only build docs for the set branches, such
-as release, main etc. Only docs committed to these branches will be reflected.
+## Multi-version docs
 
-```
+Builds docs for committed branches only (e.g. `main`, `release`). Local uncommitted changes are **not** reflected.
+
+```bash
+cd docs
+source venv_docs/bin/activate
 make multi-docs
+xdg-open _build/index.html
 ```
-
-To view the multi version docs, navigate to `isaaclab_arena/docs/_build/index.html`, and double-click.
diff --git a/docs/pages/example_workflows/reinforcement_learning/step_1_environment_setup.rst b/docs/pages/example_workflows/reinforcement_learning/step_1_environment_setup.rst
@@ -155,17 +155,18 @@ See :doc:`../../concepts/concept_environment_design` for environment composition
 Validation: Run Random Policy
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-To validate the environment setup, we can run a policy with random weights to ensure everything loads correctly:
+To validate the environment loads correctly, run one training iteration and check for errors:
 
 .. code-block:: bash
 
-   python isaaclab_arena/scripts/reinforcement_learning/train.py \
+   /isaac-sim/python.sh submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
+     --external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
+     --task lift_object \
      --num_envs 64 \
      --max_iterations 1 \
-     lift_object
+     --headless
 
-This command will load the environment, initialize 64 parallel environments, and exit immediately
-(``max_iterations=1``). If successful, the environment is ready for training.
+If the environment is set up correctly, you will see one iteration of training output before the script exits.
 
 You should see output indicating the start of training:
 

diff --git a/docs/pages/example_workflows/reinforcement_learning/step_2_policy_training.rst b/docs/pages/example_workflows/reinforcement_learning/step_2_policy_training.rst
@@ -1,135 +1,72 @@
 Policy Training
 ---------------
 
-This workflow covers training an RL policy from scratch using RSL-RL's PPO implementation.
-The training is fully parallelized across hundreds of environments for sample-efficient learning.
-
 **Docker Container**: Base (see :doc:`../../quickstart/docker_containers` for more details)
 
 :docker_run_default:
 
-
-Training Overview
-^^^^^^^^^^^^^^^^^
-
-We use **Proximal Policy Optimization (PPO)** from the `RSL-RL <https://github.com/leggedrobotics/rsl_rl>`_ library,
-a proven on-policy RL algorithm for robot learning. The training process:
-
-1. **Parallel Simulation**: Runs 512 parallel environments simultaneously
-2. **Dense Rewards**: Provides shaped rewards for reaching, grasping, lifting, and goal achievement
-3. **Command Sampling**: Randomly samples target positions within a workspace range
-4. **Automatic Checkpointing**: Saves model checkpoints every 500 iterations
-5. **Tensorboard Logging**: Monitors training progress in real-time
-
 Training Command
 ^^^^^^^^^^^^^^^^
 
-To train the policy, run:
+Training uses IsaacLab's RSL-RL training script directly. The ``--external_callback`` argument
+points to an Arena function that runs before training starts — it reads the ``--task`` argument,
+builds the environment, and registers it with gym so IsaacLab's script can find it by name.
 
 .. code-block:: bash
 
-   python isaaclab_arena/scripts/reinforcement_learning/train.py \
-     --env_spacing 5.0 \
+   /isaac-sim/python.sh submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
+     --external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
+     --task lift_object \
      --num_envs 512 \
      --max_iterations 12000 \
-     --save_interval 500 \
-     --headless \
-     lift_object
-
-**Command Breakdown:**
-
-.. list-table::
-   :widths: 30 70
-   :header-rows: 1
-
-   * - Argument
-     - Description
-   * - ``--env_spacing 5.0``
-     - Spacing between parallel environments (meters)
-   * - ``--num_envs 512``
-     - Number of parallel environments for training
-   * - ``--max_iterations 12000``
-     - Total training iterations (each iteration = 24 timesteps × 512 envs = 12,288 samples)
-   * - ``--save_interval 500``
-     - Save checkpoint every 500 iterations
-   * - ``--headless``
-     - Run without GUI for faster training
-   * - ``lift_object``
-     - Environment name (must be last argument)
-
-**Additional Arguments (Optional):**
-
-.. list-table::
-   :widths: 30 70
-   :header-rows: 1
-
-   * - Argument
-     - Description
-   * - ``--seed <int>``
-     - Random seed for reproducibility (default: 42)
-   * - ``--device <str>``
-     - Device to use: 'cuda' or 'cpu' (default: 'cuda')
-   * - ``--video``
-     - Record training videos periodically
-   * - ``--video_interval 2000``
-     - Interval for recording videos (iterations)
-
-
-Training Configuration
-^^^^^^^^^^^^^^^^^^^^^^
-
-The training uses the default RSL-RL PPO configuration, which can be found at:
-
-``isaaclab_arena/policy/rl_policy/generic_policy.json``
-
-Key hyperparameters:
-
-.. code-block:: json
-
-   {
-     "algorithm": {
-       "class_name": "PPO",
-       "num_learning_epochs": 5,
-       "num_mini_batches": 4,
-       "learning_rate": 0.001,
-       "gamma": 0.99,
-       "lam": 0.95,
-       "clip_param": 0.2
-     },
-     "policy": {
-       "class_name": "ActorCritic",
-       "activation": "elu",
-       "actor_hidden_dims": [256, 256, 256],
-       "critic_hidden_dims": [256, 256, 256]
-     }
-   }
-
-To use a custom configuration, specify the path with ``--agent_cfg_path <path>``.
+     --headless
 
+Checkpoints are written to ``logs/rsl_rl/generic_experiment/<timestamp>/``.
+The agent configuration is saved alongside as ``params/agent.yaml``,
+which the evaluation script uses to reconstruct the policy at inference time.
 
-Monitoring Training
-^^^^^^^^^^^^^^^^^^^
 
-Training logs are saved to ``logs/rsl_rl/generic_experiment/<timestamp>/``.
+Overriding Hyperparameters
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Hyperparameters come from ``RLPolicyCfg`` in ``isaaclab_arena_examples/policy/base_rsl_rl_policy.py``
+and can be overridden with Hydra syntax appended to the training command:
+
+.. code-block:: bash
 
-**1. View Training Metrics with Tensorboard**
+   # Change network activation function to relu (default: elu)
+   agent.policy.activation=relu
 
-Launch Tensorboard to monitor training progress:
+   # Adjust the learning rate (default: 0.0001)
+   agent.algorithm.learning_rate=0.001
+
+   # Save a checkpoint more frequently (default: every 200 iterations)
+   agent.save_interval=500
+
+For example, to train with relu activation and a higher learning rate:
 
 .. code-block:: bash
 
-   tensorboard --logdir logs/rsl_rl
+   /isaac-sim/python.sh submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
+     --external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
+     --task lift_object \
+     --num_envs 512 \
+     --max_iterations 12000 \
+     --headless \
+     agent.policy.activation=relu \
+     agent.algorithm.learning_rate=0.001
+
 
-Navigate to ``http://localhost:6006`` in your browser to view:
+Monitoring Training
+^^^^^^^^^^^^^^^^^^^
+
+Launch Tensorboard to monitor progress:
 
-- **Episode rewards**: Total reward per episode
-- **Episode length**: Steps per episode
-- **Policy loss**: Actor and critic losses
-- **Learning rate**: Current learning rate schedule
+.. code-block:: bash
 
-**2. Training Output**
+   /isaac-sim/python.sh -m tensorboard.main --logdir logs/rsl_rl
 
-During training, you'll see periodic console output:
+During training, each iteration prints a summary to the console:
 
 .. code-block:: text
 
@@ -159,43 +96,28 @@ During training, you'll see periodic console output:
                             Time elapsed: 00:00:04
                                      ETA: 00:00:49
 
-   [INFO] Saved checkpoint to: logs/rsl_rl/generic_experiment/<timestamp>/model_<iteration>.pt
-
-**3. Checkpoints**
-
-Model checkpoints are saved to:
-
-``logs/rsl_rl/generic_experiment/<timestamp>/model_<iteration>.pt``
-
-Example: ``logs/rsl_rl/generic_experiment/2026-01-29_12-30-00/model_2000.pt``
-
 
 Multi-GPU Training
 ^^^^^^^^^^^^^^^^^^
 
-For faster training on multi-GPU systems, use the ``--distributed`` flag:
+Add ``--distributed`` to spread environments across all available GPUs:
 
 .. code-block:: bash
 
-   python isaaclab_arena/scripts/reinforcement_learning/train.py \
-     --env_spacing 5.0 \
+   /isaac-sim/python.sh submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
+     --external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
+     --task lift_object \
      --num_envs 512 \
      --max_iterations 12000 \
-     --save_interval 500 \
      --headless \
-     --distributed \
-     lift_object
-
-This automatically distributes environments across available GPUs.
+     --distributed
 
 
 Expected Results
 ^^^^^^^^^^^^^^^^
 
-After 12,000 iterations (~6 hours on a single GPU with 512 environments):
-
-The trained policy should reliably grasp and lift objects to commanded target positions.
-Please refer to the following gif for an example of the trained policy:
+After 12,000 iterations (~6 hours on a single GPU with 512 environments), the trained
+policy should reliably grasp and lift objects to commanded target positions.
 
 .. image:: ../../../images/lift_object_rl_task.gif
    :align: center