Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions docker/Dockerfile.isaaclab_arena
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
ARG BASE_IMAGE=nvcr.io/nvidia/isaac-sim:5.0.0
ARG BASE_IMAGE=nvcr.io/nvidia/isaac-sim:5.1.0

FROM ${BASE_IMAGE}

# Isaac Sim 5.1.0+ runs as a non-root user; switch to root for installation steps.
USER root

# GR00T Policy Build Arguments, these are only used if INSTALL_GROOT is true
ARG INSTALL_GROOT=false

Expand All @@ -22,9 +25,6 @@ RUN apt-get update && apt-get install -y \
sudo \
python3-pip

# Update pip to the latest version
RUN pip3 install --upgrade pip

################################
# Install Isaac Lab
################################
Expand All @@ -37,9 +37,10 @@ ENV TERM=xterm
# Symlink isaac sim to IsaacLab
RUN ln -s /isaac-sim/ ${WORKDIR}/submodules/IsaacLab/_isaac_sim
# Install IsaacLab dependencies
RUN for DIR in ${WORKDIR}/submodules/IsaacLab/source/isaaclab*/; do pip install --no-deps -e "$DIR"; done
RUN for DIR in ${WORKDIR}/submodules/IsaacLab/source/isaaclab*/; do /isaac-sim/python.sh -m pip install --no-deps -e "$DIR"; done
# Logs and other stuff appear under dist-packages per default, so this dir has to be writeable.
RUN chmod 777 -R /isaac-sim/kit/
RUN chmod a+x /isaac-sim
# NOTE(alexmillane, 2026-02-10): We started having issues with flatdict 4.0.1 installation
# during IsaacLab install. We install here with build isolation which seems to fix the issue.
RUN /isaac-sim/python.sh -m pip install flatdict==4.0.1 --no-build-isolation
Expand All @@ -49,7 +50,7 @@ RUN ${ISAACLAB_PATH}/isaaclab.sh -i
# Patch for osqp in IsaacLab. Downgrade qpsolvers
# TODO(alexmillane): Watch the thread here: https://nvidia.slack.com/archives/C06HLQ6CB41/p1764680205807019
# and remove this thread when IsaacLab has a fix.
RUN if python -c "import qpsolvers; print(qpsolvers.available_solvers)" | grep -q "osqp"; then \
RUN if /isaac-sim/python.sh -c "import qpsolvers; print(qpsolvers.available_solvers)" | grep -q "osqp"; then \
echo "OSQP is installed. You can remove this clause from the Arena dockerfile."; \
else \
echo "OSQP missing, installing... This is a patch for an Isaac Lab bug."; \
Expand Down Expand Up @@ -79,7 +80,7 @@ ENV LW_API_ENDPOINT="https://api-dev.lightwheel.net"

# HuggingFace for downloading datasets and models.
# NOTE(alexmillane, 2025-10-28): For some reason the CLI has issues when installed in the IsaacSim version of python.
RUN pip install huggingface-hub[cli]
RUN pip install huggingface-hub[cli] --break-system-packages
# Create alias for hf command to use the system-installed version
RUN echo "alias hf='/usr/local/bin/hf'" >> /etc/bash.bashrc

Expand Down Expand Up @@ -136,7 +137,7 @@ RUN echo "alias pytest='/isaac-sim/python.sh -m pytest'" >> /etc/bash.bashrc
# It will pause waiting for the debugger to attach.
# 3) Attach to the running container with VSCode using the "Attach to debugpy session"
# configuration from the Run and Debug panel.
RUN pip3 install debugpy
RUN /isaac-sim/python.sh -m pip install debugpy
RUN echo "alias debugpy='python -Xfrozen_modules=off -m debugpy --listen localhost:5678 --wait-for-client'" >> /etc/bash.bashrc

# Change prompt so it's obvious we're inside the arena container
Expand Down
2 changes: 1 addition & 1 deletion docker/setup/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ userdel ubuntu || true
useradd --no-log-init \
--uid "$DOCKER_RUN_USER_ID" \
--gid "$DOCKER_RUN_GROUP_NAME" \
--groups sudo \
--groups sudo,isaac-sim \
--shell /bin/bash \
$DOCKER_RUN_USER_NAME
chown $DOCKER_RUN_USER_NAME:$DOCKER_RUN_GROUP_NAME /home/$DOCKER_RUN_USER_NAME
Expand Down
51 changes: 24 additions & 27 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,42 @@
# `isaaclab_arena` Dox - Developer Guide
# `isaaclab_arena` Docs - Developer Guide

To build the `isaaclab_arena` docs locally follow the following instructions.
The docs are built on the **host machine** (not inside Docker) using a dedicated Python 3.11 venv.

Enter the `isaaclab_arena` docker.
## Prerequisites

```
./docker/run_docker.sh
```

The version of sphinx that we use requires a newer version of python.
Install a newer version of `python` and `venv`:
`python3.11` and `python3.11-venv` must be installed on the host:

```
sudo apt-get install python3.11 python3.11-venv
```bash
sudo apt-get install -y python3.11 python3.11-venv
```

> It looks like this actually overwrites the currently installed version of python
> inside.
## First-time setup

Create a `venv` and install the dependencies
From the repo root, create the venv and install dependencies:

```
```bash
cd docs
python3.11 -m venv venv_docs
source venv_docs/bin/activate
cd ./docs
python3.11 -m pip install -r requirements.txt
venv_docs/bin/pip install -r requirements.txt
```

To make the current version of docs

```
make html
## Build and view

```bash
cd docs
venv_docs/bin/sphinx-build -M html . _build/current
xdg-open _build/current/html/index.html
```

To view the docs, navigate to `isaaclab_arena/docs/_build/current/html/index.html`, and double-click.

To make the multi version docs. Note that this will only build docs for the set branches, such
as release, main etc. Only docs committed to these branches will be reflected.
## Multi-version docs

```
Builds docs for committed branches only (e.g. `main`, `release`). Local uncommitted changes are **not** reflected.

```bash
cd docs
source venv_docs/bin/activate
make multi-docs
xdg-open _build/index.html
```

To view the multi version docs, navigate to `isaaclab_arena/docs/_build/index.html`, and double-click.
Original file line number Diff line number Diff line change
Expand Up @@ -155,17 +155,18 @@ See :doc:`../../concepts/concept_environment_design` for environment composition
Validation: Run Random Policy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To validate the environment setup, we can run a policy with random weights to ensure everything loads correctly:
To validate the environment loads correctly, run one training iteration and check for errors:

.. code-block:: bash

python isaaclab_arena/scripts/reinforcement_learning/train.py \
/isaac-sim/python.sh submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
--external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
--task lift_object \
--num_envs 64 \
--max_iterations 1 \
lift_object
--headless

This command will load the environment, initialize 64 parallel environments, and exit immediately
(``max_iterations=1``). If successful, the environment is ready for training.
If the environment is set up correctly, you will see one iteration of training output before the script exits.

You should see output indicating the start of training:

Expand Down
Original file line number Diff line number Diff line change
@@ -1,135 +1,72 @@
Policy Training
---------------

This workflow covers training an RL policy from scratch using RSL-RL's PPO implementation.
The training is fully parallelized across hundreds of environments for sample-efficient learning.

**Docker Container**: Base (see :doc:`../../quickstart/docker_containers` for more details)

:docker_run_default:


Training Overview
^^^^^^^^^^^^^^^^^

We use **Proximal Policy Optimization (PPO)** from the `RSL-RL <https://github.com/leggedrobotics/rsl_rl>`_ library,
a proven on-policy RL algorithm for robot learning. The training process:

1. **Parallel Simulation**: Runs 512 parallel environments simultaneously
2. **Dense Rewards**: Provides shaped rewards for reaching, grasping, lifting, and goal achievement
3. **Command Sampling**: Randomly samples target positions within a workspace range
4. **Automatic Checkpointing**: Saves model checkpoints every 500 iterations
5. **Tensorboard Logging**: Monitors training progress in real-time

Training Command
^^^^^^^^^^^^^^^^

To train the policy, run:
Training uses IsaacLab's RSL-RL training script directly. The ``--external_callback`` argument
points to an Arena function that runs before training starts — it reads the ``--task`` argument,
builds the environment, and registers it with gym so IsaacLab's script can find it by name.

.. code-block:: bash

python isaaclab_arena/scripts/reinforcement_learning/train.py \
--env_spacing 5.0 \
/isaac-sim/python.sh submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
--external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
--task lift_object \
--num_envs 512 \
--max_iterations 12000 \
--save_interval 500 \
--headless \
lift_object

**Command Breakdown:**

.. list-table::
:widths: 30 70
:header-rows: 1

* - Argument
- Description
* - ``--env_spacing 5.0``
- Spacing between parallel environments (meters)
* - ``--num_envs 512``
- Number of parallel environments for training
* - ``--max_iterations 12000``
- Total training iterations (each iteration = 24 timesteps × 512 envs = 12,288 samples)
* - ``--save_interval 500``
- Save checkpoint every 500 iterations
* - ``--headless``
- Run without GUI for faster training
* - ``lift_object``
- Environment name (must be last argument)

**Additional Arguments (Optional):**

.. list-table::
:widths: 30 70
:header-rows: 1

* - Argument
- Description
* - ``--seed <int>``
- Random seed for reproducibility (default: 42)
* - ``--device <str>``
- Device to use: 'cuda' or 'cpu' (default: 'cuda')
* - ``--video``
- Record training videos periodically
* - ``--video_interval 2000``
- Interval for recording videos (iterations)


Training Configuration
^^^^^^^^^^^^^^^^^^^^^^

The training uses the default RSL-RL PPO configuration, which can be found at:

``isaaclab_arena/policy/rl_policy/generic_policy.json``

Key hyperparameters:

.. code-block:: json

{
"algorithm": {
"class_name": "PPO",
"num_learning_epochs": 5,
"num_mini_batches": 4,
"learning_rate": 0.001,
"gamma": 0.99,
"lam": 0.95,
"clip_param": 0.2
},
"policy": {
"class_name": "ActorCritic",
"activation": "elu",
"actor_hidden_dims": [256, 256, 256],
"critic_hidden_dims": [256, 256, 256]
}
}

To use a custom configuration, specify the path with ``--agent_cfg_path <path>``.
--headless

Checkpoints are written to ``logs/rsl_rl/generic_experiment/<timestamp>/``.
The agent configuration is saved alongside as ``params/agent.yaml``,
which the evaluation script uses to reconstruct the policy at inference time.

Monitoring Training
^^^^^^^^^^^^^^^^^^^

Training logs are saved to ``logs/rsl_rl/generic_experiment/<timestamp>/``.
Overriding Hyperparameters
^^^^^^^^^^^^^^^^^^^^^^^^^^

Hyperparameters come from ``RLPolicyCfg`` in ``isaaclab_arena_examples/policy/base_rsl_rl_policy.py``
and can be overridden with Hydra syntax appended to the training command:

.. code-block:: bash

**1. View Training Metrics with Tensorboard**
# Change network activation function to relu (default: elu)
agent.policy.activation=relu

Launch Tensorboard to monitor training progress:
# Adjust the learning rate (default: 0.0001)
agent.algorithm.learning_rate=0.001

# Save a checkpoint more frequently (default: every 200 iterations)
agent.save_interval=500

For example, to train with relu activation and a higher learning rate:

.. code-block:: bash

tensorboard --logdir logs/rsl_rl
/isaac-sim/python.sh submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
--external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
--task lift_object \
--num_envs 512 \
--max_iterations 12000 \
--headless \
agent.policy.activation=relu \
agent.algorithm.learning_rate=0.001


Navigate to ``http://localhost:6006`` in your browser to view:
Monitoring Training
^^^^^^^^^^^^^^^^^^^

Launch Tensorboard to monitor progress:

- **Episode rewards**: Total reward per episode
- **Episode length**: Steps per episode
- **Policy loss**: Actor and critic losses
- **Learning rate**: Current learning rate schedule
.. code-block:: bash

**2. Training Output**
/isaac-sim/python.sh -m tensorboard.main --logdir logs/rsl_rl

During training, you'll see periodic console output:
During training, each iteration prints a summary to the console:

.. code-block:: text

Expand Down Expand Up @@ -159,43 +96,28 @@ During training, you'll see periodic console output:
Time elapsed: 00:00:04
ETA: 00:00:49

[INFO] Saved checkpoint to: logs/rsl_rl/generic_experiment/<timestamp>/model_<iteration>.pt

**3. Checkpoints**

Model checkpoints are saved to:

``logs/rsl_rl/generic_experiment/<timestamp>/model_<iteration>.pt``

Example: ``logs/rsl_rl/generic_experiment/2026-01-29_12-30-00/model_2000.pt``


Multi-GPU Training
^^^^^^^^^^^^^^^^^^

For faster training on multi-GPU systems, use the ``--distributed`` flag:
Add ``--distributed`` to spread environments across all available GPUs:

.. code-block:: bash

python isaaclab_arena/scripts/reinforcement_learning/train.py \
--env_spacing 5.0 \
/isaac-sim/python.sh submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
--external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
--task lift_object \
--num_envs 512 \
--max_iterations 12000 \
--save_interval 500 \
--headless \
--distributed \
lift_object

This automatically distributes environments across available GPUs.
--distributed


Expected Results
^^^^^^^^^^^^^^^^

After 12,000 iterations (~6 hours on a single GPU with 512 environments):

The trained policy should reliably grasp and lift objects to commanded target positions.
Please refer to the following gif for an example of the trained policy:
After 12,000 iterations (~6 hours on a single GPU with 512 environments), the trained
policy should reliably grasp and lift objects to commanded target positions.

.. image:: ../../../images/lift_object_rl_task.gif
:align: center
Expand Down
Loading
Loading