On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning: Manipulation Experiments
Code for the manipulation experiments (Push-T and LIBERO-10) from our ICML 2026 paper On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning.
This codebase was originally a fork of the UVA repo. See also the UVA Paper.
We updated the original UVA environmnent to work on our clusters. We used miniconda.
First clone this repository and the LIBERO repository in your project directory
cd $PROJECT_DIR
git clone git@github.com:sachaMorin/uva.git
git clone git@github.com:Lifelong-Robot-Learning/LIBERO.git
Then build the environment.
cd $PROJECT_DIR/uva
conda env create -f conda_environment_mila.yml
The code assumes checkpoints and data directories under uva.
mkdir checkpoints data
then install LIBERO
conda activate uva
cd $PROJECT_DIR/LIBERO
pip install -e .
The LIBERO UVA wrapper depends on mujoco_py which requires an old-school mujoco install. We found the easiest way to install mujoco was to follow the instructions from DINO-WM. You only need this to run the LIBERO-10 experiments.
On our cluster, loading cudatoolkit/12.6 was also required.
Download the following datasets and unzip them in the uva/data folder.
- Push-T from Diffusion Policy.
- LIBERO-10 from LIBERO. The UVA authors replayed the data to extract the absolute actions and appended language tokens from CLIP using
AutoTokenizer.from_pretrained("openai/clip-vit-base-patch32"). Download both the original hdf5 file and the converted dataset. Extractlibero_10.zipin thedatafolder. Unzipping thezarrfile is not required.
We start from a pretrained VAE model and a pretrained image generation model MAR. Run the following command to download the pretrained models.
python unified_video_action/utils/download.py
Per the original UVA instructions, we use a pretrained video model (VM) as an initialization for all models. We recommend using at least 4 GPUs for training. To train the UVA model on the Push-T dataset, run the following command:
accelerate launch --num_processes=4 train.py \
--config-dir=. \
--config-name=luva_pusht.yaml \
hydra.run.dir=checkpoints/pusht_video \
logging.id=pusht_video \
logging.mode=disabled \
checkpoint=video_fvd \
model.policy.action_model_params.predict_action=false \
training.main_split_ratio=1.00 \
training.main_task_modes=video_model \
training.val_task_modes=video_model \
training.val_video=true \
training.val_every=2000 \
training.num_steps=200_000 \
dataloader.batch_size=32 \
model.policy.optimizer.learning_rate=0.0001 \
training.lr_warmup_steps=1000
To train a BC policy on Push-T starting from the pretrained VM, run:
accelerate launch --num_processes=4 train.py \
--config-dir=. \
--config-name=luva_pusht.yaml \
model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
hydra.run.dir=checkpoints/pusht_policy \
logging.id=pusht_policy \
logging.mode=disabled \
checkpoint=val_loss_policy \
model.policy.action_model_params.predict_action=true \
model.policy.autoregressive_model_params.policy_linear_head=false \
training.seed=0 \
training.seed_python=0 \
training.main_split_ratio=0.05 \
training.main_task_modes=policy_model \
training.secondary_task_modes=null \
training.val_task_modes=policy_model \
training.val_rollout=true \
training.val_every=5000 \
training.num_steps=50_000 \
dataloader.batch_size=32 \
model.policy.optimizer.learning_rate=0.00002 \
training.lr_warmup_steps=1000
training.main_split_ratio=0.05 sets the fraction of the dataset used as labeled data for BC (5% here). Set model.policy.autoregressive_model_params.{idm_linear_head,policy_linear_head}=true to use linear heads instead of diffusion heads.
To train an IDM on Push-T starting from the pretrained VM, run:
accelerate launch --num_processes=4 train.py \
--config-dir=. \
--config-name=luva_pusht.yaml \
model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
hydra.run.dir=checkpoints/pusht_idm \
logging.id=pusht_idm \
logging.mode=disabled \
checkpoint=val_loss_idm \
model.policy.action_model_params.predict_action=true \
model.policy.autoregressive_model_params.idm_linear_head=false \
training.main_split_ratio=0.05 \
training.main_task_modes=inverse_model \
training.secondary_task_modes=null \
training.val_task_modes=inverse_model \
training.val_every=5000 \
training.num_steps=50_000 \
dataloader.batch_size=32 \
model.policy.optimizer.learning_rate=0.00002 \
training.lr_warmup_steps=1000
Generate labels with a trained IDM to perform BC.
accelerate launch --num_processes=4 train.py \
--config-dir=. \
--config-name=luva_pusht.yaml \
model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
hydra.run.dir=checkpoints/pusht_policy_idm_labels \
logging.id=pusht_policy_idm_labels \
logging.mode=disabled \
checkpoint=val_loss_policy \
model.policy.action_model_params.predict_action=true \
model.policy.autoregressive_model_params.policy_linear_head=false \
model.policy.use_idm_labels=true \
model.policy.checkpoint_idm=checkpoints/pusht_idm/checkpoints/latest.ckpt \
training.main_split_ratio=1.00 \
training.main_task_modes=policy_model \
training.secondary_task_modes=null \
training.val_task_modes=policy_model \
training.val_rollout=true \
training.val_every=5000 \
training.num_steps=50_000 \
dataloader.batch_size=32 \
model.policy.optimizer.learning_rate=0.00002 \
training.lr_warmup_steps=1000
training.main_split_ratio=1.00 is okay here since with model.policy.use_idm_labels=true the model only trains on IDM-generated labels.
To train the original UVA model on Push-T starting from the pretrained VM, run:
accelerate launch --num_processes=4 train.py \
--config-dir=. \
--config-name=luva_pusht.yaml \
model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
hydra.run.dir=checkpoints/pusht_uva \
logging.id=pusht_uva \
logging.mode=disabled \
checkpoint=val_loss_policy \
model.policy.action_model_params.predict_action=true \
model.policy.autoregressive_model_params.idm_linear_head=false \
model.policy.autoregressive_model_params.policy_linear_head=false \
training.main_split_ratio=0.05 \
training.main_task_modes=uva \
training.val_task_modes=uva \
training.val_rollout=true \
training.val_every=5000 \
training.num_steps=50_000 \
dataloader.batch_size=32 \
model.policy.optimizer.learning_rate=0.00002 \
training.lr_warmup_steps=1000
To evaluate a model in policy mode (BC, IDM labeling, UVA (Policy)) set --checkpoint to the desired checkpoint and use policy_sampling_mode=policy, e.g.:
python eval_sim.py \
--checkpoint checkpoints/pusht_policy/checkpoints/latest.ckpt \
--output_dir checkpoints/pusht_policy/eval_policy \
--policy_sampling_mode policy \
--n_train 0 \
--n_test 50
To evaluate VM-IDM, you should provide both the IDM and the VM.
python eval_sim.py \
--checkpoint checkpoints/pusht_idm/checkpoints/latest.ckpt \
--checkpoint_video checkpoints/pusht_video/checkpoints/latest.ckpt \
--output_dir checkpoints/pusht_idm/eval_video_idm \
--policy_sampling_mode video_idm \
--n_train 0 \
--n_test 50
To evaluate VM-IDM (UVA), run
python eval_sim.py \
--checkpoint checkpoints/pusht_uva/checkpoints/latest.ckpt \
--output_dir checkpoints/pusht_uva/eval_video_idm_shared \
--policy_sampling_mode video_idm_shared \
--n_train 0 \
--n_test 50
You may notice some code, logs and configs mentioning TTP (Test-Time Planning). While we explored TTP in an earlier version of the project (using UVA as both a policy prior and a World Model for planning), none of the results in the paper use it. Our TTP code with default parameters reduces to simply rolling out the policy.
If you use this code, please cite both our paper and the original UVA paper:
@article{morin2026sample,
title={On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning},
author={Morin, Sacha and Byeon, Moonsub and Jolicoeur-Martineau, Alexia and Lachapelle, S{\'e}bastien},
journal={arXiv preprint arXiv:2602.02762},
year={2026}
}@article{li2025unified,
title={Unified Video Action Model},
author={Li, Shuang and Gao, Yihuai and Sadigh, Dorsa and Song, Shuran},
journal={arXiv preprint arXiv:2503.00200},
year={2025}
}