Skip to content

sachaMorin/uva

Repository files navigation

On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning: Manipulation Experiments

Code for the manipulation experiments (Push-T and LIBERO-10) from our ICML 2026 paper On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning.

This codebase was originally a fork of the UVA repo. See also the UVA Paper.

Installation

We updated the original UVA environmnent to work on our clusters. We used miniconda.

First clone this repository and the LIBERO repository in your project directory

cd $PROJECT_DIR
git clone git@github.com:sachaMorin/uva.git
git clone git@github.com:Lifelong-Robot-Learning/LIBERO.git

Then build the environment.

cd $PROJECT_DIR/uva
conda env create -f conda_environment_mila.yml

The code assumes checkpoints and data directories under uva.

mkdir checkpoints data

then install LIBERO

conda activate uva
cd $PROJECT_DIR/LIBERO
pip install -e .

The LIBERO UVA wrapper depends on mujoco_py which requires an old-school mujoco install. We found the easiest way to install mujoco was to follow the instructions from DINO-WM. You only need this to run the LIBERO-10 experiments.

On our cluster, loading cudatoolkit/12.6 was also required.

Datasets

Download the following datasets and unzip them in the uva/data folder.

  • Push-T from Diffusion Policy.
  • LIBERO-10 from LIBERO. The UVA authors replayed the data to extract the absolute actions and appended language tokens from CLIP using AutoTokenizer.from_pretrained("openai/clip-vit-base-patch32"). Download both the original hdf5 file and the converted dataset. Extract libero_10.zip in the data folder. Unzipping the zarr file is not required.

Training

Download Pretrained Models

We start from a pretrained VAE model and a pretrained image generation model MAR. Run the following command to download the pretrained models.

python unified_video_action/utils/download.py

Train Video Generation Model (VM)

Per the original UVA instructions, we use a pretrained video model (VM) as an initialization for all models. We recommend using at least 4 GPUs for training. To train the UVA model on the Push-T dataset, run the following command:

accelerate launch --num_processes=4 train.py \
    --config-dir=. \
    --config-name=luva_pusht.yaml \
    hydra.run.dir=checkpoints/pusht_video \
    logging.id=pusht_video \
    logging.mode=disabled \
    checkpoint=video_fvd \
    model.policy.action_model_params.predict_action=false \
    training.main_split_ratio=1.00 \
    training.main_task_modes=video_model \
    training.val_task_modes=video_model \
    training.val_video=true \
    training.val_every=2000 \
    training.num_steps=200_000 \
    dataloader.batch_size=32 \
    model.policy.optimizer.learning_rate=0.0001 \
    training.lr_warmup_steps=1000

Train Policy (BC)

To train a BC policy on Push-T starting from the pretrained VM, run:

accelerate launch --num_processes=4 train.py \
    --config-dir=. \
    --config-name=luva_pusht.yaml \
    model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
    hydra.run.dir=checkpoints/pusht_policy \
    logging.id=pusht_policy \
    logging.mode=disabled \
    checkpoint=val_loss_policy \
    model.policy.action_model_params.predict_action=true \
    model.policy.autoregressive_model_params.policy_linear_head=false \
    training.seed=0 \
    training.seed_python=0 \
    training.main_split_ratio=0.05 \
    training.main_task_modes=policy_model \
    training.secondary_task_modes=null \
    training.val_task_modes=policy_model \
    training.val_rollout=true \
    training.val_every=5000 \
    training.num_steps=50_000 \
    dataloader.batch_size=32 \
    model.policy.optimizer.learning_rate=0.00002 \
    training.lr_warmup_steps=1000

training.main_split_ratio=0.05 sets the fraction of the dataset used as labeled data for BC (5% here). Set model.policy.autoregressive_model_params.{idm_linear_head,policy_linear_head}=true to use linear heads instead of diffusion heads.

Train Inverse Dynamics Model (IDM)

To train an IDM on Push-T starting from the pretrained VM, run:

accelerate launch --num_processes=4 train.py \
    --config-dir=. \
    --config-name=luva_pusht.yaml \
    model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
    hydra.run.dir=checkpoints/pusht_idm \
    logging.id=pusht_idm \
    logging.mode=disabled \
    checkpoint=val_loss_idm \
    model.policy.action_model_params.predict_action=true \
    model.policy.autoregressive_model_params.idm_linear_head=false \
    training.main_split_ratio=0.05 \
    training.main_task_modes=inverse_model \
    training.secondary_task_modes=null \
    training.val_task_modes=inverse_model \
    training.val_every=5000 \
    training.num_steps=50_000 \
    dataloader.batch_size=32 \
    model.policy.optimizer.learning_rate=0.00002 \
    training.lr_warmup_steps=1000

Train Policy (IDM Labeling)

Generate labels with a trained IDM to perform BC.

accelerate launch --num_processes=4 train.py \
    --config-dir=. \
    --config-name=luva_pusht.yaml \
    model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
    hydra.run.dir=checkpoints/pusht_policy_idm_labels \
    logging.id=pusht_policy_idm_labels \
    logging.mode=disabled \
    checkpoint=val_loss_policy \
    model.policy.action_model_params.predict_action=true \
    model.policy.autoregressive_model_params.policy_linear_head=false \
    model.policy.use_idm_labels=true \
    model.policy.checkpoint_idm=checkpoints/pusht_idm/checkpoints/latest.ckpt \
    training.main_split_ratio=1.00 \
    training.main_task_modes=policy_model \
    training.secondary_task_modes=null \
    training.val_task_modes=policy_model \
    training.val_rollout=true \
    training.val_every=5000 \
    training.num_steps=50_000 \
    dataloader.batch_size=32 \
    model.policy.optimizer.learning_rate=0.00002 \
    training.lr_warmup_steps=1000

training.main_split_ratio=1.00 is okay here since with model.policy.use_idm_labels=true the model only trains on IDM-generated labels.

Train UVA

To train the original UVA model on Push-T starting from the pretrained VM, run:

accelerate launch --num_processes=4 train.py \
    --config-dir=. \
    --config-name=luva_pusht.yaml \
    model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
    hydra.run.dir=checkpoints/pusht_uva \
    logging.id=pusht_uva \
    logging.mode=disabled \
    checkpoint=val_loss_policy \
    model.policy.action_model_params.predict_action=true \
    model.policy.autoregressive_model_params.idm_linear_head=false \
    model.policy.autoregressive_model_params.policy_linear_head=false \
    training.main_split_ratio=0.05 \
    training.main_task_modes=uva \
    training.val_task_modes=uva \
    training.val_rollout=true \
    training.val_every=5000 \
    training.num_steps=50_000 \
    dataloader.batch_size=32 \
    model.policy.optimizer.learning_rate=0.00002 \
    training.lr_warmup_steps=1000

Evaluation

Policies

To evaluate a model in policy mode (BC, IDM labeling, UVA (Policy)) set --checkpoint to the desired checkpoint and use policy_sampling_mode=policy, e.g.:

python eval_sim.py \
    --checkpoint checkpoints/pusht_policy/checkpoints/latest.ckpt \
    --output_dir checkpoints/pusht_policy/eval_policy \
    --policy_sampling_mode policy \
    --n_train 0 \
    --n_test 50

VM-IDM

To evaluate VM-IDM, you should provide both the IDM and the VM.

python eval_sim.py \
    --checkpoint checkpoints/pusht_idm/checkpoints/latest.ckpt \
    --checkpoint_video checkpoints/pusht_video/checkpoints/latest.ckpt \
    --output_dir checkpoints/pusht_idm/eval_video_idm \
    --policy_sampling_mode video_idm \
    --n_train 0 \
    --n_test 50

VM-IDM (UVA)

To evaluate VM-IDM (UVA), run

python eval_sim.py \
    --checkpoint checkpoints/pusht_uva/checkpoints/latest.ckpt \
    --output_dir checkpoints/pusht_uva/eval_video_idm_shared \
    --policy_sampling_mode video_idm_shared \
    --n_train 0 \
    --n_test 50

Test-Time Planning

You may notice some code, logs and configs mentioning TTP (Test-Time Planning). While we explored TTP in an earlier version of the project (using UVA as both a policy prior and a World Model for planning), none of the results in the paper use it. Our TTP code with default parameters reduces to simply rolling out the policy.

Citation

If you use this code, please cite both our paper and the original UVA paper:

@article{morin2026sample,
  title={On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning},
  author={Morin, Sacha and Byeon, Moonsub and Jolicoeur-Martineau, Alexia and Lachapelle, S{\'e}bastien},
  journal={arXiv preprint arXiv:2602.02762},
  year={2026}
}
@article{li2025unified,
    title={Unified Video Action Model},
    author={Li, Shuang and Gao, Yihuai and Sadigh, Dorsa and Song, Shuran},
    journal={arXiv preprint arXiv:2503.00200},
    year={2025}
}

About

Code for the manipulation experiments from our ICML 2026 paper "On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors