On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning: Manipulation Experiments

Code for the manipulation experiments (Push-T and LIBERO-10) from our ICML 2026 paper On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning.

This codebase was originally a fork of the UVA repo. See also the UVA Paper.

Installation

We updated the original UVA environmnent to work on our clusters. We used miniconda.

First clone this repository and the LIBERO repository in your project directory

cd $PROJECT_DIR
git clone git@github.com:sachaMorin/uva.git
git clone git@github.com:Lifelong-Robot-Learning/LIBERO.git

Then build the environment.

cd $PROJECT_DIR/uva
conda env create -f conda_environment_mila.yml

The code assumes checkpoints and data directories under uva.

mkdir checkpoints data

then install LIBERO

conda activate uva
cd $PROJECT_DIR/LIBERO
pip install -e .

The LIBERO UVA wrapper depends on mujoco_py which requires an old-school mujoco install. We found the easiest way to install mujoco was to follow the instructions from DINO-WM. You only need this to run the LIBERO-10 experiments.

On our cluster, loading cudatoolkit/12.6 was also required.

Datasets

Download the following datasets and unzip them in the uva/data folder.

Push-T from Diffusion Policy.
LIBERO-10 from LIBERO. The UVA authors replayed the data to extract the absolute actions and appended language tokens from CLIP using AutoTokenizer.from_pretrained("openai/clip-vit-base-patch32"). Download both the original hdf5 file and the converted dataset. Extract libero_10.zip in the data folder. Unzipping the zarr file is not required.

Training

Download Pretrained Models

We start from a pretrained VAE model and a pretrained image generation model MAR. Run the following command to download the pretrained models.

python unified_video_action/utils/download.py

Train Video Generation Model (VM)

Per the original UVA instructions, we use a pretrained video model (VM) as an initialization for all models. We recommend using at least 4 GPUs for training. To train the UVA model on the Push-T dataset, run the following command:

accelerate launch --num_processes=4 train.py \
    --config-dir=. \
    --config-name=luva_pusht.yaml \
    hydra.run.dir=checkpoints/pusht_video \
    logging.id=pusht_video \
    logging.mode=disabled \
    checkpoint=video_fvd \
    model.policy.action_model_params.predict_action=false \
    training.main_split_ratio=1.00 \
    training.main_task_modes=video_model \
    training.val_task_modes=video_model \
    training.val_video=true \
    training.val_every=2000 \
    training.num_steps=200_000 \
    dataloader.batch_size=32 \
    model.policy.optimizer.learning_rate=0.0001 \
    training.lr_warmup_steps=1000

Train Policy (BC)

To train a BC policy on Push-T starting from the pretrained VM, run:

accelerate launch --num_processes=4 train.py \
    --config-dir=. \
    --config-name=luva_pusht.yaml \
    model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
    hydra.run.dir=checkpoints/pusht_policy \
    logging.id=pusht_policy \
    logging.mode=disabled \
    checkpoint=val_loss_policy \
    model.policy.action_model_params.predict_action=true \
    model.policy.autoregressive_model_params.policy_linear_head=false \
    training.seed=0 \
    training.seed_python=0 \
    training.main_split_ratio=0.05 \
    training.main_task_modes=policy_model \
    training.secondary_task_modes=null \
    training.val_task_modes=policy_model \
    training.val_rollout=true \
    training.val_every=5000 \
    training.num_steps=50_000 \
    dataloader.batch_size=32 \
    model.policy.optimizer.learning_rate=0.00002 \
    training.lr_warmup_steps=1000

training.main_split_ratio=0.05 sets the fraction of the dataset used as labeled data for BC (5% here). Set model.policy.autoregressive_model_params.{idm_linear_head,policy_linear_head}=true to use linear heads instead of diffusion heads.

Train Inverse Dynamics Model (IDM)

To train an IDM on Push-T starting from the pretrained VM, run:

accelerate launch --num_processes=4 train.py \
    --config-dir=. \
    --config-name=luva_pusht.yaml \
    model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
    hydra.run.dir=checkpoints/pusht_idm \
    logging.id=pusht_idm \
    logging.mode=disabled \
    checkpoint=val_loss_idm \
    model.policy.action_model_params.predict_action=true \
    model.policy.autoregressive_model_params.idm_linear_head=false \
    training.main_split_ratio=0.05 \
    training.main_task_modes=inverse_model \
    training.secondary_task_modes=null \
    training.val_task_modes=inverse_model \
    training.val_every=5000 \
    training.num_steps=50_000 \
    dataloader.batch_size=32 \
    model.policy.optimizer.learning_rate=0.00002 \
    training.lr_warmup_steps=1000

Train Policy (IDM Labeling)

Generate labels with a trained IDM to perform BC.

accelerate launch --num_processes=4 train.py \
    --config-dir=. \
    --config-name=luva_pusht.yaml \
    model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
    hydra.run.dir=checkpoints/pusht_policy_idm_labels \
    logging.id=pusht_policy_idm_labels \
    logging.mode=disabled \
    checkpoint=val_loss_policy \
    model.policy.action_model_params.predict_action=true \
    model.policy.autoregressive_model_params.policy_linear_head=false \
    model.policy.use_idm_labels=true \
    model.policy.checkpoint_idm=checkpoints/pusht_idm/checkpoints/latest.ckpt \
    training.main_split_ratio=1.00 \
    training.main_task_modes=policy_model \
    training.secondary_task_modes=null \
    training.val_task_modes=policy_model \
    training.val_rollout=true \
    training.val_every=5000 \
    training.num_steps=50_000 \
    dataloader.batch_size=32 \
    model.policy.optimizer.learning_rate=0.00002 \
    training.lr_warmup_steps=1000

training.main_split_ratio=1.00 is okay here since with model.policy.use_idm_labels=true the model only trains on IDM-generated labels.

Train UVA

To train the original UVA model on Push-T starting from the pretrained VM, run:

accelerate launch --num_processes=4 train.py \
    --config-dir=. \
    --config-name=luva_pusht.yaml \
    model.policy.autoregressive_model_params.pretrained_model_path=checkpoints/pusht_video/checkpoints/latest.ckpt \
    hydra.run.dir=checkpoints/pusht_uva \
    logging.id=pusht_uva \
    logging.mode=disabled \
    checkpoint=val_loss_policy \
    model.policy.action_model_params.predict_action=true \
    model.policy.autoregressive_model_params.idm_linear_head=false \
    model.policy.autoregressive_model_params.policy_linear_head=false \
    training.main_split_ratio=0.05 \
    training.main_task_modes=uva \
    training.val_task_modes=uva \
    training.val_rollout=true \
    training.val_every=5000 \
    training.num_steps=50_000 \
    dataloader.batch_size=32 \
    model.policy.optimizer.learning_rate=0.00002 \
    training.lr_warmup_steps=1000

Evaluation

Policies

To evaluate a model in policy mode (BC, IDM labeling, UVA (Policy)) set --checkpoint to the desired checkpoint and use policy_sampling_mode=policy, e.g.:

python eval_sim.py \
    --checkpoint checkpoints/pusht_policy/checkpoints/latest.ckpt \
    --output_dir checkpoints/pusht_policy/eval_policy \
    --policy_sampling_mode policy \
    --n_train 0 \
    --n_test 50

VM-IDM

To evaluate VM-IDM, you should provide both the IDM and the VM.

python eval_sim.py \
    --checkpoint checkpoints/pusht_idm/checkpoints/latest.ckpt \
    --checkpoint_video checkpoints/pusht_video/checkpoints/latest.ckpt \
    --output_dir checkpoints/pusht_idm/eval_video_idm \
    --policy_sampling_mode video_idm \
    --n_train 0 \
    --n_test 50

VM-IDM (UVA)

To evaluate VM-IDM (UVA), run

python eval_sim.py \
    --checkpoint checkpoints/pusht_uva/checkpoints/latest.ckpt \
    --output_dir checkpoints/pusht_uva/eval_video_idm_shared \
    --policy_sampling_mode video_idm_shared \
    --n_train 0 \
    --n_test 50

Test-Time Planning

You may notice some code, logs and configs mentioning TTP (Test-Time Planning). While we explored TTP in an earlier version of the project (using UVA as both a policy prior and a World Model for planning), none of the results in the paper use it. Our TTP code with default parameters reduces to simply rolling out the policy.

Citation

If you use this code, please cite both our paper and the original UVA paper:

@article{morin2026sample,
  title={On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning},
  author={Morin, Sacha and Byeon, Moonsub and Jolicoeur-Martineau, Alexia and Lachapelle, S{\'e}bastien},
  journal={arXiv preprint arXiv:2602.02762},
  year={2026}
}

@article{li2025unified,
    title={Unified Video Action Model},
    author={Li, Shuang and Gao, Yihuai and Sadigh, Dorsa and Song, Shuran},
    journal={arXiv preprint arXiv:2503.00200},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
media		media
prepared_data		prepared_data
process_dataset		process_dataset
umi		umi
unified_video_action		unified_video_action
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
conda_environment_mila.yml		conda_environment_mila.yml
eval_sim.py		eval_sim.py
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning: Manipulation Experiments

Installation

Datasets

Training

Download Pretrained Models

Train Video Generation Model (VM)

Train Policy (BC)

Train Inverse Dynamics Model (IDM)

Train Policy (IDM Labeling)

Train UVA

Evaluation

Policies

VM-IDM

VM-IDM (UVA)

Test-Time Planning

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning: Manipulation Experiments

Installation

Datasets

Training

Download Pretrained Models

Train Video Generation Model (VM)

Train Policy (BC)

Train Inverse Dynamics Model (IDM)

Train Policy (IDM Labeling)

Train UVA

Evaluation

Policies

VM-IDM

VM-IDM (UVA)

Test-Time Planning

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages