GitHub - rguntz/ibrl-docker: This repository provides the Docker setup for implementing the paper "Imitation Bootstrapped Reinforcement Learning" in simulation (Mujoco)

Getting Started

To clone this repository along with all submodules, run:

git clone --recursive https://github.com/rguntz/ibrl-docker.git

Download data and BC models

Download dataset and models from Google Drive and put the folders under release folder. The release folder should contain release/cfgs (already shipped with the repo), release/data and release/model (the latter two are from the downloaded zip file).

unzip the file :

unzip data_and_model.zip

Build the image :

docker build -t ibrl-gpu .

Running the Container without rendering :

docker run --gpus all -it --rm \
    -v $(pwd):/app \
    ibrl-gpu

Running the container with rendering :

xhost +local:docker

docker run --gpus all -it --rm \
  -e DISPLAY=$DISPLAY \
  -v /tmp/.X11-unix:/tmp/.X11-unix \
  -v $(pwd):/app \
  ibrl-gpu

Compile the CPP files :

cd common_utils
mkdir build
cd build
cmake ..
make -j

Create an API key for progress monitoring

Visit the site https://docs.wandb.ai/models/quickstart and create an account, and then an API key to monitor online the training pipeline. Then, run the commmand in inside the terminal :

export WANDB_API_KEY="837e47625b8acbd29a5dbc4cc5e0dcc30fca5f73"

Robomimic (pixel)

Train RL policy using the BC policy provided in release folder

IBRL

# can
python train_rl.py --config_path release/cfgs/robomimic_rl/can_ibrl.yaml

# square
python train_rl.py --config_path release/cfgs/robomimic_rl/square_ibrl.yaml

Use --save_dir PATH to specify where to store the logs and models. Use --use_wb 0 to disable logging to weight and bias.

Use the following commands to train a BC policy from scratch. We find that IBRL is not sensitive to the exact performance of the BC policy.

# can
python train_bc.py --config_path release/cfgs/robomimic_bc/can.yaml

# square
python train_bc.py --config_path release/cfgs/robomimic_bc/square.yaml

RLPD

# can
python train_rl.py --config_path release/cfgs/robomimic_rl/can_rlpd.yaml

# square
python train_rl.py --config_path release/cfgs/robomimic_rl/square_rlpd.yaml

RFT (Regularized Fine-Tuning)

These commands run RFT from pretrained models in release folder.

# can rft
python train_rl.py --config_path release/cfgs/robomimic_rl/can_rft.yaml

# square rft
python train_rl.py --config_path release/cfgs/robomimic_rl/square_rft.yaml

To only perform pretraining:

# can, pretraining for 5 x 10,000 steps
python train_rl.py --config_path release/cfgs/robomimic_rl/can_rft.yaml --pretrain_only 1 --pretrain_num_epoch 5 --load_pretrained_agent None

# square, pretraining for 10 x 10,000 steps
python train_rl.py --config_path release/cfgs/robomimic_rl/square_rft.yaml --pretrain_only 1 --pretrain_num_epoch 10 --load_pretrained_agent None

Robomimic (state)

IBRL

Train IBRL using the provided state BC policies:

# can state
python train_rl.py --config_path release/cfgs/robomimic_rl/can_state_ibrl.yaml

# square state
python train_rl.py --config_path release/cfgs/robomimic_rl/square_state_ibrl.yaml

To train a state BC policy from scratch:

# can
python train_bc.py --config_path release/cfgs/robomimic_bc/can_state.yaml

# square
python train_bc.py --config_path release/cfgs/robomimic_bc/square_state.yaml

RLPD

# can state
python train_rl.py --config_path release/cfgs/robomimic_rl/can_state_rlpd.yaml

# square state
python train_rl.py --config_path release/cfgs/robomimic_rl/square_state_rlpd.yaml

RFT

Since state policies are fast to train, we can just run pretrain and RL fine-tuning in one step.

# can
python train_rl.py --config_path release/cfgs/robomimic_rl/can_state_rft.yaml

# square
python train_rl.py --config_path release/cfgs/robomimic_rl/square_state_rft.yaml

Metaworld

IBRL

Train RL policy using the BC policy provided in release folder

# assembly
python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/ibrl_basic.yaml --bc_policy assembly

# boxclose
python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/ibrl_basic.yaml --bc_policy boxclose

# coffeepush
python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/ibrl_basic.yaml --bc_policy coffeepush

# stickpull
python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/ibrl_basic.yaml --bc_policy stickpull

If you want to train BC policy from scratch

python mw_main/train_bc_mw.py --dataset.path Assembly --save_dir SAVE_DIR

RPLD

Note that we still specify bc_policy to specify the task name, but we don't use it in baselines. This is special to train_rl_mw.py.

python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/rlpd.yaml --bc_policy assembly --use_wb 0

RFT

For simplicity, here this one command performs both pretraining and RL training.

python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/rft.yaml --bc_policy assembly --use_wb 0

Citation

@misc{hu2023imitation,
    title={Imitation Bootstrapped Reinforcement Learning},
    author={Hengyuan Hu and Suvir Mirchandani and Dorsa Sadigh},
    year={2023},
    eprint={2311.02198},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
bc		bc
common_utils		common_utils
env		env
evaluate		evaluate
mw_main		mw_main
networks		networks
release/cfgs		release/cfgs
rl		rl
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh
set_env.sh		set_env.sh
train_bc.py		train_bc.py
train_rl.py		train_rl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started

Running the Container without rendering :

Running the container with rendering :

Create an API key for progress monitoring

Robomimic (pixel)

IBRL

RLPD

RFT (Regularized Fine-Tuning)

Robomimic (state)

IBRL

RLPD

RFT

Metaworld

IBRL

RPLD

RFT

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Running the Container without rendering :

Running the container with rendering :

Create an API key for progress monitoring

Robomimic (pixel)

IBRL

RLPD

RFT (Regularized Fine-Tuning)

Robomimic (state)

IBRL

RLPD

RFT

Metaworld

IBRL

RPLD

RFT

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages