RoboBrain-Dex: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model

robobrain-dex.mp4

Robobrain-Dex is a foundation model pre-trained on multi-source egocentric datasets, demonstrating exceptional capabilities in dexterous manipulation.

Architecture

Robobrain-Dex utilizes Robobrain2.0 as the base Vision-Language Model. It undergoes distillation using our custom-designed motion dynamics model while simultaneously being pre-trained on multi-source human egocentric demonstration data. All real-world post-trainings are conducted using embodiment data that was not encountered during the pre-training phase. The model architecture is illustrated in the figure below:

Set Up Conda Environments

git clone https://github.com/FlagOpen/RoboBrain_Dex
conda create -n robobrain-dex python=3.10 -y
conda activate robobrain-dex
# Our experiments are conducted with 'torch 2.9.0'

pip install -e .

# Install Flash Attention 2 for training (https://github.com/Dao-AILab/flash-attention)
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

Data Process

We use RLDS format data for model finetuning. We provide a script to convert JSON data from Unitree's official data collection format (avp_teleoperate/tree/g1) to RLDS.

G1 (Unitree): Convert from Unitree official JSON format to RLDS (data layout: <input_root>/<task_name>/episode_XXXXXX/data.json and colors/ images). The JSON must record arm end-effector poses in the base frame (left_hand_pos_state, right_hand_pos_state) and finger tip poses in the wrist frame (left_fingertip_pos_state, right_fingertip_pos_state).

We provide post-training data for the task of pour the drink, which can be used as a reference for the data format.

cd data_process/g1_process
python build_rlds_from_g1_state_action.py --input_root [input_root] --task_name [task_name] --output_dir [output_dir]

After converting to RLDS, register the dataset (which, for example, pour_the_drink) with our dataloader by adding an entry for it in configs.py (here), transforms.py (here). For reference, in each of these files, there are sample entries for the G1 datasets that we used in our paper.

VLA Posttrain

#download pretrained model
hf auth login
hf download BAAI/RoboBrain-Dex --repo-type model --local-dir /your/local/path

#download pretrained motion dynamic model
hf download BAAI/Motion_Dynamic_Model--repo-type model --local-dir /your/local/path

cd vla-scripts
torchrun --standalone --nnodes 1 --nproc-per-node 8 finetune_g1_rq.py \
    --vla_path "/path/to/robobrain-dex-checkpoint" \
    --data_root_dir "/path/to/datasets" \
    --motion_dynamics_path "/path/to/motion_dynamics.pt" \
    --dataset_name "pour_the_drink" \
    --run_root_dir "finetune_log/test"

Deploy Server

#in the server
conda activate robobrain-dex
cd vla-scripts
torchrun --standalone --nnodes 1 --nproc-per-node 1 deploy_server.py

TODO

The following features are planned for future implementation:

Complete data processing scripts and documentation
Complete deployment scripts on real robots
Complete pretraining scripts and documentation

Acknowledgments

Robobrain-Dex builds on the following excellent open-source projects:

We thank the authors for their contributions to the robotics and machine learning communities.

Citation

If you find our work useful, please consider citing us and give a star to our repository! 🌟🌟🌟

Robobrain-Dex

@article{fu2025metis,
  title={METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model},
  author={Fu, Yankai and Chen, Ning and Zhao, Junkai and Shan, Shaozhe and Yao, Guocai and Wang, Pengwei and Wang, Zhongyuan and Zhang, Shanghang},
  journal={arXiv preprint arXiv:2511.17366},
  year={2025}
}

Robobrain

@article{ji2025robobrain,
  title={RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete},
  author={Ji, Yuheng and Tan, Huajie and Shi, Jiayu and Hao, Xiaoshuai and Zhang, Yuan and Zhang, Hengyuan and Wang, Pengwei and Zhao, Mengdi and Mu, Yao and An, Pengju and others},
  journal={arXiv preprint arXiv:2502.21257},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Robobrain		Robobrain
assets		assets
data_process/g1_process		data_process/g1_process
motion_dynamics_model		motion_dynamics_model
vla-scripts		vla-scripts
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoboBrain-Dex: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model

Architecture

Set Up Conda Environments

Data Process

VLA Posttrain

Deploy Server

TODO

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RoboBrain-Dex: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model

Architecture

Set Up Conda Environments

Data Process

VLA Posttrain

Deploy Server

TODO

Acknowledgments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages