mini-pi0

mini-pi0 is a compact research codebase for training flow-matching robot action policies on ManiSkill demonstrations. The current stack centers on mini_pi0_fm: an image-conditioned action-chunk policy with transformer, CNN1D, and UNet1D denoisers.

Demo Gallery

StackCube Motion Planning

95.5% success over 200 eval episodes with the medium transformer + ViT policy.

Click the preview to open the MP4.

StackPyramid Motion Planning

26.0% success over 50 eval episodes with the medium transformer + ViT policy.

Base camera success	Wrist camera success

Click a preview to open the MP4.

PegInsertionSide Diagnostics

PegInsertionSide is the current hard task. The latest contact + hole-camera policy reaches 10.0% success over 100 eval episodes, with most remaining failures reaching the hole but timing out before stable insertion.

Base camera success	Left hole camera success	Right hole camera success

Click a preview to open the MP4.

What Is In This Repo

ManiSkill simulation workflows for collection, replay, conversion, training, eval, deploy-sim, and diagnostics.
mini_pi0_fm flow-matching policy with:
- action backbones: transformer, cnn1d, unet1d
- vision backbones: resnet18, timm
- multi-camera cross-attention conditioning
- observation history and chunked action prediction
Robomimic-style HDF5 loading for converted ManiSkill trajectories.
Action diagnostics for comparing flow steps and per-dimension clipping.
PegInsertion support with close hole cameras and optional contact features.
IsaacLab adapter scaffold kept for future integration work.

Repository Layout

mini_pi0/
  cli/         # train/eval/convert/collect command entrypoints
  config/      # typed dataclass config schema and YAML loading
  dataset/     # HDF5 episode loading, conversion, torch datasets
  sim/         # simulator adapters and custom ManiSkill environments
  models/      # mini_pi0_fm model and registry
  train/       # training loop, optimizer, checkpointing
  eval/        # rollout eval, action diagnostics, grid videos
  deploy/      # simulation deployment loop
  utils/       # runtime/device/parity helpers

examples/configs/
  maniskill3_stackcube_motionplanning_transformer_vit_hist2_medium.yaml
  maniskill3_tray_transformer_vit_hist2_chunk16.yaml
  maniskill3_peginsertion_motionplanning_transformer_vit_hist3_medium_holecam_contacts.yaml

Install

This project is uv first. Use Python 3.11 for the ManiSkill + LeRobot stack.

uv venv --python 3.11 .venv
. .venv/bin/activate
uv pip install -e ".[maniskill3,vision,dev]"

For LeRobot v3 support in the same environment as ManiSkill, install with the repo constraints so the simulator stack stays compatible:

uv pip install -e ".[maniskill3,lerobot,vision,dev]" \
  -c constraints-maniskill-lerobot.txt

Pip fallback:

python -m venv .venv
. .venv/bin/activate
pip install -e ".[maniskill3,lerobot,vision,dev]" \
  -c constraints-maniskill-lerobot.txt

On shared machines with read-only Hugging Face cache paths, set a writable cache:

export HF_HOME=/tmp/minipi_hf_cache

Quickstart

Train the StackCube transformer + ViT policy:

mini-pi0 train \
  --config examples/configs/maniskill3_stackcube_motionplanning_transformer_vit_hist2_medium.yaml

Evaluate a checkpoint:

mini-pi0 eval \
  --config examples/configs/maniskill3_stackcube_motionplanning_transformer_vit_hist2_medium.yaml \
  --set eval.checkpoint=runs/<experiment>/run1/checkpoints/best.pt \
  --set eval.action_stats_path=runs/<experiment>/run1/artifacts/action_stats.json \
  --set eval.record_grid=true \
  --set eval.grid_cameras='["base_camera","hand_camera"]'

Run offline action diagnostics:

python -m mini_pi0.eval.action_diagnostics \
  --config examples/configs/maniskill3_stackcube_motionplanning_transformer_vit_hist2_medium.yaml \
  --checkpoint runs/<experiment>/run1/checkpoints/best.pt \
  --action_stats runs/<experiment>/run1/artifacts/action_stats.json \
  --flow_steps 4,6,8

ManiSkill Data Workflow

Download built-in ManiSkill demonstrations:

python -m mani_skill.utils.download_demo StackCube-v1 \
  --output_dir demos/maniskill

Replay demonstrations with RGBD observations and the target controller:

python -m mani_skill.trajectory.replay_trajectory \
  --traj-path demos/maniskill/StackCube-v1/motionplanning/trajectory.h5 \
  --obs-mode rgbd \
  --target-control-mode pd_joint_pos \
  --save-traj

Convert the replayed trajectory into the training schema:

mini-pi0 convert-maniskill-trajectory \
  --input_hdf5 demos/maniskill/StackCube-v1/motionplanning/trajectory.rgbd.pd_joint_pos.physx_cpu.h5 \
  --output_hdf5 data/robomimic/maniskill/stackcube/mp/rgbd_pd_joint_pos.hdf5 \
  --overwrite

Or convert a replayed ManiSkill trajectory directly into LeRobot v3 format with ManiSkill's official converter:

python -m mani_skill.trajectory.convert_to_lerobot \
  --traj-path demos/maniskill/StackCube-v1/motionplanning/trajectory.rgbd.pd_joint_pos.physx_cpu.h5 \
  --output-dir data/lerobot/stackcube-rgbd-pd-joint-pos \
  --task-name "Stack cube" \
  --fps 20 \
  --image-size 128x128 \
  --robot-type panda

Existing robomimic HDF5 files can also be ported to LeRobot v3:

mini-pi0 convert-robomimic-to-lerobot \
  --input_hdf5 data/robomimic/maniskill/stackcube/mp/rgbd_pd_joint_pos.hdf5 \
  --output_dir data/lerobot/stackcube-rgbd-pd-joint-pos \
  --repo_id local/stackcube-rgbd-pd-joint-pos \
  --task_name "Stack cube" \
  --fps 20 \
  --image_keys agentview_image,robot0_eye_in_hand_image \
  --overwrite

Train from LeRobot v3 by switching the dataset format:

mini-pi0 train \
  --config examples/configs/maniskill3_stackcube_motionplanning_transformer_vit_hist2_medium.yaml \
  --set data.format=lerobot_v3 \
  --set data.lerobot_repo_id=local/stackcube-rgbd-pd-joint-pos \
  --set data.lerobot_root=data/lerobot/stackcube-rgbd-pd-joint-pos \
  --set data.lerobot_image_keys='["observation.images.agentview_image","observation.images.robot0_eye_in_hand_image"]'

For full data notes, custom environment collection, and task conversion details, see docs/DATASETS.md and docs/SIMULATION.md.

PegInsertionSide

PegInsertionSide is tracked separately because it needs better visual access to the hole and benefits from contact diagnostics. This branch includes:

MiniPi0PegInsertionSide-v1, a repo-local ManiSkill environment with close hole_left_camera and hole_right_camera sensors.
Replay helpers for local env registration.
Contact extraction into HDF5 under obs/* keys.
A contact-aware config: examples/configs/maniskill3_peginsertion_motionplanning_transformer_vit_hist3_medium_holecam_contacts.yaml.

Prepare the hole-camera contact dataset:

NUM_ENVS=16 tools/prepare_peginsertion_holecam_contacts.sh

Train the current PegInsertion config:

mini-pi0 train \
  --config examples/configs/maniskill3_peginsertion_motionplanning_transformer_vit_hist3_medium_holecam_contacts.yaml

Read the task-specific notes in docs/PEG_INSERTION.md.

Config Notes

The main knobs are:

model.action_backbone: transformer, cnn1d, or unet1d
model.vision_backbone: resnet18 or timm
model.conditioning_mode: cross_attention or global
model.obs_horizon: number of observation frames
data.chunk_size and model.chunk_size: predicted action horizon
robot.image_keys: ordered camera observations
robot.state_keys: proprio/contact state keys
eval.grid_cameras: one or more cameras for saved rollout grids

When camera keys, state keys, action dimension, or chunk size change, retrain or use a checkpoint trained with the same interface.

Validation

Run focused checks:

python -m pytest \
  tests/test_config.py \
  tests/test_model_registry.py \
  tests/test_fm_architecture.py \
  tests/test_training_stability_controls.py \
  tests/test_eval_weight_source.py \
  -q

Run the full suite:

python -m pytest -q

Results

Full task benchmark tracking lives in docs/TASK_BENCHMARK.md.

StackCube-v1

Config: examples/configs/maniskill3_stackcube_motionplanning_transformer_vit_hist2_medium.yaml

Metric	Value
Success rate	95.5%
CI95	92.5% - 98.0%
Episodes	200
Mean episode length	166.2 steps
Mean inference speed	31.9 ms/chunk

Artifacts:

StackPyramid-v1

Config: examples/configs/maniskill3_stackpyramid_motionplanning_transformer_vit_hist2_medium.yaml

Metric	Value
Success rate	26.0%
CI95	14.0% - 38.0%
Episodes	50
Mean episode length	444.9 steps
Mean inference speed	32.6 ms/chunk
Failure modes	37 no progress, 13 success

Artifacts:

PullCubeTool-v1

Variant: pd_ee_delta_pose small policy with hist2 base + wrist cameras.

Metric	Value
Success rate	31.0%
CI95	23.0% - 41.0%
Episodes	100
Mean episode length	408.4 steps
Mean inference speed	27.7 ms/chunk
Failure modes	69 timeout after progress, 31 success

Base camera success	Wrist camera success

Artifacts:

PegInsertionSide-v1

Config: examples/configs/maniskill3_peginsertion_motionplanning_transformer_vit_hist3_medium_holecam_contacts.yaml

Metric	Value
Success rate	10.0%
CI95	5.0% - 16.0%
Episodes	100
Mean episode length	474.8 steps
Mean inference speed	43.4 ms/chunk
Failure modes	90 timeout after progress, 10 success

Artifacts:

Current interpretation: the policy now solves a small but real fraction of rollouts. Most failures still make progress toward the hole, so the next useful step is phase diagnostics: grasp state, peg-hole distance, angular alignment, insertion depth, contact force, and jamming signals.

For the task-wise benchmark table, see docs/TASK_BENCHMARK.md.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
docs		docs
examples/configs		examples/configs
mini_pi0		mini_pi0
tests		tests
tools		tools
.gitignore		.gitignore
README.md		README.md
constraints-maniskill-lerobot.txt		constraints-maniskill-lerobot.txt
deploy_so100.py		deploy_so100.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mini-pi0

Demo Gallery

StackCube Motion Planning

StackPyramid Motion Planning

PegInsertionSide Diagnostics

What Is In This Repo

Repository Layout

Install

Quickstart

ManiSkill Data Workflow

PegInsertionSide

Config Notes

Validation

Results

StackCube-v1

StackPyramid-v1

PullCubeTool-v1

PegInsertionSide-v1

TODO

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mini-pi0

Demo Gallery

StackCube Motion Planning

StackPyramid Motion Planning

PegInsertionSide Diagnostics

What Is In This Repo

Repository Layout

Install

Quickstart

ManiSkill Data Workflow

PegInsertionSide

Config Notes

Validation

Results

StackCube-v1

StackPyramid-v1

PullCubeTool-v1

PegInsertionSide-v1

TODO

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages