We reproduced Cosmos3-Nano on LIBERO-10 and matched Table 20's 97.4% at checkpoint 2000.
We're working towards fine-tuning Cosmos3 as a robot policy for the Unitree G1 humanoid. LIBERO-10 is a good warm-up for that: like the G1, it's post-training onto an out-of-distribution embodiment.
A few things weren't obvious from the shipped scripts, so we're writing them down for the next person.
1. Complementing the repo
Things that are missing or won't load as shipped:
- The LIBERO recipe isn't on
main (only the DROID config + server script are) — we found it on
mharrim-nv-patch-1.
- We had to reconstruct two imported files even there:
libero_pose_utils and
cosmos3_action_lerobot._patch_decoder_cache.
2. Training
The shipped config is fine, just one thing to know:
- The config trains on all four suites by default. That gives libero_10 only ~1 pass in 2000 steps
→ ~82%. The same 2000 steps on libero_10 alone (~2.7 passes) → the 97% in the paper.
3. Writing the eval loop — the repo has none
The repo has no closed-loop LIBERO eval client (just the inference server), so we wrote our own.
Conventions we had to match - or else the score tanks:
- Image: the sim frames are rotated 180° vs training, so rotate them back with
img[::-1,::-1].
- Gripper: the model outputs
[0,1] but robosuite wants [-1,1] (negative = open). Convert
with 1 - 2·g; pass it through raw and the gripper never opens.
- Server: the server's normalization flags already exist — just remember to use them
(--action-normalization quantile_rot + the libero rot6d stats file), or actions come out the
wrong scale.
Hope this saves someone the digging.
We reproduced Cosmos3-Nano on LIBERO-10 and matched Table 20's 97.4% at checkpoint 2000.
We're working towards fine-tuning Cosmos3 as a robot policy for the Unitree G1 humanoid. LIBERO-10 is a good warm-up for that: like the G1, it's post-training onto an out-of-distribution embodiment.
A few things weren't obvious from the shipped scripts, so we're writing them down for the next person.
1. Complementing the repo
Things that are missing or won't load as shipped:
main(only the DROID config + server script are) — we found it onmharrim-nv-patch-1.libero_pose_utilsandcosmos3_action_lerobot._patch_decoder_cache.2. Training
The shipped config is fine, just one thing to know:
→ ~82%. The same 2000 steps on libero_10 alone (~2.7 passes) → the 97% in the paper.
3. Writing the eval loop — the repo has none
The repo has no closed-loop LIBERO eval client (just the inference server), so we wrote our own.
Conventions we had to match - or else the score tanks:
img[::-1,::-1].[0,1]but robosuite wants[-1,1](negative = open). Convertwith
1 - 2·g; pass it through raw and the gripper never opens.(
--action-normalization quantile_rot+ the libero rot6d stats file), or actions come out thewrong scale.
Hope this saves someone the digging.