Skip to content

docs(gr00t_inference): symmetric posttrain disclaimer for all POSTTRAIN_TAGS embodiments #213

@cagataycali

Description

@cagataycali

Context

Follow-up from PR #128 review (R3, R5, R7 threads).

The gr00t_inference tool docstring now warns that unitree_g1_sonic requires a finetuned checkpoint, but per upstream Isaac-GR00T@3df8b38/gr00t/data/embodiment_tags.py::POSTTRAIN_TAGS, the full posttrain set is:

POSTTRAIN_TAGS = {
    EmbodimentTag.UNITREE_G1,           # -> unitree_g1_full_body_with_waist_height_nav_cmd
    EmbodimentTag.UNITREE_G1_SONIC,     # -> unitree_g1_sonic         (covered by #128)
    EmbodimentTag.SIMPLER_ENV_GOOGLE,
    EmbodimentTag.SIMPLER_ENV_WIDOWX,
    EmbodimentTag.LIBERO_PANDA,         # -> libero_sim
}

Problem

A user reading the gr00t_inference docstring today would assume all listed Unitree-G1 / LIBERO embodiments are interchangeable with unitree_g1_real (which is in PRETRAIN_TAGS, baked into nvidia/GR00T-N1.7-3B). They aren't -- pointing the base checkpoint at any posttrain tag silently emits garbage actions.

Additional concern raised in PR #128 R7 review

The disclaimer says requires a finetuned checkpoint, which warns about the model side but not the action side: action.motion_token returned by SONIC inference is a 64-dim learned latent, NOT an executable joint command. A user wiring the policy output straight into a Unitree G1 SDK call gets garbage even with a properly fine-tuned checkpoint, because the SONIC C++ decoder (separate process, 50 Hz, from NVlabs/GR00T-WholeBodyControl) is in the loop.

Reviewer suggestion (PR #128 R7, non-blocking):

unitree_g1_sonic (SONIC whole-body controller -- VLA emits 64-dim motion-token latents that must be decoded to joint commands by the SONIC runtime from GR00T-WholeBodyControl; requires a finetuned checkpoint, not the base nvidia/GR00T-N1.7-3B)

Proposal

Replace the per-line disclaimer (currently only on the SONIC entry) with a structural split that mirrors upstream's PRETRAIN_TAGS / POSTTRAIN_TAGS partitioning, AND extend the SONIC entry to call out the decoder requirement. Sketch:

**Pretrain configs (work directly with nvidia/GR00T-N1.7-3B):**
  - ``unitree_g1_real``
  - ``fourier_gr1_arms_only`` ...

**Posttrain configs (require a finetuned checkpoint, not the base model):**
  - ``unitree_g1`` / ``unitree_g1_full_body``
  - ``unitree_g1_sonic`` (action stream is SONIC motion-token latents;
    requires GR00T-WholeBodyControl decoder to produce joint commands)
  - ``libero_panda`` / ``libero_sim``
  - ``simpler_env_google``, ``simpler_env_widowx``

This scales as upstream adds new posttrain entries: one footnote per category instead of per-line annotations.

Pin contract

Update / extend tests/policies/groot/test_data_config.py::test_gr00t_inference_docstring_flags_sonic_as_posttrain into a parametrised test covering every entry in upstream's POSTTRAIN_TAGS so a future addition of a posttrain embodiment fails CI until the docstring is updated.

Additionally, harden the SONIC-specific pin so EVERY occurrence of unitree_g1_sonic in the docstring carries the disclaimer, not just the first (R7 reviewer suggestion):

import re
occurrences = re.findall(r"``unitree_g1_sonic``[^`]*", doc)
assert occurrences
for occ in occurrences:
    assert "finetuned checkpoint" in occ, f"missing disclaimer in: {occ!r}"

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions