Skip to content

Cosmos3 Inverse/Forward Dynamics: how to obtain observation.state (LeRobot-compatible) alongside predicted actions? #192

@BlackMamba19

Description

@BlackMamba19

I am evaluating Cosmos3 inverse dynamics (video → action) and I would like to build LeRobot-format episodes for downstream training/finetuning (e.g., observation.images.*, observation.state, action, timestamps, etc.).
Cosmos3 inference currently returns an action tensor (e.g. [T, raw_action_dim]) but it is unclear how I should obtain the corresponding state observations (observation.state) required by LeRobot/robot-learning datasets.


Context / Goal

  • I run Cosmos3 inverse dynamics on an input video and obtain:

    • action.data with shape [T, raw_action_dim]
  • I want to append new episodes to an existing dataset in LeRobot/GR00T format (parquet + meta + mp4), which expects at minimum:

    • observation.images.<cam> (video frames)
    • observation.state (per-timestep state vector)
    • action (per-timestep action vector)
    • timestamps / frame_index / episode_index, etc.

My blocker: given only (video, predicted action), how should I obtain observation.state to create a coherent episode?

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions