diff --git a/cookbooks/cosmos3/generator/action/README.md b/cookbooks/cosmos3/generator/action/README.md
index 0562c8b2..f0ebffd6 100644
--- a/cookbooks/cosmos3/generator/action/README.md
+++ b/cookbooks/cosmos3/generator/action/README.md
@@ -42,6 +42,32 @@ fingers.
 
 Action data samples across different embodiments can be inspected interactively in the [Cosmos3 Action Viewer](https://huggingface.co/spaces/nvidia/Cosmos3-Action-Viewer) Hugging Face Space.
 
+### Dataset state, action layout, and normalization
+
+The inverse-dynamics examples predict action trajectories from video inputs.
+They do not infer a full LeRobot episode by themselves: fields such as
+`observation.state`, timestamps, frame indexes, and camera streams must come
+from the source dataset or the robot logging pipeline used to create the
+episode. When building a LeRobot-format dataset, align the predicted `action`
+rows with the original per-timestep observations instead of treating inverse
+dynamics as a replacement for state estimation.
+
+For a concrete LeRobot-style robotics sample, inspect
+[`assets/droid_lerobot_example/`](./assets/droid_lerobot_example/). Its
+metadata declares the DROID state streams
+`observation.state.cartesian_position`,
+`observation.state.joint_positions`, and
+`observation.state.gripper_position`, while the parquet/video assets provide the
+corresponding timestep-aligned records. This is the current checked-in example
+for understanding how action-conditioned robotics inputs relate to dataset
+state fields.
+
+Current checked-in action assets cover the AV, DROID, and UMI examples listed
+above. Use the action JSON files and notebooks as the canonical layout and
+normalization references for those examples. Other embodiment layouts and
+normalization statistics should be treated as model/data-release specific until
+their example assets are published.
+
 ## Run with Cosmos Framework
 
 ### Quickstart