Skip to content

docs: explain action string controls#58

Open
Travor278 wants to merge 1 commit into
Robbyant:mainfrom
Travor278:docs/action-string-controls
Open

docs: explain action string controls#58
Travor278 wants to merge 1 commit into
Robbyant:mainfrom
Travor278:docs/action-string-controls

Conversation

@Travor278
Copy link
Copy Markdown

Description

Documents how --action_string expands into WASD/IJKL action arrays and how those keys are converted into camera movement for LingBot-World-Base (Act).

The README now covers segment syntax, multi-key segments, none, inferred/padded frame_num, key-column order, movement/rotation meanings, and a ViPE preprocessing note for videos with UI overlays. The CLI help for --action_string now also names the WASD/IJKL mapping.

Related Issue

Resolves #37

Related to #26

Motivation and Context

Several users asked how WASD is derived from camera poses and what action vectors such as [0,0,1,1] mean. The implementation already defines this mapping in wan/utils/wasd_ijkl_to_c2ws.py, but users currently need to read the code to understand the control semantics.

This PR moves the most important control contract into the README and CLI help without changing generation behavior.

How Has This Been / Can This Be Tested?

Environment: WSL2 Ubuntu2204, Python 3.10.12 venv at /home/Travor/venvs/starvla-332.

python -m py_compile generate.py wan/utils/wasd_ijkl_to_c2ws.py

Parser smoke:

python -c 'from wan.utils.wasd_ijkl_to_c2ws import action_string_to_wasd_ijkl, infer_frame_num_from_action_string, pad_frame_num_to_4n_plus_1, wasd_array_to_frame_keys; wasd, ijkl, frames = action_string_to_wasd_ijkl("w-2,ij-1,none-2,d-1"); assert frames == 6; assert infer_frame_num_from_action_string("w-2,ij-1,none-2,d-1") == 6; assert pad_frame_num_to_4n_plus_1(6) == 9; assert wasd.shape == (6, 4); assert ijkl.shape == (6, 4); assert wasd_array_to_frame_keys(wasd, ijkl) == [["w"], ["w"], ["i", "j"], [], [], ["d"]]'

Also ran:

git diff --check origin/main...HEAD

Checklist

  • Documents the action string syntax and key-column order.
  • Explains inferred frame_num and 4n+1 padding behavior.
  • Adds CLI help without changing runtime behavior.

@Travor278 Travor278 marked this pull request as ready for review May 28, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How WASD is derived from the camera pose

1 participant