docs: explain action string controls#58
Open
Travor278 wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Documents how
--action_stringexpands into WASD/IJKL action arrays and how those keys are converted into camera movement forLingBot-World-Base (Act).The README now covers segment syntax, multi-key segments,
none, inferred/paddedframe_num, key-column order, movement/rotation meanings, and a ViPE preprocessing note for videos with UI overlays. The CLI help for--action_stringnow also names the WASD/IJKL mapping.Related Issue
Resolves #37
Related to #26
Motivation and Context
Several users asked how WASD is derived from camera poses and what action vectors such as
[0,0,1,1]mean. The implementation already defines this mapping inwan/utils/wasd_ijkl_to_c2ws.py, but users currently need to read the code to understand the control semantics.This PR moves the most important control contract into the README and CLI help without changing generation behavior.
How Has This Been / Can This Be Tested?
Environment: WSL2 Ubuntu2204, Python 3.10.12 venv at
/home/Travor/venvs/starvla-332.Parser smoke:
python -c 'from wan.utils.wasd_ijkl_to_c2ws import action_string_to_wasd_ijkl, infer_frame_num_from_action_string, pad_frame_num_to_4n_plus_1, wasd_array_to_frame_keys; wasd, ijkl, frames = action_string_to_wasd_ijkl("w-2,ij-1,none-2,d-1"); assert frames == 6; assert infer_frame_num_from_action_string("w-2,ij-1,none-2,d-1") == 6; assert pad_frame_num_to_4n_plus_1(6) == 9; assert wasd.shape == (6, 4); assert ijkl.shape == (6, 4); assert wasd_array_to_frame_keys(wasd, ijkl) == [["w"], ["w"], ["i", "j"], [], [], ["d"]]'Also ran:
Checklist
frame_numand4n+1padding behavior.