feat: support vae for wan2.2#1447
Conversation
ethan686
commented
May 14, 2026
- vae still has very small percision loss, while after 40 steps without cfg, the percision still can be 99.8% compared with mindiesd.
- the current percision is based on the resize mode change, from knearnest to kBicubic; while since there will be new resize method, so this change is not commited now.
Co-authored-by: bubaishenhua112 <bubaishenhua112@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces the AutoencoderKLWan model and a corresponding VideoProcessor for the Wan2.2 architecture. Key changes include the implementation of various 3D causal convolution blocks, residual blocks, and attention mechanisms for video encoding and decoding. Feedback focuses on several critical bugs, such as uninitialized member variables in WanResidualDownBlockImpl leading to null pointer dereferences, and incorrect index resets for shared pointers in the VAE implementation. Additionally, multiple style guide violations were identified, including the use of TORCH_CHECK instead of CHECK, non-compliant naming for local variables and constants, and the use of std::map where std::unordered_map is preferred.
Co-authored-by: bubaishenhua112 <bubaishenhua112@gmail.com>