Hello,
There is a new Wan2.1 based video editing model released this week called SAMA (https://github.com/Cynthiazxy123/SAMA). They used some of your amazing DITTO-1m dataset for training, and mention your model a number of times in reference to their results (Paper: https://arxiv.org/abs/2603.19228)
I assume since the true DITTO edit model has not, afaik, been released, that the scores they used or DITTOs benchmarks were from either the STYLE or GLOBAL models, correct? If this is an accurate assumption, I'm wondering if you feel the model you trained/partially-trained intended specifically for editing would fare against SAMA?
Thanks again for sharing all of you code and models for Ditto! Ditto-1m is clearly helping others continue research into this exciting field!

Hello,
There is a new Wan2.1 based video editing model released this week called SAMA (https://github.com/Cynthiazxy123/SAMA). They used some of your amazing DITTO-1m dataset for training, and mention your model a number of times in reference to their results (Paper: https://arxiv.org/abs/2603.19228)
I assume since the true DITTO edit model has not, afaik, been released, that the scores they used or DITTOs benchmarks were from either the STYLE or GLOBAL models, correct? If this is an accurate assumption, I'm wondering if you feel the model you trained/partially-trained intended specifically for editing would fare against SAMA?
Thanks again for sharing all of you code and models for Ditto! Ditto-1m is clearly helping others continue research into this exciting field!