Question about DITTO model benchmarks mentioned in new paper about the SAMA Model (Open Sourced Instruction-Guided Video Editing model based on Wan2.1)

Hello,

There is  a new Wan2.1 based video editing model released this week called SAMA (https://github.com/Cynthiazxy123/SAMA). They used some of your amazing DITTO-1m dataset for training, and mention your model a number of times in reference to their results (Paper: https://arxiv.org/abs/2603.19228)

I assume since the true DITTO edit model has not, afaik, been released, that the scores they used or DITTOs benchmarks were from either the STYLE or GLOBAL models, correct? If this is an accurate assumption, I'm wondering if you feel the model you trained/partially-trained intended specifically for editing would fare against SAMA? 

Thanks again for sharing all of you code and models for Ditto! Ditto-1m is clearly helping others continue research into this exciting field!

<img width="1591" height="420" alt="Image" src="https://github.com/user-attachments/assets/d476f364-2b19-4bb5-9fbf-479e259b20d0" />

<img width="1633" height="1411" alt="Image" src="https://github.com/user-attachments/assets/c301d855-83b4-47a7-bb0b-902bf40b0d2d" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about DITTO model benchmarks mentioned in new paper about the SAMA Model (Open Sourced Instruction-Guided Video Editing model based on Wan2.1) #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about DITTO model benchmarks mentioned in new paper about the SAMA Model (Open Sourced Instruction-Guided Video Editing model based on Wan2.1) #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions