Hi,
I tried to reproduce mova-360p’s LSE-C and LSE-D metrics on Verse-Bench, but ran into some issues. I followed the instructions in Verse-Bench (Hugging Face dataset) and Verse-Bench (GitHub repo) to prepare the test data and run the evaluation. For video inference, I used the same settings as the office code demo.
However, my average LSE-C is 5.41 and the average LSE-D is 13.303, which are much worse than the numbers reported in the paper. When I visualized the generated videos, they didn’t look obviously strange.
I uploaded the first two inference results from set3 here:
Google Drive link
In my tests:
- For case0: LSE-C = 4.4768114, LSE-D = 12.80221240751205
- For case-1: LSE-C = 6.9611187, LSE-D = 13.244998070501513
I’m not sure what might be causing this discrepancy.