How to eval on Emu edit benchmark

Hi,

Noticed that there are known issues with the Emu edit benchmark: some image-caption pairs seem incorrect (e.g., 'a train station in city') or identical source and target captions. So I was wondering how to calculate clip_dir metric. How did you process the benchmark dataset?

Looking forward to your reply.