Hi, I'm Huanxin Sheng. I'm so excited that you follow our work and extend to VLM-as-a-judge paradigm!
I briefly skimmed your work, which is really great! I noticed the intervals in your work seem to be quite wide, which might already indicates that VLMs are still not strong enough to act as a good grader for image understanding.
Related to this problem, I'm not sure whether you can test stronger models (e.g. newest Gemini model) to make this claim more convincing, but you may also try some new CP methods for low-cost evidence. As my experience, CIR (https://arxiv.org/pdf/2601.02769) is also a good choice if you are interested, just as good as R2CCP and BoostedCP.
I'm still active in this topic though currently studying on agents. So I'm highly excited and looking forward to further discussion or collaboration. :)
Best,
Huanxin
Hi, I'm Huanxin Sheng. I'm so excited that you follow our work and extend to VLM-as-a-judge paradigm!
I briefly skimmed your work, which is really great! I noticed the intervals in your work seem to be quite wide, which might already indicates that VLMs are still not strong enough to act as a good grader for image understanding.
Related to this problem, I'm not sure whether you can test stronger models (e.g. newest Gemini model) to make this claim more convincing, but you may also try some new CP methods for low-cost evidence. As my experience, CIR (https://arxiv.org/pdf/2601.02769) is also a good choice if you are interested, just as good as R2CCP and BoostedCP.
I'm still active in this topic though currently studying on agents. So I'm highly excited and looking forward to further discussion or collaboration. :)
Best,
Huanxin