Hi, yushi.
by comparing between the result generated from PromptCap module and pred_caption from OKVQA_val_gpt3.json, I find that they are slightly different.
image_id = "285291".zfill(12)
image_path = os.path.join("/ai/liuchenhui/Dataset/okvqa/okvqa_val_vinvl", f"COCO_val2014_{image_id}.jpg")
prompt = "please describe this image according to the given question: What is the common name for this type of hill?"
result = model.caption(prompt, image_path)
# a girl in a red jacket skiing down a snowy hill
# pred_caption: a little girl in a red jacket standing on a snow covered slope
image_id = "315668".zfill(12)
image_path = os.path.join("/ai/liuchenhui/Dataset/okvqa/okvqa_val_vinvl", f"COCO_val2014_{image_id}.jpg")
prompt = "please describe this image according to the given question: What country does this appear to be?"
result = model.caption(prompt, image_path)
# a herd of sheep in a field in england
# a man standing next to a herd of sheep in england
I wonder if there is way I can generate the same captions as in the evaluation logs.
Hi, yushi.
by comparing between the result generated from
PromptCapmodule andpred_captionfromOKVQA_val_gpt3.json, I find that they are slightly different.I wonder if there is way I can generate the same captions as in the evaluation logs.