Skip to content

feat: add CRPE-Relation task#1354

Merged
kcz358 merged 1 commit into
EvolvingLMMs-Lab:mainfrom
njb-nvidia:add-crpe_relation-task
May 28, 2026
Merged

feat: add CRPE-Relation task#1354
kcz358 merged 1 commit into
EvolvingLMMs-Lab:mainfrom
njb-nvidia:add-crpe_relation-task

Conversation

@njb-nvidia
Copy link
Copy Markdown
Contributor

Summary

Adds CRPE-Relation, a 7,576-item single-image MCQ on object / predicate / subject relationships drawn from The All-Seeing Project V2.

Dataset: `nv-njb/CRPE` — a bundled re-host of `OpenGVLab/CRPE`.

Why a re-host

The original `OpenGVLab/CRPE` repo ships the `crpe_relation.jsonl` annotation file alongside 544 `abnormal_images/` JPEGs, but the remaining 5,400 records reference COCO val2017 images by relative path — those JPEGs are not in the HF repo, so out-of-the-box `load_dataset` cannot resolve them.

The re-host inlines all 1,081 unique referenced images (537 from COCO val2017 + 544 from abnormal_images) as JPEG bytes under an `Image()` feature. Result: a self-contained parquet (~1 GB across 4 shards) that loads end-to-end via standard `load_dataset` — no extra COCO download needed.

Files

  • `lmms_eval/tasks/crpe_relation/crpe_relation.yaml` — task config.
  • `lmms_eval/tasks/crpe_relation/utils.py` — doc transforms, `MultiChoiceRegexFilter` (letter-first, then choice-text substring; strips ``/`` wrappers).

Parity vs. local fork

Qwen3-VL-2B-Instruct, full `test` split (7,576 items), 8x H100, greedy decoding.

Source Accuracy Stderr
Fork (vllm backend) 0.7401 ±0.005
Upstream (HF simple/qwen3_vl) 0.7418 ±0.005
Identical `filtered_resps` 7,174 / 7,576 (94.7%)
Verdict agreement 95.7%
Δ +0.17 pp

Essentially identical — well within stderr.

Test plan

  • `uv run lmms-eval --tasks crpe_relation --limit 8` smoke
  • Full 7,576-doc run on 8x H100 with Qwen3-VL-2B-Instruct; matches the fork's vllm score within 0.2 pp
  • Per-doc analysis: 94.7% identical predictions, 95.7% verdict agreement
  • Bundled parquet loads via `load_dataset("nv-njb/CRPE", split="test")` without external dependencies

CRPE-Relation is a 7,576-item single-image MCQ on object/predicate/
subject relationships, drawn from The All-Seeing Project V2.

Dataset: nv-njb/CRPE — a bundled re-host of the original
OpenGVLab/CRPE annotations (which ship only the 544 abnormal_images/
JPEGs, while the remaining 5,400 records reference COCO val2017 by
relative path). The re-host inlines all 1,081 unique images
(537 COCO val2017 + 544 abnormal) as JPEG bytes under an Image()
feature so the parquet loads end-to-end via standard load_dataset
with no extra COCO download.

Metric: exact_match (flexible-extract) on the MCQ letter. The filter
parses inline A./B./C./D. choices out of the question text, then
tries (1) leading uppercase letter, (2) substring-match against any
choice text. Handles common reasoning wrappers (<think>...</think>,
<answer>...</answer>).
@kcz358 kcz358 merged commit b71d25a into EvolvingLMMs-Lab:main May 28, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants