Skip to content

openmed-labs/synthvision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SynthVision

SynthVision

Synthetic medical VQA dataset generation and VLM fine-tuning pipeline.

119K medical images annotated by two frontier VLMs (Qwen 3.5, Kimi K2.5), cross-validated at 93% agreement, producing 110K training records. Fine-tuning 3 small models (2-3B params) improves all benchmarks — best model reaches +15.0% average exact match.

Blog post: SynthVision: Building a 110K Synthetic Medical VQA Dataset

Setup

# Requires Python 3.11+
uv sync

Pipeline

1. Build seed dataset

Aggregate and deduplicate images from 4 open medical datasets (ROCO, MultiCaRe, PathVQA, VQA-RAD):

uv run python scripts/build_seeds.py \
    --config configs/datasets.yaml \
    --output data/seeds

2. Annotate with frontier VLMs

Send seed images to a VLM API for multi-turn clinical annotation. The annotation pipeline supports batch inference via Doubleword:

uv run python -c "
from openmed.annotation.pipeline import AnnotationPipeline
pipeline = AnnotationPipeline('configs/annotation.yaml')
pipeline.run(seeds_dir='data/seeds', annotated_dir='data/annotated')
"

3. Prepare training data

Merge validated annotations, deduplicate, and convert to ShareGPT JSONL:

uv run python scripts/prepare_training.py --output data/training

4. Fine-tune

LoRA fine-tuning with assistant-only label masking. Each model family has its own config:

# Qwen2.5-VL-3B
uv run accelerate launch --num_processes=4 --mixed_precision=bf16 \
    scripts/finetune.py --model qwen --data data/training --config configs/qwen2vl_v6.yaml

# Qwen3.5-2B (best model)
uv run accelerate launch --num_processes=4 --mixed_precision=bf16 \
    scripts/finetune.py --model qwen35 --data data/training --config configs/qwen35_d.yaml

# Ministral-3B
uv run accelerate launch --num_processes=4 --mixed_precision=bf16 \
    scripts/finetune.py --model ministral --data data/training --config configs/ministral_d.yaml

5. Evaluate

Benchmark on VQA-RAD, PathVQA, and SLAKE using vLLM batched inference:

# Base model
uv run python scripts/evaluate.py \
    --model base --model-id Qwen/Qwen2.5-VL-3B-Instruct \
    --benchmarks all --output data/eval/base.json

# Fine-tuned model
uv run python scripts/evaluate.py \
    --model OpenMed/Qwen2.5-3B-MedVL \
    --benchmarks all --output data/eval/finetuned.json

Results

Model VQA-RAD PathVQA SLAKE Avg EM vs Base
Qwen3.5-2B (D) 0.5521 0.4748 0.6880 0.5716 +15.0%
Qwen2.5-VL-3B (v6) 0.5211 0.3468 0.6032 0.4904 +8.9%
Ministral-3B (D) 0.4789 0.3669 0.5664 0.4707 +9.6%

HF Hub Artifacts

Asset Link
Seed images OpenMed/synthvision-seeds
Qwen 3.5 annotations OpenMed/synthvision-annotated-qwen
Kimi K2.5 annotations OpenMed/synthvision-annotated-kimi
Qwen validated by Kimi OpenMed/synthvision-validated-qwen-by-kimi
Kimi validated by Qwen OpenMed/synthvision-validated-kimi-by-qwen
Training data OpenMed/synthvision-training
Qwen2.5-3B-MedVL OpenMed/Qwen2.5-3B-MedVL
Qwen3.5-2B-MedVL OpenMed/Qwen3.5-2B-MedVL
Ministral-3B-MedVL OpenMed/Ministral-3B-MedVL

License

Apache-2.0

About

Synthetic medical VQA pipeline: 119K images annotated by frontier VLMs, cross-validated at 93% agreement, fine-tuned on 3 model families (2-3B params)

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages