Medical Vision-Language Model Optimization Framework

This framework provides a comprehensive evaluation and optimization system for medical vision-language models using DSPy. It includes evaluation metrics, multiple medical datasets, and optimization strategies.

5 Medical VLM Experiments:
- VQA RAD: Visual Question Answering on Radiology images
- Gastrovision: Gastroenterology endoscopy classification
- CheXpert: Chest X-ray classification
- DDI Disease: Dermatology disease diagnosis
- DDI Skintone: Skin tone classification
All datasets analyzed in this study are publicly available. No proprietary or restricted datasets are used.
4 DSPy Optimization Strategies:
- BootstrapFewShotWithRandomSearch
- MIPROv2
- SIMBA
- GEPA

Framework Overview

Installation

pip install -r requirements.txt

Usage

Running Single Experiments

python scripts/run_experiment.py \
  --experiment vqa_rad \
  --model "your-model-name" \
  --api_base "your-api-base" \
  --api_key "your-api-key"

Running Batch Experiments

python scripts/batch_run.py \
  --model "your-model-name" \
  --api_base "your-api-base" \
  --api_key "your-api-key" \
  --experiments vqa_rad chexpert

Available Experiments

vqa_rad: Visual Question Answering on Radiology images
chexpert: Chest X-ray classification
ddi_disease: Dermatology disease diagnosis
ddi_skintone: Skin tone classification
gastrovision: Gastroenterology endoscopy classification

Configuration

The framework uses configurable paths in config/paths.py. Update the BASE_DATA_DIR to point to your data directory:

BASE_DATA_DIR = Path("/your/data/directory")

Directory Structure

medvlm_optimization/
├── config/               # Configuration files
├── src/
│   ├── experiments/      # Individual experiment implementations
│   ├── utils/           # Utility functions
│   ├── metrics.py       # Evaluation metrics
│   └── main.py          # Main execution logic
├── scripts/             # CLI scripts
├── outputs/             # Generated logs and results
└── requirements.txt     # Python dependencies

Output

Results are logged to outputs/logs/ with detailed experiment information and performance metrics.

Data Requirements

The framework expects data in the following structure:

CheXpert: CSV files with image paths and labels
DDI: CSV metadata files and image directories
Gastrovision: CSV files with base64-encoded images
VQA RAD: Loads from HuggingFace datasets

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSES		LICENSES
config		config
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Vision-Language Model Optimization Framework

Framework Overview

Installation

Usage

Running Single Experiments

Running Batch Experiments

Available Experiments

Configuration

Directory Structure

Output

Data Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Medical Vision-Language Model Optimization Framework

Framework Overview

Installation

Usage

Running Single Experiments

Running Batch Experiments

Available Experiments

Configuration

Directory Structure

Output

Data Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages