CliniBench: Clinical Outcome Prediction Benchmark

CliniBench is the first comprehensive benchmark for comparing encoder-based classifiers and generative large language models (LLMs) for discharge diagnosis prediction from admission notes in the MIMIC-IV dataset.

Installation

Prerequisites

Python 3.11
CUDA-capable GPU (recommended for encoder training and LLM inference)
Access to MIMIC-IV dataset (requires PhysioNet credentialing)

Setup with uv (recommended)

# Clone the repository
git clone https://github.com/your-org/clinibench.git
cd clinibench

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Activate the virtual environment
source .venv/bin/activate

Docker Setup

# Build the Docker image
docker build -t clinibench:latest .

# Run container with GPU support
docker run --gpus all -it -v $(pwd):/clinibench clinibench:latest {script_name.py}

Dataset Preparation

Accessing MIMIC-IV

Complete the CITI "Data or Specimens Only Research" course
Request access to MIMIC-IV on PhysioNet
Download MIMIC-IV dataset (version 2.2)
Download MIMIC-IV-Note dataset (version 2.2)
Follow https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/buildmimic/postgres for the general mimic-iv content
Follow https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iv-note/buildmimic/postgres/README.md for mimic-iv notes

Generating the admission notes and dataset splits

Adjust the PostgreSQL connection string (username, password, database name, hostname and port) in the create_admission_note_dataset.sh script and execute it. You may adjust the paths where the dataset is stored within the script. All following steps and default parameters assume the paths defined in the script.

This script will 1. create the admission from discharge notes dataset, 2. create the train/dev/test splits from the full dataset and 3. create a file that contains the notes from all train splits which is used for few-shot experiments.

Usage

Step 1: Start the vLLM server

Launch a vLLM server that provides access to your model. By default, the script assumes it's running on localhost. If your server is on a different host, you'll need to adjust the --vllm_ip parameter in Step 2.

Step 2a: Generate diagnosis predictions (zero-shot)

Run the prediction script with your model and data:

python src/generative_models/predict_diagnoses.py \
   --vllm_ip=http://localhost:8000 \
   --test_data=data/mimic-iv/icd-10/icu/test_10_icu.parquet \
   --icd10_code_file=data/icd10_codes.csv \
   --icd9_code_file=data/icd9_codes.csv \
   --results_file=qwen2.5-3B-mimic-iv-icu-icd10.parquet

Step 2b: Generate diagnosis predictions (few-shot)

Run the prediction script including few-shot examples:

python src/generative_models/predict_diagnoses.py \
    --vllm_ip=http://localhost:8000 \
    --test_data=data/mimic-iv/icd-10/icu/test_10_icu.parquet \
    --icd10_code_file=data/icd10_codes.csv \
    --icd9_code_file=data/icd9_codes.csv \
    --results_file=qwen2.5-3B-mimic-iv-icu-icd10.parquet \
    --few_shot \
    --few_shot_note_data=./data/fewshot-candidates/notes.parquet \
    --few_shot_ids=./data/fewshot-candidates/gold_shots/test_few_shots.pq \
    --few_shot_column=icd10_icu \
    --num_few_shot_candidates=5

Step 3: Evaluate the predictions

Map the generated diagnosis descriptions to ICD codes and calculate evaluation metrics:

python src/generative_models/map_and_evaluate.py \
    --predictions_file_path=qwen2.5-3B-mimic-iv-icu-icd10.parquet \
    --results_save_path=results.json \
    --icd_version=10 \
    --icd10_code_file=data/icd10_codes.csv \
    --icd9_code_file=data/icd9_codes.csv

Step 4: View results

All evaluation metrics will be saved to the file specified in --results_save_path (e.g., results.json).

Citation

If you use CliniBench in your research, please cite:

@misc{grundmann2025clinibenchclinicaloutcomeprediction,
      title={CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models}, 
      author={Paul Grundmann and Dennis Fast and Jan Frick and Thomas Steffek and Felix Gers and Wolfgang Nejdl and Alexander Löser},
      year={2025},
      eprint={2509.26136},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.26136}, 
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
k8s		k8s
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
create_admission_note_dataset.sh		create_admission_note_dataset.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CliniBench: Clinical Outcome Prediction Benchmark

Installation

Prerequisites

Setup with uv (recommended)

Docker Setup

Dataset Preparation

Accessing MIMIC-IV

Generating the admission notes and dataset splits

Usage

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

DATEXIS/CliniBench

Folders and files

Latest commit

History

Repository files navigation

CliniBench: Clinical Outcome Prediction Benchmark

Installation

Prerequisites

Setup with uv (recommended)

Docker Setup

Dataset Preparation

Accessing MIMIC-IV

Generating the admission notes and dataset splits

Usage

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages