Skip to content

DATEXIS/CliniBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CliniBench: Clinical Outcome Prediction Benchmark

CliniBench is the first comprehensive benchmark for comparing encoder-based classifiers and generative large language models (LLMs) for discharge diagnosis prediction from admission notes in the MIMIC-IV dataset.

Installation

Prerequisites

  • Python 3.11
  • CUDA-capable GPU (recommended for encoder training and LLM inference)
  • Access to MIMIC-IV dataset (requires PhysioNet credentialing)

Setup with uv (recommended)

# Clone the repository
git clone https://github.com/your-org/clinibench.git
cd clinibench

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Activate the virtual environment
source .venv/bin/activate

Docker Setup

# Build the Docker image
docker build -t clinibench:latest .

# Run container with GPU support
docker run --gpus all -it -v $(pwd):/clinibench clinibench:latest {script_name.py}

Dataset Preparation

Accessing MIMIC-IV

  1. Complete the CITI "Data or Specimens Only Research" course
  2. Request access to MIMIC-IV on PhysioNet
  3. Download MIMIC-IV dataset (version 2.2)
  4. Download MIMIC-IV-Note dataset (version 2.2)
  5. Follow https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/buildmimic/postgres for the general mimic-iv content
  6. Follow https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iv-note/buildmimic/postgres/README.md for mimic-iv notes

Generating the admission notes and dataset splits

Adjust the PostgreSQL connection string (username, password, database name, hostname and port) in the create_admission_note_dataset.sh script and execute it. You may adjust the paths where the dataset is stored within the script. All following steps and default parameters assume the paths defined in the script.

This script will 1. create the admission from discharge notes dataset, 2. create the train/dev/test splits from the full dataset and 3. create a file that contains the notes from all train splits which is used for few-shot experiments.

Usage

Step 1: Start the vLLM server

Launch a vLLM server that provides access to your model. By default, the script assumes it's running on localhost. If your server is on a different host, you'll need to adjust the --vllm_ip parameter in Step 2.

Step 2a: Generate diagnosis predictions (zero-shot)

Run the prediction script with your model and data:

python src/generative_models/predict_diagnoses.py \
   --vllm_ip=http://localhost:8000 \
   --test_data=data/mimic-iv/icd-10/icu/test_10_icu.parquet \
   --icd10_code_file=data/icd10_codes.csv \
   --icd9_code_file=data/icd9_codes.csv \
   --results_file=qwen2.5-3B-mimic-iv-icu-icd10.parquet

Step 2b: Generate diagnosis predictions (few-shot)

Run the prediction script including few-shot examples:

python src/generative_models/predict_diagnoses.py \
    --vllm_ip=http://localhost:8000 \
    --test_data=data/mimic-iv/icd-10/icu/test_10_icu.parquet \
    --icd10_code_file=data/icd10_codes.csv \
    --icd9_code_file=data/icd9_codes.csv \
    --results_file=qwen2.5-3B-mimic-iv-icu-icd10.parquet \
    --few_shot \
    --few_shot_note_data=./data/fewshot-candidates/notes.parquet \
    --few_shot_ids=./data/fewshot-candidates/gold_shots/test_few_shots.pq \
    --few_shot_column=icd10_icu \
    --num_few_shot_candidates=5

Step 3: Evaluate the predictions

Map the generated diagnosis descriptions to ICD codes and calculate evaluation metrics:

python src/generative_models/map_and_evaluate.py \
    --predictions_file_path=qwen2.5-3B-mimic-iv-icu-icd10.parquet \
    --results_save_path=results.json \
    --icd_version=10 \
    --icd10_code_file=data/icd10_codes.csv \
    --icd9_code_file=data/icd9_codes.csv

Step 4: View results

All evaluation metrics will be saved to the file specified in --results_save_path (e.g., results.json).

Citation

If you use CliniBench in your research, please cite:

@misc{grundmann2025clinibenchclinicaloutcomeprediction,
      title={CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models}, 
      author={Paul Grundmann and Dennis Fast and Jan Frick and Thomas Steffek and Felix Gers and Wolfgang Nejdl and Alexander Löser},
      year={2025},
      eprint={2509.26136},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.26136}, 
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages