Skip to content

WZRJohn/TreeDDx

Repository files navigation

TreeDDx

Code release for TreeDDx: Benchmarking Differential Diagnostic Reasoning in Large Language Models Using Structured Clinical Decision Trees.

Data availability

The original case data are based on the JAMA Network Clinical Challenge collection:

https://jamanetwork.com/collections/44038/clinical-challenge

The underlying Clinical Challenge content is not included in this repository. Users must obtain appropriate access, permission, or licensing from the data source before preparing and using the input data.

Input data format

Prepare a local file named original_data.json. It should be a JSON list, where each item is one clinical case.

Required fields:

[
  {
    "question": "Clinical case question text",
    "opa": "Option A",
    "opb": "Option B",
    "opc": "Option C",
    "opd": "Option D",
    "answer_idx": "A",
    "dicussion": "Ground-truth diagnostic discussion text",
    "model_answer": "The answer and diagnostic reasoning generated by another LLM"
  }
]

Notes:

  • model_answer should be produced before running the LLM-tree generation step.

API configuration

The scripts use an OpenAI-compatible chat-completions API. The default model is:

gpt-5.4-mini

For security, do not hard-code your API key in the scripts. Either pass credentials with command-line arguments:

python gt_decisiontree_generation.py \
  --api-key YOUR_API_KEY \
  --base-url YOUR_BASE_URL

Pipeline

Run the three scripts in order.

1. Generate ground-truth decision trees

Default command:

python gt_decisiontree_generation.py \
  --api-key YOUR_API_KEY \
  --base-url YOUR_BASE_URL

Default input:

original_data.json

Default output:

gt_decisiontree_output.json

This step appends a new field to each item:

gt_tree

2. Generate decision trees from LLM answers

Default command:

python llm_decisiontree_generation.py \
  --api-key YOUR_API_KEY \
  --base-url YOUR_BASE_URL

Default input:

gt_decisiontree_output.json

Default output:

gtandllm_decisiontree_output.json

This step appends a new field to each item:

llm_tree

3. Evaluate LLM trees against ground-truth trees

Default command:

python evaluation.py \
  --api-key YOUR_API_KEY \
  --base-url YOUR_BASE_URL

Default input:

gtandllm_decisiontree_output.json

Default output:

eva_output.json

This step appends a new field to each item:

evaluation

The output file has this top-level structure:

{
  "items": [],
  "overall_mean_scores": {},
  "field_mean_scores": {}
}

Citation

If you use this code, please cite the TreeDDx paper:

TreeDDx: Benchmarking Differential Diagnostic Reasoning in Large Language Models Using Structured Clinical Decision Trees

About

Code release for TreeDDx: Benchmarking Differential Diagnostic Reasoning in Large Language Models Using Structured Clinical Decision Trees.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages