TreeDDx

Code release for TreeDDx: Benchmarking Differential Diagnostic Reasoning in Large Language Models Using Structured Clinical Decision Trees.

Data availability

The original case data are based on the JAMA Network Clinical Challenge collection:

https://jamanetwork.com/collections/44038/clinical-challenge

The underlying Clinical Challenge content is not included in this repository. Users must obtain appropriate access, permission, or licensing from the data source before preparing and using the input data.

Input data format

Prepare a local file named original_data.json. It should be a JSON list, where each item is one clinical case.

Required fields:

[
  {
    "question": "Clinical case question text",
    "opa": "Option A",
    "opb": "Option B",
    "opc": "Option C",
    "opd": "Option D",
    "answer_idx": "A",
    "dicussion": "Ground-truth diagnostic discussion text",
    "model_answer": "The answer and diagnostic reasoning generated by another LLM"
  }
]

Notes:

model_answer should be produced before running the LLM-tree generation step.

API configuration

The scripts use an OpenAI-compatible chat-completions API. The default model is:

gpt-5.4-mini

For security, do not hard-code your API key in the scripts. Either pass credentials with command-line arguments:

python gt_decisiontree_generation.py \
  --api-key YOUR_API_KEY \
  --base-url YOUR_BASE_URL

Pipeline

Run the three scripts in order.

1. Generate ground-truth decision trees

Default command:

python gt_decisiontree_generation.py \
  --api-key YOUR_API_KEY \
  --base-url YOUR_BASE_URL

Default input:

original_data.json

Default output:

gt_decisiontree_output.json

This step appends a new field to each item:

gt_tree

2. Generate decision trees from LLM answers

Default command:

python llm_decisiontree_generation.py \
  --api-key YOUR_API_KEY \
  --base-url YOUR_BASE_URL

Default input:

gt_decisiontree_output.json

Default output:

gtandllm_decisiontree_output.json

This step appends a new field to each item:

llm_tree

3. Evaluate LLM trees against ground-truth trees

Default command:

python evaluation.py \
  --api-key YOUR_API_KEY \
  --base-url YOUR_BASE_URL

Default input:

gtandllm_decisiontree_output.json

Default output:

eva_output.json

This step appends a new field to each item:

evaluation

The output file has this top-level structure:

{
  "items": [],
  "overall_mean_scores": {},
  "field_mean_scores": {}
}

Citation

If you use this code, please cite the TreeDDx paper:

TreeDDx: Benchmarking Differential Diagnostic Reasoning in Large Language Models Using Structured Clinical Decision Trees

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
TreeDDx Evaluation.png		TreeDDx Evaluation.png
TreeDDx Results.png		TreeDDx Results.png
evaluation.py		evaluation.py
gt_decisiontree_generation.py		gt_decisiontree_generation.py
llm_decisiontree_generation.py		llm_decisiontree_generation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TreeDDx

Data availability

Input data format

API configuration

Pipeline

1. Generate ground-truth decision trees

2. Generate decision trees from LLM answers

3. Evaluate LLM trees against ground-truth trees

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TreeDDx

Data availability

Input data format

API configuration

Pipeline

1. Generate ground-truth decision trees

2. Generate decision trees from LLM answers

3. Evaluate LLM trees against ground-truth trees

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages