Hierarchical Reinforcement Learning for Detecting Safety and Reliability Vulnerabilities in Large Language Model-Assisted Healthcare Systems
IEEE Conference Publication | Read on IEEE Xplore
This repository contains the implementation for our IEEE-accepted paper on using Hierarchical Reinforcement Learning (HRL) to detect safety and reliability vulnerabilities in Large Language Model (LLM)-assisted healthcare systems.
As LLMs become integrated into clinical workflows, ensuring their robustness against failure modes is critical. This framework proactively identifies vulnerabilities using a hierarchical RL architecture trained on healthcare-relevant scenarios.
├── main.py # Entry point – runs training and evaluation
├── train.py # HRL training loop
├── models.py # HRL model definitions (high-level & low-level policies)
├── environment.py # Custom healthcare LLM environment
├── plot_rewards.py # Reward curve visualization
├── training_log.csv # Training metrics log
├── reward_plot.png # Reward convergence plot
├── entropy_plot.png # Entropy plot over training
└── hrlenv/ # Virtual environment (not tracked)
- Python 3.8+
- pip
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-namepython -m venv hrlenv- Windows:
hrlenv\Scripts\activate
- macOS/Linux:
source hrlenv/bin/activate
pip install -r requirements.txt
⚠️ Ifrequirements.txtis not present, install dependencies manually based on imports inmodels.pyandtrain.py.
python main.pypython plot_rewards.pyTraining metrics are saved to training_log.csv and plots are exported as PNG files.
Our HRL framework outperforms prior approaches on all metrics:
| Model | Accuracy (%) | Reward | Entropy |
|---|---|---|---|
| HRL (Ours) | 92.5 | 2450 | 0.89 |
| Flat RL (REINFORCE) | 84.3 | 1930 | 0.72 |
| Rule-Based Filter | 79.1 | 1810 | 0.55 |
Reward and entropy convergence plots are available below:
If you use this code, please cite our paper:
@inproceedings{Addobea2025hrl,
title = {Hierarchical Reinforcement Learning for Detecting Safety and Reliability Vulnerabilities in Large Language Model-Assisted Healthcare Systems},
author = {Your Name and Co-authors},
booktitle = {Proceedings of the IEEE Conference},
year = {2025}
}For questions or collaborations, feel free to reach out via LinkedIn or open a GitHub issue.
This work is published at IEEE. Please refer to the official publication for full details.


