GitHub - AdamSkog/Datathon-DataMed

Datathon 2025

Team DataMed's submission for the DubsTech Datathon 2025.
Report Bug · Request Feature

About the Project

Team DataMed's submission for the 2025 Datathon hosted by DubsTech at the University of Washington. We have chosen the prompt of Health: Drug Overdose in USA on the Machine Learning track.

Summary of the Project

We focused on Health: Analyzing the New York State of Health as our prompt for the Datathon. We used the dataset provided by the prompt to analyze the data and build multimple machine learning models to:

Predict the following for a Specific Hospital and DRG Type for the next 1 year:
- Discharges
- Median Costs
- Median Charges
Predict the total expected discharges for each DRG Type for the State of New York
Predict the mean cost of a discharge using features like:
- Hospital
- APR DRG code
- Severity of illness
- Year
- Medical/Surgical classification.

1. `meancost.ipynb`

Mean Cost Prediction of Inpatient Discharges This notebook predicts the Mean Cost of inpatient hospital discharges based on historical data. It uses features like Facility Name, APR DRG Code, Severity of Illness, and Medical/Surgical Description. Several machine learning models are trained, including Linear Regression, Decision Trees, Random Forests, and XGBoost, to identify trends and minimize discharge costs. The goal is to deliver actionable insights for hospital cost management.

Metric	XGBoost	Random Forest	Decision Tree	Linear Regression
MAE (Mean Absolute Error)	6169.16 (Best)	7687.41	7984.86	10624.81 (Worst)
RMSE (Root Mean Squared Error)	11298.45 (Best)	13066.42	13461.71	16555.23 (Worst)
MSE (Mean Squared Error)	1.28×10⁸ (Best)	1.71×10⁸	1.81×10⁸	2.74×10⁸ (Worst)
R² Score	0.6576 (Best)	0.5420	0.5139	0.2648 (Lowest)
Explained Variance Score	0.6576 (Best)	0.5420	0.5139	0.2648 (Lowest)
Best Params	`learning_rate=0.0317, max_depth=..., n_estimators=...`	`n_estimators=175, max_depth=10, min_samples_split=5, min_samples_leaf=1`	`max_depth=9, min_samples_split=8, min_samples_leaf=1`	`{}`

2. `drg_discharge_pred.ipynb`

DRG Discharge Volume Prediction for New York State This notebook forecasts the total expected discharges for each Diagnosis-Related Group (DRG) type across New York State. After cleaning and preparing the dataset, it applies machine learning models such as Random Forest, Prophet, XGBoost, and LightGBM. Model performance is evaluated using MAE and RMSE to select the best discharge prediction approach, supporting resource and capacity planning in healthcare settings.

Metric	Random Forest	Prophet	XGBoost	LightGBM
MAE (Mean Absolute Error)	798.65 (Best)	1375.26 (Second Best)	~4800–5000 (High)	~4800–5000 (High)
RMSE (Root Mean Squared Error)	1725.33 (Best)	3669.35 (Second Best)	17492.90 (Extremely High)	Not listed (assumed high)
MAPE (Mean Absolute Percentage Error)	22.21% (Best)	36.17% (Second Best)	>300% (Very High)	>300% (Very High)

3. `1yearpred.ipynb`

One-Year Forecast of Hospital Metrics This notebook predicts three key metrics, Discharges, Median Costs, and Median Charges, for the next year for a specific hospital and DRG type. Time series models (ARIMA and Auto-ARIMA) are employed to understand historical trends and generate reliable forecasts. These insights assist hospitals in strategic planning, budgeting, and operational management.

Metric	Predicted Value (2022)	Model Used
Discharges	119.74	ARIMA
Median Cost	$11,940.99	ARIMA
Median Charges	$60,310.68	ARIMA

Built With

Contact

Adam Skoglund - @AdamSkog - ajskog@uw.edu
Swastik Singh - @swassingh - swas@uw.edu
Navneeth Dhamotharan - @Navneethd8 - nd17@uw.edu

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Data Info Files		Data Info Files
data		data
imgs		imgs
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datathon 2025

Table of Contents

About the Project

Summary of the Project

1. `meancost.ipynb`

2. `drg_discharge_pred.ipynb`

3. `1yearpred.ipynb`

Built With

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Datathon 2025

Table of Contents

About the Project

Summary of the Project

1. meancost.ipynb

2. drg_discharge_pred.ipynb

3. 1yearpred.ipynb

Built With

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `meancost.ipynb`

2. `drg_discharge_pred.ipynb`

3. `1yearpred.ipynb`

Packages