Team DataMed's submission for the DubsTech Datathon 2025.
Report Bug
·
Request Feature
Team DataMed's submission for the 2025 Datathon hosted by DubsTech at the University of Washington. We have chosen the prompt of Health: Drug Overdose in USA on the Machine Learning track.
We focused on Health: Analyzing the New York State of Health as our prompt for the Datathon. We used the dataset provided by the prompt to analyze the data and build multimple machine learning models to:
- Predict the following for a Specific Hospital and DRG Type for the next 1 year:
- Discharges
- Median Costs
- Median Charges
- Predict the total expected discharges for each DRG Type for the State of New York
- Predict the mean cost of a discharge using features like:
- Hospital
- APR DRG code
- Severity of illness
- Year
- Medical/Surgical classification.
Mean Cost Prediction of Inpatient Discharges This notebook predicts the Mean Cost of inpatient hospital discharges based on historical data. It uses features like Facility Name, APR DRG Code, Severity of Illness, and Medical/Surgical Description. Several machine learning models are trained, including Linear Regression, Decision Trees, Random Forests, and XGBoost, to identify trends and minimize discharge costs. The goal is to deliver actionable insights for hospital cost management.
| Metric | XGBoost | Random Forest | Decision Tree | Linear Regression |
|---|---|---|---|---|
| MAE (Mean Absolute Error) | 6169.16 (Best) | 7687.41 | 7984.86 | 10624.81 (Worst) |
| RMSE (Root Mean Squared Error) | 11298.45 (Best) | 13066.42 | 13461.71 | 16555.23 (Worst) |
| MSE (Mean Squared Error) | 1.28×10⁸ (Best) | 1.71×10⁸ | 1.81×10⁸ | 2.74×10⁸ (Worst) |
| R² Score | 0.6576 (Best) | 0.5420 | 0.5139 | 0.2648 (Lowest) |
| Explained Variance Score | 0.6576 (Best) | 0.5420 | 0.5139 | 0.2648 (Lowest) |
| Best Params | learning_rate=0.0317, max_depth=..., n_estimators=... |
n_estimators=175, max_depth=10, min_samples_split=5, min_samples_leaf=1 |
max_depth=9, min_samples_split=8, min_samples_leaf=1 |
{} |
DRG Discharge Volume Prediction for New York State This notebook forecasts the total expected discharges for each Diagnosis-Related Group (DRG) type across New York State. After cleaning and preparing the dataset, it applies machine learning models such as Random Forest, Prophet, XGBoost, and LightGBM. Model performance is evaluated using MAE and RMSE to select the best discharge prediction approach, supporting resource and capacity planning in healthcare settings.
| Metric | Random Forest | Prophet | XGBoost | LightGBM |
|---|---|---|---|---|
| MAE (Mean Absolute Error) | 798.65 (Best) | 1375.26 (Second Best) | ~4800–5000 (High) | ~4800–5000 (High) |
| RMSE (Root Mean Squared Error) | 1725.33 (Best) | 3669.35 (Second Best) | 17492.90 (Extremely High) | Not listed (assumed high) |
| MAPE (Mean Absolute Percentage Error) | 22.21% (Best) | 36.17% (Second Best) | >300% (Very High) | >300% (Very High) |
One-Year Forecast of Hospital Metrics This notebook predicts three key metrics, Discharges, Median Costs, and Median Charges, for the next year for a specific hospital and DRG type. Time series models (ARIMA and Auto-ARIMA) are employed to understand historical trends and generate reliable forecasts. These insights assist hospitals in strategic planning, budgeting, and operational management.
| Metric | Predicted Value (2022) | Model Used |
|---|---|---|
| Discharges | 119.74 | ARIMA |
| Median Cost | $11,940.99 | ARIMA |
| Median Charges | $60,310.68 | ARIMA |
- Adam Skoglund - @AdamSkog - ajskog@uw.edu
- Swastik Singh - @swassingh - swas@uw.edu
- Navneeth Dhamotharan - @Navneethd8 - nd17@uw.edu