Skip to content

Tuminha/NHANES-Periodontitis-Machine-Learning-Project

Repository files navigation

NHANES Periodontitis Prediction Benchmark

This repository contains a reproducible benchmark of low-cost predictors for periodontitis classification in NHANES. The current manuscript framing is methodological: it estimates realistic performance bounds, checks calibration and missingness sensitivity, and documents why questionnaire/metabolic predictors should not be presented as a stand-alone diagnostic replacement for periodontal examination.

Current Study Framing

  • Development cohort: NHANES 2011-2014 adults age 30+ with full periodontal examination, n=9,034.
  • Same-source temporal validation cohort: NHANES 2009-2010, n=5,037.
  • Outcome: any periodontitis versus no periodontitis using CDC/AAP case definitions.
  • Primary model: calibrated soft-voting ensemble with 29 predictors after excluding treatment-seeking/reverse-causality variables.
  • Secondary model: 33 predictors with the treatment-seeking variables restored for upper-bound sensitivity analysis.
  • Scope: research benchmarking and risk stratification only; not diagnosis, treatment planning, or unvalidated use in non-NHANES clinical settings.

Canonical Results

These values are the source-of-truth values enforced by scripts/check_publication_consistency.py.

Analysis Model Features AUC-ROC PR-AUC Notes
Internal 5-fold CV Primary no reverse-causality 29 0.6896 0.8240 Main development estimate
Internal 5-fold CV Secondary full-feature 33 0.6996 0.8295 Adds dental visit, flossing, loose teeth, and floss-missing flag
Same-source temporal validation Frozen primary model on 2009-2010 29 0.6495 0.7727 Same survey system, earlier cycle

Temporal operating points for the frozen primary model:

Threshold Sensitivity Specificity PPV NPV Interpretation
0.35 98.9% 5.5% 70.0% 69.1% High-sensitivity triage; negative screens are not definitive
0.65 77.7% 45.2% 76.0% 47.5% More balanced but still requires clinical confirmation

The key conclusion is deliberately modest: with these low-cost predictors, discrimination is around 0.69 internally and around 0.65 under same-source temporal validation. The observed performance is useful as a benchmark, not as proof of readiness for clinical implementation.

Reproducibility

Set up a development environment:

make setup
source venv/bin/activate

Set up the pinned publication target:

make setup-lock
source venv/bin/activate

Run lightweight checks that do not require NHANES data:

make test
make consistency
make verify-submission

Run the full local reproduction after NHANES data are available:

make reproduce-full

The legacy notebooks are retired as source-of-truth artifacts. The maintained publication surface is the script targets, result artifacts, model card, tests, and manuscript source, with consistency checks to prevent silent drift across those files.

Repository Structure

Path Purpose
src/labels.py CDC/AAP case-definition implementation and synthetic test fixtures
src/evaluation.py Metrics, threshold selection, calibration, and plotting helpers
src/publication_analysis.py Survey-weighted prevalence, subgroup performance, and missingness tables
scripts/check_publication_consistency.py Guards canonical values and conservative publication wording
scripts/verify_submission.py Runs lightweight submission-readiness gates
scripts/reproduce_v13_primary.py Regenerates internal v1.3 benchmark result artifacts
scripts/run_temporal_validation.py Regenerates same-source temporal validation artifacts
scripts/04_publication_analyses.py Generates publication sensitivity tables from processed predictions
scripts/06_generate_publication_figures.py Generates submission figures from canonical result artifacts
results/publication_sensitivity_tables.md Survey-weighted prevalence and subgroup performance summary generated by the full reproduction
figures/19_publication_performance_summary.png Main performance and operating-point figure
figures/20_publication_sensitivity_summary.png Survey-weighted prevalence, subgroup AUC, and missingness figure
results/ Saved result artifacts used by the manuscript and model card
docs/publication/ARTICLE_DRAFT.md Current manuscript source

Known Limitations

  • The validation cohort is temporally distinct but comes from the same NHANES survey system; this is not geographic or prospective clinical validation.
  • The analytic cohort has high periodontitis prevalence because it is restricted to adults with full periodontal examination data.
  • NHANES missingness patterns may encode survey logistics. The deployment-ready model without missingness indicators is therefore emphasized as a more realistic lower-bound benchmark.
  • Subgroup and survey-weighted analyses require processed prediction tables and should be regenerated before journal submission.
  • The repository is research software. Any clinical use would require independent validation, local recalibration, governance review, and workflow-specific safety assessment.

Citation

@software{barbosa_nhanes_periodontitis_benchmark,
  author = {Barbosa, Francisco Teixeira and Brizuela-Velasco, Aritza and Robles Cantero, Daniel},
  title = {NHANES Periodontitis Prediction Benchmark},
  year = {2026},
  url = {https://github.com/Tuminha/NHANES-Periodontitis-Machine-Learning-Project}
}

About

Machine learning project using NHANES dataset to predict periodontitis prevalence and severity with model evaluation.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors