This repository contains a reproducible benchmark of low-cost predictors for periodontitis classification in NHANES. The current manuscript framing is methodological: it estimates realistic performance bounds, checks calibration and missingness sensitivity, and documents why questionnaire/metabolic predictors should not be presented as a stand-alone diagnostic replacement for periodontal examination.
- Development cohort: NHANES 2011-2014 adults age 30+ with full periodontal examination,
n=9,034. - Same-source temporal validation cohort: NHANES 2009-2010,
n=5,037. - Outcome: any periodontitis versus no periodontitis using CDC/AAP case definitions.
- Primary model: calibrated soft-voting ensemble with 29 predictors after excluding treatment-seeking/reverse-causality variables.
- Secondary model: 33 predictors with the treatment-seeking variables restored for upper-bound sensitivity analysis.
- Scope: research benchmarking and risk stratification only; not diagnosis, treatment planning, or unvalidated use in non-NHANES clinical settings.
These values are the source-of-truth values enforced by scripts/check_publication_consistency.py.
| Analysis | Model | Features | AUC-ROC | PR-AUC | Notes |
|---|---|---|---|---|---|
| Internal 5-fold CV | Primary no reverse-causality | 29 | 0.6896 | 0.8240 | Main development estimate |
| Internal 5-fold CV | Secondary full-feature | 33 | 0.6996 | 0.8295 | Adds dental visit, flossing, loose teeth, and floss-missing flag |
| Same-source temporal validation | Frozen primary model on 2009-2010 | 29 | 0.6495 | 0.7727 | Same survey system, earlier cycle |
Temporal operating points for the frozen primary model:
| Threshold | Sensitivity | Specificity | PPV | NPV | Interpretation |
|---|---|---|---|---|---|
| 0.35 | 98.9% | 5.5% | 70.0% | 69.1% | High-sensitivity triage; negative screens are not definitive |
| 0.65 | 77.7% | 45.2% | 76.0% | 47.5% | More balanced but still requires clinical confirmation |
The key conclusion is deliberately modest: with these low-cost predictors, discrimination is around 0.69 internally and around 0.65 under same-source temporal validation. The observed performance is useful as a benchmark, not as proof of readiness for clinical implementation.
Set up a development environment:
make setup
source venv/bin/activateSet up the pinned publication target:
make setup-lock
source venv/bin/activateRun lightweight checks that do not require NHANES data:
make test
make consistency
make verify-submissionRun the full local reproduction after NHANES data are available:
make reproduce-fullThe legacy notebooks are retired as source-of-truth artifacts. The maintained publication surface is the script targets, result artifacts, model card, tests, and manuscript source, with consistency checks to prevent silent drift across those files.
| Path | Purpose |
|---|---|
src/labels.py |
CDC/AAP case-definition implementation and synthetic test fixtures |
src/evaluation.py |
Metrics, threshold selection, calibration, and plotting helpers |
src/publication_analysis.py |
Survey-weighted prevalence, subgroup performance, and missingness tables |
scripts/check_publication_consistency.py |
Guards canonical values and conservative publication wording |
scripts/verify_submission.py |
Runs lightweight submission-readiness gates |
scripts/reproduce_v13_primary.py |
Regenerates internal v1.3 benchmark result artifacts |
scripts/run_temporal_validation.py |
Regenerates same-source temporal validation artifacts |
scripts/04_publication_analyses.py |
Generates publication sensitivity tables from processed predictions |
scripts/06_generate_publication_figures.py |
Generates submission figures from canonical result artifacts |
results/publication_sensitivity_tables.md |
Survey-weighted prevalence and subgroup performance summary generated by the full reproduction |
figures/19_publication_performance_summary.png |
Main performance and operating-point figure |
figures/20_publication_sensitivity_summary.png |
Survey-weighted prevalence, subgroup AUC, and missingness figure |
results/ |
Saved result artifacts used by the manuscript and model card |
docs/publication/ARTICLE_DRAFT.md |
Current manuscript source |
- The validation cohort is temporally distinct but comes from the same NHANES survey system; this is not geographic or prospective clinical validation.
- The analytic cohort has high periodontitis prevalence because it is restricted to adults with full periodontal examination data.
- NHANES missingness patterns may encode survey logistics. The deployment-ready model without missingness indicators is therefore emphasized as a more realistic lower-bound benchmark.
- Subgroup and survey-weighted analyses require processed prediction tables and should be regenerated before journal submission.
- The repository is research software. Any clinical use would require independent validation, local recalibration, governance review, and workflow-specific safety assessment.
@software{barbosa_nhanes_periodontitis_benchmark,
author = {Barbosa, Francisco Teixeira and Brizuela-Velasco, Aritza and Robles Cantero, Daniel},
title = {NHANES Periodontitis Prediction Benchmark},
year = {2026},
url = {https://github.com/Tuminha/NHANES-Periodontitis-Machine-Learning-Project}
}