Tuminha · Tuminha · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026
diff --git a/.github/workflows/submission-readiness.yml b/.github/workflows/submission-readiness.yml
@@ -0,0 +1,31 @@
+name: submission-readiness
+
+on:
+  push:
+    branches: ["main", "checks/**"]
+  pull_request:
+    branches: ["main"]
+
+jobs:
+  lightweight-checks:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.14"
+
+      - name: Install pinned dependencies
+        run: make setup-lock
+
+      - name: Run unit tests
+        run: make test
+
+      - name: Check publication consistency
+        run: make consistency
+
+      - name: Run submission readiness checks
+        run: make verify-submission
diff --git a/MODEL_CARD.md b/MODEL_CARD.md
@@ -6,8 +6,8 @@
 |---|---|
 | Model name | `v1.3_primary_no_reverse_causality` |
 | Model type | Calibrated soft-voting ensemble of CatBoost, XGBoost, and LightGBM |
-| Development data | NHANES 2011-2014 adults age 30+ with full periodontal examination, `n=9,379` |
-| Same-source temporal validation data | NHANES 2009-2010, `n=5,177` |
+| Development data | NHANES 2011-2014 adults age 30+ with full periodontal examination, `n=9,034` |
+| Same-source temporal validation data | NHANES 2009-2010, `n=5,037` |
 | Outcome | Any CDC/AAP periodontitis versus no periodontitis |
 | Primary feature count | 29 predictors |
 | Secondary feature count | 33 predictors |
@@ -21,16 +21,16 @@ This model is intended for research benchmarking, methods comparison, and risk-s
 
 | Evaluation | AUC-ROC | PR-AUC | Brier | Notes |
 |---|---:|---:|---:|---|
-| Internal 5-fold CV, primary 29-feature model | 0.7172 | 0.8157 | 0.1812 | Excludes treatment-seeking variables |
-| Internal 5-fold CV, secondary 33-feature model | 0.7255 | 0.8207 | 0.1793 | Includes treatment-seeking variables |
-| Same-source temporal validation, frozen primary model | 0.6771 | 0.7735 | 0.2003 | NHANES 2009-2010 |
+| Internal 5-fold CV, primary 29-feature model | 0.6896 | 0.8240 | 0.1871 | Excludes treatment-seeking variables |
+| Internal 5-fold CV, secondary 33-feature model | 0.6996 | 0.8295 | 0.1844 | Includes treatment-seeking variables |
+| Same-source temporal validation, frozen primary model | 0.6495 | 0.7727 | 0.2023 | NHANES 2009-2010 |
 
 Temporal operating points for the frozen primary model:
 
 | Threshold | Sensitivity | Specificity | PPV | NPV | Appropriate interpretation |
 |---:|---:|---:|---:|---:|---|
-| 0.35 | 97.1% | 18.1% | 70.8% | 75.2% | High-sensitivity triage threshold; many false positives and some false negatives remain |
-| 0.65 | 82.6% | 43.3% | 74.9% | 54.9% | More balanced threshold; still not sufficient for diagnosis |
+| 0.35 | 98.9% | 5.5% | 70.0% | 69.1% | High-sensitivity triage threshold; many false positives and some false negatives remain |
+| 0.65 | 77.7% | 45.2% | 76.0% | 47.5% | More balanced threshold; still not sufficient for diagnosis |
 
 ## Feature Sets
 
@@ -58,7 +58,7 @@ The temporal validation cohort is useful because the model is frozen and evaluat
 
 Known applicability limits:
 
-- High analytic-sample prevalence, around 67-68%, limits direct PPV/NPV transfer to lower-prevalence populations.
+- High analytic-sample prevalence, around 66-72% depending on cycle and weighting, limits direct PPV/NPV transfer to lower-prevalence populations.
 - Missingness indicators may learn survey logistics, so the deployment-ready no-indicator model should be reported as a conservative benchmark.
 - Subgroup calibration and discrimination should be regenerated before journal submission using `scripts/04_publication_analyses.py`.
 - Any implementation outside NHANES-like research data requires local recalibration and independent safety assessment.
@@ -70,8 +70,8 @@ make setup-lock
 source venv/bin/activate
 make test
 make consistency
-make reproduce
-make temporal
+make verify-submission
+make reproduce-full
 ```
 
 The consistency check enforces agreement between result artifacts, README, this model card, and the manuscript source.

diff --git a/Makefile b/Makefile
@@ -1,4 +1,7 @@
-.PHONY: help setup setup-lock download process train reproduce temporal test consistency notebook clean figures lock dirs manuscript
+SHELL := /bin/bash
+PYTHON ?= ./venv/bin/python
+
+.PHONY: help setup setup-lock download process train reproduce temporal test consistency verify-submission reproduce-full notebook clean figures lock dirs manuscript
 
 help:
 	@echo "NHANES Periodontitis ML Project - Make Commands"
@@ -21,6 +24,8 @@ help:
 	@echo ""
 	@echo "Publication:"
 	@echo "  make consistency  - Check result and manuscript consistency"
+	@echo "  make verify-submission - Run lightweight submission-readiness checks"
+	@echo "  make reproduce-full - Run full local reproduction workflow"
 	@echo "  make manuscript   - Render PDF manuscript if pandoc is installed"
 	@echo "  make figures      - Generate publication figures from saved results"
 	@echo ""
@@ -44,36 +49,61 @@ setup-lock:
 
 test:
 	@echo "Running pytest unit tests..."
-	./venv/bin/python -m pytest tests/ -v --tb=short
+	$(PYTHON) -m pytest tests/ -v --tb=short
 	@echo "Tests complete"
 
 consistency:
 	@echo "Checking publication consistency..."
-	python3 scripts/check_publication_consistency.py
+	$(PYTHON) scripts/check_publication_consistency.py
 	@echo "Publication consistency checks passed"
 
+verify-submission:
+	@echo "Running submission-readiness checks..."
+	$(MAKE) test
+	$(MAKE) consistency
+	$(PYTHON) scripts/verify_submission.py
+	$(PYTHON) scripts/05_number_manuscript_lines.py
+	@echo "Submission-readiness checks complete"
+
 download:
 	@echo "Downloading NHANES data..."
-	python3 scripts/01_download_nhanes_data.py
+	$(PYTHON) scripts/01_download_nhanes_data.py
 	@echo "Download complete"
 
 process:
 	@echo "Processing and merging NHANES components..."
-	python3 scripts/02_process_nhanes_data.py
+	$(PYTHON) scripts/02_process_nhanes_data.py
 	@echo "Processing complete"
 
 train:
 	@echo "Training models..."
-	python3 scripts/03_train_models.py
+	$(PYTHON) scripts/03_train_models.py
 	@echo "Training complete"
 
 reproduce:
 	@echo "Running primary-model reproduction workflow..."
-	bash scripts/run_v13_primary.sh
+	$(PYTHON) scripts/reproduce_v13_primary.py
 
 temporal:
 	@echo "Running same-source temporal validation workflow..."
-	bash scripts/run_external_validation.sh
+	$(PYTHON) scripts/run_temporal_validation.py
+
+reproduce-full:
+	@mkdir -p logs
+	@set -euo pipefail; \
+	LOG="logs/full_reproduction_$$(date -u +%Y%m%dT%H%M%SZ).log"; \
+	echo "Writing full reproduction log to $$LOG"; \
+	{ \
+		$(MAKE) download; \
+		$(MAKE) process; \
+		$(MAKE) reproduce; \
+		$(MAKE) temporal; \
+		$(PYTHON) scripts/04_publication_analyses.py \
+			--input data/processed/publication_predictions.parquet \
+			--feature-cols age bmi waist_cm waist_height height_cm systolic_bp diastolic_bp glucose triglycerides hdl; \
+		$(MAKE) consistency; \
+		$(MAKE) verify-submission; \
+	} 2>&1 | tee "$$LOG"
 
 notebook:
 	@echo "Launching Jupyter notebook..."
@@ -85,7 +115,7 @@ figures:
 
 manuscript:
 	@echo "Rendering manuscript if pandoc is installed..."
-	python3 scripts/05_number_manuscript_lines.py
+	$(PYTHON) scripts/05_number_manuscript_lines.py
 	@if command -v pandoc >/dev/null 2>&1; then \
 		mkdir -p reports; \
 		pandoc docs/publication/ARTICLE_DRAFT.md \

diff --git a/README.md b/README.md
@@ -4,8 +4,8 @@ This repository contains a reproducible benchmark of low-cost predictors for per
 
 ## Current Study Framing
 
-- Development cohort: NHANES 2011-2014 adults age 30+ with full periodontal examination, `n=9,379`.
-- Same-source temporal validation cohort: NHANES 2009-2010, `n=5,177`.
+- Development cohort: NHANES 2011-2014 adults age 30+ with full periodontal examination, `n=9,034`.
+- Same-source temporal validation cohort: NHANES 2009-2010, `n=5,037`.
 - Outcome: any periodontitis versus no periodontitis using CDC/AAP case definitions.
 - Primary model: calibrated soft-voting ensemble with 29 predictors after excluding treatment-seeking/reverse-causality variables.
 - Secondary model: 33 predictors with the treatment-seeking variables restored for upper-bound sensitivity analysis.
@@ -17,18 +17,18 @@ These values are the source-of-truth values enforced by `scripts/check_publicati
 
 | Analysis | Model | Features | AUC-ROC | PR-AUC | Notes |
 |---|---:|---:|---:|---:|---|
-| Internal 5-fold CV | Primary no reverse-causality | 29 | 0.7172 | 0.8157 | Main development estimate |
-| Internal 5-fold CV | Secondary full-feature | 33 | 0.7255 | 0.8207 | Adds dental visit, flossing, loose teeth, and floss-missing flag |
-| Same-source temporal validation | Frozen primary model on 2009-2010 | 29 | 0.6771 | 0.7735 | Same survey system, earlier cycle |
+| Internal 5-fold CV | Primary no reverse-causality | 29 | 0.6896 | 0.8240 | Main development estimate |
+| Internal 5-fold CV | Secondary full-feature | 33 | 0.6996 | 0.8295 | Adds dental visit, flossing, loose teeth, and floss-missing flag |
+| Same-source temporal validation | Frozen primary model on 2009-2010 | 29 | 0.6495 | 0.7727 | Same survey system, earlier cycle |
 
 Temporal operating points for the frozen primary model:
 
 | Threshold | Sensitivity | Specificity | PPV | NPV | Interpretation |
 |---:|---:|---:|---:|---:|---|
-| 0.35 | 97.1% | 18.1% | 70.8% | 75.2% | High-sensitivity triage; negative screens are not definitive |
-| 0.65 | 82.6% | 43.3% | 74.9% | 54.9% | More balanced but still requires clinical confirmation |
+| 0.35 | 98.9% | 5.5% | 70.0% | 69.1% | High-sensitivity triage; negative screens are not definitive |
+| 0.65 | 77.7% | 45.2% | 76.0% | 47.5% | More balanced but still requires clinical confirmation |
 
-The key conclusion is deliberately modest: with these low-cost predictors, discrimination is around 0.72 internally and around 0.68 under same-source temporal validation. The observed performance is useful as a benchmark, not as proof of readiness for clinical implementation.
+The key conclusion is deliberately modest: with these low-cost predictors, discrimination is around 0.69 internally and around 0.65 under same-source temporal validation. The observed performance is useful as a benchmark, not as proof of readiness for clinical implementation.
 
 ## Reproducibility
 
@@ -51,18 +51,13 @@ Run lightweight checks that do not require NHANES data:
 ```bash
 make test
 make consistency
+make verify-submission
 ```
 
-Run the full workflows after NHANES data are available:
+Run the full local reproduction after NHANES data are available:
 
 ```bash
-make download
-make process
-make reproduce
-make temporal
-python3 scripts/04_publication_analyses.py \
-  --input data/processed/publication_predictions.parquet \
-  --feature-cols age bmi waist_cm systolic_bp diastolic_bp glucose triglycerides hdl
+make reproduce-full
 ```
 
 The legacy notebooks are retired as source-of-truth artifacts. The maintained publication surface is the script targets, result artifacts, model card, tests, and manuscript source, with consistency checks to prevent silent drift across those files.
@@ -75,7 +70,11 @@ The legacy notebooks are retired as source-of-truth artifacts. The maintained pu
 | `src/evaluation.py` | Metrics, threshold selection, calibration, and plotting helpers |
 | `src/publication_analysis.py` | Survey-weighted prevalence, subgroup performance, and missingness tables |
 | `scripts/check_publication_consistency.py` | Guards canonical values and conservative publication wording |
+| `scripts/verify_submission.py` | Runs lightweight submission-readiness gates |
+| `scripts/reproduce_v13_primary.py` | Regenerates internal v1.3 benchmark result artifacts |
+| `scripts/run_temporal_validation.py` | Regenerates same-source temporal validation artifacts |
 | `scripts/04_publication_analyses.py` | Generates publication sensitivity tables from processed predictions |
+| `results/publication_sensitivity_tables.md` | Survey-weighted prevalence and subgroup performance summary generated by the full reproduction |
 | `results/` | Saved result artifacts used by the manuscript and model card |
 | `docs/publication/ARTICLE_DRAFT.md` | Current manuscript source |