Experimental explorer for the CMS National Provider Directory (NPD) public use files.
Live: https://ainpi.vercel.app
Work in progress. AINPI is research/educational. Data may be incomplete, stale, or incorrect. Every number should be verified against primary sources before any business or clinical decision. See the
/insightspage for a full provenance analysis.
CMS released the National Provider Directory as FHIR R4 NDJSON public use files from directory.cms.gov — 21.7M records across 6 resource types (Practitioner, PractitionerRole, Organization, OrganizationAffiliation, Location, Endpoint) in the May 2026-05-08 release. AINPI:
- Ingests the full dataset into Google BigQuery, then runs ~30 pre-registered findings (H1–H39) against both the directory itself and federal claims/payment data (Medicaid, Medicare Part B + Part D, NPPES-deactivated × billing, Open Payments, DMEPOS, nursing-home ownership disclosures)
- Serves an interactive US choropleth at
/with a 3-style theme switcher, plus per-state CMO-facing pages built for the state Medicaid Director-letter response window - Cross-audits federal exclusion lists (OIG LEIE + SAM.gov) against the directory, against MMIS-flagged providers, and against federal claims data — closing 3 of the 4 federal database checks the 2026-04-23 CMS State Medicaid Director letter requires (NPPES + LEIE + SAM; SSA-DMF stays restricted)
- 2026-05-18 · PECOS-as-authoritative workstream (H37–H39). CMS designated PECOS as authoritative for Medicare enrollment. State Medicaid systems must demonstrate alignment under the 2026 verification rules. AINPI pre-registered three findings: PECOS PROVIDER_TYPE vs NPPES NUCC taxonomy disagreement (H37), the behavioral-health subset (H38, highest recoupment risk), and multi-state-enrollment NPIs with conflicting addresses (H39). See
/pecos. - 2026-05-17 · Map-first homepage.
/is now an interactive US choropleth with 3 selectable styles (Light cards / Dark dashboard / Minimal map). Click a state for an inline side panel with the 5 claims-side findings, CSV download, and primary-source NPI verification. - 2026-05-15 · For state Medicaid CMOs. New
/for-state-medicaid/<state>per-state pages built for the state Medicaid CMO listserve audience. Count-and-action lede, no H-numbers, citation-ready for SMD-letter Elements 2 + 4. - 2026-05-14 · Claims-side cross-audit shipped for all 50 states + DC + PR. H29–H36 — Medicaid spending, Medicare Part B / Part D, NPPES-deactivated × billing, Open Payments, DMEPOS, nursing-home ownership (Stage B via the CMS PPEF cross-walk), NDH completeness. 99.99984% NDH completeness against material Medicare Part B billers.
- 2026-05-08 · NDH May release ingested. First release-to-release deltas published. Endpoint −73%, Location −61%, OrgAffiliation +147% vs April. Two source-side schema breaks AINPI caught and patched.
| Path | What it is |
|---|---|
/ |
Map-first homepage. Click any state for inline findings. 3-style theme switcher. |
/for-state-medicaid |
Index of per-state CMO-facing pages. Forwardable for the SMD-letter response window. |
/findings |
Pre-registered findings (H1–H39). Each states null hypothesis + denominator before numbers drop |
/methodology |
Versioned audit methodology — DAMA DMBOK mapping, L0–L7 scoring, reproducibility commands |
/pecos |
PECOS-as-authoritative-source brief — implications of the 2026 verification rules |
/smd-revalidation |
Citation-ready language for the 2026-05-23 SMD-letter response (Elements 1–5) |
/data-quality |
D3 dashboard: choropleth, sankey, knowledge graph, drill-down, validation |
/insights |
Provenance + variance analysis (NPD vs published org numbers) |
/provider-search |
Real-time search against live payer FHIR directories |
/magic-scanner |
AI-augmented provider discovery |
/npd |
Public search by NPI, name, organization, state, city |
Static JSON, CDN-cached, safe to depend on across releases:
/api/v1/stats.json— site-wide counters, methodology version, commit SHA/api/v1/findings/<slug>.json— one per finding (types)
Breaking changes bump the path (/api/v2/), not the shape in place.
Public roadmap lives in GitHub Issues tagged roadmap, grouped into three milestones:
| Milestone | Scope |
|---|---|
v1.1 |
Data refinements on existing findings — phonetic name match, dual-board atlas, per-state drill-downs, methodology v1.0 prose |
v1.2 |
New findings — H16 address geocoding, H17 USPS drift, H19/H20 state scale, weekly endpoint-crawl host |
v2.0 |
Expansion beyond NPD's current 6 resources — blocked until CMS ships InsurancePlan / HealthcareService / Network / Verification |
Contributions welcome on any issue. File a new one using the issue templates.
| Repo | Scope |
|---|---|
FHIR-IQ/ainpi-probe |
FHIR endpoint liveness crawler (L0–L7). Runs separately from the site so operators can audit the code that hits their endpoints. |
FHIR-IQ/ainpi-examples |
Python + DuckDB usage examples for the /api/v1/* contract. |
frontend/ Next.js 14 app — routes, API, charts, tests
pipeline/ DuckDB-over-Parquet validation pipeline (shard, edges, NPI Luhn, temporal)
docs/methodology/ Versioned methodology doc, rendered at /methodology
.github/ CI, CodeQL, dependabot, issue + PR templates, release workflow
┌────────────────────────────────┐
│ directory.cms.gov │
│ 6 NDJSON.zst files, 2.8 GB │
└──────────────┬─────────────────┘
│ scripts/ingest-cms-npd.ts
▼
┌────────────────────────────────┐
│ BigQuery (cms_npd dataset) │
│ resource:JSON + _* flat fields │
│ 27.2M rows + 5 analytics views │
└──────┬─────────────────────┬───┘
│ │
live query │ │ scripts/sync-bq-to-supabase.ts
│ │ (nightly aggregation)
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ Next.js API │ │ Supabase Postgres│
│ routes │◄────┤ Prisma ORM │
│ on Vercel │ │ pre-agg metrics │
└──────┬───────┘ │ user auth │
│ └──────────────────┘
▼
┌────────────────────────────────┐
│ React + D3 dashboard │
│ FilterContext cross-filtering │
└────────────────────────────────┘
Why this split: BigQuery costs <$1/mo to hold 40 GB of FHIR JSON and gives free-tier-friendly analytics. Supabase is where the app's hot-path queries and auth data live. Pre-aggregations are synced nightly so the dashboard doesn't hit BigQuery on every page load.
cd frontend
npm install
cp .env.example .env.local # fill in Supabase + GCP values
npm run db:push # push Prisma schema to Supabase
npm run dev # http://localhost:3000To reload the NPD warehouse (only needed when CMS publishes a new release):
npm run bq:setup # Create dataset + tables + views (idempotent)
npm run bq:ingest # Download from directory.cms.gov, stream into BigQuery
npm run bq:sync # Aggregate BigQuery → Supabase metricsnpm run test # Vitest — 62 unit tests
npm run test:e2e # Playwright — 15 E2E specsCovers FHIR reference extraction, API parameter parsing, validation contract, filter context hierarchy, NPI/URL regex, BigQuery schema, dashboard dropdown interactions, and search.
- CLAUDE.md — Architecture + developer reference
- DATABASE_SETUP.md — Supabase + Prisma + BigQuery setup walkthrough
- CMS National Provider Directory
- HTE Data Release Specifications
- NDH FHIR IG STU1 v1.0.0 (published) · STU2 CI build (tracked for upcoming changes; not authoritative)