Skip to content

Comments

Add expandForValueSet to CodeSystemProvider for SQL-backed expansion#131

Open
jmandel wants to merge 7 commits intoHealthIntersections:mainfrom
jmandel:incremental-provider-enhancements
Open

Add expandForValueSet to CodeSystemProvider for SQL-backed expansion#131
jmandel wants to merge 7 commits intoHealthIntersections:mainfrom
jmandel:incremental-provider-enhancements

Conversation

@jmandel
Copy link
Contributor

@jmandel jmandel commented Feb 21, 2026

Summary

Adds one optional method to the CodeSystemProvider interface — expandForValueSet(spec) — that lets SQL-backed providers handle ValueSet expansion in a single query instead of the per-code iterator loop.

The worker groups compose includes/excludes by code system and passes the full hull (includes, excludes, activeOnly, paging) to the provider. Providers that implement this method return an async iterable of fully-resolved entries; the worker feeds them through includeCode() for dedup, import filtering, and FHIR construction. Providers that don't implement it (e.g., SNOMED) are completely unaffected — the existing iterator-oracle path runs unchanged.

What changed

Interface (tx/cs/cs-api.js)

  • New optional expandForValueSet(spec) method with JSDoc contract
  • Returns AsyncIterable<ExpandedEntry> or null (fall back to existing path)

Worker (tx/workers/expand.js)

  • Groups includes/excludes by code system before processing
  • Calls expandForValueSet first; marks handled systems to skip framework's manual include/exclude loops
  • Paging offset/count only passed for single-system composes (multi-system gets null — framework applies paging in finalization)

RxNorm (tx/cs/cs-rxnorm.js)

  • expandForValueSet using better-sqlite3 lazy cursors
  • TTY filter → WHERE IN, STY filter → JOIN rxnsty, concepts → WHERE RXCUI IN
  • GROUP BY RXCUI when JOINs cause row multiplication
  • Excludes, activeOnly, searchText pushed into SQL

LOINC (tx/cs/cs-loinc.js)

  • expandForValueSet using better-sqlite3 lazy cursors
  • Relationship filters → JOIN on Relationships table
  • STATUS → direct WHERE, CLASSTYPE → JOIN through Properties+PropertyValues
  • LIST/answers-for → reversed relationship JOIN
  • Multiple includes → UNION ALL with GROUP BY dedup

Other

  • Perf counters (tx/perf-counters.js) for timing expandForValueSet paths
  • locateMany / filterPage stubs on SNOMED, LOINC (may be useful for future IPC)
  • Bug fixes: searchFilter arg order, validate.js messages, TxParameters.assign paging fields

Benchmark results

RxNorm (13 tests, all pass, median 37x speedup):

Test Opt Base Speedup
filter-tty-sbd 6.8ms 249ms 37x
filter-tty-in-multi 1.4ms 476ms 342x
paged-offset-100 1.3ms 228ms 177x
text-aspirin 1.8ms TIMEOUT
filter-sty-t200 217ms 1522ms 7x

LOINC (14 tests, all pass, median 25x speedup):

Test Opt Base Speedup
filter-status-active 50ms 5325ms 106x
filter-class-chem 14ms 941ms 68x
filter-scale-qn 51ms 2611ms 51x
filter-classtype-lab 82ms 2021ms 25x

LOINC jest: 37/37 pass. Replay: no regressions on captured production queries.

Design doc

See docs/PROVIDER_INTERFACE_EVOLUTION.md for full rationale, SQL strategies, paging contract, and index decisions.

…nsion

Add one optional method to the CodeSystemProvider interface that lets
SQL-backed providers handle ValueSet expansion in a single query with
LIMIT/OFFSET instead of the per-code iterator loop.

The worker groups compose includes/excludes by code system and passes
the full hull to the provider. Providers returning an iterable skip the
framework's manual include/exclude loops entirely. Providers returning
null (e.g., SNOMED) fall back to the existing path unchanged.

RxNorm implementation: TTY/STY filter mapping to SQL, GROUP BY for JOIN
dedup, exclude/activeOnly/searchText push-down via better-sqlite3 cursors.

LOINC implementation: Relationship/Property/Status filter mapping, UNION
per include with GROUP BY dedup, all using existing indexes.

Paging offset/count are only passed for single-system composes where they
are exact. Multi-system composes still get filter/exclude push-down but
the framework handles paging in finalization.

Also includes: eager context loading in RxNorm (eliminates 3x redundant
SQL per code), searchFilter arg order fixes, TxParameters.assign fix,
perf counters, and props-only-when-requested optimization.

Benchmarks: RxNorm 13/13 pass (median 37x), LOINC 14/14 pass (median 25x).
No regressions on captured production query replay.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jmandel and others added 6 commits February 20, 2026 20:10
- scripts/test-expand-for-valueset.js: RxNorm benchmark (13 tests)
- scripts/test-loinc-expand.js: LOINC benchmark (14 tests)
- scripts/replay-sampled-terminology.js: replay test harness
- test-expand-results.txt: RxNorm results (7-341x speedup)
- test-loinc-expand-results.txt: LOINC results (2-106x speedup)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the monolithic WHERE (inc1 OR inc2) AND NOT excl pattern with
UNION for includes and EXCEPT for excludes. This fixes a critical bug
where STY-based filter excludes (NOT IN with 241K-row subquery) would
block the Node.js event loop indefinitely.

Key changes:
- cs-rxnorm.js: each include → its own SELECT, combined with UNION;
  each exclude → EXCEPT SELECT. Same #buildFilterSql for both sides.
- cs-loinc.js: add filter-based exclude support (NOT IN subquery)
- import-rxnorm.module.js: add missing idx_rxnsty_rxcui index
- New test script: test-expand-cross-system.js with 20 tests covering
  richer include/exclude patterns and cross-system (RxNorm+LOINC)
  ValueSet expansions. All 20 pass, no hangs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…SOURCE env var

- RxNorm: archive lookup for retired concept codes now in SQL via UNION
  against RXNATOMARCHIVE (no JS-side conceptCodes tracking)
- Concept excludes EXCEPT against both rxnconso and archive tables
- LOINC: filter values use LP codes, exclude filters use EXCEPT
- TX_LIBRARY_SOURCE env var overrides config.json librarySource
- Test scripts use env var instead of patching config.json
- All 25 tests pass (5 RxNorm-only + 20 cross-system)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Force X_RXNCONSO_1 index on NOT EXISTS subquery to avoid partial
covering index scan that caused 25ms overhead per concept query.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
EXCEPT materializes both sides before applying LIMIT, making STY
excludes 200ms+. NOT EXISTS on the outer query lets SQLite short-circuit.

- STY-only excludes: direct rxnsty correlation (no rxnconso scan)
- TTY-only excludes: rxnconso with INDEXED BY X_RXNCONSO_1
- Mixed excludes: rxnconso JOIN rxnsty with index hint
- Concept excludes: simple NOT IN (parameterized values)
- Multi-filter excludes are conjunctive (AND), not independent

STY exclude: 21ms (was 278ms EXCEPT, infinite NOT IN)
Cross-system filter exclude: 64ms (was 315ms)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…y skipping

When an exclude has an unsupported filter property, return null to
fall back to the framework's baseline path. Previously 'continue'
silently dropped the exclude, producing wrong results.

Add two fallback test cases verifying ~1.0x speedup on unknown filters.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant