Add expandForValueSet to CodeSystemProvider for SQL-backed expansion#131
Open
jmandel wants to merge 7 commits intoHealthIntersections:mainfrom
Open
Add expandForValueSet to CodeSystemProvider for SQL-backed expansion#131jmandel wants to merge 7 commits intoHealthIntersections:mainfrom
jmandel wants to merge 7 commits intoHealthIntersections:mainfrom
Conversation
…nsion Add one optional method to the CodeSystemProvider interface that lets SQL-backed providers handle ValueSet expansion in a single query with LIMIT/OFFSET instead of the per-code iterator loop. The worker groups compose includes/excludes by code system and passes the full hull to the provider. Providers returning an iterable skip the framework's manual include/exclude loops entirely. Providers returning null (e.g., SNOMED) fall back to the existing path unchanged. RxNorm implementation: TTY/STY filter mapping to SQL, GROUP BY for JOIN dedup, exclude/activeOnly/searchText push-down via better-sqlite3 cursors. LOINC implementation: Relationship/Property/Status filter mapping, UNION per include with GROUP BY dedup, all using existing indexes. Paging offset/count are only passed for single-system composes where they are exact. Multi-system composes still get filter/exclude push-down but the framework handles paging in finalization. Also includes: eager context loading in RxNorm (eliminates 3x redundant SQL per code), searchFilter arg order fixes, TxParameters.assign fix, perf counters, and props-only-when-requested optimization. Benchmarks: RxNorm 13/13 pass (median 37x), LOINC 14/14 pass (median 25x). No regressions on captured production query replay. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
417330e to
e300d0d
Compare
- scripts/test-expand-for-valueset.js: RxNorm benchmark (13 tests) - scripts/test-loinc-expand.js: LOINC benchmark (14 tests) - scripts/replay-sampled-terminology.js: replay test harness - test-expand-results.txt: RxNorm results (7-341x speedup) - test-loinc-expand-results.txt: LOINC results (2-106x speedup) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the monolithic WHERE (inc1 OR inc2) AND NOT excl pattern with UNION for includes and EXCEPT for excludes. This fixes a critical bug where STY-based filter excludes (NOT IN with 241K-row subquery) would block the Node.js event loop indefinitely. Key changes: - cs-rxnorm.js: each include → its own SELECT, combined with UNION; each exclude → EXCEPT SELECT. Same #buildFilterSql for both sides. - cs-loinc.js: add filter-based exclude support (NOT IN subquery) - import-rxnorm.module.js: add missing idx_rxnsty_rxcui index - New test script: test-expand-cross-system.js with 20 tests covering richer include/exclude patterns and cross-system (RxNorm+LOINC) ValueSet expansions. All 20 pass, no hangs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…SOURCE env var - RxNorm: archive lookup for retired concept codes now in SQL via UNION against RXNATOMARCHIVE (no JS-side conceptCodes tracking) - Concept excludes EXCEPT against both rxnconso and archive tables - LOINC: filter values use LP codes, exclude filters use EXCEPT - TX_LIBRARY_SOURCE env var overrides config.json librarySource - Test scripts use env var instead of patching config.json - All 25 tests pass (5 RxNorm-only + 20 cross-system) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Force X_RXNCONSO_1 index on NOT EXISTS subquery to avoid partial covering index scan that caused 25ms overhead per concept query. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
EXCEPT materializes both sides before applying LIMIT, making STY excludes 200ms+. NOT EXISTS on the outer query lets SQLite short-circuit. - STY-only excludes: direct rxnsty correlation (no rxnconso scan) - TTY-only excludes: rxnconso with INDEXED BY X_RXNCONSO_1 - Mixed excludes: rxnconso JOIN rxnsty with index hint - Concept excludes: simple NOT IN (parameterized values) - Multi-filter excludes are conjunctive (AND), not independent STY exclude: 21ms (was 278ms EXCEPT, infinite NOT IN) Cross-system filter exclude: 64ms (was 315ms) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…y skipping When an exclude has an unsupported filter property, return null to fall back to the framework's baseline path. Previously 'continue' silently dropped the exclude, producing wrong results. Add two fallback test cases verifying ~1.0x speedup on unknown filters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds one optional method to the
CodeSystemProviderinterface —expandForValueSet(spec)— that lets SQL-backed providers handle ValueSet expansion in a single query instead of the per-code iterator loop.The worker groups compose includes/excludes by code system and passes the full hull (includes, excludes, activeOnly, paging) to the provider. Providers that implement this method return an async iterable of fully-resolved entries; the worker feeds them through
includeCode()for dedup, import filtering, and FHIR construction. Providers that don't implement it (e.g., SNOMED) are completely unaffected — the existing iterator-oracle path runs unchanged.What changed
Interface (
tx/cs/cs-api.js)expandForValueSet(spec)method with JSDoc contractAsyncIterable<ExpandedEntry>ornull(fall back to existing path)Worker (
tx/workers/expand.js)expandForValueSetfirst; marks handled systems to skip framework's manual include/exclude loopsoffset/countonly passed for single-system composes (multi-system getsnull— framework applies paging in finalization)RxNorm (
tx/cs/cs-rxnorm.js)expandForValueSetusingbetter-sqlite3lazy cursorsWHERE IN, STY filter →JOIN rxnsty, concepts →WHERE RXCUI INGROUP BY RXCUIwhen JOINs cause row multiplicationLOINC (
tx/cs/cs-loinc.js)expandForValueSetusingbetter-sqlite3lazy cursorsOther
tx/perf-counters.js) for timing expandForValueSet pathslocateMany/filterPagestubs on SNOMED, LOINC (may be useful for future IPC)searchFilterarg order,validate.jsmessages,TxParameters.assignpaging fieldsBenchmark results
RxNorm (13 tests, all pass, median 37x speedup):
LOINC (14 tests, all pass, median 25x speedup):
LOINC jest: 37/37 pass. Replay: no regressions on captured production queries.
Design doc
See
docs/PROVIDER_INTERFACE_EVOLUTION.mdfor full rationale, SQL strategies, paging contract, and index decisions.