💡 Description
Addresses a cluster of performance and memory bottlenecks that cause validate to fail or run unacceptably slowly on large PDS4 bundles (23,000+ products) and table products with millions of records. Each sub-issue targets a specific hot spot identified through profiling and user-reported failures.
Sub-Issues
NASA-PDS/validate
NASA-PDS/pds4-jparser
Acceptance Criteria
- Large bundles (23,000+ products) complete validation without
OutOfMemoryError
- Schematron XSLT compiled once per unique schematron, not once per label
- No
Pattern object allocated more than once per unique pattern string during field validation
- Table products with millions of records validate using buffered I/O, not byte-at-a-time reads
0xFF data bytes pass through RawTableReader correctly and are not treated as EOF
💡 Description
Addresses a cluster of performance and memory bottlenecks that cause validate to fail or run unacceptably slowly on large PDS4 bundles (23,000+ products) and table products with millions of records. Each sub-issue targets a specific hot spot identified through profiling and user-reported failures.
Sub-Issues
NASA-PDS/validate
HashMapgrowth and O(n)getChildTargets()scan inInMemoryRegistrar. Fixed by adding a parent→children index andAtomicIntegercount caches. PR Fix OutOfMemoryError when validating large bundles (23,000+ products) #1594 in review.Transformerrecompiled from scratch for every label. Fixed by caching the compiledTemplatesobject; estimated 30–50% reduction in validation time for large bundles.FieldValueValidator.checkFormat()recompiled viaString.matches()on every field of every record. Fix: promote tostatic final Patternconstants.NASA-PDS/pds4-jparser
RawTableReader.readNextLine()reads one byte at a time viaByteWiseFileAccessor, resulting in hundreds of millions of individualByteBuffer.get()calls for large tables. Fix: 8 KB internal read buffer with bulkreadBytes().0xFFdata bytes sign-extend toint−1 and are silently treated as EOF, truncating table reads.Acceptance Criteria
OutOfMemoryErrorPatternobject allocated more than once per unique pattern string during field validation0xFFdata bytes pass throughRawTableReadercorrectly and are not treated as EOF