Checked for duplicates
Yes - I've already checked
π§βπ¬ User Persona(s)
Data Engineer, Node Operator
πͺ Motivation
...so that I can validate large bundles faster by eliminating redundant I/O and XML parsing that re-reads every label file from disk during the referential integrity phase.
π Additional Details
Impact: Medium
Relevant files:
src/main/java/gov/nasa/pds/tools/util/ReferentialIntegrityUtil.java (lines 711-784)
src/main/java/gov/nasa/pds/tools/validate/CrossLabelFileAreaReferenceChecker.java (lines 37-57)
Problem:
ReferentialIntegrityUtil.additionalReferentialIntegrityChecks() creates a new DocumentBuilder and re-parses every label file from disk for every label in the bundle, even though these labels were already fully parsed during the label validation pass. The DOM tree is discarded after initial validation and rebuilt from scratch. Similarly, CrossLabelFileAreaReferenceChecker.add() re-parses label XML from scratch.
Recommendation:
Cache the parsed DOM or extracted identifiers (LIDs, LIDVIDs, file references) from the initial label validation pass and reuse them during referential integrity checking.
For Internal Dev Team To Complete
Acceptance Criteria
Given a bundle with many product labels
When I perform validation including referential integrity checks
Then I expect each label file is read and parsed from disk only once
βοΈ Engineering Details
π I&T
Checked for duplicates
Yes - I've already checked
π§βπ¬ User Persona(s)
Data Engineer, Node Operator
πͺ Motivation
...so that I can validate large bundles faster by eliminating redundant I/O and XML parsing that re-reads every label file from disk during the referential integrity phase.
π Additional Details
Impact: Medium
Relevant files:
src/main/java/gov/nasa/pds/tools/util/ReferentialIntegrityUtil.java(lines 711-784)src/main/java/gov/nasa/pds/tools/validate/CrossLabelFileAreaReferenceChecker.java(lines 37-57)Problem:
ReferentialIntegrityUtil.additionalReferentialIntegrityChecks()creates a newDocumentBuilderand re-parses every label file from disk for every label in the bundle, even though these labels were already fully parsed during the label validation pass. The DOM tree is discarded after initial validation and rebuilt from scratch. Similarly,CrossLabelFileAreaReferenceChecker.add()re-parses label XML from scratch.Recommendation:
Cache the parsed DOM or extracted identifiers (LIDs, LIDVIDs, file references) from the initial label validation pass and reuse them during referential integrity checking.
For Internal Dev Team To Complete
Acceptance Criteria
Given a bundle with many product labels
When I perform validation including referential integrity checks
Then I expect each label file is read and parsed from disk only once
βοΈ Engineering Details
π I&T