Skip to content

Code improvements re:value.trim() and Table_Delimited callsΒ #1597

@jordanpadams

Description

@jordanpadams

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Data Engineer, Node Operator

πŸ’ͺ Motivation

...so that I can validate large Table_Character and Table_Delimited products with millions of records without multi-minute wall-clock runtimes, by reducing the per-record and per-field overhead that currently dominates validation time.

πŸ“– Additional Details

Profiling a 354 MB Table_Character product (1M records, 22 ASCII_Real fields) shows ~15.8s end-to-end validation time. Raw I/O accounts for less than 10% of that; the bottleneck is per-record field validation in FieldValueValidator and TableValidator. Three specific sources of waste identified in the code:

1. value.trim() called 3–6 times per field
In FieldValueValidator.validate(), the same value.trim() expression appears in length checks, empty checks, checkType(), checkSpecialMinMax(), and error message strings. Each call allocates a new String. For 1M records Γ— 22 fields = 22M fields, this means 66–132M redundant trim allocations. Fix: call once at the top of the field loop and reuse.

2. Double-read for Table_Delimited
In TableValidator.validateTableDelimitedContent(), each iteration calls both readNextLine() (raw line string) and getRecord(currentRow) (parsed record) for the same row. The raw line is only used for delimiter/EOL checks; the record is used for field validation. These could be unified to avoid the dual object allocation per record.

Relationship to pds4-jparser#197:
The readNextLine() buffering fix in pds4-jparser (4.7Γ— raw I/O throughput improvement) has measurable impact for Table_Delimited validation but zero impact for Table_Character validation β€” TableValidator uses readNextFixedLine() for fixed-length tables, bypassing readNextLine() entirely. Per-record field validation is the dominant cost in both cases.

Relevant files:

  • src/main/java/gov/nasa/pds/tools/validate/content/table/FieldValueValidator.java
  • src/main/java/gov/nasa/pds/tools/validate/rule/pds4/TableValidator.java

πŸ¦„ Related requirements

NASA-PDS/pds4-jparser#197


For Internal Dev Team To Complete

Acceptance Criteria

Given a Table_Character or Table_Delimited product with β‰₯ 1M records
When I run validation
Then I expect wall-clock time is measurably reduced compared to the current baseline, with no change in validation correctness

βš™οΈ Engineering Details

πŸŽ‰ I&T

πŸ€– Generated with Claude Code

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

Status

ToDo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions