Background
src/picmaker/io.py::read_one_image_array runs the format cascade by trying each reader in turn (pickle → numpy → VICAR → FITS → PIL → PDS3). Each reader either opens the file itself or relies on its own magic-byte sniff. The FITS branch additionally re-reads the first 9 bytes (io.py:142-144) to sniff b'SIMPLE =' before calling astropy.io.fits.open.
For a file whose true format is at the end of the cascade (e.g. a PDS3 label), every preceding probe opens and partially reads the file before giving up. On a slow filesystem or for a directory tree of thousands of files this adds measurable wall-clock cost.
See CODEBASE_CRITIQUE.md §5 (Performance and resource use, "Finding (Low) — io.read_one_image_array re-opens the file up to five times").
Suggested approach
Read the first 32 bytes of the file once, then dispatch based on the magic bytes:
| Format |
Magic bytes |
| pickle |
\x80 (protocol 2+) or \x28\x6c\x70 (protocol 0/1) |
| numpy .npy |
\x93NUMPY |
| VICAR |
LBLSIZE= (within the first ~16 bytes) |
| FITS |
SIMPLE = |
| PNG |
\x89PNG |
| TIFF |
II\x2a\x00 (LE) or MM\x00\x2a (BE) |
| JPEG |
\xff\xd8\xff |
Sketch:
def read_one_image_array(filename, labelfile, obj=None, hst=False):
filename_str = str(filename)
try:
with open(filename_str, 'rb') as f:
header = f.read(32)
except OSError as exc:
cascade_errors = [exc]
header = b''
if header.startswith(b'\\x80') or header.startswith(b'(lp'):
return _read_pickle(filename_str)
if header.startswith(b'\\x93NUMPY'):
return _read_npy(filename_str)
if b'LBLSIZE=' in header[:16]:
return _read_vicar(filename_str)
if header.startswith(b'SIMPLE ='):
return _read_fits(filename_str, obj, hst)
if header.startswith((b'\\x89PNG', b'II\\x2a\\x00', b'MM\\x00\\x2a', b'\\xff\\xd8\\xff')):
return _read_pil(filename_str)
if labelfile:
result = read_pds_labeled_image_array(labelfile, obj)
if result is not None:
return result
raise OSError(...) from ExceptionGroup(...)
Each per-format helper is the body of the corresponding branch in the current cascade. The factored helpers are easier to test in isolation (one of the things _process_one_image in #12 would consume directly).
Why now
- The cascade-end
OSError is already chained from ExceptionGroup (see codebase priority 4); the magic-byte rewrite preserves that semantics.
- The reader cascade is the bottleneck for
picmaker -r --pattern '*' <dir> on directory trees with many files.
Acceptance criteria
Related
CODEBASE_CRITIQUE.md §5 — original finding.
- #12 — function-split refactor; the per-format helpers from this issue feed the
_process_one_image helper described there.
Background
src/picmaker/io.py::read_one_image_arrayruns the format cascade by trying each reader in turn (pickle → numpy → VICAR → FITS → PIL → PDS3). Each reader either opens the file itself or relies on its own magic-byte sniff. The FITS branch additionally re-reads the first 9 bytes (io.py:142-144) to sniffb'SIMPLE ='before callingastropy.io.fits.open.For a file whose true format is at the end of the cascade (e.g. a PDS3 label), every preceding probe opens and partially reads the file before giving up. On a slow filesystem or for a directory tree of thousands of files this adds measurable wall-clock cost.
See
CODEBASE_CRITIQUE.md§5 (Performance and resource use, "Finding (Low) —io.read_one_image_arrayre-opens the file up to five times").Suggested approach
Read the first 32 bytes of the file once, then dispatch based on the magic bytes:
\x80(protocol 2+) or\x28\x6c\x70(protocol 0/1)\x93NUMPYLBLSIZE=(within the first ~16 bytes)SIMPLE =\x89PNGII\x2a\x00(LE) orMM\x00\x2a(BE)\xff\xd8\xffSketch:
Each per-format helper is the body of the corresponding branch in the current cascade. The factored helpers are easier to test in isolation (one of the things
_process_one_imagein #12 would consume directly).Why now
OSErroris already chained fromExceptionGroup(see codebase priority 4); the magic-byte rewrite preserves that semantics.picmaker -r --pattern '*' <dir>on directory trees with many files.Acceptance criteria
read_one_image_arrayreads each input file's first 32 bytes once for format dispatch._read_pickle,_read_npy,_read_vicar,_read_fits,_read_pil).OSError('Unrecognized image file format: ...')chained fromExceptionGroup.test_io.py,test_io_cascade.py,test_pds3_reader.py,test_pds3_reader_branches.py,test_warning_elevation.py) pass unchanged.Related
CODEBASE_CRITIQUE.md§5 — original finding._process_one_imagehelper described there.