Skip to content

Add support for hdf5 files#169

Merged
erykoff merged 42 commits into
LSSTDESC:mainfrom
elvinpoole:hdf5
Jan 29, 2026
Merged

Add support for hdf5 files#169
erykoff merged 42 commits into
LSSTDESC:mainfrom
elvinpoole:hdf5

Conversation

@elvinpoole
Copy link
Copy Markdown
Contributor

Added read/write to a group in an hdf5 file

Useful for saving maps in TXPipe/ceci where we want N maps with different masks saved to the same file (e.g. galaxy density maps in different tomo bins)

Currently it only allows the whole map to the read (cannot specify list of coverage pixels with pixels) but this can be added

@elvinpoole elvinpoole marked this pull request as draft January 20, 2026 20:31
@elvinpoole elvinpoole marked this pull request as ready for review January 20, 2026 20:43
Copy link
Copy Markdown
Collaborator

@erykoff erykoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started to go through but not the full way. The map should be saved and not the valid pixels. This is a key part of the healsparse format.

Comment thread healsparse/healSparseMap.py Outdated
use_threads : `bool`, optional
Use multithreaded reading for parquet files.
group : `str`, optional
if file is hdf5, read from this group
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> If file is hdf5, read from this group. (Please format as a sentence).
The next line is not necessary; the default is in the signature.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking further, I think that this should be hdf_group instead of group everywhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/healSparseMap.py Outdated
coverage information.
group: `str`, optional
If file format is ``hdf5``, save to this group
otheriwse unused
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> If file format is ``hdf5``, save to this group; otherwise unused.
(Can fit on one line; next line is not necessary)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, make this hdf5_group.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/healSparseMap.py Outdated
header=header, degrade_nside=degrade_nside,
weightfile=weightfile, reduction=reduction, use_threads=use_threads)
weightfile=weightfile, reduction=reduction, use_threads=use_threads,
hdf5_group=group)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> hdf_group=hdf5_group.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/healSparseMap.py Outdated
"""
_write_map(self, filename, clobber=clobber, nocompress=nocompress, format=format,
nside_io=nside_io)
nside_io=nside_io, hdf5_group=group)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> hdf_group=hdf5_group.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/io_map.py Outdated
use_threads : `bool`, optional
Use multithreaded reading for parquet files.
hdf5_group: `str`, optional
hdf5 group to read from (only used if reading from an hdf5 file)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Read from this hdf5 group (only used reading from an hdf5 file).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/io_map.py Outdated
raise IOError("Filename %s could not be found." % (filename))
else:
raise NotImplementedError("HealSparse only supports fits and parquet files (with pyarrow).")
raise NotImplementedError("HealSparse only supports fits, hdf5 and parquet files (with pyarrow).")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> ... hdf5 (with h5py), and parquet ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/io_map_hdf5.py Outdated
Parameters
----------
hsp_map : HealSparseMap
Map to save
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Map to save.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/io_map_hdf5.py Outdated
filepath : str
HDF5 file path
group : str, optional
Name of the HDF5 group to store the map (default 'map')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to put the default in the docstring. Here and below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/io_map_hdf5.py Outdated
Overwrite the file/group if it exists (default False)
"""
if os.path.isfile(filepath) and not clobber:
raise RuntimeError("Filename %s exists and clobber is False." % (filepath))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this can work as is. clobber is the same as overwrite which isn't happening here. I think we can demand that the HDF5 already exists (document this!) and therefore (a) clobber does nothing and (b) we can demand that the file exists.

Or ... I see now that you're using clobber to overwrite an existing group? In which case, that needs to be documented but the check here is not what you want.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. the clobber logic should now work. Yes, the intended behaviour is:
clobber=False only prevents overwriting the group. If the file already exists but the group is new, we can still save to a new group in that file (even if clobber=F)

Comment thread healsparse/io_map_hdf5.py Outdated

# Sparse map - only save valid pixels
valid_pixels = hsp_map.valid_pixels
grp.create_dataset("pixel", data=valid_pixels, compression="gzip")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to work, and in fact may explode well beyond your existing memory. The healsparse format is extremely efficient without recording pixel values (which can sometimes fail and is often slow). You want to save the map itself, and not valid pixels. Same thing with the coverage map.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now save the map

Copy link
Copy Markdown
Collaborator

@erykoff erykoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is looking a lot better but there are still a couple places that will not work in general (packed bool maps and recarrays). Also, please add to the documentation about this new format.

Comment thread healsparse/healSparseMap.py Outdated
the ``healpix`` EXPLICIT format does not maintain all metadata and
coverage information.
hdf5_group: `str`, optional
If file format is ``hdf5``, save to this group, otheriwse unused.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> otherwise

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/io_map.py
is_fits_file = True
fits.close()
except OSError:
except (OSError, UnicodeDecodeError):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the UnicodeDecodeError coming from?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is what happens when you try to load the hdf5 file as if its fits. there might be a cleaner way of checking

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh! Okay, could you add a comment here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/io_map_hdf5.py
Comment thread healsparse/io_map_hdf5.py Outdated
)
elif hsp_map.is_bit_packed_map:
# save as bool array rather than packed to keep the same pixel as other maps when reading
sparse_map_reshape = np.asarray(hsp_map._sparse_map).reshape(ncov_in_sparse, nfine_per_cov)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to blow up the memory and crash in the cases that a packed map is being used. Please use the packed format. In this case you just need to reshape with a divide-by-8 on nfine_per_cov e.g. https://github.com/LSSTDESC/healsparse/blob/main/healsparse/io_map_fits.py#L614-L616

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/io_map_hdf5.py
Comment thread healsparse/io_map_hdf5.py Outdated

sparse_map = np.zeros(sparse_size, dtype=dtype)
for name, _ in dtype:
sparse_map[name][:] = sentinel
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to work because the sentinel only applies to primary. You'll need to read the overflow coverage pixel in all cases (not just here)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i read the overflow pixel from file anyway so this line was doing nothing. removed

Comment thread healsparse/io_map_hdf5.py Outdated
)
elif is_bit_packed:
sparse_map = grp["sparse_map"][cov_index_in_sparse_ordered, :][inv].reshape(-1)
sparse_map = _PackedBoolArray.from_boolean_array(sparse_map)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to leave this stored as packed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread tests/test_io_hdf5.py
Copy link
Copy Markdown
Collaborator

@erykoff erykoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments. The only substantive one is that it looks like the "pixels read" tests are turned off, but it seems like these should work. So these tests should be turned on (and working) or specifying pixels needs to raise a NotImplementedError.

Comment thread docs/filespec.rst Outdated
A :code:`HealSparseMap` may also be serialized to an HDF5 file.
Multiple :code:`HealSparseMap` objects can be stored in the same HDF5 file.
Each :code:`HealSparseMap` is stored in a different HDF5 group.
They are not required to have the same mask, :code:`nside_coverage` or :code:`nside_sparse`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oxford comma, please!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/healSparseMap.py Outdated
use_threads : `bool`, optional
Use multithreaded reading for parquet files.
hdf5_group : `str`, optional
If file format is ``hdf5``, read from this group, otheriwse unused.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> otherwise

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread healsparse/io_map.py
is_fits_file = True
fits.close()
except OSError:
except (OSError, UnicodeDecodeError):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh! Okay, could you add a comment here?

Comment thread healsparse/io_map_hdf5.py Outdated
compression="gzip",
)
elif hsp_map.is_bit_packed_map:
# save as bool array rather than packed to keep the same pixel as other maps when reading
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is no longer operative.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment thread healsparse/io_map_hdf5.py
Comment thread tests/test_io_hdf5.py Outdated
import healsparse

# read specific coverage pixels not yet implemented
test_pixels_read = False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this as an option and made the pixel tests always run

@erykoff erykoff changed the title Hdf5 Add support for hdf5 files Jan 29, 2026
@erykoff erykoff merged commit ef36eb1 into LSSTDESC:main Jan 29, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants