Added HDF5 compression to EmpCylSL disk cache files#183
Added HDF5 compression to EmpCylSL disk cache files#183
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds HDF5 compression support to EmpCylSL disk cache files to reduce file sizes. The compression is achieved using the HDF5 Deflate algorithm with a default compression level of 7, along with optional shuffle filter and Szip compression support.
- Adds three protected member variables for configuring HDF5 compression settings (H5compress, H5shuffle, H5szip)
- Implements compression in the WriteH5Cache() method by creating a DataSetCreateProps object with appropriate compression settings
- Applies compression to all 8 cache datasets (potC, rforceC, zforceC, densC, potS, rforceS, zforceS, densS)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| include/EmpCylSL.H | Adds three protected member variables for HDF5 compression configuration (H5compress=7, H5shuffle=true, H5szip=false) |
| exputil/EmpCylSL.cc | Implements compression logic in WriteH5Cache() by configuring DataSetCreateProps with chunking, shuffle filter, and Deflate/Szip compression, then applies it to all 8 cache datasets |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Add public setter for HDF5 compression parameters in EmpCylSL
…YAML configuration parameter; I can't see why someone would want to turn this off, but it's an optin
|
The last commit allows the user to change the compression level. E.g. setting HDF5 compression for |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Set HDF5 compression parameters for cache files | ||
| if (conf["compress"]) { | ||
| unsigned compress = conf["compress"].as<unsigned>(); | ||
| ortho->setH5Params(compress); | ||
| } |
There was a problem hiding this comment.
The Cylinder class only exposes the compress parameter, while the BiorthBasis Cylindrical class exposes both compress and shuffle parameters. This creates an inconsistent API between the two interfaces. Consider either adding shuffle support to Cylinder.cc (by adding "shuffle" to valid_keys and parsing it), or documenting why this parameter is only available in one interface.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| // Set HDF5 compression parameters for cache files | ||
| if (conf["compress"]) { | ||
| unsigned compress = conf["compress"].as<unsigned>(); | ||
| ortho->setH5Params(compress); | ||
| } |
There was a problem hiding this comment.
The new compression functionality lacks test coverage. Consider adding tests that verify: (1) compression can be enabled/disabled via the compress parameter, (2) compressed cache files can be read back correctly, (3) the shuffle parameter works as expected, and (4) validation of the compression level (0-9 range). The existing test file tests/Disk/cyl_basis.py could be extended to include these cases.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| //! Set HDF5 compression parameters for cache files | ||
| //! @param compress Compression level (valid range: 0-9, where 0=no compression, 9=max compression) | ||
| //! @param shuffle Enable byte shuffle filter (default: true) | ||
| //! @param szip Enable szip compression (default: false) | ||
| void setH5Params(unsigned compress, bool shuffle=true, bool szip=false) | ||
| { | ||
| // Validate compression level (unsigned type prevents negative values) | ||
| if (compress > 9) { | ||
| if (myid==0) { | ||
| std::cout << "EmpCylSL: compression level " << compress | ||
| << " out of range [0,9], using 9" << std::endl; | ||
| } | ||
| compress = 9; | ||
| } | ||
| H5compress = compress; | ||
| H5shuffle = shuffle; | ||
| H5szip = szip; | ||
| } |
There was a problem hiding this comment.
The current implementation makes szip and deflate compression mutually exclusive (only one can be active at a time), but this is not documented. The setH5Params method documentation should clarify that when szip=true, the compress parameter (which controls deflate/gzip compression level) is ignored. Consider adding documentation or a warning when both are configured.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
exputil/EmpCylSL.cc
Outdated
| // Chunking | ||
| unsigned long nx = NUMX + 1, ny = NUMY + 1; | ||
| dcpl.add(HighFive::Chunking({nx, ny})); | ||
| if (H5shuffle) dcpl.add(HighFive::Shuffle()); |
There was a problem hiding this comment.
The shuffle filter is applied even when H5compress=0 (no compression). The shuffle filter reorders bytes to improve compressibility, but has no benefit and adds overhead when compression is disabled. Consider only applying the shuffle filter when compression is actually enabled (i.e., when H5compress > 0 or H5szip is true).
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
[WIP] Address feedback on HDF5 compression implementation
…tion Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Add test coverage for HDF5 compression in EmpCylSL disk cache
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
…pression Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Only apply HDF5 shuffle filter when compression is enabled
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Only apply HDF5 shuffle filter when compression is enabled
Change setH5Params to use deflate when both szip and deflate compression are configured
|
This PR is done for now and ready for review. Main new features are:
|
|
@michael-petersen No hurry on this. But work on this PR is completed on my end. |
The code changes are very simple. On small builds, the compression factor is ~1.2. Might be better for larger cache files.