Skip to content

Proposal to restructure hdf5#2403

Draft
adperezm wants to merge 23 commits intoExtremeFLOW:developfrom
adperezm:feature/hdf5writer
Draft

Proposal to restructure hdf5#2403
adperezm wants to merge 23 commits intoExtremeFLOW:developfrom
adperezm:feature/hdf5writer

Conversation

@adperezm
Copy link
Collaborator

@adperezm adperezm commented Mar 20, 2026

Hi,

@vbaconnet wanted to add hdf5 as an output for probes and while checking the code, we found that the current version seems a bit specific. So the idea came to try to restructure how hdf5 is managed.

In my opinion, one should have an hdf5 file manager that is very flexible and then there are users to it: Like vtkhdf and the current field files simply being normal hdf5 files with a specific structure.

So I have here a draft of how I would like things to be, but I would like to know your opinions and suggestion on how one can manage it nicely.

In this example:

    u => neko_registry%get_field("u") ! a field
    call res%init(10, name="res") ! a vector
    call mat%init(3, 10, name="mat") ! a matrix

    
    call hdf5_output%init("test.hdf")
    
    ! open the file
    call hdf5_output%open("w")
    
    ! Create or open a group
    call hdf5_output%set_active_group(["fluid", "field"])
    ! Write data to ir
    call hdf5_output%write_dataset(res)  
    
    ! new group
    call hdf5_output%set_active_group(["testing"])
    ! write again
    call hdf5_output%write_dataset(mat)
    call hdf5_output%write_dataset(u)
    
    ! close
    call hdf5_output%close()

The file written would look like:

/                        Group
/fluid                   Group
/fluid/field             Group
/fluid/field/res         Dataset {40}
/testing                 Group
/testing/mat             Dataset {40, 3}
/testing/u               Dataset {486, 6, 6, 6}

I believe that this would then allow to use it nicely in any routine that might need IO without the developer needing to fully get the parallel hdf5 API.

As the NEKO API grows, I can imagine that we will make most of our post processing in it, and having a good way to output files in a flexible way would be very nice.

So please let me know what types of features and arrays you think are absolutely needed to be able to use this sort of interface in the current parts of the code that use the format.

You can also just let me know if you think that this should be scrapped in favor of having more specific uses of hdf5

#==========================

To be specific, what I want is that vtkhdf and what lies in the current hdf5_file_t stop managing the HDF5 things themselves and move them to another type. Then they can just arrange the data in whichever way the want and write them using a new hdf_file manager.

@adperezm adperezm added the help wanted Extra attention is needed label Mar 20, 2026
Copy link
Collaborator

@vbaconnet vbaconnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this way of doing things, because it really brings out the strengths of hdf5 as you were saying!

I have many questions scattered a bit everywhere, I hope they are somewhat understandable! It could also be that I misunderstood some things, sorry in advance if that's the case...

Some general questions from my side:

  • (see my comment below about matrix/vector names) how should we deal with naming? Right now it throws an error if the dataset path/name (unsure what the correct nomenclature is) already exists. Should we add the option to: (i) overwrite existing datasets, or (ii) say if "matrix" exists, to create a unique name on-the-fly like "matrix_0"? Just for discussion purposes, I don't know what is best. As a user I don't know that I would even know what I want. Below an illustrated example.
a%init(3) ! a vector
b%init(10) ! another vector

call h5output%init("my.hdf5")
call h5output%open("w") ! can you also open with "a"?

! Case 1
call h5output%write(a) ! writes under "/Vector"
call h5output%write(b) ! Should this throw an error? Or write under "/Vector_0"? Or should it depend on if we open with "w" or "a"?

! Case 2
call h5output%set_overwrite(.true.) ! <-- what is the meaning of "overwrite"? Overwrite the file in general, or its datasets? Is that the same?
call h5output%write(a) ! writes under "/Vector"
call h5output%write(b) ! I would say this should overwrite the previous "/Vector". But then there is the question of sizes etc.
  • I think being able to make separate calls to open, write, close is nice. I wonder if we also want to provide some ready-to-go routines that are closer to what we currently have for like csv, fld, etc? I dont know if you already have it planned out, but I thought it would be cool to have the generic file write do the classic open, write, close and leave the specific write_vector, write_matrix routines to require opening and closing a file manually?

! ===================
! Create the data set
! ===================
call h5lexists_f(this%active_group_id, trim(mat%name), dset_exists, ierr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here with mat%name, could it happen that the name is empty? Since both vector and matrix initialize their names as empty strings by default? What would happen then inside the hdf5 file, would it still be able to write something under an "empty" dataset name?

I think we could change the default name of vector and matrix to "Vector" and "Matrix" as we do for fields, that would make things easier perhaps?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could indeed be empty, so we should have a guardrail there. Perhaps simply assigning vector, field, matrix if empty. And then depending on overwrite/append, do whatever it needs.

@adperezm
Copy link
Collaborator Author

Hi,

Regarding the overwrite. I think we by default already have a parameter, so we should probably just respect it. If it is set to true, whatever has the same name would be replaced. If not, it would either: throw an error, or potentially appended. (maybe that's another option?) So I would need to add that.

  • I think being able to make separate calls to open, write, close is nice. I wonder if we also want to provide some ready-to-go routines that are closer to what we currently have for like csv, fld, etc? I dont know if you already have it planned out, but I thought it would be cool to have the generic file write do the classic open, write, close and leave the specific write_vector, write_matrix routines to require opening and closing a file manually?

Yes, that is what I would like to do with the %write. That depending on the data, the subroutines that are in here are used to write with a predetermined structure. That is what is done already in the current hdf version in neko. But the individual routines should also just exist in case they are needed for something else.

@adperezm
Copy link
Collaborator Author

@vbaconnet , when you have time check more or less how it would be with the probes (right now it is very rough but just to see functionality)

In the advecting cone, interpolating u, v, w, p, s we get a file that looks like this:

/                        Group
/probes                  Group
/probes/coordinates      Dataset {200, 3}
/probes/t_0              Group
/probes/t_0/interpolated_fields Dataset {200, 5}
/probes/t_1              Group
/probes/t_1/interpolated_fields Dataset {200, 5}
/probes/t_10             Group
/probes/t_10/interpolated_fields Dataset {200, 5}
/probes/t_11             Group
/probes/t_11/interpolated_fields Dataset {200, 5}
/probes/t_12             Group
/probes/t_12/interpolated_fields Dataset {200, 5}

So every time it interpolates, it is added as a group. Please check if you have preferences, i.e., should each interpolated field be named to? Is it better to leave it as a 2d array?

There are many things missing here, for example an attribute writer. But we go from here

@timofeymukha
Copy link
Collaborator

I think the idea of having some wrapper for handling hdf5 in a sane manner is a great idea. Basically, something like a small subset of h5py but for us in fortran and pinned to neko needs. Having that in the user file would be great.

Not sure heavy-duty stuff like vtkhdf gotta use it necessarily. That is very low level and looks scary ahahah. But something that works for 80% of the needs at a fraction of the complexity of the hdf5 api would be great!!

@adperezm
Copy link
Collaborator Author

Not sure heavy-duty stuff like vtkhdf gotta use it necessarily. That is very low level and looks scary ahahah. But something that works for 80% of the needs at a fraction of the complexity of the hdf5 api would be great!!

Yeah! I'm confident vtkhdf on different files can be done this way because I've done so in python and I see no technical barrier in reproducing that, since in that case it is just a matter of writing your data with the extra connectivity and types, etc. But the current one-file version is indeed out of the scope.

For the moment I think I will just put this in the current hdf5 file and limit to use these routines for writing probes.

Then we see later after we converge with the other hdf5 files and see if it makes sense to unify. I don't like the idea of having multiple types doing very very similar things. Already many routines repeat. But one thing at a time 😊

@timfelle
Copy link
Collaborator

I like the idea of simplifying things here. And I agree that we should probably unify most things at some point. But I think the way to do it is exactly what you have started here, created an interface for the most common operations we deal with, which can be shared between different file writers.

I do not foresee a common writter for vtkhdf and probes, but they could interface with the same backend.

Comment on lines +337 to +339
call h5dwrite_f(dset_id, precision_hdf, mat%x, dcount, ierr, &
file_space_id = filespace, mem_space_id = memspace, &
xfer_prp = xf_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Careful, you are writting a matrix of type rp to a HDF5 object of based type this%precision If these do not match, you will get garbage. You must do type conversion if they do not match. Check the VTKHDF write_pointdata for an example of how to handle it. My idea is to build a subroutine to handle it at some point, at the moment it is still part of the writer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, That's a good point. I will check that one

! Create the corresponding memory space (buffer) for my local data
call h5screate_simple_f(dset_rank, dcount, memspace, ierr)
! Write the data
call h5dwrite_f(dset_id, precision_hdf, field%x, dcount, ierr, &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue here


select case(precision)
case(sp)
H5T_NEKO_REAL = H5T_NATIVE_REAL
Copy link
Collaborator

@timfelle timfelle Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The built in HDF5 system is a bit safer.

Suggested change
H5T_NEKO_REAL = H5T_NATIVE_REAL
H5T_NEKO_REAL = h5kind_to_type(sp, H5_REAL_KIND)

@adperezm
Copy link
Collaborator Author

Excellent @timfelle! I will start putting these things in the main hdf5_file_t and focus on writing probes. And then we can use/modify things as we need them.

@vbaconnet
Copy link
Collaborator

should each interpolated field be named to? Is it better to leave it as a 2d array?

I think it would be cool to have it like (with only 2 fields instead of 5 in your example):

/                        Group
/probes                  Group
/probes/coordinates      Dataset {200, 3}
/probes/t_0              Group
/probes/t_0/field_name1  Dataset {200} # <-- or {200, 1}, whatever is correct
/probes/t_0/field_name2  Dataset {200}
/probes/t_1              Group
/probes/t_1/field_name1  Dataset {200}
/probes/t_1/field_name2  Dataset {200}
# ...
/probes/time             Dataset {Ntimesteps} # <- at the end of the simulation? Or can this be updated at runtime?

And an attribute of /probes can be the name of the fields as well, but maybe it's redundant? I don't really know what is best with hdf5 but I think it would be nice to have it all organized like that... But it's quite personal.

Having the final time signal written at the end is also maybe redundant, you could build the array by casting f["/probes"].keys() into floats I guess.

Also, In your example, are t_0, t_1, etc the time? How do we feel about that? To me it feels weird to have floating point numbers as names of things but again it's personal preference.. I would suggest an index for example? Honestly don't know what is best. As long as the probes are in hdf5 I'm happy :D

@timfelle
Copy link
Collaborator

timfelle commented Mar 23, 2026

I think we should take a leaf out of VTKHDF's book in terms of temporal management.
Times are stored in an array under Steps/values, Temporal data is then stored as [N_points * N_timesteps], and a set of offset values are stored seperately to indicate where each timestep begins.

/                           Group
/probes                     Group
/probes/coordinates         Dataset {200, 3}
/probes/time                Dataset {Ntimesteps}     # This can be written at run time
/probes/offsets             Dataset {Ntimesteps}     # 200, 400, ....
/probes/Fields              Group
/probes/Fields/field_name1  Dataset {200*Ntimesteps} # [0,...,199] T0, [200,...,399] T1
/probes/Fields/field_name2  Dataset {200*Ntimesteps}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants