Proposal to restructure hdf5 by adperezm · Pull Request #2403 · ExtremeFLOW/neko

adperezm · 2026-03-20T12:36:34Z

Hi,

@vbaconnet wanted to add hdf5 as an output for probes and while checking the code, we found that the current version seems a bit specific. So the idea came to try to restructure how hdf5 is managed.

In my opinion, one should have an hdf5 file manager that is very flexible and then there are users to it: Like vtkhdf and the current field files simply being normal hdf5 files with a specific structure.

So I have here a draft of how I would like things to be, but I would like to know your opinions and suggestion on how one can manage it nicely.

In this example:

    u => neko_registry%get_field("u") ! a field
    call res%init(10, name="res") ! a vector
    call mat%init(3, 10, name="mat") ! a matrix

    
    call hdf5_output%init("test.hdf")
    
    ! open the file
    call hdf5_output%open("w")
    
    ! Create or open a group
    call hdf5_output%set_active_group(["fluid", "field"])
    ! Write data to ir
    call hdf5_output%write_dataset(res)  
    
    ! new group
    call hdf5_output%set_active_group(["testing"])
    ! write again
    call hdf5_output%write_dataset(mat)
    call hdf5_output%write_dataset(u)
    
    ! close
    call hdf5_output%close()

The file written would look like:

/                        Group
/fluid                   Group
/fluid/field             Group
/fluid/field/res         Dataset {40}
/testing                 Group
/testing/mat             Dataset {40, 3}
/testing/u               Dataset {486, 6, 6, 6}

I believe that this would then allow to use it nicely in any routine that might need IO without the developer needing to fully get the parallel hdf5 API.

As the NEKO API grows, I can imagine that we will make most of our post processing in it, and having a good way to output files in a flexible way would be very nice.

So please let me know what types of features and arrays you think are absolutely needed to be able to use this sort of interface in the current parts of the code that use the format.

You can also just let me know if you think that this should be scrapped in favor of having more specific uses of hdf5

#==========================

To be specific, what I want is that vtkhdf and what lies in the current hdf5_file_t stop managing the HDF5 things themselves and move them to another type. Then they can just arrange the data in whichever way the want and write them using a new hdf_file manager.

vbaconnet

I really like this way of doing things, because it really brings out the strengths of hdf5 as you were saying!

I have many questions scattered a bit everywhere, I hope they are somewhat understandable! It could also be that I misunderstood some things, sorry in advance if that's the case...

Some general questions from my side:

(see my comment below about matrix/vector names) how should we deal with naming? Right now it throws an error if the dataset path/name (unsure what the correct nomenclature is) already exists. Should we add the option to: (i) overwrite existing datasets, or (ii) say if "matrix" exists, to create a unique name on-the-fly like "matrix_0"? Just for discussion purposes, I don't know what is best. As a user I don't know that I would even know what I want. Below an illustrated example.

a%init(3) ! a vector
b%init(10) ! another vector

call h5output%init("my.hdf5")
call h5output%open("w") ! can you also open with "a"?

! Case 1
call h5output%write(a) ! writes under "/Vector"
call h5output%write(b) ! Should this throw an error? Or write under "/Vector_0"? Or should it depend on if we open with "w" or "a"?

! Case 2
call h5output%set_overwrite(.true.) ! <-- what is the meaning of "overwrite"? Overwrite the file in general, or its datasets? Is that the same?
call h5output%write(a) ! writes under "/Vector"
call h5output%write(b) ! I would say this should overwrite the previous "/Vector". But then there is the question of sizes etc.

I think being able to make separate calls to open, write, close is nice. I wonder if we also want to provide some ready-to-go routines that are closer to what we currently have for like csv, fld, etc? I dont know if you already have it planned out, but I thought it would be cool to have the generic file write do the classic open, write, close and leave the specific write_vector, write_matrix routines to require opening and closing a file manually?

vbaconnet · 2026-03-20T14:32:13Z

src/io/hdf5_file_2.F90

+    ! ===================
+    ! Create the data set
+    ! ===================
+    call h5lexists_f(this%active_group_id, trim(mat%name), dset_exists, ierr)


Here with mat%name, could it happen that the name is empty? Since both vector and matrix initialize their names as empty strings by default? What would happen then inside the hdf5 file, would it still be able to write something under an "empty" dataset name?

I think we could change the default name of vector and matrix to "Vector" and "Matrix" as we do for fields, that would make things easier perhaps?

It could indeed be empty, so we should have a guardrail there. Perhaps simply assigning vector, field, matrix if empty. And then depending on overwrite/append, do whatever it needs.

adperezm · 2026-03-20T15:20:04Z

Hi,

Regarding the overwrite. I think we by default already have a parameter, so we should probably just respect it. If it is set to true, whatever has the same name would be replaced. If not, it would either: throw an error, or potentially appended. (maybe that's another option?) So I would need to add that.

I think being able to make separate calls to open, write, close is nice. I wonder if we also want to provide some ready-to-go routines that are closer to what we currently have for like csv, fld, etc? I dont know if you already have it planned out, but I thought it would be cool to have the generic file write do the classic open, write, close and leave the specific write_vector, write_matrix routines to require opening and closing a file manually?

Yes, that is what I would like to do with the %write. That depending on the data, the subroutines that are in here are used to write with a predetermined structure. That is what is done already in the current hdf version in neko. But the individual routines should also just exist in case they are needed for something else.

adperezm · 2026-03-20T16:52:05Z

@vbaconnet , when you have time check more or less how it would be with the probes (right now it is very rough but just to see functionality)

In the advecting cone, interpolating u, v, w, p, s we get a file that looks like this:

/                        Group
/probes                  Group
/probes/coordinates      Dataset {200, 3}
/probes/t_0              Group
/probes/t_0/interpolated_fields Dataset {200, 5}
/probes/t_1              Group
/probes/t_1/interpolated_fields Dataset {200, 5}
/probes/t_10             Group
/probes/t_10/interpolated_fields Dataset {200, 5}
/probes/t_11             Group
/probes/t_11/interpolated_fields Dataset {200, 5}
/probes/t_12             Group
/probes/t_12/interpolated_fields Dataset {200, 5}

So every time it interpolates, it is added as a group. Please check if you have preferences, i.e., should each interpolated field be named to? Is it better to leave it as a 2d array?

There are many things missing here, for example an attribute writer. But we go from here

timofeymukha · 2026-03-20T17:49:58Z

I think the idea of having some wrapper for handling hdf5 in a sane manner is a great idea. Basically, something like a small subset of h5py but for us in fortran and pinned to neko needs. Having that in the user file would be great.

Not sure heavy-duty stuff like vtkhdf gotta use it necessarily. That is very low level and looks scary ahahah. But something that works for 80% of the needs at a fraction of the complexity of the hdf5 api would be great!!

adperezm · 2026-03-20T23:49:33Z

Not sure heavy-duty stuff like vtkhdf gotta use it necessarily. That is very low level and looks scary ahahah. But something that works for 80% of the needs at a fraction of the complexity of the hdf5 api would be great!!

Yeah! I'm confident vtkhdf on different files can be done this way because I've done so in python and I see no technical barrier in reproducing that, since in that case it is just a matter of writing your data with the extra connectivity and types, etc. But the current one-file version is indeed out of the scope.

For the moment I think I will just put this in the current hdf5 file and limit to use these routines for writing probes.

Then we see later after we converge with the other hdf5 files and see if it makes sense to unify. I don't like the idea of having multiple types doing very very similar things. Already many routines repeat. But one thing at a time 😊

timfelle · 2026-03-23T08:16:41Z

I like the idea of simplifying things here. And I agree that we should probably unify most things at some point. But I think the way to do it is exactly what you have started here, created an interface for the most common operations we deal with, which can be shared between different file writers.

I do not foresee a common writter for vtkhdf and probes, but they could interface with the same backend.

timfelle · 2026-03-23T08:19:17Z

src/io/hdf5_file_2.F90

+    call h5dwrite_f(dset_id, precision_hdf, mat%x, dcount, ierr, &
+         file_space_id = filespace, mem_space_id = memspace, &
+         xfer_prp = xf_id)


Careful, you are writting a matrix of type rp to a HDF5 object of based type this%precision If these do not match, you will get garbage. You must do type conversion if they do not match. Check the VTKHDF write_pointdata for an example of how to handle it. My idea is to build a subroutine to handle it at some point, at the moment it is still part of the writer.

ohh, That's a good point. I will check that one

timfelle · 2026-03-23T08:21:50Z

src/io/hdf5_file_2.F90

+    ! Create the corresponding memory space (buffer) for my local data
+    call h5screate_simple_f(dset_rank, dcount, memspace, ierr)
+    ! Write the data
+    call h5dwrite_f(dset_id, precision_hdf, field%x, dcount, ierr, &


same issue here

timfelle · 2026-03-23T09:25:22Z

src/io/hdf5_file_2.F90

+
+    select case(precision)
+    case(sp)
+       H5T_NEKO_REAL = H5T_NATIVE_REAL


The built in HDF5 system is a bit safer.

Suggested change

H5T_NEKO_REAL = H5T_NATIVE_REAL

H5T_NEKO_REAL = h5kind_to_type(sp, H5_REAL_KIND)

adperezm · 2026-03-23T10:08:27Z

Excellent @timfelle! I will start putting these things in the main hdf5_file_t and focus on writing probes. And then we can use/modify things as we need them.

vbaconnet · 2026-03-23T14:12:23Z

should each interpolated field be named to? Is it better to leave it as a 2d array?

I think it would be cool to have it like (with only 2 fields instead of 5 in your example):

/                        Group
/probes                  Group
/probes/coordinates      Dataset {200, 3}
/probes/t_0              Group
/probes/t_0/field_name1  Dataset {200} # <-- or {200, 1}, whatever is correct
/probes/t_0/field_name2  Dataset {200}
/probes/t_1              Group
/probes/t_1/field_name1  Dataset {200}
/probes/t_1/field_name2  Dataset {200}
# ...
/probes/time             Dataset {Ntimesteps} # <- at the end of the simulation? Or can this be updated at runtime?

And an attribute of /probes can be the name of the fields as well, but maybe it's redundant? I don't really know what is best with hdf5 but I think it would be nice to have it all organized like that... But it's quite personal.

Having the final time signal written at the end is also maybe redundant, you could build the array by casting f["/probes"].keys() into floats I guess.

Also, In your example, are t_0, t_1, etc the time? How do we feel about that? To me it feels weird to have floating point numbers as names of things but again it's personal preference.. I would suggest an index for example? Honestly don't know what is best. As long as the probes are in hdf5 I'm happy :D

timfelle · 2026-03-23T14:20:11Z

I think we should take a leaf out of VTKHDF's book in terms of temporal management.
Times are stored in an array under Steps/values, Temporal data is then stored as [N_points * N_timesteps], and a set of offset values are stored seperately to indicate where each timestep begins.

/                           Group
/probes                     Group
/probes/coordinates         Dataset {200, 3}
/probes/time                Dataset {Ntimesteps}     # This can be written at run time
/probes/offsets             Dataset {Ntimesteps}     # 200, 400, ....
/probes/Fields              Group
/probes/Fields/field_name1  Dataset {200*Ntimesteps} # [0,...,199] T0, [200,...,399] T1
/probes/Fields/field_name2  Dataset {200*Ntimesteps}

adperezm added 6 commits March 19, 2026 17:49

first changes

9f773d7

Working vector

f8cd1d8

working matrix

bff4ed5

error if dset exists

16fdf28

Field added

2a9d8bc

Format

9c4d763

adperezm added the help wanted Extra attention is needed label Mar 20, 2026

vbaconnet reviewed Mar 20, 2026

View reviewed changes

Added rough hdf5 probe

c05a3e2

timfelle reviewed Mar 23, 2026

View reviewed changes

adperezm added 4 commits March 23, 2026 12:21

put things in hdf5 file

8a88fb0

remove old file

0a8f3e8

Casting data when needed

6c90370

safer probe output - work in progress still

871bdaa

adperezm added 2 commits March 23, 2026 15:15

IMproved the overwrite behaviour in matrix

db33939

Fixed field and vector writers

e83c49b

adperezm added 4 commits March 23, 2026 16:00

Bug chasing

e2aabda

Fixed open close bugs

4fd1fe1

fixed bugs

f6f31de

Removed some log messages

4678e0c

adperezm added 6 commits March 23, 2026 16:33

Deallocate buffers

2ba74d1

Merge remote-tracking branch 'upstream/develop' into feature/hdf5writer

34059d4

Interfaces for attributes

9163f76

Real and integer attributes to groups

b1c26a5

hdf5 works but not on probes with vectors

072d91a

probe does not write with vectors

25adc18

	H5T_NEKO_REAL = H5T_NATIVE_REAL
	H5T_NEKO_REAL = h5kind_to_type(sp, H5_REAL_KIND)

Conversation

adperezm commented Mar 20, 2026 • edited by vbaconnet Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vbaconnet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vbaconnet Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

adperezm Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

adperezm commented Mar 20, 2026

Uh oh!

adperezm commented Mar 20, 2026

Uh oh!

timofeymukha commented Mar 20, 2026

Uh oh!

adperezm commented Mar 20, 2026

Uh oh!

timfelle commented Mar 23, 2026

Uh oh!

timfelle Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

adperezm Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

timfelle Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

timfelle Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adperezm commented Mar 23, 2026

Uh oh!

vbaconnet commented Mar 23, 2026

Uh oh!

timfelle commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adperezm commented Mar 20, 2026 •

edited by vbaconnet

Loading

vbaconnet left a comment •

edited

Loading

timfelle Mar 23, 2026 •

edited

Loading

timfelle commented Mar 23, 2026 •

edited

Loading