Proposal to restructure hdf5#2403
Conversation
There was a problem hiding this comment.
I really like this way of doing things, because it really brings out the strengths of hdf5 as you were saying!
I have many questions scattered a bit everywhere, I hope they are somewhat understandable! It could also be that I misunderstood some things, sorry in advance if that's the case...
Some general questions from my side:
- (see my comment below about matrix/vector names) how should we deal with naming? Right now it throws an error if the dataset path/name (unsure what the correct nomenclature is) already exists. Should we add the option to: (i) overwrite existing datasets, or (ii) say if "matrix" exists, to create a unique name on-the-fly like "matrix_0"? Just for discussion purposes, I don't know what is best. As a user I don't know that I would even know what I want. Below an illustrated example.
a%init(3) ! a vector
b%init(10) ! another vector
call h5output%init("my.hdf5")
call h5output%open("w") ! can you also open with "a"?
! Case 1
call h5output%write(a) ! writes under "/Vector"
call h5output%write(b) ! Should this throw an error? Or write under "/Vector_0"? Or should it depend on if we open with "w" or "a"?
! Case 2
call h5output%set_overwrite(.true.) ! <-- what is the meaning of "overwrite"? Overwrite the file in general, or its datasets? Is that the same?
call h5output%write(a) ! writes under "/Vector"
call h5output%write(b) ! I would say this should overwrite the previous "/Vector". But then there is the question of sizes etc.- I think being able to make separate calls to
open, write, closeis nice. I wonder if we also want to provide some ready-to-go routines that are closer to what we currently have for likecsv,fld, etc? I dont know if you already have it planned out, but I thought it would be cool to have the generic filewritedo the classicopen, write, closeand leave the specificwrite_vector,write_matrixroutines to require opening and closing a file manually?
src/io/hdf5_file_2.F90
Outdated
| ! =================== | ||
| ! Create the data set | ||
| ! =================== | ||
| call h5lexists_f(this%active_group_id, trim(mat%name), dset_exists, ierr) |
There was a problem hiding this comment.
Here with mat%name, could it happen that the name is empty? Since both vector and matrix initialize their names as empty strings by default? What would happen then inside the hdf5 file, would it still be able to write something under an "empty" dataset name?
I think we could change the default name of vector and matrix to "Vector" and "Matrix" as we do for fields, that would make things easier perhaps?
There was a problem hiding this comment.
It could indeed be empty, so we should have a guardrail there. Perhaps simply assigning vector, field, matrix if empty. And then depending on overwrite/append, do whatever it needs.
|
Hi, Regarding the overwrite. I think we by default already have a parameter, so we should probably just respect it. If it is set to true, whatever has the same name would be replaced. If not, it would either: throw an error, or potentially appended. (maybe that's another option?) So I would need to add that.
Yes, that is what I would like to do with the %write. That depending on the data, the subroutines that are in here are used to write with a predetermined structure. That is what is done already in the current hdf version in neko. But the individual routines should also just exist in case they are needed for something else. |
|
@vbaconnet , when you have time check more or less how it would be with the probes (right now it is very rough but just to see functionality) In the advecting cone, interpolating u, v, w, p, s we get a file that looks like this: / Group
/probes Group
/probes/coordinates Dataset {200, 3}
/probes/t_0 Group
/probes/t_0/interpolated_fields Dataset {200, 5}
/probes/t_1 Group
/probes/t_1/interpolated_fields Dataset {200, 5}
/probes/t_10 Group
/probes/t_10/interpolated_fields Dataset {200, 5}
/probes/t_11 Group
/probes/t_11/interpolated_fields Dataset {200, 5}
/probes/t_12 Group
/probes/t_12/interpolated_fields Dataset {200, 5}So every time it interpolates, it is added as a group. Please check if you have preferences, i.e., should each interpolated field be named to? Is it better to leave it as a 2d array? There are many things missing here, for example an attribute writer. But we go from here |
|
I think the idea of having some wrapper for handling hdf5 in a sane manner is a great idea. Basically, something like a small subset of h5py but for us in fortran and pinned to neko needs. Having that in the user file would be great. Not sure heavy-duty stuff like vtkhdf gotta use it necessarily. That is very low level and looks scary ahahah. But something that works for 80% of the needs at a fraction of the complexity of the hdf5 api would be great!! |
Yeah! I'm confident vtkhdf on different files can be done this way because I've done so in python and I see no technical barrier in reproducing that, since in that case it is just a matter of writing your data with the extra connectivity and types, etc. But the current one-file version is indeed out of the scope. For the moment I think I will just put this in the current hdf5 file and limit to use these routines for writing probes. Then we see later after we converge with the other hdf5 files and see if it makes sense to unify. I don't like the idea of having multiple types doing very very similar things. Already many routines repeat. But one thing at a time 😊 |
|
I like the idea of simplifying things here. And I agree that we should probably unify most things at some point. But I think the way to do it is exactly what you have started here, created an interface for the most common operations we deal with, which can be shared between different file writers. I do not foresee a common writter for vtkhdf and probes, but they could interface with the same backend. |
src/io/hdf5_file_2.F90
Outdated
| call h5dwrite_f(dset_id, precision_hdf, mat%x, dcount, ierr, & | ||
| file_space_id = filespace, mem_space_id = memspace, & | ||
| xfer_prp = xf_id) |
There was a problem hiding this comment.
Careful, you are writting a matrix of type rp to a HDF5 object of based type this%precision If these do not match, you will get garbage. You must do type conversion if they do not match. Check the VTKHDF write_pointdata for an example of how to handle it. My idea is to build a subroutine to handle it at some point, at the moment it is still part of the writer.
There was a problem hiding this comment.
ohh, That's a good point. I will check that one
src/io/hdf5_file_2.F90
Outdated
| ! Create the corresponding memory space (buffer) for my local data | ||
| call h5screate_simple_f(dset_rank, dcount, memspace, ierr) | ||
| ! Write the data | ||
| call h5dwrite_f(dset_id, precision_hdf, field%x, dcount, ierr, & |
src/io/hdf5_file_2.F90
Outdated
|
|
||
| select case(precision) | ||
| case(sp) | ||
| H5T_NEKO_REAL = H5T_NATIVE_REAL |
There was a problem hiding this comment.
The built in HDF5 system is a bit safer.
| H5T_NEKO_REAL = H5T_NATIVE_REAL | |
| H5T_NEKO_REAL = h5kind_to_type(sp, H5_REAL_KIND) |
|
Excellent @timfelle! I will start putting these things in the main hdf5_file_t and focus on writing probes. And then we can use/modify things as we need them. |
I think it would be cool to have it like (with only 2 fields instead of 5 in your example): / Group
/probes Group
/probes/coordinates Dataset {200, 3}
/probes/t_0 Group
/probes/t_0/field_name1 Dataset {200} # <-- or {200, 1}, whatever is correct
/probes/t_0/field_name2 Dataset {200}
/probes/t_1 Group
/probes/t_1/field_name1 Dataset {200}
/probes/t_1/field_name2 Dataset {200}
# ...
/probes/time Dataset {Ntimesteps} # <- at the end of the simulation? Or can this be updated at runtime?And an attribute of Having the final time signal written at the end is also maybe redundant, you could build the array by casting Also, In your example, are t_0, t_1, etc the time? How do we feel about that? To me it feels weird to have floating point numbers as names of things but again it's personal preference.. I would suggest an index for example? Honestly don't know what is best. As long as the probes are in hdf5 I'm happy :D |
|
I think we should take a leaf out of VTKHDF's book in terms of temporal management. |
Hi,
@vbaconnet wanted to add hdf5 as an output for probes and while checking the code, we found that the current version seems a bit specific. So the idea came to try to restructure how hdf5 is managed.
In my opinion, one should have an hdf5 file manager that is very flexible and then there are users to it: Like vtkhdf and the current field files simply being normal hdf5 files with a specific structure.
So I have here a draft of how I would like things to be, but I would like to know your opinions and suggestion on how one can manage it nicely.
In this example:
The file written would look like:
/ Group /fluid Group /fluid/field Group /fluid/field/res Dataset {40} /testing Group /testing/mat Dataset {40, 3} /testing/u Dataset {486, 6, 6, 6}I believe that this would then allow to use it nicely in any routine that might need IO without the developer needing to fully get the parallel hdf5 API.
As the NEKO API grows, I can imagine that we will make most of our post processing in it, and having a good way to output files in a flexible way would be very nice.
So please let me know what types of features and arrays you think are absolutely needed to be able to use this sort of interface in the current parts of the code that use the format.
You can also just let me know if you think that this should be scrapped in favor of having more specific uses of hdf5
#==========================
To be specific, what I want is that vtkhdf and what lies in the current hdf5_file_t stop managing the HDF5 things themselves and move them to another type. Then they can just arrange the data in whichever way the want and write them using a new hdf_file manager.