HDF5 data formatter #179
Conversation
…nto data_structures
Added some formatting to the __repr__ of the dataset Added force_write option (untested for GNUplot format)
Working test for writing a simple 1D file Got a failing test for the read/loading of data
Loading from file (basic ) Saving of array metadata
Todo incremental write (does not work yet with a loop) Also set_points reference in dataset is not saved correctly
(not robust yet)
|
@AdriaanRol fantastic, I will take a look at this tomorrow! 💯 🌟 😍 for making tests that don't work yet! |
|
@AdriaanRol great to see progress on this!
Could you explain this? The setpoint array behaves like the measurement arrays, no?
Could you explain this too? What is the issue?
Units not in units?
did you see #142 ?
There is a wildcard version of |
Yes it does, however the problem is that the setpoint requires the entire array upon initialization, the consequence of this is that the order in which I extract the data matters. Additionally (and I don't know how this works) if the setpoint has it's own setpoint array, as I can imagine in the future for some form of nested loop this will also require a reference to this. You are right, just storing the id is easy and something I will probably add, however just hacking it in there if I don't know yet how to extract it seems a bit pointless to me.
Just a typo, I meant units are not included in the dataset even though they are part of the parameter, I have included them in the formatter so that it should be forwards compatible.
I guess this is a grammer thing, if you have a parameter, say a thermometer you ask what "units" it has, not what "unit" it has. It is confusing though as you would expect the singular form just like label, name etc. I think "unit" would just look plain weird.
I did, however I find that To achieve this I need the following code (lines 79 - 82 in hdf5_format.py) location = data_set.location
self.filepath = io_manager.join(
io_manager.base_location,
data_set.location_provider(io_manager)+'/'+location+'.hdf5')This seems a bit convoluted to me (but most importantly is something that should happen in the location provider). What I would like is to have a formatter that provides the following for me The io.list or wildcard function seems to be what I am looking for, I should replace this in the test but I'll wait for #142 to be merged first. |
|
Interesting with the setpoint arrays, now this also confuses me.
The unit is part of the
But we use
Maybe this is easier? location = data_set.location
filename = io_manager.join(location, location.split('/')[-1] + '.hdf5')But it could also be useful to extend the
The |
|
@alexcjohnson do you have test cases/test datasets for more complex datasets?
I think the confusion comes from the fact that I have one column per name and am combining multiple parameters in one datagroup, the list of names then contains each name for every column (being either a parameter or multiple columns for a composite parameter (containing multiple parameters). |
|
Based on the discussions today (Friday QCodes meeting) the (currently functional) hdf5 formatter needs to be rewritten. The formatter will follow the following three rules
Additionally (not in this PR) direct access to the dataset from the main process will be required. |
|
I guess this is v 0.1 right ? |
started as a style cleanup, can you find the bug it fixes?
Another fix due to switching to a nicer pattern...
The same symbol (s) had three meanings on a single line!
looks for a "close_file" method in the formatter, so this should be extensible to other formatters we may write.
| self.write() | ||
|
|
||
| if hasattr(self.formatter, 'close_file'): | ||
| self.formatter.close_file(self) |
There was a problem hiding this comment.
@AdriaanRol re: closing the file at the end of a loop: the loop already calls DataSet.finalize() at the end, which seems like a pretty sensible place to close the file, before we go to the trouble of implementing a context manager. Seem reasonable?
There was a problem hiding this comment.
I agree, however do you think it reasonable to create an issue for this once this is merged? It is mostly important when interrupting loops (intended or due to exception)
There was a problem hiding this comment.
I believe that is already taken care of, as finalize gets called in a try / finally block. Which is exactly how a context manager works under the hood, though I agree that it would be cleaner to implement it that way instead.
|
I haven't looked through the tests yet, but I fixed a few bugs in untested parts of the code. Currently I get this for coverage on the new formatter: Which is a good start! But we will do ourselves a big favor if we can increase it. |
| if data_name not in data_set._h5_base_group.keys(): | ||
| arr_group = data_set._h5_base_group.create_group(data_name) | ||
| else: | ||
| arr_group = data_set._h5_base_group[data_name] |
There was a problem hiding this comment.
@AdriaanRol this is the only actual code change I had to make to get rid of all the other attributes stashed on the formatter. OK?
There was a problem hiding this comment.
yes, looks good :)
|
@alexcjohnson , updated todolist in comment above to include increasing coverage. Thanks for the changes. p.s. feel free to give it a go as I will not be able to get to it on a short (days) timescale and I think this should be merged sooner rather than later (the merge frequency has gone down quite a bit recently). |
|
@AdriaanRol @alexcjohnson Not my call, but I would be happy with merging as well. |
Removed a function that was not used (from older version)
|
All tests pass (locally), 100% coverage in hdf5_formatter.py as far as I can tell people are happy with this formatter. |
| # general unsupported type in dict | ||
| # reading metadata for dataset that does not have a metadata attribute | ||
| # unrecognized list type when reading in dict | ||
| # boolean string that is not True or False raised Value Error |
There was a problem hiding this comment.
No they are not, removing now
|
@AdriaanRol thanks for completing the HDF5 tests. I added tests for |
|
@AdriaanRol I meant to point out earlier this change I made - I've seen this various places , and not just from you but other contributors as well. I suppose you get the type This also applies to using qcodes, even from within drivers and tests (although I notice I haven't been consistent about that myself... but it would actually make the tests better if they used the top-level names when possible!). The objects people are generally expected to use directly are aliased in That said, for linkages within the core of the package itself it's better to point to the item you want directly, as that can avoid circular imports. But it's only within the core that this applies, and the core is all tested as a unit so we'll catch it if a reorganization breaks anything. |
|
...and that last comment reminds me, we should include |
|
Wait a sec, this merge never passed travis right ? |
|
@giulioungaretti , All tests that were part of the formatter passed on Travis and all tests passed locally. I think the final failure (most of the Travis runs did pass) has to do with the stochastic bugs on travis. Just so you can sleep well, there is no untested code in the master. |
|
@AdriaanRol, cool! I did not check the code yet, but there's a new error with the dataset. May have not come from this either but just checking if it was something you knew :D I will look deeper into it soon! |


First version of the HDF5 data formatter.
@alexcjohnson , @MerlinSmiles @giulioungaretti
Data saving and loading works, including incremenatal writes.
I have added tests for the HDF5 format which still fail (as it is not done yet).
As a sidenote, I am not 100% satisfied yet but I think it is a good start.
I have added a notebook that shows examples and runs the test suite for just the hdf5 format.
If the above issues have been resolved I think this can be merged, let me know what else should be on that list.

A screenshot of how the data is saved (using the 1D example)
While making this I ran into the following other issues, I think addressing these is out of the scope of this issue but definately part of #62
And there is a bunch more which I cannot think of now.