Nicer file location formatter#142
Conversation
|
Just as a note, the default format is unchanged, and I added this so it wouldnt break anything, it could of course also replace the original version, and any |
|
@MerlinSmiles cool, this looks like a good generalization of what I started with
Currently the way to alter global data storage settings is by changing the DataSet.location_provider = Whatever(...) # sets auto-generated locations
DataSet.default_io = DiskIO('c:/path/to/my/data') # sets base directory
DataSet.default_formatter = MorseCode(...) # sets file formatI can certainly see the appeal of collecting all config info in one place, and we could make a |
| data_manager=None, mode=DataMode.LOCAL, **kwargs): | ||
| def new_data(location=None, loc_fmt=None, name=None, | ||
| overwrite=False, io=None, data_manager=None, | ||
| mode=DataMode.LOCAL, loc_record={}, **kwargs): |
There was a problem hiding this comment.
FormatLocation can do everything TimestampLocation can do and more, so no reason to keep both.
Then with loc_record we don't need name, it can just be a key in the record. Also I don't think we need loc_fmt here, if you want to change that you just provide location=FormatLocation(fmt='...'), loc_record={...}
There was a problem hiding this comment.
From a user standpoint when writing .run(...) it makes it very unreadable if one has to write that whole line into the arguments, on the other hand, one could make different FormatLocation()s and pass them in the run, great :)
I'd like to see location take the loc_fmt string, and when it can be formatted do it, but it might break stuff if someone really wants a \{date\} string in there.
About the name, it is handy to do a .run(name=xx) easy and simple.
There was a problem hiding this comment.
From a user standpoint when writing
.run(...)it makes it very unreadable if one has to write that whole line into the arguments, on the other hand, one could make differentFormatLocation()s and pass them in the run, great :)
I imagine it would be fairly infrequent that someone needs to change the location_provider for just one measurement... mostly this will be set in experiment config and never changed, and if you want one specific run saved elsewhere you would provide an explicit location for it, rather than going through this formatting machinery. But yes, if for some reason there are a couple of location formats you're switching between (one for tuning data, another for final data, something like that?) then you're exactly right, just make a few FormatLocation objects and pass in the one you want.
I'd like to see location take the loc_fmt string, and when it can be formatted do it, but it might break stuff if someone really wants a
\{date\}string in there.
not sure what you mean by this, can you give an example?
About the name, it is handy to do a
.run(name=xx)easy and simple.
True, if people are using name a lot it would get cumbersome to use .run(loc_record={'name': '...'}) - so I'd be OK with keeping name as a special input to the location_provider. Then we should probably allow it to exist in loc_record too, and not necessarily get overridden by the provided name, by doing something like
if name is not None:
loc_record['name'] = nameI'd probably do this inside new_data, so that this does not need to happen separately in every location_provider we create, and it simplified the call signature to just location_provider(io, loc_record)
There was a problem hiding this comment.
I'd like to see location take the loc_fmt string, and when it can be formatted do it, but it might break stuff if someone really wants a {date} string in there.
not sure what you mean by this, can you give an example?
What I am thinking of is ..run(location='{counter}/{date}_{time} {name}') the only reason to not pass this location string to the formatter is when someone really wants a location including {}.
There was a problem hiding this comment.
the only reason to not pass this location string to the formatter is when someone really wants a location including
{}.
You have to double curly brackets to get them through .format as literals... '{}{{{}}}'.format(3,4) -> '3{4}'
I still feel like we'd be better off making people explicitly create a new FormatLocation. For example, I'm imagining one of the cases this would get used is if multiple people are working on the same experiment... what if the person you're overriding uses a different fmt_date, or even has a completely different class for their location_provider?
There was a problem hiding this comment.
Ok, will not add this then.
|
@alexcjohnson thanks for all your comments, im on holiday now, so wont fix this before next week. |
| with `fmt_date='%Y-%m-%d'` and `fmt_time='%H-%M-%S'` | ||
| ''' | ||
| def __init__(self, fmt='{date}/{time}', fmt_date='%Y-%m-%d', | ||
| fmt_time='%H-%M-%S', fmt_counter='{:03}', record={}): |
There was a problem hiding this comment.
It's dangerous to have mutable default args... better to do record=None here and below self.base_record = record or {}
otherwise someone can later do DataSet.location_provider.base_record.update(name='chickens') and that will be the default base record for any FormatLocation you make thereafter. It's OK in __call__ as loc_record doesn't get stored anywhere or modified during the call.
There was a problem hiding this comment.
Yes, already fixed that here after @giulioungaretti taught me this weekend :)
…ocation-formatter data_set is messy due to merge conflicts :(
|
Now I merged master into my local updated branch, this became a huge mess to understand the changes unfortunately :( sorry @alexcjohnson I added all the points from our discussion above. Could you take another look? |
| head, tail = base_location.split('{counter}', 1) | ||
|
|
||
| subdir, counter_prefix = os.path.split(head) | ||
| if os.path.isdir(subdir): |
There was a problem hiding this comment.
Oh interesting, I guess io.list should work this way already!
There was a problem hiding this comment.
yes, I thought so too, but something failed in the comparison after that. This was a bit confusing so I reverted to os
There was a problem hiding this comment.
OK, maybe I'll take a look at fixing it tonight. We're going to want to get rid of all the os calls, I think, but also we should write tests of all this, and I suspect this is going to be a little annoying to test especially since I haven't tested DiskIO yet.
|
@MerlinSmiles I think this is ready now, care to take a look? |
| class TimestampLocation: | ||
| class SafeFormatter(string.Formatter): | ||
| def get_value(self, key, args, kwargs): | ||
| '''Overrides string.Formatter.get_value''' |
There was a problem hiding this comment.
actually, that docstring isn't really saying anything... it should probably say why it's overriding the default.
|
@MerlinSmiles that seems reasonable, and a good way to go about it. Can you check on the notebook? It looks like it's still referencing |
|
@alexcjohnson how can I run a single test file? like the |
|
@MerlinSmiles I think this will do |
|
@giulioungaretti nice, going from 60 to 0.03 seconds :D |
|
@MerlinSmiles I just linted and changed a few docstrings. 💃 from me! |
|
@alexcjohnson the only issue with the docstrings like that is that they don't follow any of the conventions for docstrings. Which are not in any pep, but there's a few options to choose from. |
|
I mean like line here But idk if it's better to repeat or just stick to " ... plus: " . |
|
@giulioungaretti could you give a full example of what you mean? I cant really follow. |
|
@giulioungaretti alright, I guess there's no time like the present to get started on this transformation. I'm fine with reST, if that's the most common flavor then let's just go with it. One piece the community seems a bit divided on but I have a strong opinion about (and from your gist you seem to agree: for multiline docstrings, do NOT put text on the line with the opening quotes (or closing, but people don't seem to do that anyway). Yes: """
You don't need a weatherman
to know which way the wind blows.
"""No: """How many roads must a man walk down
before you call him a man?
"""FWIW I have a pep257 linter installed in sublime, doesn't say much about the content but it has many opinions about the form:
"""
My one line docstring.
"""
|
|
I strongly prefer that too , it embodies the classical sense of beauty in its symmetry. I usually use pylint just to get all the possible errors and things, and then I ignore when/where it make sense. Also just realised that my link was pointing to the wrong line. This was the one I'm not like super happy: Because it's not parseable by any documentation generator, I think it's better to repeat ourselves for the sake of clarity and maintainability. All the other points work for me :D |
|
@giulioungaretti how did I do with docstring style? |

Hey,
I added a file location formatter, wich resembles some ideas from the logging library.
This allows the user to do the following:
Usage:
And
What do you think about this?
I'd like to see this as a global thing in the station, but for what I can see, there is no such thing as a global definition for the location, as it is only set in the io-manager...