Skip to content

Nicer file location formatter#142

Merged
alexcjohnson merged 29 commits into
masterfrom
File-location-formatter
Jun 2, 2016
Merged

Nicer file location formatter#142
alexcjohnson merged 29 commits into
masterfrom
File-location-formatter

Conversation

@MerlinSmiles

Copy link
Copy Markdown
Contributor

Hey,
I added a file location formatter, wich resembles some ideas from the logging library.
This allows the user to do the following:

Usage:

loc_provider = FormatLocation(fmt='{date}/{time}_#{counter}_{name}_{label}')
loc = loc_provider(DiskIO('.'), record={'name': 'Rainbow', 'label': 'test'})
loc
> '2016-04-30/13-28-15_#001_Rainbow_test'

And

loop.run(loc_fmt='{date}/{time}_#{counter}_{test} {name}', loc_record={'test':'blah'}, name='xxx')
> DataSet: DataMode.PULL_FROM_SERVER, location='2016-05-03/23-52-29_#001_blah xxx'

What do you think about this?
I'd like to see this as a global thing in the station, but for what I can see, there is no such thing as a global definition for the location, as it is only set in the io-manager...

@MerlinSmiles

Copy link
Copy Markdown
Contributor Author

Just as a note, the default format is unchanged, and I added this so it wouldnt break anything, it could of course also replace the original version, and any location string could be interpreted by this formatting thing.

@alexcjohnson

Copy link
Copy Markdown
Contributor

@MerlinSmiles cool, this looks like a good generalization of what I started with TimestampLocation. I think we can simplify the usage a bit, will comment within the code.

I'd like to see this as a global thing in the station, but for what I can see, there is no such thing as a global definition for the location, as it is only set in the io-manager...

Currently the way to alter global data storage settings is by changing the DataSet class attributes:

DataSet.location_provider = Whatever(...)  # sets auto-generated locations
DataSet.default_io = DiskIO('c:/path/to/my/data')  # sets base directory
DataSet.default_formatter = MorseCode(...)  # sets file format

I can certainly see the appeal of collecting all config info in one place, and we could make a Station instance be that place; that requires making it more obvious which Station instance you're using at any given time, and I'm not quite sure yet whether we'll have use cases for more than one Station simultaneously and what impact that would have on this issue. We could also make a single global qcodes.config object that handles this and other things as they come up.

Comment thread qcodes/data/data_set.py Outdated
data_manager=None, mode=DataMode.LOCAL, **kwargs):
def new_data(location=None, loc_fmt=None, name=None,
overwrite=False, io=None, data_manager=None,
mode=DataMode.LOCAL, loc_record={}, **kwargs):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FormatLocation can do everything TimestampLocation can do and more, so no reason to keep both.

Then with loc_record we don't need name, it can just be a key in the record. Also I don't think we need loc_fmt here, if you want to change that you just provide location=FormatLocation(fmt='...'), loc_record={...}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user standpoint when writing .run(...) it makes it very unreadable if one has to write that whole line into the arguments, on the other hand, one could make different FormatLocation()s and pass them in the run, great :)
I'd like to see location take the loc_fmt string, and when it can be formatted do it, but it might break stuff if someone really wants a \{date\} string in there.

About the name, it is handy to do a .run(name=xx) easy and simple.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user standpoint when writing .run(...) it makes it very unreadable if one has to write that whole line into the arguments, on the other hand, one could make different FormatLocation()s and pass them in the run, great :)

I imagine it would be fairly infrequent that someone needs to change the location_provider for just one measurement... mostly this will be set in experiment config and never changed, and if you want one specific run saved elsewhere you would provide an explicit location for it, rather than going through this formatting machinery. But yes, if for some reason there are a couple of location formats you're switching between (one for tuning data, another for final data, something like that?) then you're exactly right, just make a few FormatLocation objects and pass in the one you want.

I'd like to see location take the loc_fmt string, and when it can be formatted do it, but it might break stuff if someone really wants a \{date\} string in there.

not sure what you mean by this, can you give an example?

About the name, it is handy to do a .run(name=xx) easy and simple.

True, if people are using name a lot it would get cumbersome to use .run(loc_record={'name': '...'}) - so I'd be OK with keeping name as a special input to the location_provider. Then we should probably allow it to exist in loc_record too, and not necessarily get overridden by the provided name, by doing something like

if name is not None:
    loc_record['name'] = name

I'd probably do this inside new_data, so that this does not need to happen separately in every location_provider we create, and it simplified the call signature to just location_provider(io, loc_record)

@MerlinSmiles MerlinSmiles May 10, 2016

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcjohnson

I'd like to see location take the loc_fmt string, and when it can be formatted do it, but it might break stuff if someone really wants a {date} string in there.

not sure what you mean by this, can you give an example?

What I am thinking of is ..run(location='{counter}/{date}_{time} {name}') the only reason to not pass this location string to the formatter is when someone really wants a location including {}.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only reason to not pass this location string to the formatter is when someone really wants a location including {}.

You have to double curly brackets to get them through .format as literals... '{}{{{}}}'.format(3,4) -> '3{4}'

I still feel like we'd be better off making people explicitly create a new FormatLocation. For example, I'm imagining one of the cases this would get used is if multiple people are working on the same experiment... what if the person you're overriding uses a different fmt_date, or even has a completely different class for their location_provider?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will not add this then.

@MerlinSmiles

Copy link
Copy Markdown
Contributor Author

@alexcjohnson thanks for all your comments, im on holiday now, so wont fix this before next week.

Comment thread qcodes/data/data_set.py Outdated
with `fmt_date='%Y-%m-%d'` and `fmt_time='%H-%M-%S'`
'''
def __init__(self, fmt='{date}/{time}', fmt_date='%Y-%m-%d',
fmt_time='%H-%M-%S', fmt_counter='{:03}', record={}):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dangerous to have mutable default args... better to do record=None here and below self.base_record = record or {}

otherwise someone can later do DataSet.location_provider.base_record.update(name='chickens') and that will be the default base record for any FormatLocation you make thereafter. It's OK in __call__ as loc_record doesn't get stored anywhere or modified during the call.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, already fixed that here after @giulioungaretti taught me this weekend :)

Triton1 added 2 commits May 10, 2016 14:39
…ocation-formatter

data_set is messy due to merge conflicts :(
@MerlinSmiles

Copy link
Copy Markdown
Contributor Author

Now I merged master into my local updated branch, this became a huge mess to understand the changes unfortunately :( sorry

@alexcjohnson I added all the points from our discussion above. Could you take another look?

Comment thread qcodes/data/data_set.py Outdated
head, tail = base_location.split('{counter}', 1)

subdir, counter_prefix = os.path.split(head)
if os.path.isdir(subdir):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I guess io.list should work this way already!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I thought so too, but something failed in the comparison after that. This was a bit confusing so I reverted to os

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe I'll take a look at fixing it tonight. We're going to want to get rid of all the os calls, I think, but also we should write tests of all this, and I suspect this is going to be a little annoying to test especially since I haven't tested DiskIO yet.

@alexcjohnson

Copy link
Copy Markdown
Contributor

@MerlinSmiles I think this is ready now, care to take a look?

Comment thread qcodes/data/data_set.py Outdated
class TimestampLocation:
class SafeFormatter(string.Formatter):
def get_value(self, key, args, kwargs):
'''Overrides string.Formatter.get_value'''

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

""" please 🐄

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, that docstring isn't really saying anything... it should probably say why it's overriding the default.

@alexcjohnson

Copy link
Copy Markdown
Contributor

@MerlinSmiles that seems reasonable, and a good way to go about it. Can you check on the notebook? It looks like it's still referencing SafeDict. Can you also make a few tests of SafeFormatter to show what it's supposed to do? I'll give you a start - in test_location_provider.py add something like:

class TestSafeFormatter(TestCase):
    def test_normal_formatting(self):
        formatter = SafeFormatter()
        self.assertEqual(formatter.format('{}{{{}}}', 1,2), '1{2}')
        self.assertEqual(formatter.format('{apples}{{{oranges}}}', apples='red', oranges='blue'), 'red{blue}')

    def test_missing(self):
        formatter = SafeFormatter()
        self.assertEqual(formatter.format('{}'), '{0}')
        self.assertEqual(formatter.format('{cheese}', fruit='pears'), '{cheese}')

@MerlinSmiles

Copy link
Copy Markdown
Contributor Author

@alexcjohnson how can I run a single test file? like the test_location_provider.py ?
The whole test takes about a minute 😮

@giulioungaretti

Copy link
Copy Markdown
Contributor

@MerlinSmiles I think this will do

python -m unittest module.class.function 

@MerlinSmiles

Copy link
Copy Markdown
Contributor Author

@giulioungaretti nice, going from 60 to 0.03 seconds :D

@alexcjohnson

Copy link
Copy Markdown
Contributor

@MerlinSmiles I just linted and changed a few docstrings. 💃 from me!

@giulioungaretti

Copy link
Copy Markdown
Contributor

@alexcjohnson the only issue with the docstrings like that is that they don't follow any of the conventions for docstrings. Which are not in any pep, but there's a few options to choose from.
see https://gist.github.com/giulioungaretti/7428aa55c36342a128dcc6879654181f

@giulioungaretti

Copy link
Copy Markdown
Contributor

I mean like line here

But idk if it's better to repeat or just stick to " ... plus: " .

@MerlinSmiles

Copy link
Copy Markdown
Contributor Author

@giulioungaretti could you give a full example of what you mean? I cant really follow.

@alexcjohnson

Copy link
Copy Markdown
Contributor

@giulioungaretti alright, I guess there's no time like the present to get started on this transformation. I'm fine with reST, if that's the most common flavor then let's just go with it.

One piece the community seems a bit divided on but I have a strong opinion about (and from your gist you seem to agree: for multiline docstrings, do NOT put text on the line with the opening quotes (or closing, but people don't seem to do that anyway). Yes:

"""
You don't need a weatherman
to know which way the wind blows.
"""

No:

"""How many roads must a man walk down
before you call him a man?
"""

FWIW I have a pep257 linter installed in sublime, doesn't say much about the content but it has many opinions about the form:

  • docstrings for every public module, function, class, and method. This can get a little redundant, like class docstring and __init__ docstring, or a module with only one class, but in general it seems good.
  • one-line docstrings go on one line """My one line docstring.""" as opposed to
"""
My one line docstring.
"""
  • There's always a period at the end of the first line, and if there are more lines, there's a blank after the first line (hence the one you pointed out)
  • Class docstrings have a blank line both before and after the whole string.

@giulioungaretti

Copy link
Copy Markdown
Contributor

I strongly prefer that too , it embodies the classical sense of beauty in its symmetry.

I usually use pylint just to get all the possible errors and things, and then I ignore when/where it make sense.

Also just realised that my link was pointing to the wrong line. This was the one I'm not like super happy:
" Arguments are the same as DataSet constructor, plus: ""

Because it's not parseable by any documentation generator, I think it's better to repeat ourselves for the sake of clarity and maintainability.

All the other points work for me :D

@alexcjohnson

Copy link
Copy Markdown
Contributor

@giulioungaretti how did I do with docstring style?

@giulioungaretti

Copy link
Copy Markdown
Contributor

💃 image

@alexcjohnson alexcjohnson merged commit 8b7250b into master Jun 2, 2016
@alexcjohnson alexcjohnson deleted the File-location-formatter branch June 2, 2016 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants