bpo-40688: Use the correct parser in the peg_generator scripts by lysnikolaou · Pull Request #20235 · python/cpython

lysnikolaou · 2020-05-19T19:08:50Z

The scripts in Tools/peg_generator/scripts mostly assume that
ast.parse and compile use the old parser, since this was the
state of things, while we were developing them. They need to be
updated to always use the correct parser. _peg_parser is being
extended to support both parsing and compiling with both parsers.

Also deleted the parse_file function from the _peg_parser extension
module, since it wasn't up-to-date with all the things parse_string
supported.

https://bugs.python.org/issue40688

The scripts in `Tools/peg_generator/scripts` mostly assume that `ast.parse` and `compile` use the old parser, since this was the state of things, while we were developing them. They need to be updated to always use the correct parser. `_peg_parser` is being extended to support both parsing and compiling with both parsers.

gvanrossum

Hm, I don’t love that the return type varies based on a flag value.

lysnikolaou · 2020-05-19T23:13:06Z

Hm, I don’t love that the return type varies based on a flag value.

Would you prefer having parse_string and compile_string?

gvanrossum · 2020-05-19T23:17:28Z

Yes, that’s nice.

gvanrossum

I have a bad feeling about the logic in test_parse_directory.py -- it seems to be getting worse every time we think of something new... :-(

lysnikolaou · 2020-05-20T00:28:58Z

I have a bad feeling about the logic in test_parse_directory.py -- it seems to be getting worse every time we think of something new... :-(

That's definitely true. Would rewriting it to be specific to testing again the stdlib be a good idea in your opinion? And maybe some of the arguments can also be removed, since they're no longer really relevant and do not need to be tested.

lysnikolaou · 2020-05-20T01:20:36Z

Closed and reopened for the CI to re-run.

gvanrossum · 2020-05-21T00:31:01Z

 time_stdlib: time_stdlib_compile

-time_stdlib_compile: venv peg_extension/parse.c data/xxl.py
+time_stdlib_compile: venv data/xxl.py
 	$(VENVPYTHON) scripts/benchmark.py --parser=cpython --target=xxl compile

-time_stdlib_parse: venv peg_extension/parse.c data/xxl.py
+time_stdlib_parse: venv data/xxl.py
 	$(VENVPYTHON) scripts/benchmark.py --parser=cpython --target=xxl parse


I find the name stdlib in these targets horribly confusing. Maybe we can change this to old? Assuming these targets can be made to always use the old parser, and the default targets can be made to always use the new parser. (The perf results I reported last week may have been wrong due to this confusion.)

FWIW I'm not too keen on test_local and test_global below either. (I had somehow thought that time_stdlib ran what is actually called test_global.)

Renamed time_stdlib* to time_old*, test_local to time_peg_dir and test_global to time_stdlib. What do you think?

gvanrossum · 2020-05-21T00:33:13Z

-command_check = subcommands.add_parser(
-    "check", help="Benchmark parsing and throwing the tree away"
-)


Deleting this broke the make bench target.

I deleted the make bench target, since if we were to change it to something like $(VENVPYTHON) scripts/benchmark.py --parser=pegen --target=stdlib parse, it would be the same thing with time_stdlib (the new one).

…directory

lysnikolaou · 2020-05-24T18:26:57Z

I pushed fixes for bugs in test_parse_directory, which prevented it from timing parsing correctly. In my local machine, it now needs 24.544 seconds (mean from three trials), while the all-time minimum was around 5.5 seconds. Could you maybe run the timings as well? Should we be alarmed?

gvanrossum · 2020-05-24T19:37:11Z

Let me investigate later today.

pablogsal · 2020-05-24T20:17:47Z

I pushed fixes for bugs in test_parse_directory, which prevented it from timing parsing correctly. In my local machine, it now needs 24.544 seconds (mean from three trials), while the all-time minimum was around 5.5 seconds. Could you maybe run the timings as well? Should we be alarmed?

I did some quick experiment comparing commit c5fc156 with the current master (I copied the version of test_parse_directory.py in this PR to the root to run always the same code):

Commit `c5fc156`

$ ./python test_parse_directory.py --grammar-file Grammar/python.gram --tokens-file Grammar/Tokens -d Lib

Checked 1,689 files, 772,894 lines, 28,464,113 bytes in 4.559 seconds.
That's 169,532 lines/sec, or 6,243,503 bytes/sec.

Current master

$ ./python test_parse_directory.py --grammar-file Grammar/python.gram --tokens-file Grammar/Tokens -d Lib

Checked 1,707 files, 779,893 lines, 28,702,650 bytes in 4.837 seconds.
That's 161,223 lines/sec, or 5,933,523 bytes/sec.

So is a bit slower than it used to be but certainly not that slow as the readings you are getting. Are you by any change using a debug build of Python?

lysnikolaou · 2020-05-24T20:41:30Z

Are you by any change using a debug build of Python?

That must have been it. I'm not back down to around 6 seconds. I was quite sure I was using the correct Python, but it turns out I was using a debug build.

FWIW I think the minor slow-down is reasonable because of all the invalid_* rules and that's why I've been trying to add cut points to the rules, whose last alternative is invalid_*.

Sorry for the noise.

pablogsal · 2020-05-24T20:44:04Z

FWIW I think the minor slow-down is reasonable because of all the invalid_* rules and that's why I've been trying to add cut points to the rules, whose last alternative is invalid_*.

It may also be related to the more correct error handing we have been adding since them. Btw, last time I benchmarked the parser a couple of weeks ago I learned that significant time is spent on the linked list for inspecting the memorization cache (checking if a token is currently memorized).

gvanrossum · 2020-05-24T20:50:19Z

There's no reason to expect the parser to be slower due to this PR -- if it reports different numbers it must be due to a change in the test code.

I happen to have a debug build, and I do see significant slower performance reported with this PR than without it. Current master:

$ time make test_global
../../python.exe scripts/test_parse_directory.py \
		--grammar-file ../../Grammar/python.gram \
		--tokens-file ../../Grammar/Tokens \
		-d ../../Lib \
		--short \
		--exclude "*/test2to3/*" \
		--exclude "*/test2to3/**/*" \
		--exclude "*/bad*" \
		--exclude "*/lib2to3/tests/data/*"
../../Lib/test/test_modulefinder.py:281: DeprecationWarning: invalid escape sequence \u
  b"""\
Checked 1,687 files, 775,179 lines, 28,545,376 bytes in 12.791 seconds.
That's 60,604 lines/sec, or 2,231,688 bytes/sec.

real	0m17.172s
user	0m16.196s
sys	0m0.514s

With this PR, and current master merged back in:

$ time make time_stdlib
./venv/bin/python scripts/test_parse_directory.py \
		--grammar-file ../../Grammar/python.gram \
		--tokens-file ../../Grammar/Tokens \
		-d ../../Lib \
		--short \
		--exclude "*/test2to3/*" \
		--exclude "*/test2to3/**/*" \
		--exclude "*/bad*" \
		--exclude "*/lib2to3/tests/data/*"
../../Lib/test/test_modulefinder.py:281: DeprecationWarning: invalid escape sequence \u
  b"""\
Checked 1,687 files, 775,179 lines, 28,545,376 bytes in 23.936 seconds.
That's 32,386 lines/sec, or 1,192,577 bytes/sec.
Memory stats:
  rss         :         65 MiB
  vms         :       4251 MiB
  maxrss      :         74 MiB

real	0m26.537s
user	0m25.388s
sys	0m0.457s

I haven't looked in the code to try and explain the difference yet. I'll see what the difference is for a non-debug build next.

gvanrossum · 2020-05-24T21:17:26Z

Hm, without debug mode it's still a significant different with the old measuring tool.

Master:

$ time make test_global
../../python.exe scripts/test_parse_directory.py \
		--grammar-file ../../Grammar/python.gram \
		--tokens-file ../../Grammar/Tokens \
		-d ../../Lib \
		--short \
		--exclude "*/test2to3/*" \
		--exclude "*/test2to3/**/*" \
		--exclude "*/bad*" \
		--exclude "*/lib2to3/tests/data/*"
Checked 1,687 files, 775,179 lines, 28,545,376 bytes in 2.969 seconds.
That's 261,064 lines/sec, or 9,613,474 bytes/sec.

real	0m10.069s
user	0m9.451s
sys	0m0.430s

This PR, with master merged:

$ time make time_stdlib
./venv/bin/python scripts/test_parse_directory.py \
		--grammar-file ../../Grammar/python.gram \
		--tokens-file ../../Grammar/Tokens \
		-d ../../Lib \
		--short \
		--exclude "*/test2to3/*" \
		--exclude "*/test2to3/**/*" \
		--exclude "*/bad*" \
		--exclude "*/lib2to3/tests/data/*"
Checked 1,687 files, 775,179 lines, 28,545,376 bytes in 5.705 seconds.
That's 135,877 lines/sec, or 5,003,569 bytes/sec.
Memory stats:
  rss         :         75 MiB
  vms         :       4234 MiB
  maxrss      :         77 MiB

real	0m7.006s
user	0m6.089s
sys	0m0.343s

So nearly double.

Now I'll go read the source.

gvanrossum

Sigh.

gvanrossum · 2020-05-24T21:26:04Z

            short,
-            None,
-            0,
+            1,


Oh. This seems the problem: by passing mode=1 here, we will call parse_string() (on line 174) which returns an AST. This requires converting the C-level AST (which is allocated using an arena) to a Python-usable AST (built out of Python objects) and that's known to be much slower. Originally this passed mode=0, which meant "don't return anything". It seems we no longer have that option, which is fine, but then we should at least pass mode=2.

Again, we should refactor a bunch of stuff here so it's more readable.

Again, we should refactor a bunch of stuff here so it's more readable.

I've been working on this. I'll open a new PR for the refactorings, after we've merged this one, if you're okay with it.

… okay in parse_directory

gvanrossum

Okay, let 'er rip!

lysnikolaou · 2020-05-25T19:50:11Z

Can we merge this, so that I can merge it into the branch with the script refactorings?

miss-islington · 2020-05-25T19:52:01Z

Thanks @lysnikolaou for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

pablogsal · 2020-05-25T19:52:09Z

Can we merge this, so that I can merge it into the branch with the script refactorings?

Granted ✨

…nGH-20235) The scripts in `Tools/peg_generator/scripts` mostly assume that `ast.parse` and `compile` use the old parser, since this was the state of things, while we were developing them. They need to be updated to always use the correct parser. `_peg_parser` is being extended to support both parsing and compiling with both parsers. (cherry picked from commit 9645930) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>

bedevere-bot · 2020-05-25T19:52:14Z

GH-20396 is a backport of this pull request to the 3.9 branch.

) The scripts in `Tools/peg_generator/scripts` mostly assume that `ast.parse` and `compile` use the old parser, since this was the state of things, while we were developing them. They need to be updated to always use the correct parser. `_peg_parser` is being extended to support both parsing and compiling with both parsers. (cherry picked from commit 9645930) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>

lysnikolaou requested a review from pablogsal as a code owner May 19, 2020 19:08

the-knights-who-say-ni added the CLA signed label May 19, 2020

bedevere-bot added the awaiting review label May 19, 2020

lysnikolaou requested a review from gvanrossum May 19, 2020 19:09

lysnikolaou added the skip news label May 19, 2020

Remove unneeded extension variable

e4e960b

gvanrossum reviewed May 19, 2020

View reviewed changes

Remove bytecode kwarg and create a new function compile_string

437d22e

lysnikolaou added the needs backport to 3.9 label May 19, 2020

gvanrossum reviewed May 20, 2020

View reviewed changes

lysnikolaou added 2 commits May 20, 2020 03:16

Remove smelly symbols

2253bea

Address feedback

5178f1c

lysnikolaou closed this May 20, 2020

lysnikolaou reopened this May 20, 2020

gvanrossum reviewed May 21, 2020

View reviewed changes

Address second round of feedback

0ef0643

lysnikolaou requested a review from gvanrossum May 23, 2020 10:19

Accept filename argument in _peg_parser.* and fix bugs in test_parse_…

7281c3c

…directory

gvanrossum reviewed May 24, 2020

View reviewed changes

Address feedback; compute mode arg in the caller and assert that it's…

865a239

… okay in parse_directory

gvanrossum approved these changes May 24, 2020

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting review labels May 24, 2020

pablogsal merged commit 9645930 into python:master May 25, 2020

bedevere-bot removed the awaiting merge label May 25, 2020

bedevere-bot removed the needs backport to 3.9 label May 25, 2020

lysnikolaou deleted the fix-peg-scripts branch May 25, 2020 19:54

                           short,
-                          None,
-,
+,

Uh oh!

Uh oh!

Conversation

lysnikolaou commented May 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

lysnikolaou commented May 19, 2020

Uh oh!

gvanrossum commented May 19, 2020

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lysnikolaou commented May 20, 2020

Uh oh!

lysnikolaou commented May 20, 2020

Uh oh!

gvanrossum May 21, 2020

Choose a reason for hiding this comment

Uh oh!

lysnikolaou May 21, 2020

Choose a reason for hiding this comment

Uh oh!

gvanrossum May 21, 2020

Choose a reason for hiding this comment

Uh oh!

lysnikolaou May 21, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lysnikolaou commented May 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented May 24, 2020

Uh oh!

pablogsal commented May 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Commit c5fc156

Current master

Uh oh!

lysnikolaou commented May 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal commented May 24, 2020

Uh oh!

gvanrossum commented May 24, 2020

Uh oh!

gvanrossum commented May 24, 2020

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

gvanrossum May 24, 2020

Choose a reason for hiding this comment

Uh oh!

lysnikolaou May 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

lysnikolaou commented May 25, 2020

Uh oh!

miss-islington commented May 25, 2020

Uh oh!

pablogsal commented May 25, 2020

Uh oh!

lysnikolaou commented May 19, 2020 •

edited

Loading

lysnikolaou commented May 24, 2020 •

edited

Loading

pablogsal commented May 24, 2020 •

edited

Loading

Commit `c5fc156`

lysnikolaou commented May 24, 2020 •

edited

Loading

lysnikolaou May 24, 2020 •

edited

Loading