Skip to content

bpo-45021: Fix a hang in forked children#28007

Merged
gpshead merged 4 commits into
python:mainfrom
0x0L:patch-1
Sep 20, 2021
Merged

bpo-45021: Fix a hang in forked children#28007
gpshead merged 4 commits into
python:mainfrom
0x0L:patch-1

Conversation

@0x0L

@0x0L 0x0L commented Aug 27, 2021

Copy link
Copy Markdown
Contributor

_global_shutdown_lock should be reinitialized in forked children

https://bugs.python.org/issue45021

_global_shutdown_lock should be reinitialized in forked children
@the-knights-who-say-ni

Copy link
Copy Markdown

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@0x0L

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@pitrou

pitrou commented Aug 31, 2021

Copy link
Copy Markdown
Member

It would be nice to add a simple test for this.

@pitrou pitrou requested a review from gpshead August 31, 2021 12:24
@0x0L

0x0L commented Aug 31, 2021

Copy link
Copy Markdown
Contributor Author

I'm not too sure how to test for a freeze: should I just put a minimal reproducer somewhere and let it freeze if run on unpatched version ?

@pitrou

pitrou commented Aug 31, 2021

Copy link
Copy Markdown
Member

Yes, that would work.

Comment thread Lib/test/test_concurrent_futures.py Outdated
Comment thread Lib/test/test_concurrent_futures.py Outdated
Comment thread Lib/concurrent/futures/thread.py Outdated
@0x0L 0x0L requested a review from vstinner September 13, 2021 17:24
@ambv ambv added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 15, 2021
@bedevere-bot

Copy link
Copy Markdown

🤖 New build scheduled with the buildbot fleet by @ambv for commit 02891f9 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 15, 2021
@0x0L

0x0L commented Sep 18, 2021

Copy link
Copy Markdown
Contributor Author

Build failures look unrelated to this patch

@vstinner

Copy link
Copy Markdown
Member

buildbot/AMD64 Arch Linux Asan Debug PR

Unrelated and already fixed by: python/buildmaster-config@576b13d

buildbot/x86-64 macOS PR

Unrelated and known issue: https://bugs.python.org/issue43592

@vstinner vstinner left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @pitrou: Would you mind to double check the fix?

@gpshead gpshead added needs backport to 3.10 only security fixes type-bug An unexpected behavior, bug, or error needs backport to 3.9 labels Sep 20, 2021
@gpshead

gpshead commented Sep 20, 2021

Copy link
Copy Markdown
Member

because this test is mixing fork and threads which is a posix-nono it will probably run into hanging problems of its own on some platforms or build setups. If/When that happens, we can conditionally enable it only for a a known system and build configs it happens to not hang on. merging anyways, we'll untangle that when it happens.

@gpshead gpshead merged commit 0bfa110 into python:main Sep 20, 2021
@miss-islington

Copy link
Copy Markdown
Contributor

Thanks @0x0L for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10.
🐍🍒⛏🤖

@bedevere-bot

Copy link
Copy Markdown

GH-28480 is a backport of this pull request to the 3.10 branch.

@bedevere-bot

Copy link
Copy Markdown

GH-28481 is a backport of this pull request to the 3.9 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)

Co-authored-by: nullptr <3621629+0x0L@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)

Co-authored-by: nullptr <3621629+0x0L@users.noreply.github.com>
@vstinner

Copy link
Copy Markdown
Member

I'm happy to see my Lock._at_fork_reinit() method being used :-) https://bugs.python.org/issue40089

@vstinner

Copy link
Copy Markdown
Member

Maybe at some point, we should consider to make the method public?

miss-islington added a commit that referenced this pull request Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)


Co-authored-by: nullptr <3621629+0x0L@users.noreply.github.com>

Automerge-Triggered-By: GH:gpshead
miss-islington added a commit that referenced this pull request Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)


Co-authored-by: nullptr <3621629+0x0L@users.noreply.github.com>

Automerge-Triggered-By: GH:gpshead
niyas-sait pushed a commit to niyas-sait/cpython that referenced this pull request Sep 21, 2021
_global_shutdown_lock should be reinitialized in forked children
@0x0L

0x0L commented Sep 28, 2021

Copy link
Copy Markdown
Contributor Author

@vstinner @gpshead
It looks like there's an issue with this patch. Running the following in on an unpatched python

import os
from concurrent.futures import ProcessPoolExecutor
from concurrent.futures.thread import _global_shutdown_lock

if hasattr(os, 'register_at_fork'):
    os.register_at_fork(before=_global_shutdown_lock.acquire,
                        after_in_child=_global_shutdown_lock._at_fork_reinit,
                        ​after_in_parent=_global_shutdown_lock.release)

with ProcessPoolExecutor() as pool:
    r = list(pool.map(str.upper, ['a']))

fails in some env.... Oddly I have two conda env with the exact same hash for the python package and this example hangs up in one env and not the other.

Calling os.register_at_fork only with the after_in_child looks ok and actually seems more appropriate.
0x0L@dc2d0c9 passes the test

Any thoughts ? Should I open a new issue or can we reopen this one ?

@gpshead

gpshead commented Sep 28, 2021

Copy link
Copy Markdown
Member

I cannot reproduce any hang with that snippet on such a python revision.

While not technically necessary for this issue bug "fix" (quotes because all use of threading+fork in one process is a bad idea), It still seems right to grab the lock across the fork, otherwise the forked child could be in a strange state w.r.t. the locks protected resources.

@0x0L

0x0L commented Sep 29, 2021

Copy link
Copy Markdown
Contributor Author

I also struggle to reproduce it. It only happens - consistently though - in a single one of my envs.
I'll try to find some time to attach a gdb and investigate a bit more.

@0x0L

0x0L commented Sep 30, 2021

Copy link
Copy Markdown
Contributor Author

:) Actually the python env where it didn't work was already patched by hand... Everything's look ok.
My apologies

@vstinner

Copy link
Copy Markdown
Member

Don't worry, it happens to everybody. Thanks for double checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants