Bug report
make_pending_calls uses a mutex and the the handling_thread field to ensure that only one thread per-interpreter is handling pending calls at a time:
|
/* Only one thread (per interpreter) may run the pending calls |
|
at once. In the same way, we don't do recursive pending calls. */ |
|
PyMutex_Lock(&pending->mutex); |
|
if (pending->handling_thread != NULL) { |
|
/* A pending call was added after another thread was already |
|
handling the pending calls (and had already "unsignaled"). |
|
Once that thread is done, it may have taken care of all the |
|
pending calls, or there might be some still waiting. |
|
To avoid all threads constantly stopping on the eval breaker, |
|
we clear the bit for this thread and make sure it is set |
|
for the thread currently handling the pending call. */ |
|
_Py_set_eval_breaker_bit(pending->handling_thread, _PY_CALLS_TO_DO_BIT); |
|
_Py_unset_eval_breaker_bit(tstate, _PY_CALLS_TO_DO_BIT); |
|
PyMutex_Unlock(&pending->mutex); |
|
return 0; |
|
} |
|
pending->handling_thread = tstate; |
|
PyMutex_Unlock(&pending->mutex); |
However, the clearing of handling_thread is done outside of the mutex:
|
pending->handling_thread = NULL; |
|
return 0; |
There are two problems with this (for the free-threaded build):
- It's a data race because there's a read of
handling_thread (in the mutex) concurrently with a write (outside the mutex)
- The logic in that sets
_PY_CALLS_TO_DO_BIT on pending->handling_thread is subject to a time-of-check to time-of-use hazard: pending->handling_thread may be non-NULL when evaluating the if-statement, but then cleared before setting the eval breaker bit.
Relevant unit test
TSan catches this race when running test_subthreads_can_handle_pending_calls from test_capi.test_misc.
Suggested Fix
We should set pending->handling_thread = NULL; while holding pending->mutex, at least in the free-threaded build
Linked PRs
Bug report
make_pending_callsuses a mutex and the thehandling_threadfield to ensure that only one thread per-interpreter is handling pending calls at a time:cpython/Python/ceval_gil.c
Lines 911 to 928 in 41a91bd
However, the clearing of
handling_threadis done outside of the mutex:cpython/Python/ceval_gil.c
Lines 959 to 960 in 41a91bd
There are two problems with this (for the free-threaded build):
handling_thread(in the mutex) concurrently with a write (outside the mutex)_PY_CALLS_TO_DO_BITonpending->handling_threadis subject to a time-of-check to time-of-use hazard:pending->handling_threadmay be non-NULL when evaluating the if-statement, but then cleared before setting the eval breaker bit.Relevant unit test
TSan catches this race when running
test_subthreads_can_handle_pending_callsfromtest_capi.test_misc.Suggested Fix
We should set
pending->handling_thread = NULL;while holdingpending->mutex, at least in the free-threaded buildLinked PRs