gh-143008: Fix Null pointer dereferences in TextIOWrapper underlying stream access#145957
gh-143008: Fix Null pointer dereferences in TextIOWrapper underlying stream access#145957cmaloney wants to merge 20 commits into
Conversation
TextIOWrapper keeps its underlying stream in a member called `self->buffer`. That stream can be detached by user code, such as custom `.flush` implementations resulting in `self->buffer` being set to NULL. The implementation often checked at the start of functions if `self->buffer` is in a good state, but did not always recheck after other Python code was called which could modify `self->buffer`. The cases which need to be re-checked are hard to spot so rather than rely on reviewer effort create better safety by making all self->buffer access go through helper functions. Thank you yihong0618 for the test, NEWS and initial implementation in pythongh-143041. Co-authored-by: yihong0618 <zouzou0208@gmail.com>
|
wow that is cool thanks |
|
@colesbury could you take a look as you've done some recent work on TextIO safety around threading. These cases are generally single threaded but generally the pattern "Implementation usually checked state; then did an operation which could modify it and didn't recheck". |
|
I'm a bit confused about the relationship here between |
|
Less important, but please match the CPython style for C code, e.g.:
|
The more complicated case is during deletion where there is an interaction with the base class. Adding to the complexity here all the objects involved and the multiple operations (
Sorry for missing those bits, doing a review of that and will update for it shortly. |
|
There are a couple cases I'm not sure how to PEP 7, in particular: /* hits 81 cols */
PyObject *bytes = _textiowrapper_buffer_callmethod_noargs(self,
&_Py_ID(read));
/* hits 81 cols */
res = _textiowrapper_buffer_callmethod_onearg(self,
&_Py_ID(_dealloc_warn),
(PyObject *)self);
PyObject *input_chunk = _textiowrapper_buffer_callmethod_onearg(self,
&_Py_ID(read), bytes_to_feed);A couple directions I looked at but don't know how to decide between:
|
|
I think it's fine to make the function names shorter, i.e., In general the preference is for wrapping function calls like the following (from PEP 7 example): PyErr_Format(PyExc_TypeError,
"cannot create '%.100s' instances",
type->tp_name);But if that's not practical due to overly long lines, than just try to fit the style of surround code. For example, from elsewhere in PyObject *incrementalDecoder = PyObject_CallFunctionObjArgs(
(PyObject *)state->PyIncrementalNewlineDecoder_Type,
self->decoder, self->readtranslate ? Py_True : Py_False, NULL); |
|
This should also resolve gh-143007 |
|
I think thish is ready for re-review @colesbury |
| } | ||
| return NULL; | ||
| } | ||
| return self->buffer; |
There was a problem hiding this comment.
I was worried that the function returns a borrowed reference. Another thread can call detach() converting the buffer variable to a dangling pointer. I hacked the code to introduce sleep+sched_yield() between getting the buffer and using the buffer, and I spawned 250 threads calling stream.write() and 1 thread calling stream.detach() with a random sleep.
In fact, it's (currently) safe to use a borrowed reference because all io.TextIOWrapper methods are protected by @critical_section. So it's not possible to call deatch() while another io.TextIOWrapper method is called: method calls are serialized by the critical section.
But it would be interesting to add a comment to explain why it's safe to use a borrowed reference. Something like:
// Returning a borrowed reference is safe since TextIOWrapper
// methods are protected by critical sections.
There was a problem hiding this comment.
Added a _Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED to validate + comment to document.
There was a problem hiding this comment.
and found out __init__ wasn't in a critical section in the process so re-initalization was racy... fixed as well
| leading to NULL pointer dereferences (see gh-143008, gh-142594). Protect against | ||
| that by using helpers to check self->buffer validity at callsites. */ | ||
| static PyObject * | ||
| buffer_access_safe(textio *self) |
There was a problem hiding this comment.
I suggest to rename buffer_xxx() functions to textiowrapper_buffer_xxx(), and rename this function to textiowrapper_buffer_get().
| /* During destruction self->ok is 0 but self->buffer is non-NULL and this | ||
| needs to error in that case which the safe buffer wrapper does not. | ||
| Match original behavior by calling CHECK_ATTACHED explicitly. */ |
There was a problem hiding this comment.
textiowrapper_dealloc() sets ok to 0 and almost immediately sets buffer to NULL. So I don't understand well when it's possible that ok is 0 and buffer is non-NULL.
This comment is unclear to me. I suggest rephrasing it to something like:
"Call CHECK_ATTACHED() to raise an exception if ..., when ... happens."
The "Match original behavior" part is unclear to me.
There was a problem hiding this comment.
Moved to checking self->ok directly and just returning true in that case which differs more in this bug fix but I think is a lot more understandable. Described a couple cases this is hit by the test suite.
Co-authored-by: Victor Stinner <vstinner@python.org>
Documentation build overview
104 files changed ·
|
TextIOWrapper keeps its underlying stream in a member called
self->buffer. That stream can be detached by user code, such as custom.flush()implementations, resulting inself->bufferbeing set to NULL. The implementation often checked at the start of functions ifself->bufferis in a good state but did not always recheck after other Python code which could modifyself->buffer.The cases which need to be re-checked are hard to spot so rather than rely on reviewer effort make a better safety net by changing all
self->bufferaccess to go through helper functions.Thank you @yihong0618 for the test, NEWS and an initial implementation in gh-143041.
TextIOWrapper.truncatevia re-entrantflush#143008