Skip to content

gh-143008: Fix Null pointer dereferences in TextIOWrapper underlying stream access#145957

Open
cmaloney wants to merge 20 commits into
python:mainfrom
cmaloney:textio_nullbuffer
Open

gh-143008: Fix Null pointer dereferences in TextIOWrapper underlying stream access#145957
cmaloney wants to merge 20 commits into
python:mainfrom
cmaloney:textio_nullbuffer

Conversation

@cmaloney
Copy link
Copy Markdown
Contributor

@cmaloney cmaloney commented Mar 15, 2026

TextIOWrapper keeps its underlying stream in a member called self->buffer. That stream can be detached by user code, such as custom .flush() implementations, resulting in self->buffer being set to NULL. The implementation often checked at the start of functions if self->buffer is in a good state but did not always recheck after other Python code which could modify self->buffer.

The cases which need to be re-checked are hard to spot so rather than rely on reviewer effort make a better safety net by changing all self->buffer access to go through helper functions.

Thank you @yihong0618 for the test, NEWS and an initial implementation in gh-143041.

TextIOWrapper keeps its underlying stream in a member called
`self->buffer`. That stream can be detached by user code, such as custom
`.flush` implementations resulting in `self->buffer` being set to NULL.
The implementation often checked at the start of functions if
`self->buffer` is in a good state, but did not always recheck after
other Python code was called which could modify `self->buffer`.

The cases which need to be re-checked are hard to spot so rather than
rely on reviewer effort create better safety by making all self->buffer
access go through helper functions.

Thank you yihong0618 for the test, NEWS and initial implementation in
pythongh-143041.

Co-authored-by: yihong0618 <zouzou0208@gmail.com>
@yihong0618
Copy link
Copy Markdown
Contributor

wow that is cool thanks

@cmaloney cmaloney added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Mar 15, 2026
@cmaloney cmaloney changed the title gh-143008: Safer TextIOWrapper underlying stream access gh-143008: Fix Null Pointer Dereferences in TextIOWrapper underlying stream access Mar 17, 2026
@cmaloney cmaloney changed the title gh-143008: Fix Null Pointer Dereferences in TextIOWrapper underlying stream access gh-143008: Fix Null pointer dereferences in TextIOWrapper underlying stream access Mar 17, 2026
Comment thread Lib/test/test_io/test_textio.py
@cmaloney
Copy link
Copy Markdown
Contributor Author

@colesbury could you take a look as you've done some recent work on TextIO safety around threading. These cases are generally single threaded but generally the pattern "Implementation usually checked state; then did an operation which could modify it and didn't recheck".

@colesbury
Copy link
Copy Markdown
Contributor

I'm a bit confused about the relationship here between _textiowrapper_buffer_safe and CHECK_ATTACHED. When can buffer be NULL and CHECK_ATTACHED not trigger?

@colesbury colesbury self-requested a review April 17, 2026 17:57
@colesbury
Copy link
Copy Markdown
Contributor

Less important, but please match the CPython style for C code, e.g.:

  • { on new lines for functions definitions
  • when wrapping function calls, indent continuation lines to align with the call site

@cmaloney
Copy link
Copy Markdown
Contributor Author

I'm a bit confused about the relationship here between _textiowrapper_buffer_safe and CHECK_ATTACHED. When can buffer be NULL and CHECK_ATTACHED not trigger?

CHECK_ATTACHED, by way of CHECK_INITIALIZED, checks self->ok first. The self->ok flag is set to 0 at start of __init__, 1 at the end, and cleared/reset to 0 during .clear() as well as at the start of the Py_tp_dealloc. During __init__ there are calls to check the file is in a good state and get some attributes (remember: self->buf is the underlying file-like object not an internal buffer).

The more complicated case is during deletion where there is an interaction with the base class. TextIOWrapper implements Py_tp_dealloc and its base class (TextIOBase which has a base class IOBase) implements a Py_tp_finalize with the function iobase_finalize. That function tries to ensure all data is written in the full I/O "stack" before any part of it is closed out of order. It tries to have Text I/O flush and close, then Buffered I/O (self->buf), then the Raw I/O. If there is an underlying file (self->buf != NULL) during those operations need to try and write data in the TextIOWrapper even though self->ok is 0.

Adding to the complexity here all the objects involved and the multiple operations (flush(), close(), .closed) can be overridden and could call TextIOWrapper.clear() or TextIOWrapper.detach() causing self->buf to become NULL.

Less important, but please match the CPython style for C code, e.g.:

Sorry for missing those bits, doing a review of that and will update for it shortly.

@cmaloney
Copy link
Copy Markdown
Contributor Author

There are a couple cases I'm not sure how to PEP 7, in particular:

/* hits 81 cols */
        PyObject *bytes = _textiowrapper_buffer_callmethod_noargs(self,
                                                                  &_Py_ID(read));
/* hits 81 cols */
            res = _textiowrapper_buffer_callmethod_onearg(self,
                                                          &_Py_ID(_dealloc_warn),
                                                          (PyObject *)self);


PyObject *input_chunk = _textiowrapper_buffer_callmethod_onearg(self,
            &_Py_ID(read), bytes_to_feed);

A couple directions I looked at but don't know how to decide between:

  1. Make the variable name shorter (increases diff)
  2. Make the function name shorter (ex. _textiowrapper_buf_... or _textio_buffer_, _textiowrapper_buf_callmeth_noargs)
  3. Whats the right indentation to just put on the next line?

@colesbury
Copy link
Copy Markdown
Contributor

I think it's fine to make the function names shorter, i.e., buffer_callmethod_onearg instead of _textiowrapper_buffer_callmethod_onearg. File-scope static functions don't need a prefix or underscore.

In general the preference is for wrapping function calls like the following (from PEP 7 example):

PyErr_Format(PyExc_TypeError,
             "cannot create '%.100s' instances",
             type->tp_name);

But if that's not practical due to overly long lines, than just try to fit the style of surround code. For example, from elsewhere in textio.c:

        PyObject *incrementalDecoder = PyObject_CallFunctionObjArgs(
            (PyObject *)state->PyIncrementalNewlineDecoder_Type,
            self->decoder, self->readtranslate ? Py_True : Py_False, NULL);

@cmaloney
Copy link
Copy Markdown
Contributor Author

cmaloney commented May 6, 2026

This should also resolve gh-143007

@cmaloney
Copy link
Copy Markdown
Contributor Author

I think thish is ready for re-review @colesbury

Comment thread Modules/_io/textio.c
}
return NULL;
}
return self->buffer;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried that the function returns a borrowed reference. Another thread can call detach() converting the buffer variable to a dangling pointer. I hacked the code to introduce sleep+sched_yield() between getting the buffer and using the buffer, and I spawned 250 threads calling stream.write() and 1 thread calling stream.detach() with a random sleep.

In fact, it's (currently) safe to use a borrowed reference because all io.TextIOWrapper methods are protected by @critical_section. So it's not possible to call deatch() while another io.TextIOWrapper method is called: method calls are serialized by the critical section.

But it would be interesting to add a comment to explain why it's safe to use a borrowed reference. Something like:

// Returning a borrowed reference is safe since TextIOWrapper
// methods are protected by critical sections.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a _Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED to validate + comment to document.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and found out __init__ wasn't in a critical section in the process so re-initalization was racy... fixed as well

Comment thread Modules/_io/textio.c
leading to NULL pointer dereferences (see gh-143008, gh-142594). Protect against
that by using helpers to check self->buffer validity at callsites. */
static PyObject *
buffer_access_safe(textio *self)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to rename buffer_xxx() functions to textiowrapper_buffer_xxx(), and rename this function to textiowrapper_buffer_get().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started like that (see: d7f14fc) but it made some of the PEP-7 formatting really bad so moved to a shorter name

Comment thread Modules/_io/textio.c
Comment thread Modules/_io/textio.c Outdated
Comment on lines +3324 to +3327
/* During destruction self->ok is 0 but self->buffer is non-NULL and this
needs to error in that case which the safe buffer wrapper does not.
Match original behavior by calling CHECK_ATTACHED explicitly. */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

textiowrapper_dealloc() sets ok to 0 and almost immediately sets buffer to NULL. So I don't understand well when it's possible that ok is 0 and buffer is non-NULL.

This comment is unclear to me. I suggest rephrasing it to something like:

"Call CHECK_ATTACHED() to raise an exception if ..., when ... happens."

The "Match original behavior" part is unclear to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to checking self->ok directly and just returning true in that case which differs more in this bug fix but I think is a lot more understandable. Described a couple cases this is hit by the test suite.

@serhiy-storchaka serhiy-storchaka added the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label May 30, 2026
Co-authored-by: Victor Stinner <vstinner@python.org>
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Jun 5, 2026

Documentation build overview

📚 cpython-previews | 🛠️ Build #33000961 | 📁 Comparing d7b146e against main (a00b24e)

  🔍 Preview build  

104 files changed · ± 103 modified · - 1 deleted

± Modified

- Deleted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes needs backport to 3.15 pre-release feature fixes, bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants