bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders.#14304
Merged
serhiy-storchaka merged 2 commits intoJun 25, 2019
Merged
Conversation
* The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
Member
|
Is there a test case similar to the one in wsproto project's test present in test_codecs.py to check for from codecs import getincrementaldecoder
decoder = getincrementaldecoder("utf-8")()
print(decoder.decode(b'f\xf1\xf6rd', False)) |
Member
Author
|
I was not sure that we should guarantee this behavior. But new tests helped to make the fix more limited. |
methane
approved these changes
Jun 25, 2019
Contributor
|
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8. |
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this pull request
Jun 25, 2019
…-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
GH-14368 is a backport of this pull request to the 3.8 branch. |
|
GH-14369 is a backport of this pull request to the 3.7 branch. |
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this pull request
Jun 25, 2019
…-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington
added a commit
that referenced
this pull request
Jun 25, 2019
* The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
vstinner
pushed a commit
that referenced
this pull request
Jun 25, 2019
…-14304) (GH-14369) * bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ned-deily
pushed a commit
to ned-deily/cpython
that referenced
this pull request
Jul 2, 2019
…thonGH-14304) (pythonGH-14369) * bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (pythonGH-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
lisroach
pushed a commit
to lisroach/cpython
that referenced
this pull request
Sep 10, 2019
…-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
DinoV
pushed a commit
to DinoV/cpython
that referenced
this pull request
Jan 14, 2020
…-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
a sequence that can't be handled by the error handler.
handler decodes now a lone low surrogate with final=False.
https://bugs.python.org/issue24214