Skip to content

bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders.#14304

Merged
serhiy-storchaka merged 2 commits into
python:masterfrom
serhiy-storchaka:utf8-utf16-incremental-decoder
Jun 25, 2019
Merged

bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders.#14304
serhiy-storchaka merged 2 commits into
python:masterfrom
serhiy-storchaka:utf8-utf16-incremental-decoder

Conversation

@serhiy-storchaka

@serhiy-storchaka serhiy-storchaka commented Jun 22, 2019

Copy link
Copy Markdown
Member
  • The UTF-8 incremental decoders fails now fast if encounter
    a sequence that can't be handled by the error handler.
  • The UTF-16 incremental decoders with the surrogatepass error
    handler decodes now a lone low surrogate with final=False.

https://bugs.python.org/issue24214

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
@tirkarthi

Copy link
Copy Markdown
Member

Is there a test case similar to the one in wsproto project's test present in test_codecs.py to check for UnicodeDecodeError ? I could see the below test raising UnicodeDecodeError like older behavior with the PR where as it returns 'f' on master.

from codecs import getincrementaldecoder
decoder = getincrementaldecoder("utf-8")()
print(decoder.decode(b'f\xf1\xf6rd', False))

@serhiy-storchaka

Copy link
Copy Markdown
Member Author

I was not sure that we should guarantee this behavior. But new tests helped to make the fix more limited.

@miss-islington

Copy link
Copy Markdown
Contributor

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒⛏🤖

@serhiy-storchaka serhiy-storchaka deleted the utf8-utf16-incremental-decoder branch June 25, 2019 08:54
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 25, 2019
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-bot

Copy link
Copy Markdown

GH-14368 is a backport of this pull request to the 3.8 branch.

@bedevere-bot

Copy link
Copy Markdown

GH-14369 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 25, 2019
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington added a commit that referenced this pull request Jun 25, 2019
* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
vstinner pushed a commit that referenced this pull request Jun 25, 2019
…-14304) (GH-14369)

* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ned-deily pushed a commit to ned-deily/cpython that referenced this pull request Jul 2, 2019
…thonGH-14304) (pythonGH-14369)

* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (pythonGH-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants