Skip to content

TextDecoder behavior is wrong since 20.18.3/22.13.0 #60888

@ChALkeR

Description

@ChALkeR

Testcase:

Array.from({length:256},(x,i)=>[i,new TextDecoder('windows-1252').decode(Uint8Array.of(i)).codePointAt(0)]).filter(([a,b])=>a!==b).map(x=>x.join())

Node.js treats windows-1252 as a subset of Unicode (code above shows zero difference), which is not correct

E.g. Node.js:

> new TextDecoder('windows-1252').decode(Uint8Array.of(128)).codePointAt(0)
128
> new TextDecoder('windows-1252').decode(Uint8Array.of(130)).codePointAt(0)
130
> new TextDecoder('windows-1252').decode(Uint8Array.of(131)).codePointAt(0)
131
> new TextDecoder('windows-1252').decode(Uint8Array.of(159)).codePointAt(0)
159

Browsers (expected):

> new TextDecoder('windows-1252').decode(Uint8Array.of(128)).codePointAt(0)
8364
> new TextDecoder('windows-1252').decode(Uint8Array.of(130)).codePointAt(0)
8218
> new TextDecoder('windows-1252').decode(Uint8Array.of(131)).codePointAt(0)
402
> new TextDecoder('windows-1252').decode(Uint8Array.of(159)).codePointAt(0)
376

This also directly contradicts the doc (which is aware that windows-1252 and Latin1 are different):

node/doc/api/buffer.md

Lines 229 to 234 in 7643c2a

Modern Web browsers follow the [WHATWG Encoding Standard][] which aliases
both `'latin1'` and `'ISO-8859-1'` to `'win-1252'`. This means that while doing
something like `http.get()`, if the returned charset is one of those listed in
the WHATWG specification it is possible that the server actually returned
`'win-1252'`-encoded data, and using `'latin1'` encoding may incorrectly decode
the characters.

It's also a regression since v20.18.3 and v22.13.0
Node.js <=20.18.2 behaves correctly, v22 <=22.12.0 also behaves correctly

This regressed in 20.x and 22.x this year, after they were labeled as LTS
20.x regressed during Maintenance.

Whatever caused this in 20/22 should be reverted

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions