SIGN IN SIGN UP

gh-62259: Add support of multi-byte encodings in the XML parser (GH-149860)

Supported encodings: "cp932", "cp949", "cp950", "Big5","EUC-JP",
"GB2312", "GBK", "johab", and "Shift_JIS".

Partially supported encodings (only BMP characters): "Big5-HKSCS",
"EUC_JIS-2004", "EUC_JISX0213", "Shift_JIS-2004", "Shift_JISX0213",
"utf-8-sig" and non-standard aliases like "UTF8" (without hyphen).

The parser now raises ValueError for known unsupported
multi-byte encodings such us "ISO-2022-JP" or "raw-unicode-escape"
instead of failing later, when encounter non-ASCII data.
S
Serhiy Storchaka committed
8ab7b43a14bed4780febbd7586a41cfe459aa6d5
Parent: a34edf7
Committed by GitHub <noreply@github.com> on 5/26/2026, 7:40:25 PM