gh-62259: Add support of multi-byte encodings in the XML parser (GH-149860)
Supported encodings: "cp932", "cp949", "cp950", "Big5","EUC-JP", "GB2312", "GBK", "johab", and "Shift_JIS". Partially supported encodings (only BMP characters): "Big5-HKSCS", "EUC_JIS-2004", "EUC_JISX0213", "Shift_JIS-2004", "Shift_JISX0213", "utf-8-sig" and non-standard aliases like "UTF8" (without hyphen). The parser now raises ValueError for known unsupported multi-byte encodings such us "ISO-2022-JP" or "raw-unicode-escape" instead of failing later, when encounter non-ASCII data.
S
Serhiy Storchaka committed
8ab7b43a14bed4780febbd7586a41cfe459aa6d5
Parent: a34edf7
Committed by GitHub <noreply@github.com>
on 5/26/2026, 7:40:25 PM