[3.10] gh-135661: Fix parsing start and end tags in HTMLParser according to the HTML5 standard (GH-135930) (GH-136268) (#136292)
* Whitespaces no longer accepted between `</` and the tag name. E.g. `</ script>` does not end the script section. * Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are `\t\n\r\f `. * Null character (U+0000) no longer ends the tag name. * Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first `>` in quoted attribute value. E.g. `</script/foo=">"/>`. * Multiple slashes and whitespaces between the last attribute and closing `>` are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`. * Multiple `=` between attribute name and value are no longer collapsed. E.g. `<a foo==bar>` produces attribute "foo" with value "=bar". * Whitespaces between the `=` separator and attribute name or value are no longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and "=bar", both with value None; `<a foo= bar>` produces two attributes: "foo" with value "" and "bar" with value None. * Fix data loss after unclosed script or style tag (gh-86155). Also backport test.support.subTests() (gh-135120). --------- (cherry picked from commit 0243f97cbadec8d985e63b1daec5d1cbc850cae3) (cherry picked from commit c555f889c3558a0a8cd8d8ecc2b493014b88a700) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com> Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
M
Miss Islington (bot) committed
151e0f00f74df26f0b40fb0ef9a53f353c1d1ab2
Parent: 85766db
Committed by GitHub <noreply@github.com>
on 7/12/2025, 12:26:58 PM