Beautiful Soup

Support case-insensitve DOCTYPE with htmlparser

Bug #1848401 reported by Jibben Nee on 2019-10-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Beautiful Soup	Fix Released	Undecided	Unassigned

Bug Description

Current bs4/builder/_htmlparser.py BeautifulSoupHTMLParser.handle_decl (lines 188-196) matches for "DOCTYPE " or "DOCTYPE" only. Many sites have doctype in lowercase, resulting in <!DOCTYPE doctype html>. Should check data.lower() against "doctype" like html.parser does.

Or simply, the existing first case can actually apply in all situations, since it's a given (being called by html.parser's parse_html_declaration) the first word is doctype. Simply, `data = data[len("DOCTYPE "):]` will always work. No if/elif required.

html.parser for reference: https://github.com/python/cpython/blob/3.8/Lib/html/parser.py#L265

See original description

Jibben Nee (ziddey) on 2019-10-17

description:

updated

Jibben Nee (ziddey) on 2019-10-17

description:	updated
description:	updated

Jibben Nee (ziddey) on 2019-10-17

description:	updated
description:	updated
description:	updated
description:	updated

Revision history for this message

Leonard Richardson (leonardr) wrote on 2019-11-11:

Fixed in revision 538.

Changed in beautifulsoup:
status:	New → Fix Committed

Leonard Richardson (leonardr) on 2019-12-29

Changed in beautifulsoup:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.