Error using html5lib with multi_valued_attributes=None
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I get the error "'NoneType' object is not subscriptable" when I try to parse the document
<html xmlns="http://
with html5lib with multi_valued_
I suspect this is a bug in BeautifulSoup. For now, I've been able to work around the issue by passing multi_valued_
Full example in REPL:
>>> from bs4 import BeautifulSoup
>>> test = '<html xmlns="http://
>>> BeautifulSoup(test, 'html.parser', multi_valued_
<html xmlns="http://
>>> BeautifulSoup(test, 'html5lib')
<html xmlns="http://
>>> BeautifulSoup(test, 'html5lib', multi_valued_
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/
self._feed()
File "/Users/
self.
File "/Users/
doc = parser.
File "/Users/
self.
File "/Users/
self.mainLoop()
File "/Users/
new_token = phase.processSt
File "/Users/
return func(token)
File "/Users/
return self.parser.
File "/Users/
return func(token)
File "/Users/
self.
File "/Users/
if (name in list_attr['*']
TypeError: 'NoneType' object is not subscriptable
>>> BeautifulSoup(test, 'html5lib', multi_valued_
<html xmlns="http://
>>> from bs4.diagnose import diagnose
>>> diagnose(test)
Diagnostic running on Beautiful Soup 4.9.3
Python version 3.9.6 (default, Jun 29 2021, 05:25:02)
[Clang 12.0.5 (clang-
I noticed that lxml is not installed. Installing it may help.
Found html5lib version 1.1
Trying to parse your markup with html.parser
Here's what html.parser did with the markup:
<html xmlns="http://
</html>
-------
Trying to parse your markup with html5lib
Here's what html5lib did with the markup:
<html xmlns="http://
<head>
</head>
<body>
</body>
</html>
-------
Changed in beautifulsoup: | |
status: | Fix Committed → Confirmed |
Changed in beautifulsoup: | |
status: | Fix Committed → Fix Released |
Fixed in revision 617.