Error using html5lib with multi_valued_attributes=None

Bug #1948488 reported by JW
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

I get the error "'NoneType' object is not subscriptable" when I try to parse the document
    <html xmlns="http://www.w3.org/1999/xhtml"></html>
with html5lib with multi_valued_attributes=None. A full trace is below.

I suspect this is a bug in BeautifulSoup. For now, I've been able to work around the issue by passing multi_valued_attributes={'*': []}. However, the documentation explicitly mentions multi_valued_attributes=None.

Full example in REPL:

>>> from bs4 import BeautifulSoup
>>> test = '<html xmlns="http://www.w3.org/1999/xhtml"></html>'
>>> BeautifulSoup(test, 'html.parser', multi_valued_attributes=None)
<html xmlns="http://www.w3.org/1999/xhtml"></html>
>>> BeautifulSoup(test, 'html5lib')
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>
>>> BeautifulSoup(test, 'html5lib', multi_valued_attributes=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/bs4/__init__.py", line 348, in __init__
    self._feed()
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/bs4/__init__.py", line 434, in _feed
    self.builder.feed(self.markup)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/bs4/builder/_html5lib.py", line 87, in feed
    doc = parser.parse(markup, **extra_kwargs)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 284, in parse
    self._parse(stream, False, None, *args, **kwargs)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 133, in _parse
    self.mainLoop()
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 240, in mainLoop
    new_token = phase.processStartTag(new_token)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 469, in processStartTag
    return func(token)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 680, in startTagHtml
    return self.parser.phases["inBody"].processStartTag(token)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 469, in processStartTag
    return func(token)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 478, in startTagHtml
    self.tree.openElements[0].attributes[attr] = value
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/bs4/builder/_html5lib.py", line 246, in __setitem__
    if (name in list_attr['*']
TypeError: 'NoneType' object is not subscriptable
>>> BeautifulSoup(test, 'html5lib', multi_valued_attributes={'*': []})
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>
>>> from bs4.diagnose import diagnose
>>> diagnose(test)
Diagnostic running on Beautiful Soup 4.9.3
Python version 3.9.6 (default, Jun 29 2021, 05:25:02)
[Clang 12.0.5 (clang-1205.0.22.9)]
I noticed that lxml is not installed. Installing it may help.
Found html5lib version 1.1

Trying to parse your markup with html.parser
Here's what html.parser did with the markup:
<html xmlns="http://www.w3.org/1999/xhtml">
</html>
--------------------------------------------------------------------------------
Trying to parse your markup with html5lib
Here's what html5lib did with the markup:
<html xmlns="http://www.w3.org/1999/xhtml">
 <head>
 </head>
 <body>
 </body>
</html>
--------------------------------------------------------------------------------

Revision history for this message
Leonard Richardson (leonardr) wrote :

Fixed in revision 617.

Changed in beautifulsoup:
status: New → Fix Committed
Revision history for this message
JW (jw-00000) wrote :

Thanks for the quick fix!

Revision history for this message
Leonard Richardson (leonardr) wrote :

Fix released in version 4.11.0.

Revision history for this message
JW (jw-00000) wrote :

Sorry, it seems I still get this problem in the latest version. Only now the error message is

    TypeError: argument of type 'NoneType' is not iterable

---

Full trace:

Python 3.9.12 (main, Mar 26 2022, 15:51:15)
[Clang 13.1.6 (clang-1316.0.21.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> test = '<html xmlns="http://www.w3.org/1999/xhtml"></html>'
>>> BeautifulSoup(test, 'html5lib', multi_valued_attributes=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/bs4/__init__.py", line 333, in __init__
    self._feed()
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/bs4/__init__.py", line 451, in _feed
    self.builder.feed(self.markup)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/bs4/builder/_html5lib.py", line 93, in feed
    doc = parser.parse(markup, **extra_kwargs)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 284, in parse
    self._parse(stream, False, None, *args, **kwargs)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 133, in _parse
    self.mainLoop()
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 240, in mainLoop
    new_token = phase.processStartTag(new_token)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 469, in processStartTag
    return func(token)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 680, in startTagHtml
    return self.parser.phases["inBody"].processStartTag(token)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 469, in processStartTag
    return func(token)
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/html5lib/html5parser.py", line 478, in startTagHtml
    self.tree.openElements[0].attributes[attr] = value
  File "/Users/jw/.local/share/virtualenvs/project-y5mhfRg2/lib/python3.9/site-packages/bs4/builder/_html5lib.py", line 252, in __setitem__
    if (name in list_attr.get('*')
TypeError: argument of type 'NoneType' is not iterable
>>> from bs4.diagnose import diagnose
>>> diagnose(test)
Diagnostic running on Beautiful Soup 4.11.1
Python version 3.9.12 (main, Mar 26 2022, 15:51:15)
[Clang 13.1.6 (clang-1316.0.21.2)]
Found lxml version 4.6.3.0
Found html5lib version 1.1

Changed in beautifulsoup:
status: Fix Committed → Confirmed
Revision history for this message
Leonard Richardson (leonardr) wrote :

The issue was present three times and the test I wrote only found the first one. All three are fixed in revision 641.

Changed in beautifulsoup:
status: Confirmed → Fix Committed
Revision history for this message
Leonard Richardson (leonardr) wrote :

Fix released in version 4.11.2.

Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.