HTML causes 'TypeError: expected string or buffer' in _html5lib.AttrList
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
When I feed this document to BS4.4 a TypeError is raised, I expected to get data to work with. I substracted this from a larger document, but brought it down to this minimal example. If i change p to z, or remove p it works fine. Please note i'm running python 3.4.
Document:
<a class="
Example TestCase:
class TestCase(
def test_soup(self):
# should not raise an exception
data = '<a class="
Stacktrace:
Error
Traceback (most recent call last):
File "/Users/
BeautifulSo
File "/Users/
self._feed()
File "/Users/
self.
File "/Users/
doc = parser.
File "/Users/
parseMeta=
File "/Users/
self.mainLoop()
File "/Users/
new_token = phase.processEn
File "/Users/
return self.endTagHand
File "/Users/
clone = formattingEleme
File "/Users/
node.
File "/Users/
value = whitespace_
TypeError: expected string or buffer
Library versions on OSX:
beautifulsoup4=
html5lib==0.999999
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in beautifulsoup: | |
status: | Fix Committed → Fix Released |
A possible fix for this problem:
--- bs4/builder/ _html5lib. py 2015-06-28 19:39:36 +0000 _html5lib. py 2015-08-27 13:03:59 +0000
and name in list_attr[ self.element. name])) : re.split( value) re.split( value)
self. element[ name] = value attrs.items( ))
+++ bs4/builder/
@@ -120,7 +120,10 @@
if (name in list_attr['*']
or (self.element.name in list_attr
- value = whitespace_
+ # Node that is being cloned possibly already has
+ # attributes with list values
+ if not isinstance(value, list):
+ value = whitespace_
def items(self):
return list(self.
However, the "<a />" will be duplicated in the output.