Merge lp:~facelessuser/beautifulsoup/beautifulsoup into lp:beautifulsoup

Proposed by Isaac Muse
Status: Merged
Merge reported by: Leonard Richardson
Merged at revision: not available
Proposed branch: lp:~facelessuser/beautifulsoup/beautifulsoup
Merge into: lp:beautifulsoup
Diff against target: 87 lines (+19/-8)
2 files modified
bs4/formatter.py (+13/-8)
bs4/tests/test_html5lib.py (+6/-0)
To merge this branch: bzr merge lp:~facelessuser/beautifulsoup/beautifulsoup
Reviewer Review Type Date Requested Status
Leonard Richardson Pending
Review via email: mp+398038@code.launchpad.net

Description of the change

Add bare attribute logic to html5 formatter as discussed in https://bugs.launchpad.net/beautifulsoup/+bug/1915424.

To post a comment you must log in.
Revision history for this message
Leonard Richardson (leonardr) wrote :

This looks good. I'm going to make a few changes on top of this:

1. Call these "boolean attributes" rather than "bare attributes" because that's how the HTML5 spec refers to them (https://www.w3.org/TR/html50/infrastructure.html#boolean-attributes)

2. Documentation update when talking about the 'html5' formatter.

3. Add a test to verify that the 'html' formatter keeps the same behavior as before.

Revision history for this message
Leonard Richardson (leonardr) wrote :

Adapted into revision 601.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bs4/formatter.py'
2--- bs4/formatter.py 2019-12-29 15:48:30 +0000
3+++ bs4/formatter.py 2021-02-13 22:42:06 +0000
4@@ -48,6 +48,7 @@
5 def __init__(
6 self, language=None, entity_substitution=None,
7 void_element_close_prefix='/', cdata_containing_tags=None,
8+ bare_attributes=False
9 ):
10 """Constructor.
11
12@@ -64,6 +65,8 @@
13 as containing CDATA in this dialect. For example, in HTML,
14 <script> and <style> tags are defined as containing CDATA,
15 and their contents should not be formatted.
16+ :param bare_attributes: Enable bare attributes when the attribute
17+ has an empty value. This is only valid syntax for HTML.
18 """
19 self.language = language
20 self.entity_substitution = entity_substitution
21@@ -71,7 +74,8 @@
22 self.cdata_containing_tags = self._default(
23 language, cdata_containing_tags, 'cdata_containing_tags'
24 )
25-
26+ self.bare_attributes = bare_attributes
27+
28 def substitute(self, ns):
29 """Process a string that needs to undergo entity substitution.
30 This may be a string encountered in an attribute value or as
31@@ -100,26 +104,26 @@
32 or numeric entities.
33 """
34 return self.substitute(value)
35-
36+
37 def attributes(self, tag):
38 """Reorder a tag's attributes however you want.
39-
40+
41 By default, attributes are sorted alphabetically. This makes
42 behavior consistent between Python 2 and Python 3, and preserves
43 backwards compatibility with older versions of Beautiful Soup.
44 """
45 if tag.attrs is None:
46 return []
47- return sorted(tag.attrs.items())
48-
49-
50+ return sorted((k, (None if self.bare_attributes and v == '' else v)) for k, v in tag.attrs.items())
51+
52+
53 class HTMLFormatter(Formatter):
54 """A generic Formatter for HTML."""
55 REGISTRY = {}
56 def __init__(self, *args, **kwargs):
57 return super(HTMLFormatter, self).__init__(self.HTML, *args, **kwargs)
58
59-
60+
61 class XMLFormatter(Formatter):
62 """A generic Formatter for XML."""
63 REGISTRY = {}
64@@ -133,7 +137,8 @@
65 )
66 HTMLFormatter.REGISTRY["html5"] = HTMLFormatter(
67 entity_substitution=EntitySubstitution.substitute_html,
68- void_element_close_prefix = None
69+ void_element_close_prefix = None,
70+ bare_attributes = True
71 )
72 HTMLFormatter.REGISTRY["minimal"] = HTMLFormatter(
73 entity_substitution=EntitySubstitution.substitute_xml
74
75=== modified file 'bs4/tests/test_html5lib.py'
76--- bs4/tests/test_html5lib.py 2020-04-05 19:43:58 +0000
77+++ bs4/tests/test_html5lib.py 2021-02-13 22:42:06 +0000
78@@ -188,3 +188,9 @@
79 # because there's no way of knowing, when a string is created,
80 # where in the tree it will eventually end up.
81 pass
82+
83+ def test_bare_attribute(self):
84+ # Test that HTML5 output will render bare attributes when value is an empty string.
85+ markup = "<span foo>test</span>"
86+ soup = self.soup(markup)
87+ self.assertEqual('<body><span foo>test</span></body>', soup.body.decode(formatter="html5"))

Subscribers

People subscribed via source and target branches

to status/vote changes: