Handling attributes whose value is the empty string as HTML boolean attributes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
So, recently, I had a new issue opened up in Soup Sieve, and while in the beginning, I asserted that attributes should all be strings, I noticed that BS does a lot to ensure even if attribute values are not strings that they still get resolved to strings and not choke, so I ended up caving and made sure that Soup Sieve also normalizes attribute values that are not strings. 'None' being of particular interest.
Soup Sieve will now handle arbitrary random types now in attribute values, more to keep things from crashing, which is why I assume BS does it, so I'm fine with handling things like `None` even if I'd argue the user should never use `None` explicitly in the attribute. My interest is more in how BS handles an attribute `None` vs an empty string in HTML, and why it handles them differently.
This brings me to my question. In HTML, `foo=""` and `foo` are essentially treated the same. Using CSS selectors in any browser, `[foo=""]` will match both attributes with explicit empty strings (`foo=""`) and attributes with implied empty strings (`foo`).
The whole reason that the Soup Sieve issue was opened is that the user, in order to force BS to output a bare attribute in the form of `foo`, had to assign the variable a `None`. Interestingly, when BS imports a bare attribute, it stores it in its dictionary as `attrs['foo'] = ''` which I think is correct. But when outputting, it will not output it as `foo`, but will output it as `foo=""`. Hence, why the user forced it `None` as BS then treats that differently and will output a bare attribute `foo`.
Why are these (attribute value of `None` vs an empty string) treated differently in output? In HTML they are the same, so I would expect an attribute assigned an empty string or None to output the same. Generally, most HTML users would probably prefer the bare attribute. So can we have HTML formatter, by default, output bare attributes when the string is empty?
Related branches
- Leonard Richardson: Pending requested
-
Diff: 87 lines (+19/-8)2 files modifiedbs4/formatter.py (+13/-8)
bs4/tests/test_html5lib.py (+6/-0)
summary: |
- Handling of bare attributes in HTML + Handling attributes whose value is the empty string as HTML boolean + attributes |
I do realize that XML should treat this case differently, and this would only be for HTML.