xmlns attribute issue with XML parser with lxml 4.4.0
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned | ||
beautifulsoup4 (Ubuntu) |
Fix Released
|
High
|
Andreas Hasenack |
Bug Description
I have imported a SVG file using the BeautifulSoup XML parser in order to adjust some attribute values, which worked just fine. When I tried to run the processed SVG in my Firefox, it was complaining about a "not well-formed XML/SVG file". The reason: After processing with BeautifulSoup, the <svg> root element contained an attribute named "xmlns:", not "xmlns" anymore. This already occurs when I'm calling
>>> BeautifulSoup(
in my code (tested in the interactive python shell).
It seems like there is a bug with the XML parser which appends a colon to the xmlns-attribute for some reason. This can in my opinion be reproduced when importing any SVG file containing a xmlns attribute with the XML parser.
Related branches
- Leonard Richardson: Pending requested
-
Diff: 12 lines (+1/-1)1 file modifiedbs4/element.py (+1/-1)
- Christian Ehrhardt (community): Approve
- Canonical Server: Pending requested
-
Diff: 77 lines (+45/-1)4 files modifieddebian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/fix-definition-default-xml-namespace.patch (+35/-0)
debian/patches/series (+1/-0)
Changed in beautifulsoup4 (Ubuntu): | |
assignee: | nobody → Andreas Hasenack (ahasenack) |
status: | Triaged → In Progress |
Changed in beautifulsoup: | |
status: | Fix Committed → Fix Released |
This looks to be due to a change in the latest lxml 4.4.0. If you downgrade your version of lxml version to 4.3.5, it goes away.
So this is either a bug in the latest lxml, or we need to adjust somehow for how the new lxml 4.4.0 does things. This will require some more investigation.