Beautiful Soup

XMLParsedAsHTMLWarning does not include stacklevel

Bug #2034451 reported by Leonard Richardson on 2023-09-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Beautiful Soup	Fix Committed	Undecided	Unassigned

Bug Description

Original report from Marc Müller:

I’ve been noticing a warning in one of my test suites recently. The warning itself is probably fine, however it’s emitted on bs4 itself and not at the right stacklevel which made it quite difficult to debug and actually find the culprit.

```
venv/lib/python3.11/site-packages/bs4/builder/__init__.py:545
  /.../venv/lib/python3.11/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning:
      It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), […].
    warnings.warn(
```

Setting a stacklevel of 10 in `builder/__init__.py` does resolve it. The warning will now be reported for `enocean/protocol/eep.py` which is the correct location in my case. Would appreciate it if this could be added to bs4.

```
diff --git a/bs4/builder/__init__.py b/bs4/builder/__init__.py
index 2e39745..76c16ad 100644
--- a/bs4/builder/__init__.py
+++ b/bs4/builder/__init__.py
@@ -543,7 +543,8 @@ class DetectsXMLParsedAsHTML(object):
     def _warn(cls):
         """Issue a warning about XML being parsed as HTML."""
         warnings.warn(
- XMLParsedAsHTMLWarning.MESSAGE, XMLParsedAsHTMLWarning
+ XMLParsedAsHTMLWarning.MESSAGE, XMLParsedAsHTMLWarning,
+ stacklevel=10
         )

def _initialize_xml_detector(self):
```

Revision history for this message

Leonard Richardson (leonardr) wrote on 2023-09-05 (last edit on 2023-09-05):

Since _warn() is called from two different places (warn_if_markup_looks_like_xml and _root_tag_encountered) it probably needs a different stacklevel at each place. And warn_if_markup_looks_like_xml is potentially called at a different place by each TreeBuilder.

Leonard Richardson (leonardr) on 2023-09-05

Changed in beautifulsoup:
status:	New → Triaged

Revision history for this message

Leonard Richardson (leonardr) wrote on 2024-01-17:

Fixed in revision ceee991.

Changed in beautifulsoup:
status:	Triaged → Fix Committed

Revision history for this message

Leonard Richardson (leonardr) wrote on 2024-01-17:

Released in 4.12.3.

Changed in beautifulsoup:
status:	Fix Committed → Fix Released

Revision history for this message

Leonard Richardson (leonardr) wrote on 2024-02-04:

Follow-up email from Marc:

I just saw that you released version 4.12.3 a week ago with a fix for the issue.
Unfortunately it seems, it doesn’t quite do it. There is one more `cls._warn` which
doesn’t have but needs the stacklevel attribute. In this case `10`.

```
diff --git a/bs4/builder/__init__.py b/bs4/builder/__init__.py
index ffb31fc..30d7ca1 100644
--- a/bs4/builder/__init__.py
+++ b/bs4/builder/__init__.py
@@ -588,7 +588,7 @@ class DetectsXMLParsedAsHTML(object):
             # We encountered an XML declaration and then a tag other
             # than 'html'. This is a reliable indicator that a
             # non-XHTML document is being parsed as XML.
- self._warn()
+ self._warn(stacklevel=10)

def register_treebuilders_from(module):
```

If you want to test it, the library which causes the issue tries to read an `xml` file with the ``html.parser`.
Unfortunately, it’s unmaintained so I can’t fix it there, but for the time being it still works.

```py
BeautifulSoup(xml_file.read(), "html.parser”)
```

The start of the xml file
```
<?xml version="1.0" encoding="utf-8"?>
…
```

Changed in beautifulsoup:
status:	Fix Released → Confirmed
status:	Confirmed → In Progress

Leonard Richardson (leonardr) on 2024-02-12

Changed in beautifulsoup:
status:	In Progress → Fix Committed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.