XMLParsedAsHTMLWarning does not include stacklevel

Bug #2034451 reported by Leonard Richardson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Committed
Undecided
Unassigned

Bug Description

Original report from Marc Müller:

I’ve been noticing a warning in one of my test suites recently. The warning itself is probably fine, however it’s emitted on bs4 itself and not at the right stacklevel which made it quite difficult to debug and actually find the culprit.

```
venv/lib/python3.11/site-packages/bs4/builder/__init__.py:545
  /.../venv/lib/python3.11/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning:
      It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), […].
    warnings.warn(
```

Setting a stacklevel of 10 in `builder/__init__.py` does resolve it. The warning will now be reported for `enocean/protocol/eep.py` which is the correct location in my case. Would appreciate it if this could be added to bs4.

```
diff --git a/bs4/builder/__init__.py b/bs4/builder/__init__.py
index 2e39745..76c16ad 100644
--- a/bs4/builder/__init__.py
+++ b/bs4/builder/__init__.py
@@ -543,7 +543,8 @@ class DetectsXMLParsedAsHTML(object):
     def _warn(cls):
         """Issue a warning about XML being parsed as HTML."""
         warnings.warn(
- XMLParsedAsHTMLWarning.MESSAGE, XMLParsedAsHTMLWarning
+ XMLParsedAsHTMLWarning.MESSAGE, XMLParsedAsHTMLWarning,
+ stacklevel=10
         )

     def _initialize_xml_detector(self):
```

Revision history for this message
Leonard Richardson (leonardr) wrote (last edit ):

Since _warn() is called from two different places (warn_if_markup_looks_like_xml and _root_tag_encountered) it probably needs a different stacklevel at each place. And warn_if_markup_looks_like_xml is potentially called at a different place by each TreeBuilder.

Changed in beautifulsoup:
status: New → Triaged
Revision history for this message
Leonard Richardson (leonardr) wrote :

Fixed in revision ceee991.

Changed in beautifulsoup:
status: Triaged → Fix Committed
Revision history for this message
Leonard Richardson (leonardr) wrote :

Released in 4.12.3.

Changed in beautifulsoup:
status: Fix Committed → Fix Released
Revision history for this message
Leonard Richardson (leonardr) wrote :

Follow-up email from Marc:

I just saw that you released version 4.12.3 a week ago with a fix for the issue.
Unfortunately it seems, it doesn’t quite do it. There is one more `cls._warn` which
doesn’t have but needs the stacklevel attribute. In this case `10`.

```
diff --git a/bs4/builder/__init__.py b/bs4/builder/__init__.py
index ffb31fc..30d7ca1 100644
--- a/bs4/builder/__init__.py
+++ b/bs4/builder/__init__.py
@@ -588,7 +588,7 @@ class DetectsXMLParsedAsHTML(object):
             # We encountered an XML declaration and then a tag other
             # than 'html'. This is a reliable indicator that a
             # non-XHTML document is being parsed as XML.
- self._warn()
+ self._warn(stacklevel=10)

 def register_treebuilders_from(module):
```

If you want to test it, the library which causes the issue tries to read an `xml` file with the ``html.parser`.
Unfortunately, it’s unmaintained so I can’t fix it there, but for the time being it still works.

```py
BeautifulSoup(xml_file.read(), "html.parser”)
```

The start of the xml file
```
<?xml version="1.0" encoding="utf-8"?>

```

Changed in beautifulsoup:
status: Fix Released → Confirmed
status: Confirmed → In Progress
Changed in beautifulsoup:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.