lxml test fails with libxml2 2.12

Bug #2045481 reported by Jan Tojnar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

I am trying to update libxml2 to 2.12.1 on NixOS and noticed that with that version the lxml test fails. Apparently, it cannot find any namespaced tags:

beautifulsoup: 4.12.2
lxml: 4.9.3-3
Python: 3.11.6
Downstream PR: https://github.com/NixOS/nixpkgs/pull/269060

=================================== FAILURES ===================================
______________ TestLXMLXMLTreeBuilder.test_find_by_prefixed_name _______________

self = <bs4.tests.test_lxml.TestLXMLXMLTreeBuilder object at 0x7ffff52ead50>

        def test_find_by_prefixed_name(self):
            doc = """<?xml version="1.0" encoding="utf-8"?>
    <Document xmlns="http://example.com/ns0"
        xmlns:ns1="http://example.com/ns1"
        xmlns:ns2="http://example.com/ns2"
        <ns1:tag>foo</ns1:tag>
        <ns1:tag>bar</ns1:tag>
        <ns2:tag key="value">baz</ns2:tag>
    </Document>
    """
            soup = self.soup(doc)

            # There are three <tag> tags.
> print(len(soup.find_all('tag'))); assert 3 == len(soup.find_all('tag'))
E AssertionError

bs4/tests/__init__.py:1117: AssertionError
----------------------------- Captured stdout call -----------------------------
0
=========================== short test summary info ============================
FAILED bs4/tests/test_lxml.py::TestLXMLXMLTreeBuilder::test_find_by_prefixed_name - AssertionError
=================== 1 failed, 649 passed, 6 skipped in 2.82s ===================

Revision history for this message
Terje Røsten (terjeros) wrote :
Revision history for this message
Leonard Richardson (leonardr) wrote (last edit ):

There is a typo in the document used in the test; the opening <Document> tag is missing the final right angle bracket, making the XML invalid. If I'm right, that typo is causing the problem. The method of handling invalid documents is something that can legitimately change between one version of a dependency and another, and it's not what I'm trying to test with that test case.

Please try the attached diff (or the head of the 'master' branch) and see if that solves the problem.

Revision history for this message
Leonard Richardson (leonardr) wrote :
Revision history for this message
Jan Tojnar (jtojnar) wrote :

Thanks. Can confirm that fixes the issue.

Changed in beautifulsoup:
status: New → Fix Committed
Revision history for this message
Tomasz Kloczko (kloczek) wrote :
Download full text (3.2 KiB)

I've tested that patch and with libxml 2.12.3 I have pytaest failing unit

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-beautifulsoup4-4.12.2-6.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-beautifulsoup4-4.12.2-6.
+ /usr/bin/pytest -ra -m 'not network'
============================= test session starts ==============================
platform linux -- Python 3.8.18, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/tkloczko/rpmbuild/BUILD/beautifulsoup4-4.12.2
collected 656 items

bs4/tests/test_builder.py ..... [ 0%]
bs4/tests/test_builder_registry.py ........... [ 2%]
bs4/tests/test_css.py .................................................. [ 10%]
.......... [ 11%]
bs4/tests/test_dammit.py .................................. [ 16%]
bs4/tests/test_element.py ..... [ 17%]
bs4/tests/test_formatter.py .............. [ 19%]
bs4/tests/test_fuzz.py .....ssssss [ 21%]
bs4/tests/test_html5lib.py ............................................. [ 28%]
...................................... [ 33%]
bs4/tests/test_htmlparser.py ........................................... [ 40%]
...........F.................... [ 45%]
bs4/tests/test_lxml.py ................................................. [ 52%]
.................................................... [ 60%]
bs4/tests/test_navigablestring.py ........ [ 62%]
bs4/tests/test_pageelement.py .................................... [ 67%]
bs4/tests/test_soup.py ................................................. [ 75%]
......... [ 76%]
bs4/tests/test_tag.py ....................... [ 79%]
bs4/tests/test_tree.py ................................................. [ 87%]
........................................................................ [ 98%]
........... [100%]

=================================== FAILURES ===================================
_____ TestHTMLParserTreeBuilder.test_smart_quotes_converted_on_the_way_in ______

self = <bs4.tests.test_htmlparser.TestHTMLParserTreeBuilder object at 0x7f1aa6965c40>

    def test_smart_quotes_converted_on_the_way_in(self):
        # Microsoft smart quotes are converted to Unicode characters during
        # parsing.
        quote = b"<p>\x91Foo\x92</p>"
        soup = self.soup(quote)
> assert soup.p.string == "\N{LEFT SINGLE QUOTATION MARK}Foo\N{RIGHT SINGLE QUOTATION MARK}"
E AttributeError: 'NoneType' object has no attribute 'string'

bs4/tests/__init__.py:808: AttributeError
=========================== short test summary info ============================
SKIPPED [6] bs4/tests/test_fuzz.py:60: html5lib problems
FAILED bs4/tests/test_htmlparser.py::TestHTMLParserTreeBuilder::test_smart_quotes_converted_...

Read more...

Revision history for this message
Jan Tojnar (jtojnar) wrote :

@kloczek I am unable to reproduce that failure on NixOS with Python 3.11.6, libxml2 2.12.3 and libxslt 1.1.39.

Revision history for this message
Leonard Richardson (leonardr) wrote :

It's very unlikely that the error you found in test_htmlparser.TestHTMLParserTreeBuilder is related to this patch, for two reasons:

1. The test that fails is testing Python's built-in HTML parser, not lxml. No lxml code should be running.
2. The patch only changes a test (in XMLTreeBuilderSmokeTest), not any code that would be run by the failing test.

In addition, I'm unable to reproduce the issue in my own Python 3.8 environment.

I'd be interested to know:

* Whether the TestHTMLParserTreeBuilder failure is reliably repeatable in your Python 3.8 environment, or whether it's intermittent.
* Whether backing out the patch fixes the TestHTMLParserTreeBuilder failure (at the expense of potentially introducing an lxml test failure)
* Whether installing lxml (and thus libxml2) into an environment that doesn't have the TestHTMLParserTreeBuilder failure, introduces the failure.

Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.