Passing a Tag into Tag.extend() affects only half of the original tag's children.

Bug #1885710 reported by alex
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

python3.8
beautifulsoup 4.9.1

Code to reproduce:
```
>>> from bs4 import BeautifulSoup, Tag
>>> soup = BeautifulSoup('<html><body>' + "".join(f'<p id="{i}"></p>' for i in range(10)) + '</body></html>', 'html.parser')
>>> soup
<html><body><p id="0"></p><p id="1"></p><p id="2"></p><p id="3"></p><p id="4"></p><p id="5"></p><p id="6"></p><p id="7"></p><p id="8"></p><p id="9"></p></body></html>
>>> fakebody = Tag(name='body')
>>> fakebody.extend(soup.body)
>>> fakebody
<body><p id="0"></p><p id="2"></p><p id="4"></p><p id="6"></p><p id="8"></p><p id="1"></p><p id="5"></p><p id="9"></p><p id="0"></p><p id="2"></p><p id="4"></p><p id="6"></p><p id="8"></p></body>
>>> soup.body
<body><p id="1"></p><p id="3"></p><p id="5"></p><p id="7"></p><p id="9"></p></body>
```

In docs is written (https://beautiful-soup-4.readthedocs.io/en/latest/index.html?highlight=tag#extend)

Starting in Beautiful Soup 4.7.0, Tag also supports a method called .extend(), which works just like calling .extend() on a Python list:
But list doesn't work in this way

```
>>> a = [1, 2, 3, 4]
>>> b = []
>>> b.extend(a)
>>> b
[1, 2, 3, 4]
>>> a
[1, 2, 3, 4]
```

Maybe copy of elements should be not destructive?

Ideas for fix in bs4/elements.py:

# Use of this source code is governed by the MIT license.
__license__ = "MIT"

from copy import deepcopy
...
def extend(self, tags):
    """Appends the given PageElements to this one's contents.

    :param tags: A list of PageElements.
    """
    for tag in tags:
        self.append(tag if tag.parent is None else deepcopy(tag))
...

Or maybe should be patch Tag.insert?

alex (asiaron)
description: updated
Revision history for this message
Leonard Richardson (leonardr) wrote :

This is fixed in revision 587.

Tag.extend() iterates over the given list and calls Tag.append() on each element of the list. When the 'list' it was given is another Tag (let's call it t1), this means iterating over the PageElement objects found in the list "t1.contents".

The problem you found stems from the fact that if Tag.append() is given a PageElement already associated with a Beautiful Soup parse tree, it will uproot that PageElement and move it to a new location. This changes t1.contents, causing the iterator to skip the next item. That's why, in your example, only half of the <p> tags are re-homed.

You're right to point out that this behavior differs from Python's list.extend(), which works by copying references. Tag objects can only exist in one place in a tree, so copying references doesn't work for them. I've changed the documentation to spell out what is happening rather than making a comparison to list.extend().

Changed in beautifulsoup:
status: New → Fix Committed
summary: - extend Tag borrow not all Tags from Tag.children
+ Passing a Tag into Tag.extend() affects only half of the original tag's
+ children.
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.