Merge lp:~arthur-darcet/beautifulsoup/beautifulsoup into lp:beautifulsoup

Proposed by Arthur Darcet
Status: Merged
Merged at revision: 556
Proposed branch: lp:~arthur-darcet/beautifulsoup/beautifulsoup
Merge into: lp:beautifulsoup
Diff against target: 36 lines (+6/-4)
1 file modified
bs4/element.py (+6/-4)
To merge this branch: bzr merge lp:~arthur-darcet/beautifulsoup/beautifulsoup
Reviewer Review Type Date Requested Status
Leonard Richardson Pending
Review via email: mp+323231@code.launchpad.net

Description of the change

Hi,

We use BeautifulSoup in a program that calls PageElement.unwrap and PageElement.replace_with a lot, and those two functions trigger two calls to `self.parent.index(self)` where only one would be needed.
I amended `PageElement.extract` to take an optional (and internal) parameter if the tag index in its parent is already known. For our use-case, all the processing time is spent in `list.index`, so this patch is a 2x speed up.

Let me know if this isn't the right place to submit a patch

Arthur Darcet

To post a comment you must log in.
Revision history for this message
petrescu (paul-bogdan) wrote :

Needs reviewer' ack,

Paul

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bs4/element.py'
--- bs4/element.py 2016-07-26 17:03:59 +0000
+++ bs4/element.py 2017-04-26 13:36:26 +0000
@@ -240,7 +240,7 @@
240 raise ValueError("Cannot replace a Tag with its parent.")240 raise ValueError("Cannot replace a Tag with its parent.")
241 old_parent = self.parent241 old_parent = self.parent
242 my_index = self.parent.index(self)242 my_index = self.parent.index(self)
243 self.extract()243 self.extract(_self_index=my_index)
244 old_parent.insert(my_index, replace_with)244 old_parent.insert(my_index, replace_with)
245 return self245 return self
246 replaceWith = replace_with # BS3246 replaceWith = replace_with # BS3
@@ -252,7 +252,7 @@
252 "Cannot replace an element with its contents when that"252 "Cannot replace an element with its contents when that"
253 "element is not part of a tree.")253 "element is not part of a tree.")
254 my_index = self.parent.index(self)254 my_index = self.parent.index(self)
255 self.extract()255 self.extract(_self_index=my_index)
256 for child in reversed(self.contents[:]):256 for child in reversed(self.contents[:]):
257 my_parent.insert(my_index, child)257 my_parent.insert(my_index, child)
258 return self258 return self
@@ -264,10 +264,12 @@
264 wrap_inside.append(me)264 wrap_inside.append(me)
265 return wrap_inside265 return wrap_inside
266266
267 def extract(self):267 def extract(self, _self_index=None):
268 """Destructively rips this element out of the tree."""268 """Destructively rips this element out of the tree."""
269 if self.parent is not None:269 if self.parent is not None:
270 del self.parent.contents[self.parent.index(self)]270 if _self_index is None:
271 _self_index = self.parent.index(self)
272 del self.parent.contents[_self_index]
271273
272 #Find the two elements that would be next to each other if274 #Find the two elements that would be next to each other if
273 #this element (and any children) hadn't been parsed. Connect275 #this element (and any children) hadn't been parsed. Connect

Subscribers

People subscribed via source and target branches

to status/vote changes: