Fix performance with recent html5lib linkage fixes

Bug #1810617 reported by Isaac Muse
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

Timing is bit excessive with my recent changes. The following merge request will address the performance issues while still retaining fixes for known, reported linkage issues: https://code.launchpad.net/~facelessuser/beautifulsoup/linkage-performance-fix/+merge/361399.

Related branches

Revision history for this message
Jeffrey Breen (jeffreybreen) wrote :

This patch addresses the slowdown I was seeing.

4.6.3 (good performance):

(env-old) $ python ./python/self_contained.py
version: 4.6.3
Downloading https://www.sec.gov/Archives/edgar/data/315374/000114420419000709/tv509497_10k.htm
Got 992141 bytes in 0.2 s
Parsed with lxml: 0.6 s
      get_text(): 0.0 s

4.7.0 before patch (50X slower):

(env-new) $ python ./python/self_contained.py
version: 4.7.0
Downloading https://www.sec.gov/Archives/edgar/data/315374/000114420419000709/tv509497_10k.htm
Got 992141 bytes in 0.1 s
Parsed with lxml: 31.0 s
      get_text(): 0.0 s

4.7.0 + patch (back to normal):

(env-r492) $ python python/self_contained.py
version: 4.7.0
Downloading https://www.sec.gov/Archives/edgar/data/315374/000114420419000709/tv509497_10k.htm
Got 992141 bytes in 0.5 s
Parsed with lxml: 0.4 s
      get_text(): 0.0 s

Changed in beautifulsoup:
status: New → Fix Committed
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.