Fix performance with recent html5lib linkage fixes
Bug #1810617 reported by
Isaac Muse
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Timing is bit excessive with my recent changes. The following merge request will address the performance issues while still retaining fixes for known, reported linkage issues: https:/
Related branches
lp:~facelessuser/beautifulsoup/linkage-performance-fix
Merged
into
lp:beautifulsoup
- Leonard Richardson: Approve
-
Diff: 158 lines (+41/-98)1 file modifiedbs4/__init__.py (+41/-98)
Changed in beautifulsoup: | |
status: | New → Fix Committed |
Changed in beautifulsoup: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
This patch addresses the slowdown I was seeing.
4.6.3 (good performance):
(env-old) $ python ./python/ self_contained. py /www.sec. gov/Archives/ edgar/data/ 315374/ 000114420419000 709/tv509497_ 10k.htm
version: 4.6.3
Downloading https:/
Got 992141 bytes in 0.2 s
Parsed with lxml: 0.6 s
get_text(): 0.0 s
4.7.0 before patch (50X slower):
(env-new) $ python ./python/ self_contained. py /www.sec. gov/Archives/ edgar/data/ 315374/ 000114420419000 709/tv509497_ 10k.htm
version: 4.7.0
Downloading https:/
Got 992141 bytes in 0.1 s
Parsed with lxml: 31.0 s
get_text(): 0.0 s
4.7.0 + patch (back to normal):
(env-r492) $ python python/ self_contained. py /www.sec. gov/Archives/ edgar/data/ 315374/ 000114420419000 709/tv509497_ 10k.htm
version: 4.7.0
Downloading https:/
Got 992141 bytes in 0.5 s
Parsed with lxml: 0.4 s
get_text(): 0.0 s