Merge lp:~adeuring/launchpad/bug-904461 into lp:launchpad
| Status: | Merged | ||||
|---|---|---|---|---|---|
| Approved by: | Abel Deuring on 2012-01-19 | ||||
| Approved revision: | no longer in the source branch. | ||||
| Merged at revision: | 14712 | ||||
| Proposed branch: | lp:~adeuring/launchpad/bug-904461 | ||||
| Merge into: | lp:launchpad | ||||
| Diff against target: |
606 lines (+375/-61) 8 files modified
lib/lp/bugs/browser/cvereport.py (+67/-34) lib/lp/bugs/browser/tests/test_cve.py (+176/-0) lib/lp/bugs/doc/bugtask-search.txt (+2/-2) lib/lp/bugs/doc/cve.txt (+4/-4) lib/lp/bugs/interfaces/cve.py (+7/-4) lib/lp/bugs/model/cve.py (+39/-9) lib/lp/bugs/templates/bugtask-macros-cve.pt (+3/-8) lib/lp/bugs/tests/test_cve.py (+77/-0) |
||||
| To merge this branch: | bzr merge lp:~adeuring/launchpad/bug-904461 | ||||
| Related bugs: |
|
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| j.c.sackett (community) | 2012-01-19 | Approve on 2012-01-19 | |
| Richard Harding (community) | code* | 2012-01-19 | Approve on 2012-01-19 |
|
Review via email:
|
|||
Commit Message
[r=jcsackett,
Description of the Change
This branch fixes bug 904461: Timeout viewing CVE reports in an Ubuntu series
This timeout is not caused by slow SQL queries, but by several bottlenecks
on Python level. Related OOPS reports show SQL times of less than one second.
Thanks to lifeless, who gave me a number good hints where to look and
how to reproduce the timeout locally.
I created test data that should be roughly similar to real world data:
a few hundred bugs/bug tasks and ca 250 CVE, often linked to more than
one bug: The +cve page for Oneiric (or Natty -- I can't remeber where I
looked...) contains approximately 4800 links to CVEs, but only 240
or 250 _different_ CVEs: CVEs are often linked to several bugs.
Probably the worst bottleneck is the query
in lp.bugs.
diff), where one CVE is on average loaded 20 times. The SQL itself is
quite fast, but Storm spends much time in methods like load_object().
I changed this in r14627 so that only the tuples (bug_id, cve_id) are
loaded from the table BugCve; the related Bug and Cve instances are
retrieved via load_related().
Another serious bottleneck is the function canonical_url(): profile
data showed that several seconds are spent there when a page is
rendered.
The number of calls can be considerably reduced: Most calls are for
Cve instances, so we can avoid a large part of them, if we call
the function just once for each Cve. I implemented this by adding
an optional parameter cve_mapper to getBugCvesForBu
the function get_cve_
is now used as cve_mapper when the +cve view is rendered.
This saved 2.5 seconds in my test setup (r14629).
Antoher noticeable improvement was to retrieve the (bug, cve) tuples
for open and resolved CVEs in one query (r14628). I am not sure though
if the improvement in my test data resembles the real world: The number
of CVEs that are linked to open _and_ resovled bug tasks was probably
too big in the test data.
The last big bottleneck was the TAL loop that renders the CVE links.
Replacing it with simple Python list and string operations (r14630,
method CVEReportView.
A final, smaller improvemet was to bulk load the Bugs and the Person
records for bug task assignees (r14631).
Timing data from this test:
def test_open_
from time import time
total = 0.0
LOOPS = 3
for i in range(LOOPS):
start = time()
view = create_
delta = time() - start
total += delta
print "render time", delta
print "avg", total/LOOPS
(Note that this test is not included in the branch: Running it makes
sense with the right test data: The script that created my test data
ran more than one hour -- doing this in a regular unit test as a part
of the complete test suite does not make sense. And the sample data
file current.sql has a size of 9MB with the new test data...)
r14626 (original timeout) 14.3 sec
r14627 more efficent retrieval of (Bug, Cve) tuples 8.4 sec
r14628 one query to retrieve open and resolved CVEs 8.1 sec
r14629 efficient generation of CVE related display data 5.6 sec
r14630 render CVE links and join them in a browser method 2.9 sec
r14631 bulk load bugs and bugtask assignees 2.9 sec
(In r14631 and r14630, caching is noticable: The first loop execution
needed ca 4 seconds, the two following loop run needed ca 2.4 seconds.)
I did _not_ test that the number of SQL queries is constant, regardless
of the number of bugs/bug tasks/CVEs because it is not, unfortunately:
IPersonSet.
for ValidPersonCache records of single Person records. Bulk loading
Person records works fine though.
tests:
./bin/test bugs -vvt test_cve
no lint.
| j.c.sackett (jcsackett) wrote : | # |
> Should we add a docstring to the getBugCvesForBu
> cve_mapper param as a callable method meant for that mapping?
I see that you have a docstring for this method in the interface; given the complexity of what's going on, I think in the model code it may be worthwhile to simply add the usual "See `ICveSet`." docstring so people needing help quickly know where to go for it. I'll still approve, but would like this added before it lands.
Your tests look pretty extensive; I concur that adding multiple bugs per cve might be useful, but as you've got reasonable testing of performance and behavior of that situation elsewhere and the cve set tests verify the implementation sufficiently Rick is correct to say it's not a blocker.
One very minor point in the diff below:
> === modified file 'lib/lp/
> --- lib/lp/
> +++ lib/lp/
> @@ -178,15 +182,41 @@
> raise AssertionError(
> return sorted(cves, key=lambda a: a.sequence)
>
> - def getBugCvesForBu
> - bug_ids = set(task.bug.id for task in bugtasks)
> + def getBugCvesForBu
> + bugs = load_related(Bug, bugtasks, ('bugID', ))
> + if len(bugs) == 0:
> + return []
> + bug_ids = [bug.id for bug in bugs]
> assert bug_ids, "bugtasks must be non-empty, received %r" % bugtasks
I think you can remove the assert--you return if bugs is len 0, and the
only other thing the assert would catch would be NoneType, which would blow up
on the len check. Obviously it's not hurting anything, but it's not really
necessary either.
| Abel Deuring (adeuring) wrote : | # |
Hi Rick,
thanks for your review. I made all changes you suggested, though the doc string for getBugCvesForBu() is just the usual terse "see `ICveSet`".
The test data generated in TestCVEReportVi
I also forgot to mention in my original MP that I think that test_open_
| Abel Deuring (adeuring) wrote : | # |
On 19.01.2012 15:45, j.c.sackett wrote:
> Review: Approve
>
>> Should we add a docstring to the getBugCvesForBu
>> cve_mapper param as a callable method meant for that mapping?
>
> I see that you have a docstring for this method in the interface; given the complexity of what's going on, I think in the model code it may be worthwhile to simply add the usual "See `ICveSet`." docstring so people needing help quickly know where to go for it. I'll still approve, but would like this added before it lands.
>
> Your tests look pretty extensive; I concur that adding multiple bugs per cve might be useful, but as you've got reasonable testing of performance and behavior of that situation elsewhere and the cve set tests verify the implementation sufficiently Rick is correct to say it's not a blocker.
>
> One very minor point in the diff below:
>
>> === modified file 'lib/lp/
>> --- lib/lp/
>> +++ lib/lp/
>> @@ -178,15 +182,41 @@
>> raise AssertionError(
>> return sorted(cves, key=lambda a: a.sequence)
>>
>> - def getBugCvesForBu
>> - bug_ids = set(task.bug.id for task in bugtasks)
>> + def getBugCvesForBu
>> + bugs = load_related(Bug, bugtasks, ('bugID', ))
>> + if len(bugs) == 0:
>> + return []
>> + bug_ids = [bug.id for bug in bugs]
>> assert bug_ids, "bugtasks must be non-empty, received %r" % bugtasks
>
> I think you can remove the assert--you return if bugs is len 0, and the
> only other thing the assert would catch would be NoneType, which would blow up
> on the len check. Obviously it's not hurting anything, but it's not really
> necessary either.
good spot! I noticed the assert too when working on the branch but then
simple forgot it...
| j.c.sackett (jcsackett) wrote : | # |
Thanks for removing the assert, Abel.

Thanks Abel, this looks like it'll save a bunch of cpu cycles. I have a few questions/notes.
In the html generation, should the string that defines the template be moved outside of the method so that the variable instance isn't created each call (potentially thousands?) and instead be placed as some sort of class level constant?
Should we add a docstring to the getBugCvesForBu gTasks and make note of the cve_mapper param as a callable method meant for that mapping?
I don't think it matters, but in looking at the TestCveSet test cases, I wonder if a test case with multiple bugs per cve should be exercised as well since much of the code is about dealing with that condition.
I don't think any of these are deal breakers so approving.