Merge lp:~jtv/launchpad/bug-884649-branch-4 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Merged at revision: 14256
Proposed branch: lp:~jtv/launchpad/bug-884649-branch-4
Merge into: lp:launchpad
Diff against target: 0 lines
To merge this branch: bzr merge lp:~jtv/launchpad/bug-884649-branch-4
Reviewer Review Type Date Requested Status
Brad Crittenden (community) code Approve
Review via email: mp+81265@code.launchpad.net

Commit message

Speed up 2nd-pass binary domination through caching.

Description of the change

= Summary =

The second pass of binary package domination has been taking far too long. It repeatedly queried for active architecture-specific binary package publications on given source package releases. This was a necessary change that we chose to implement naïvely-but-correctly at first, as per Knuth's Law.

Generally, the query will be repeated for the same source package release across all architectures, with (after some other branches I landed for this bug) the same answer every time.

== Proposed fix ==

This branch picks the fruit from the tree planted and watered in the preceding branches for the same bug. It caches the result of that repetitive query between iterations.

== Pre-implementation notes ==

See the notes on bug 884649 for the plan we're following here. A few more low-risk optimizations remain to be added after this branch, if necessary:

 * More bulk-loading of references to BinaryPackageBuild, SourcePackageRelease etc.

 * As Raphaël suggested: use BinaryPackagePublishingHistory.binarypackagename now that it's finally been fully initialized.

== Implementation details ==

It might have been possible to pre-populate the entire cache in one huge query. Surprisingly, memory footprint is probably not a limiting factor there. But now that the second domination pass skips packages where it's known that it won't do any good, it might be costly to figure out in advance exactly for which SourcePackageReleases the information would be needed.

I replaced BinaryPackagePublishingHistory.getOtherPublicationsForSameSource with SourcePackageRelease.getActiveArchSPecificPublications. Quite a mouthful, eh? The change followed naturally from extracting the part where we find the relevant SourcePackageRelease. The BPPH itself isn't relevant as such; only a few of its attributes are relevant as search criteria, and those are now passed as parameters.

== Tests ==

{{{
./bin/test -vvc lp.archivepublisher.tests.test_dominator
./bin/test -vvc lp.soyuz.tests.test_publishing
./bin/test -vvc lp.soyuz.tests.test_sourcepackagerelease
}}}

== Demo and Q/A ==

Upload & process a few revisions of a source package with both arch-specific and arch-independent binary packages. Maybe some pure arch-specific and arch-indep ones as well.

Sabotage a binary package for one architecture: reset the BPPH for one of the arch-specific binary packages (from a mixed source package) to Pending.

Run the dominator.

See that it functioned normally:

 * Of the architecture-dependent binary packages, all revisions are superseded except for the latest revision of each.

 * For the architecture you “sabotaged,” the last Published BPPH stays Published and the Pending one stays Pending.

 * Architecture-independent BPPHs are Superseded, except for the very latest revision of each _and_ except for the revision matching the currently Published revision of the binary package you sabotaged.

In other words, there should be two live versions of that arch-indep binary package. Its other releases should be superseded as normal though.

= Launchpad lint =

One pre-existing lint item that seems to be just cleverness confusing the linter. I'm not touching it.

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/domination.py
  lib/lp/soyuz/tests/test_publishing.py
  lib/lp/soyuz/model/publishing.py
  lib/lp/soyuz/tests/test_sourcepackagerelease.py
  lib/lp/soyuz/model/sourcepackagerelease.py
  lib/lp/archivepublisher/tests/test_dominator.py

./lib/lp/soyuz/model/sourcepackagerelease.py
     204: redefinition of function 'copyright' from line 195

To post a comment you must log in.
Revision history for this message
Brad Crittenden (bac) wrote :

Hi Jeroen,

Lots of optimization here -- I know that warms your innards.

* I'm a little confused by find_live_source_version and find_live_binary_version_pass_1. Both take an iterable called 'sorted_pubs'. In the former the "latest" is element [0] and in the later it is [-1] even though the comments say they are both sorted in descending order. Is that notion of "latest" really different?

* "This class looks up whether there are any." Could you complete that thought?

* I like your ArchSpecificPublicationsCache. I'm a bit (paranoidly) worried about the fragility of the order of elements returned by getKey and those expected by _lookUp ... but can't think of anything smart to suggest.

In find_live_binary_version_pass_2 how long do you expect the sorted_pubs list to be? Would it be reasonable to use set-based methods to determine arch_indep_pubs after you already figured out arch_specific_pubs? BTW, I should pay more attention to itertools -- I learned about ifilter* from this review. Thanks!

* In test_hasArchSpecificPublications_is_consistent_and_correct, I don't understand the value of testing each entry in the cache twice.

Otherwise it looks good. Thanks.

review: Approve (code)
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Thanks Brad!

> * I'm a little confused by find_live_source_version and
> find_live_binary_version_pass_1. Both take an iterable called 'sorted_pubs'.
> In the former the "latest" is element [0] and in the later it is [-1] even
> though the comments say they are both sorted in descending order. Is that
> notion of "latest" really different?

I don't see the [-1] you're referring to! Maybe you're thinking of the pop(0)? Without the argument that would take the last item from the list, but with the “0” it takes the first item instead.

> * "This class looks up whether there are any." Could you complete that
> thought?

It was a back-reference to the publications described at the end of the previous paragraph. I just ran out of succinct ways of mentioning them.

I just reworded this text to be clearer.

> * I like your ArchSpecificPublicationsCache. I'm a bit (paranoidly) worried
> about the fragility of the order of elements returned by getKey and those
> expected by _lookUp ... but can't think of anything smart to suggest.

The fun of getKey is that it leaves only one data path. It minimizes the opportunities for this to go wrong.

If the arguments passed to SPR.getActiveArchSpecificPublications ever gets messed up, at worst, the search can never be satisfied and thus the positive tests will fail.

> In find_live_binary_version_pass_2 how long do you expect the sorted_pubs list
> to be? Would it be reasonable to use set-based methods to determine
> arch_indep_pubs after you already figured out arch_specific_pubs? BTW, I
> should pay more attention to itertools -- I learned about ifilter* from this
> review. Thanks!

Generally those lists would be quite short. I noncommittally considered using tuples instead of lists, frozensets etc. but none of those changes looked to me as if they would even be a net performance improvement, let alone a worthwhile one.

> * In test_hasArchSpecificPublications_is_consistent_and_correct, I don't
> understand the value of testing each entry in the cache twice.

Consistency of subsequent calls (which of course return cached results). A confession: I first wrote a very detailed unit test for this class, using only fakes. Ran though vast amounts of detail, such as the keys' sensitivity to each individual item. Thanks to the fakes, it all ran in a minuscule fraction of a second. Then I decided that these tests didn't carry their maintenance weight and deleted the entire unit test case, leaving the integration test case and porting perhaps one or two of the original unit tests to it.

I clarified this by bolting the word “repeatably” onto the test's description.

Jeroen

Preview Diff

Empty