Launchpad itself

Bug #884649
Comment #1

Comment 1 for bug 884649

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2011-11-01:

See lib/lp/archepublisher/domination.py for background, and especially the comment in dominateBinaries.

Some notes about our planned approach follow, from IRC (times in UTC):

(11:54:23) jtv:
But the big change I'm hoping for is that we might find that the first binary domination pass can simply keep all arch-all pubs alive.

(11:54:55) bigjools:
That's how I originally did it.

(11:55:47) jtv:
Doesn't work?

(11:56:43) bigjools:
The two corner cases mentioned in the comment ...
The arch-all pubs were never getting dominated.

(12:03:59) jtv:
Looking at the comment, I see why the second pass is needed. Accepting that we need the second pass anyway, there's nothing per se against further increasing the need for it. Which means: legroom for optimization!

About the "keep arch-all bpphs alive" thing: are you saying you tried that with, or without the double domination?

(12:09:18) bigjools:
Without.
Well — 2 sets of queries
rather than a full domination run
so it did arch-any first then arch-all
but that has the schizo problem.

(12:15:30) jtv:
Quite.

(12:15:53) jtv:
So that means that _with_ double domination, this might just work.

Dominate twice, but on the first pass, don't supersede arch-all at all.
(So consider them, but keep them live)
I wonder if we could do the first pass for all architectures, before doing the second pass.
Because that way, we get to supersede all non-live arch-specific pubs before we even start looking at the other-publications-from-same-source.

The second pass could group by SPRs. There'd be only one getOtherPublicationsForSameSource for each.

(12:20:44) bigjools:
Isn't this what we do right now?

(12:20:59) jtv:
Slightly different loop nesting.
Right now we loop over DASes, and for each, do 2 domination passes.

(12:21:15) bigjools:
Actually one way to speed it is to get the list of sources that had arch-all binaries that we left live.

(12:21:23) jtv:
Indeed.
Should be very easy to collect that information in the per-package domination loop.
That's a nice touch.
In fact, it eliminates a whole lot of work that I thought we were going to need!

I guess for the second pass, the algorithm could be something like:
Keep the latest version alive. Keep the remaining arch-specific versions alive. Keep any arch-indep versions alive if there are still arch-specific pubs for their SPR.

“What's the difference with what we do now?” I hear you ask.

(12:26:32) bigjools:
:)

(12:26:57) jtv:
The difference is that there's no need to dominate arch-specific BPPHs at all in that pass.

(12:27:06) bigjools:
Indeed.

(12:27:48) jtv:
AFAICS we can pass just the arch-specific BPPHs to the inner domination method.

Och aye, no we can't quite do that.

Because if the latest version is arch-specific, we'd end up deleting the last arch-all version instead of superseding it.
(Unless we allow the caller to "pre-seed" the dominant)

Anyway, that's going into needless detail.

First I'll apply some loop fission to separate the two passes into separate DAS loops.

Then I add code to collect the SPRs-with-live-arch-indep-BPPHs.

Then I rearrange the second-pass loop to iterate over those, with an inner loop that iterates DAS.

(12:33:15) bigjools:
Nice.

(12:33:37) jtv:
And somewhere along the way, I cut computations that duplicate the outer loops out of the domination method.

In particular, the call to getPublicationsForSameSource goes into the outer second-pass loop. Its results get re-used for every architecture.

(Thus we query it once for every SPR, apply the result to each architecture, then move on to the next SPR)

(12:36:47) bigjools:
ok

See lib/lp/archepublisher/domination.py for background, and especially the comment in dominateBinaries.

Some notes about our planned approach follow, from IRC (times in UTC):

(11:54:23) jtv:
But the big change I'm hoping for is that we might find that the first binary domination pass can simply keep all arch-all pubs alive.

(11:54:55) bigjools:
That's how I originally did it.

(11:55:47) jtv:
Doesn't work?

(11:56:43) bigjools:
The two corner cases mentioned in the comment ...
The arch-all pubs were never getting dominated.

(12:03:59) jtv:
Looking at the comment, I see why the second pass is needed.  Accepting that we need the second pass anyway, there's nothing per se against further increasing the need for it.  Which means: legroom for optimization!

About the "keep arch-all bpphs alive" thing: are you saying you tried that with, or without the double domination?

(12:09:18) bigjools:
Without.
Well — 2 sets of queries
rather than a full domination run
so it did arch-any first then arch-all
but that has the schizo problem.

(12:15:30) jtv:
Quite.

(12:15:53) jtv:
So that means that _with_ double domination, this might just work.

The second pass could group by SPRs.  There'd be only one getOtherPublicationsForSameSource for each.

(12:20:44) bigjools:
Isn't this what we do right now?

(12:20:59) jtv:
Slightly different loop nesting.
Right now we loop over DASes, and for each, do 2 domination passes.

(12:21:15) bigjools:
Actually one way to speed it is to get the list of sources that had arch-all binaries that we left live.

I guess for the second pass, the algorithm could be something like:
Keep the latest version alive.  Keep the remaining arch-specific versions alive.  Keep any arch-indep versions alive if there are still arch-specific pubs for their SPR.

“What's the difference with what we do now?”  I hear you ask.

(12:26:32) bigjools:
:)

(12:26:57) jtv:
The difference is that there's no need to dominate arch-specific BPPHs at all in that pass.

(12:27:06) bigjools:
Indeed.

(12:27:48) jtv:
AFAICS we can pass just the arch-specific BPPHs to the inner domination method.

Och aye, no we can't quite do that.

Because if the latest version is arch-specific, we'd end up deleting the last arch-all version instead of superseding it.
(Unless we allow the caller to "pre-seed" the dominant)

Anyway, that's going into needless detail.

First I'll apply some loop fission to separate the two passes into separate DAS loops.

Then I add code to collect the SPRs-with-live-arch-indep-BPPHs.

Then I rearrange the second-pass loop to iterate over those, with an inner loop that iterates DAS.

(12:33:15) bigjools:
Nice.

(12:33:37) jtv:
And somewhere along the way, I cut computations that duplicate the outer loops out of the domination method.

In particular, the call to getPublicationsForSameSource goes into the outer second-pass loop.  Its results get re-used for every architecture.

(Thus we query it once for every SPR, apply the result to each architecture, then move on to the next SPR)

(12:36:47) bigjools:
ok