See lib/lp/archepublisher/domination.py for background, and especially the comment in dominateBinaries.
Some notes about our planned approach follow, from IRC (times in UTC):
(11:54:23) jtv:
But the big change I'm hoping for is that we might find that the first binary domination pass can simply keep all arch-all pubs alive.
(11:54:55) bigjools:
That's how I originally did it.
(11:55:47) jtv:
Doesn't work?
(11:56:43) bigjools:
The two corner cases mentioned in the comment ...
The arch-all pubs were never getting dominated.
(12:03:59) jtv:
Looking at the comment, I see why the second pass is needed. Accepting that we need the second pass anyway, there's nothing per se against further increasing the need for it. Which means: legroom for optimization!
About the "keep arch-all bpphs alive" thing: are you saying you tried that with, or without the double domination?
(12:09:18) bigjools:
Without.
Well — 2 sets of queries
rather than a full domination run
so it did arch-any first then arch-all
but that has the schizo problem.
(12:15:30) jtv:
Quite.
(12:15:53) jtv:
So that means that _with_ double domination, this might just work.
Dominate twice, but on the first pass, don't supersede arch-all at all.
(So consider them, but keep them live)
I wonder if we could do the first pass for all architectures, before doing the second pass.
Because that way, we get to supersede all non-live arch-specific pubs before we even start looking at the other-publications-from-same-source.
The second pass could group by SPRs. There'd be only one getOtherPublicationsForSameSource for each.
(12:20:44) bigjools:
Isn't this what we do right now?
(12:20:59) jtv:
Slightly different loop nesting.
Right now we loop over DASes, and for each, do 2 domination passes.
(12:21:15) bigjools:
Actually one way to speed it is to get the list of sources that had arch-all binaries that we left live.
(12:21:23) jtv:
Indeed.
Should be very easy to collect that information in the per-package domination loop.
That's a nice touch.
In fact, it eliminates a whole lot of work that I thought we were going to need!
I guess for the second pass, the algorithm could be something like:
Keep the latest version alive. Keep the remaining arch-specific versions alive. Keep any arch-indep versions alive if there are still arch-specific pubs for their SPR.
“What's the difference with what we do now?” I hear you ask.
(12:26:32) bigjools:
:)
(12:26:57) jtv:
The difference is that there's no need to dominate arch-specific BPPHs at all in that pass.
(12:27:06) bigjools:
Indeed.
(12:27:48) jtv:
AFAICS we can pass just the arch-specific BPPHs to the inner domination method.
Och aye, no we can't quite do that.
Because if the latest version is arch-specific, we'd end up deleting the last arch-all version instead of superseding it.
(Unless we allow the caller to "pre-seed" the dominant)
Anyway, that's going into needless detail.
First I'll apply some loop fission to separate the two passes into separate DAS loops.
Then I add code to collect the SPRs-with-live-arch-indep-BPPHs.
Then I rearrange the second-pass loop to iterate over those, with an inner loop that iterates DAS.
(12:33:15) bigjools:
Nice.
(12:33:37) jtv:
And somewhere along the way, I cut computations that duplicate the outer loops out of the domination method.
In particular, the call to getPublicationsForSameSource goes into the outer second-pass loop. Its results get re-used for every architecture.
(Thus we query it once for every SPR, apply the result to each architecture, then move on to the next SPR)
See lib/lp/ archepublisher/ domination. py for background, and especially the comment in dominateBinaries.
Some notes about our planned approach follow, from IRC (times in UTC):
(11:54:23) jtv:
But the big change I'm hoping for is that we might find that the first binary domination pass can simply keep all arch-all pubs alive.
(11:54:55) bigjools:
That's how I originally did it.
(11:55:47) jtv:
Doesn't work?
(11:56:43) bigjools:
The two corner cases mentioned in the comment ...
The arch-all pubs were never getting dominated.
(12:03:59) jtv:
Looking at the comment, I see why the second pass is needed. Accepting that we need the second pass anyway, there's nothing per se against further increasing the need for it. Which means: legroom for optimization!
About the "keep arch-all bpphs alive" thing: are you saying you tried that with, or without the double domination?
(12:09:18) bigjools:
Without.
Well — 2 sets of queries
rather than a full domination run
so it did arch-any first then arch-all
but that has the schizo problem.
(12:15:30) jtv:
Quite.
(12:15:53) jtv:
So that means that _with_ double domination, this might just work.
Dominate twice, but on the first pass, don't supersede arch-all at all. ons-from- same-source.
(So consider them, but keep them live)
I wonder if we could do the first pass for all architectures, before doing the second pass.
Because that way, we get to supersede all non-live arch-specific pubs before we even start looking at the other-publicati
The second pass could group by SPRs. There'd be only one getOtherPublica tionsForSameSou rce for each.
(12:20:44) bigjools:
Isn't this what we do right now?
(12:20:59) jtv:
Slightly different loop nesting.
Right now we loop over DASes, and for each, do 2 domination passes.
(12:21:15) bigjools:
Actually one way to speed it is to get the list of sources that had arch-all binaries that we left live.
(12:21:23) jtv:
Indeed.
Should be very easy to collect that information in the per-package domination loop.
That's a nice touch.
In fact, it eliminates a whole lot of work that I thought we were going to need!
I guess for the second pass, the algorithm could be something like:
Keep the latest version alive. Keep the remaining arch-specific versions alive. Keep any arch-indep versions alive if there are still arch-specific pubs for their SPR.
“What's the difference with what we do now?” I hear you ask.
(12:26:32) bigjools:
:)
(12:26:57) jtv:
The difference is that there's no need to dominate arch-specific BPPHs at all in that pass.
(12:27:06) bigjools:
Indeed.
(12:27:48) jtv:
AFAICS we can pass just the arch-specific BPPHs to the inner domination method.
Och aye, no we can't quite do that.
Because if the latest version is arch-specific, we'd end up deleting the last arch-all version instead of superseding it.
(Unless we allow the caller to "pre-seed" the dominant)
Anyway, that's going into needless detail.
First I'll apply some loop fission to separate the two passes into separate DAS loops.
Then I add code to collect the SPRs-with- live-arch- indep-BPPHs.
Then I rearrange the second-pass loop to iterate over those, with an inner loop that iterates DAS.
(12:33:15) bigjools:
Nice.
(12:33:37) jtv:
And somewhere along the way, I cut computations that duplicate the outer loops out of the domination method.
In particular, the call to getPublications ForSameSource goes into the outer second-pass loop. Its results get re-used for every architecture.
(Thus we query it once for every SPR, apply the result to each architecture, then move on to the next SPR)
(12:36:47) bigjools:
ok