-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andrew Bennetts wrote: > John A Meinel wrote: > [...] >> Thanks for doing this. I did the work in Packer, but it is nice to have >> it here. > > Ah! I thought we had something like this. I'll see what code and ideas > I can reuse. > >> I would tend to set "auto_reorder" to always on. I don't see it ever >> making things worse, and there should only be a small overhead for the >> reordering. As such, I would also get rid of the constructor flag. > > Hmm, ok. If I was just targetting bzr.dev I would have done this > already, but if you're confident it won't cause unexpected issues I'm > happy to make it unconditional (including in backports). > >> Doing it by names is ok, I think Index already has _name for most >> indices. You could also order by "access_tuple()" which I believe is a >> public API and returns (transport, name) to the original .pack file. The >> *nice* thing about doing that is that you don't have to introduce any >> new apis here. But if it isn't as clean, then don't worry too much about it. > > _name is really a path for _transport, and as such includes the suffix. > So I don't think it's as clean either code-wise or conceptually to rely > on that. And as you noticed later access_tuple is on AggregateIndex, so > it's not readily available either. > >> If it isn't too much overhead, you might consider: >> >> keys_remove = keys.remove >> # Be careful, if keys starts empty, we would break, and get a name error >> index_hit_count = 0 >> for index in self._indices: >> if not keys: >> break >> index_hit_count = 0 >> for node in index.iter_entries(keys): >> keys_remove(node[1]) >> yield node >> index_hit_count += 1 >> >> if index_hit_count: >> hit_indices.append((index_hit_count, index)) >> >> hit_indices.sort(reverse=True) >> >> self._move_index_order(hit_indices) >> >> >> The nice thing about it, is that it is adaptive. In that if you start a >> search, and hit it in index 1, then you keep looking there, but if the >> keys start showing up in index 2, then you'll switch. The only other bit >> of logic that I could think of is: > > I don't quite follow you here. Wouldn't my logic switch to preferring > index 2 anyway? > > It's not clear to me that ordering the hit_indices by hit count is > better. > The point is that if the first index in the list gets 1 key hit, but the second gets 10, and the third gets 5, then it will get sorted into (index2, index3, index1) rather than (index1, index2, index3). At least, what I saw with your code was that it is just 'preserve existing order, but move the ones hit to the front'. Put another way, say the hits are: 5 index1 10 index2 0 index3 4 index5 0 index6 your version would sort to index1 index2 index5 index3 index6 mine would sort to index2 index1 index5 index3 index6 I think the latter is slightly preferred. I don't have any numbers to back that up, just a feeling. (Most likely in this case index2 is a 'bigger' index, and thus has more to *be* hit.) As for small indices getting rejected quickly, it is true they won't encounter any I/O overhead, which may be sufficient. It is a function call, but that is probably tiny overhead, and they'll get sorted late quickly anyway. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkug6fEACgkQJdeBCYSNAAMFDwCeNFunVP9tcpRfDfJBVMki8W7O pe0AoJRHXzjuKaU0a0I98H7GKuplAaBu =gpIB -----END PGP SIGNATURE-----