Robert Collins wrote:
> On Fri, 2009-08-07 at 05:39 +0000, Andrew Bennetts wrote:
[...]
> [debug flags] 
> > I hope we don't need to ask people to use them, but they are a cheap insurance
> > policy.
> > 
> > More usefully, they are helpful for benchmarking and testing.  I'm ok with
> > hiding them if you like.
> 
> I don't have a strong opinion. The flags are in our docs though as it
> stands, so they should be clear to people reading them rather than
> perhaps causing confusion.

I'll leave them in, assuming I can find a way to make the ReST markup cope with
them...

[inventory_delta.py API]
> Anyhow, at the moment it seems unclear and confusing, rather than
> clearer. I would find it clearer as either being on the __init__, I
> think, or perhaps with two seperate constructors? No biggy, not enough
> to block the patch on, but it is a very awkward halfway house at the
> moment - and I wouldn't want to see it left that way - so I'm concerned
> that you might switch focus away from this straight after landing it.

I've now split the class into separate serializer and deserializer, which gives
“two separate constructors”.  I think it's an improvement, I hope you'll think
so too!

[repacking single pack bug]
> > I don't understand why that case isn't covered.  Perhaps you should file the bug
> > instead of me?
> 
> I'd like to make sure I explain it well enough first; once you
> understand either of us can file it - and I'll be happy enough to be
> that person. 
[...]
> What you need to do is change the test from len(self.packs) == 1, to
> len(packs being combined) == 1 and that_pack has the same hash.

But AIUI “self.packs” on a Packer *is* “packs being combined”.  If it's not then
your explanation makes sense, but my reading of the code says otherwise.

[...]
> > > > -            if not hint or pack.name in hint:
> > > > +            if hint is None or pack.name in hint:
> > > 
> > > ^- the original form is actually faster, AFAIK. because it skips the in
> > > test for a hint of []. I'd rather we didn't change it, for all that its
> > > not in a common code path.
> > 
> > But if hint is None, then “pack.name in hint” will fail.
> 
> It would, but it won't execute. As we discussed, if you could change it
> back, but add a comment - its clearly worthy of expanding on (in either
> form the necessity for testing hint isn't obvious, apparently).

Ok.  (I still find it cleaner to have the hint is None case handled distinctly
from the hint is a list case, it seems to express the intent more clearly to me,
rather than making it appear to simply be a micro-optimisation.  But I
understand that tastes vary.)

Oh, actually, it does matter.  When hint == [], no packs should be repacked.
When hint is None, all packs should be repacked.

I've added this comment:

                # Either no hint was provided (so we are packing everything),
                # or this pack was included in the hint.

[...]
> > > > +    def _should_fake_unknown(self):
[...]
> 
> As we discussed, I think the method should say what it does; the fact
> that the base class encodes more complex policy than a child is a smell
> (not one that you need to fix), but it would be very nice to have pydoc
> help make it more obvious that the base class does this special
> handling.

Made a docstring.

> > > > +class SmartServerRepositoryInsertStream_1_18(SmartServerRepositoryInsertStreamLocked):
[...]
> 
> So, if the lock_token was required, there would be:
>  - just one verb
>  - no backwards compatibility issues
> 
> Up to you; I do think less code is better, all things considered.

I'll leave it as is.  I don't think the costs of doing it this way are
insignificant, and it doesn't lock us into assuming future clients/formats must
be repos in advance before inserting a stream.

[interrepo tests]
> > I considered leaving it there simply for test selection, but that's not the
> > right way to do that, because it can be (and was!) out of sync with what the
> > test was actually doing.
> 
> You proposed putting class names in literally, that would be fine by me.
> Its no less likely to skew, but I think the issue goes deeper than 'it
> can skew' - we have many potentially expensive tests being executed in
> the same situation, AIUI. So we may well want to revisit this.

I've put the class names in literally.  I agree that the test structure here
doesn't seem optimal, and is probably worth revisiting some day.

> > > > +    def assertCanStreamRevision(self, repo, revision_id):
> > > > +        exclude_keys = set(repo.all_revision_ids()) - set([revision_id])
> > > > +        search = SearchResult([revision_id], exclude_keys, 1, [revision_id])
> > > > +        source = repo._get_source(repo._format)
> > > > +        for substream_kind, substream in source.get_stream(search):
> > > > +            # Consume the substream
> > > > +            list(substream)
> > 
> > > ^ - this assertion, while named nice, isn't strong enough for me. I'd
> > > like it to assert that either a delta matching the local changes, or a
> > > full inventory, are streamed - and that the newly referenced file texts
> > > are present.
> > 
> > Hmm, fair enough.  That may happen after landing, though, but not long after.
> 
> If this is opening a whole, I'd really rather it be done before landing;
> especially if you agree its worth doing. If its an existing whole that
> refactoring made obvious, then leaving it till later is fine with me.

Well, before it was only checking for some has_revisions and
.inventories.keys(), which turned out to be insufficient to notice some bugs.
e.g. not copying all the chk_bytes for the inventories.  This assertion caught
those.  So I think it is an incremental improvement, not opening a hole.

I think John has made some overlapping changes in bzr.dev based on the bugs I
noticed; I'll make sure this change fits his.  Oh ho!  Merging bzr.dev gets some
failures from assertCanStreamRevision that just the assertions in bzr.dev don't.
Hmm... oh!  It appears that when insert_stream tries a pack, and it turns out
the single pack is already optimal, it was accidentally causing that pack to be
removed from the pack-names list!  (Because it removes the unneeded pack, but
it's unneeded because it has the same name as the original pack, so it forgets
the original pack...).  Ok, I've fixed that now, but again it's something this
assertion catches that none of the assertions in this test in bzr.dev did :)

And for now I've kept all the assertions in these tests from bzr.dev, so the
changes here are strictly increasing coverage, so definitely no hole added.

A tangent: when we transfer parent inventories to a stacked repo, should we
transfer the texts introduced by those inventories or not?  I'm seeing a
worrisome half-and-half behaviour from the streaming code, where the synthesised
rich root, but no other texts, are transferred for inventories requested via
get_stream_for_missing_keys.  I think the intention is not (if those parent
revisions are ever filled in later, then at that time the corresponding texts
would be backfilled too).  I've now fixed this too.

The net result of this is that the stacking tests in
per_interrepository/test_fetch.py are starting to get pretty large.  I suspect
we can do better here... (but after this branch lands, I hope!)

> > > > === modified file 'bzrlib/tests/test_remote.py'
> > > > --- bzrlib/tests/test_remote.py 2009-07-30 04:27:05 +0000
> > > > +++ bzrlib/tests/test_remote.py 2009-08-05 01:41:03 +0000
> > > > 
> > > > +    """Tests for using Repository.insert_stream verb when the _1.18 variant is
> > > > +    not available.
> > > > +
> > > > +    This test case is very similar to TestRepositoryInsertStream_1_18.
> > > > +    """
> > > 
> > > Perhaps a base class would make this smaller - it is nearly identical.
> > 
> > I don't see any elegant ways to cut the duplication at this point.  I haven't
> > tried very hard though, because it's just small enough that it fits under my
> > threshold for tolerating the ugly...
> 
> It went past mine :). TemplateMethod perhaps?

The thing is that the important part of the tests is the expected network
conversation, and that's also the bulk of the near-duplication.  So I'm finding
it hard to see a way to reduce that duplication without obscuring the key part
of the test.

I've refactored out a bit of duplication that isn't the network conversation,
and it does help a bit.

> > > I"m not sure how to say 'needs fixing' in review mail, but thats what
> > > I'd say.

(It's “review needs-fixing”, btw.  Not sure when that got added.)

I've attached a psuedo-incremental diff (I merged latest bzr.dev into the tip of
this branch, and separately bzr.dev into the last reviewed revision, and then
produced the diff of those two).  So if anything looks funny in the diff you
might want to check the actual branch.

-Andrew.