John A Meinel wrote:
[...]
> I think some of it stems from, do you want a WT in the target or not,
> etc, etc. I would be happier with something that you can compose. This
> is also cluttered because you can't just open the WT from a Branch, it
> will re-open the branch, etc. (Note that there was also bugs with
> branch.sprout() which didn't happen with bzrdir.sprout(), something with
> Trees and merge, and double-locking the WT, IIRC)

Perhaps we need an object that explicitly composes together all the components
of a controldir?  i.e. rather than getting a controldir that you can call
open_branch etc on, and but then to go from the branch to the working tree you
need to reopen the working tree, you grab all the objects at once.

Without thinking about it too hard, maybe something like:

   bag_of_things = open_bag(location)   # strawman name
   branch = bag_of_things.get_branch()
   wt = bag_of_things.get_workingtree() # does not reopen branch; it's opened
                                        # once and remembered for the lifetime
                                        # of the bag.

   [etc etc]

I'm not sure that's better, but I am sure the existing design isn't
great :)

The current API design makes it hard to avoid reopening the same object
over and over.  Ideally the design would require you to go out of your way to
get two different in-memory objects representing the same on-disk
object, because that's usually not what you want.

> Things we want to keep in mind:
> 
> a) In the ideal interface, we would have a way to copy any number of
>    branches simultaneously into a target. This gives us a way to
>    implement some form of Repository cloning that people would like.
>    (Being able to grab the stack of a loom, or trunk+main features,
>    etc.)

We at least have a workable way to do the fetch part of that operation,
but we certainly lack nice ways to find, open, copy or create sets of
branch objects.

[...]
> +class FetchSpecFactory(object):
[...]
> ^- I really prefer calling them "source_branch_stop_revision_id". or
> "source_tip_revision_id". I'd like the id to be in there, to make it
> clear that this is a string rev_id, not a tuple key, not a Revision
> object, etc.

Good point, I'll make that change.

> However, shouldn't it be "source_tip_revision_ids" (a list of strings).
> To allow more flexibility? We've often been hampered by having a single
> revision_id to work with.

Hmm.  Probably, although I'm inclined to delay complicating this further
until we have concrete use cases to drive the design.  YAGNI, in other
words :)

I wouldn't be surprised to find at the point someone refactors to add
that sort of feature that the result turns out to be a deeper change.

> I see that you have "explicit_rev_ids = set()" which isn't documented.

Hmm, it's class-private variable really, I'll prefix it with an
underscore.

> Is there a reason to keep "source_branch_stop_revision" at all?

Yes.  It's used by
<https://code.launchpad.net/~spiv/bzr/fetch-tags-from-non-sprout-too/+merge/42911>,
to support operations like “bzr pull -r 99”.  That fetch should not pull
revno 100.  So source_branch_stop_revision is the variable that
overrides the default “fetch the branch tip” behaviour, which is
hopefully fairly clear when you look at how that variable is used by
make_fetch_spec:

            if self.source_branch_stop_revision is not None:
                heads_to_fetch.add(self.source_branch_stop_revision)
            else:
                heads_to_fetch.add(self.source_branch.last_revision())

> +            try:
> +                tags_to_fetch.update(
> +                    self.source_branch.tags.get_reverse_tag_dict())
> 
> I'm worried about the performance of this, especially in the case of
> stuff like Emacs. IIRC the conversion from CVS brought something like
> 2000 tags.
> 
> While it probably isn't critical for "bzr branch", if "bzr pull"
> suddenly has to download and query against 2000 tips, that could get
> expensive.

Hmm.  It would be good to find out what the performance impact is for
branches like Emacs.  I'll do some experiments.  If/when this patch
lands, the common case for pull will be that the 2000 tips are already
in the local repo, so it'll depend on how fast it is to make that query
on a local repo.

On the other hand, if you have 2000 tags, and a large fraction of them
don't work because the revisions are missing, then that's also pretty
bad :)

> (Note that if we had versioned tags, then you could find the diff of the
> tag definition, and then only fetch those tags. As a fairly reasonable
> way to get incremental fetch only thinking about incremental changes.)

I suppose another possibility for pull is to compare the source and
target's tags dict, and only consider tips from new or changed tags.

This would have the interesting result that "bzr pull" in a branch made
from 2.2 will leave broken tags broken — but also that the first pull
made with 2.3 (assuming this patch lands) will not suddenly do an
unexpectedly large fetch as it fills in the broken tags.  I'm not sure
if that's a net positive or not :)

If we did this we'd probably want something like fetch-ghosts could use
if they did want to fill in broken tags in an existing branch.  I guess
deleting all tags and then pulling again would be potential clumsy
workaround.

> +        fetch_spec_factory = FetchSpecFactory()
> +        if revision_id is not None:
> +            fetch_spec_factory.add_revision_ids([revision_id])
> +            fetch_spec_factory.source_branch_stop_revision = revision_id
> 
> ^- This structure is a bit confusing. I'd rather see:
> 
>    fetch_spec_factory.set_tip_revision(revision_id)
> 
> And then have that add_revision_ids() && set the member.
> 
> Otherwise it is just a hard-to-get-right api. But see my earlier
> discussion about wanting the ability to fetch multiple branches. So maybe:
>   fetch_spec_factory.add_head() or add_tip_revision_id() or ...

Ok I'll try that, and see how it looks in this branch and the follow-on
branch fetch-tags-from-non-sprout-too.

> +            for path, file_id in subtrees:
> +                target = urlutils.join(url, urlutils.escape(path))
> +                sublocation = source_branch.reference_parent(file_id, path)
> +                sublocation.bzrdir.sprout(target,
> +                    basis.get_reference_revision(file_id, path),
> +                    force_new_repo=force_new_repo, recurse=recurse,
> +                    stacked=stacked)
> 
> ^- This is also a hint for a case where we would probably like to do one
> large fetch across all the subtrees. Though it is a bit muddled because
> Aaron didn't want to enforce that subtrees must share a repository.

This hunk is unchanged in my patch apart from indentation level, so I'm
not too concerned with tweaking it :)

> The tests seem ok, I wasn't very thorough.
> 
> The idea that sprout will use branch.last_revision() if none is supplied
> is ok. Though having a way to explicitly request all revisions for a
> repository, even if it has a branch, might also be useful...

It *might* be useful.  Citation needed ;)

>  review: needs_fixing
> 
> The big thing is to clarify some of the new fetch spec interface, the
> rest is all stuff we can discuss to figure out what will be the nicest
> way forward.

Agreed.  If you have the energy, take a look at how
fetch-tags-from-non-sprout-too uses it too.

-Andrew.