Merge lp:~spiv/bzr/fetch-spec-everything-not-in-other into lp:bzr
| Status: | Merged |
|---|---|
| Merged at revision: | 5648 |
| Proposed branch: | lp:~spiv/bzr/fetch-spec-everything-not-in-other |
| Merge into: | lp:bzr |
| Diff against target: |
247 lines (+32/-36) 10 files modified
bzrlib/fetch.py (+10/-15) bzrlib/graph.py (+6/-5) bzrlib/remote.py (+3/-3) bzrlib/repofmt/knitrepo.py (+1/-1) bzrlib/repofmt/weaverepo.py (+1/-1) bzrlib/repository.py (+2/-2) bzrlib/tests/per_interrepository/test_interrepository.py (+1/-1) bzrlib/tests/test_remote.py (+3/-3) doc/en/release-notes/bzr-2.3.txt (+0/-5) doc/en/release-notes/bzr-2.4.txt (+5/-0) |
| To merge this branch: | bzr merge lp:~spiv/bzr/fetch-spec-everything-not-in-other |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Martin Pool | 2010-12-07 | Approve on 2011-02-04 | |
| John A Meinel | 2010-12-06 | Pending | |
|
Review via email:
|
|||
This proposal supersedes a proposal from 2010-11-29.
Commit Message
Add EverythingResult, EverythingNotIn
Description of the Change
Like <https:/
This patch defines some new “fetch specs” to be passed as the fetch_spec argument of fetch: EverythingResult, EmptySearchResult, EverythingNotIn
Separately I'm not 100% sure we need the ability to have fetch specs that aren't resolved into search results immediately (i.e. how AbstractSearch instances are used). This happens to be effectively how the existing code works: e.g. code that does "target.
A nice thing about adding these objects is that it shifts logic to handle these different cases out of the innards of fetch.py to somewhere more modular. This in turn means we can support these things better over HPSS, and so this patch does that: the 'everything' search is now added to the network protocol. Happily new verb was needed, we can safely interpret a BadSearch error in this case as meaning we need to fallback.
Another change in this patch is the deprecation of search_
| Andrew Bennetts (spiv) wrote : | # |
John A Meinel wrote:
> Review: Needs Fixing
> + if not isinstance(search, graph.Everythin
> ^- can we add an attribute, rather than an isinstance check?
>
> if not search.
Hmm, I can do that, although currently there's no common base class for these
objects, and my gut feeling is that it would be good to keep their interface as
small and simple as possible. So I'm unsure about whether or not to do this. I
think it's probably easier to add this later than it is to remove it later.
> 490 + def _present_
> 491 + """Returns set of all revisions in ancestry of revision_ids present in
> 492 + the source repo.
> 493 +
> 494 + :param revision_ids: if None, all revisions in source are returned.
> 495 + """
> 496 + if revision_ids is not None:
> 497 + # First, ensure all specified revisions exist. Callers expect
> 498 + # NoSuchRevision when they pass absent revision_ids here.
> 499 + revision_ids = set(revision_ids)
> 500 + graph = self.source.
> 501 + present_revs = set(graph.
> 502 + missing = revision_
> 503 + if missing:
> 504 + raise errors.
> 505 + source_ids = [rev_id for (rev_id, parents) in
> 506 + self.source.
> 507 + if rev_id != _mod_revision.
> 508 + and parents is not None]
> ^- you have "graph = self.source.
Fixed, thanks for spotting that. I think it occurred because I initially
copy-and-pasted the list comprehension from graph.py.
> 145 + def get_recipe(self):
> 146 + raise NotImplementedE
> 147 +
> 148 + def get_network_
> 149 + return ('everything',)
>
> Why don't EverythingNotIn
Because they are a different kind of object. Obviously that's not clear enough
in the current code!
The names could stand to be improved, but there are basically two kinds of
“fetch spec” in this patch:
* a network-ready search result: SearchResult, PendingAncestry
EverythingRe
(I'm not certain that's the perfect design, but it's not too far off.)
* a lightweight object that can be resolved into a network-ready search result
later: EverythingNotIn
method that inspects the repository/ies to generate the search result.
> Overall, I think these changes seem good, but I don't really see how this gets
> us closer to computing a search without having one end doing a step-by-step
> search. (Certainly you're still using the same 'search_
> and you don't seem to be serializing anything over the wire...)
Right, this change isn't meant to solve that problem. The concrete performance
issue they help with is in the next patch, fetch-all-
and easy for the sprout code to request a fetch of all tags at the
same time as the tip.
Separately...
- 5544. By Andrew Bennetts on 2010-12-07
-
Merge fetch-spec-
everything- not-in- other. - 5545. By Andrew Bennetts on 2010-12-15
-
Merge latest fetch-spec-
everything- not-in- other.
| Martin Pool (mbp) wrote : | # |
I think I like the concept of splitting these out into something more like objects.
I don't want to hold this hostage but it seems like this would be a good moment to add a little developer documentation to the architectural guide describing at a high level how fetch works.
+ def get_network_
+ start_keys = ' '.join(
+ stop_keys = ' '.join(
+ count = str(self.
+ return (self._recipe[0], '\n'.join(
+
rather than adhoc encoding, could you send it as bencode?
+ medium.
out of scope for this, but concluding it's older than 2.3 because it doesn't support this particular operation is really not accurate.
+ """See InterRepository
typo.
+ def test_fetch_
+ """Can fetch with EverythingResult even when the server does not have
+ the Repository.
+ """
+ local = self.make_
+ builder = self.make_
+ builder.
+ remote_branch_url = self.smart_
+ remote_branch = bzrdir.
+ self.hpss_calls = []
+ local.repositor
+ fetch_spec=
+
I don't understand how this tests that the server doesn't have this verb.
=== modified file 'doc/en/
--- doc/en/
+++ doc/en/
@@ -158,6 +158,11 @@
crashes when encountering private bugs (they are just displayed as such).
(Vincent Ladeuil, #354985)
+* The ``revision_id`` parameter of
+ ``Repository.
+ ``InterReposito
+ replaced by the ``revision_ids`` parameter. (Andrew Bennetts)
+
Internals
*********
spiv said on irc that he's testing this; if it turns out to be faster in practice that should be mentioned too. But I guess this patch alone won't change performance.
- 5546. By Andrew Bennetts on 2010-12-21
-
Fix docstring typo.
| Andrew Bennetts (spiv) wrote : | # |
Martin Pool wrote:
> Review: Needs Information
> I think I like the concept of splitting these out into something more like
> objects.
>
> I don't want to hold this hostage but it seems like this would be a good
> moment to add a little developer documentation to the architectural guide
> describing at a high level how fetch works.
Good idea. I'll see what I can write up without spending ages on it.
> + def get_network_
> + start_keys = ' '.join(
> + stop_keys = ' '.join(
> + count = str(self.
> + return (self._recipe[0], '\n'.join(
> +
>
> rather than adhoc encoding, could you send it as bencode?
For this struct: no, because this is how existing clients encode SearchResult in
get_stream requests. (And get_parent_map requests, but that's handled
separately in remote.py. See RemoteRepositor
> + medium.
>
> out of scope for this, but concluding it's older than 2.3 because it doesn't
> support this particular operation is really not accurate.
Yeah. But that is definitely out of scope :)
> + """See InterRepository
>
> typo.
Fixed, thanks.
> + def test_fetch_
> + """Can fetch with EverythingResult even when the server does not have
> + the Repository.
> + """
> + local = self.make_
> + builder = self.make_
> + builder.
> + remote_branch_url = self.smart_
> + remote_branch = bzrdir.
> + self.hpss_calls = []
> + local.repositor
> + fetch_spec=
> +
>
> I don't understand how this tests that the server doesn't have this verb.
Oops, out-of-date docstring and test. I didn't need a new verb in the end, I
just added more search types to the existing get_stream_1.19 verb. I'll fix
that: instead I should be testing that a BadSearch reply to the new search
triggers the old requests (i.e. VFS). Probably that's a bit too hard to mock
(because it involves VFS) so I might have to monkeypatch or add a knob to the
get_stream handler class...
[...]
> spiv said on irc that he's testing this; if it turns out to be faster in
> practice that should be mentioned too. But I guess this patch alone won't
> change performance.
I don't expect it to improve performance (unless compared to fetching all the
tags by hand with the existing bzr CLI!) either.
The concern jam had is that querying for all the tags on every update and other
operations might have a noticeable negative impact on branches with many tags
such as emacs. My guess and preliminary results are that this isn't a problem,
but I'll make sure before landing this.
-Andrew.
| Andrew Bennetts (spiv) wrote : | # |
In the case of lp:emacs there's no significant performance difference. The HPSS call count, memory consumption and wall clock time are unchanged (aside from noise in measurement). The HPSS request for the initial 'bzr branch' is larger, as it requests every rev named in the tags, not just tip, but that's a drop in the bucket vs. fetching the 239M of lp:emacs.
Also, lp:emacs has no tags pointing to missing revs (or even non-ancestry revs), so "bzr pull" works out identically with the new code vs. bzr.dev: all the revisions named in the source's tags are already present, so nothing changes. The wall clock time for the new code is again the same (within the noise of my measurements).
The main case I'd be concerned about is when the source branch has tags that refer to missing revisions. The target will copy those tags, and then on every update/pull re-request those tags from the source repository. Because those lookups in the revisions index will be misses, the server will end up opening every *.rix file. In the HPSS case that will happen server side, so it probably isn't a big concern there, but it might be an issue if using a dumb remote transport. The solution for affected branches is to delete the broken tags, or fill in the missing revisions. That's probably worth the benefit this gives, which is making tags Just Work in more cases. Branches and tags made with this new bzr version will of course fetch all tags by default, so that scenario won't be a concern for them.
In short: I'm confident. If it copes with a branch with as much history and tags as lp:emacs, it probably copes with everything else.
- 5547. By Andrew Bennetts on 2011-01-04
-
Fix test_fetch_
everything_ backwards_ compat to actually test what it is intended to test.
| Andrew Bennetts (spiv) wrote : | # |
> > I don't want to hold this hostage but it seems like this would be a good
> > moment to add a little developer documentation to the architectural guide
> > describing at a high level how fetch works.
>
> Good idea. I'll see what I can write up without spending ages on it.
Done: <https:/
I believe all the points/questions raised so far have now been answered.
- 5548. By Andrew Bennetts on 2011-01-12
-
Rename get_search_result to execute, require a SearchResult (and not a search) be passed to fetch.
- 5549. By Andrew Bennetts on 2011-01-12
-
Merge lp:bzr
| Martin Pool (mbp) wrote : | # |
I think it would be worth adding docstrings on AbstractSearchR
+ if symbol_
+ symbol_
+ 'search_
+ 'deprecated in 2.3. Use revision_ids=[...] instead.',
+ DeprecationWarning, stacklevel=2)
That will now be 2.4. It seems like we ought to have common handling of deprecated parameters. (https:/
merge-with-tweaks
Thanks, and sorry for the delay.
- 5550. By Andrew Bennetts on 2011-02-07
-
Merge lp:bzr.
- 5551. By Andrew Bennetts on 2011-02-07
-
Update an occurrence of '2.3' that should be '2.4'.

+ if not isinstance(search, graph.Everythin gResult) :
^- can we add an attribute, rather than an isinstance check?
if not search. is_everything( ):
490 + def _present_ source_ revisions_ for(self, revision_ids): get_graph( ) get_parent_ map(revision_ ids)) ids.difference( present_ revs) NoSuchRevision( self.source, missing.pop()) get_graph( ).iter_ ancestry( revision_ ids) NULL_REVISION get_graph( )" and then you inline "self.source. get_graph( )" into the source_ids list comprehension. Better to re-use the graph object.
491 + """Returns set of all revisions in ancestry of revision_ids present in
492 + the source repo.
493 +
494 + :param revision_ids: if None, all revisions in source are returned.
495 + """
496 + if revision_ids is not None:
497 + # First, ensure all specified revisions exist. Callers expect
498 + # NoSuchRevision when they pass absent revision_ids here.
499 + revision_ids = set(revision_ids)
500 + graph = self.source.
501 + present_revs = set(graph.
502 + missing = revision_
503 + if missing:
504 + raise errors.
505 + source_ids = [rev_id for (rev_id, parents) in
506 + self.source.
507 + if rev_id != _mod_revision.
508 + and parents is not None]
^- you have "graph = self.source.
145 + def get_recipe(self): rror(self. get_recipe) struct( self):
146 + raise NotImplementedE
147 +
148 + def get_network_
149 + return ('everything',)
Why don't EverythingNotIn Other and NotInOtherForRevs implement these functions?
Overall, I think these changes seem good, but I don't really see how this gets us closer to computing a search without having one end doing a step-by-step search. (Certainly you're still using the same 'search_ missing_ revision_ ids', and you don't seem to be serializing anything over the wire...)