Merge lp:~spiv/bzr/redundant-missing-keys-check into lp:bzr

Proposed by Andrew Bennetts on 2010-12-14
Status: Merged
Approved by: John A Meinel on 2010-12-14
Approved revision: 5568
Merged at revision: 5569
Proposed branch: lp:~spiv/bzr/redundant-missing-keys-check
Merge into: lp:bzr
Diff against target: 75 lines (+9/-16)
4 files modified
bzrlib/fetch.py (+0/-15)
bzrlib/tests/blackbox/test_push.py (+1/-1)
doc/en/release-notes/bzr-2.3.txt (+5/-0)
doc/en/whats-new/whats-new-in-2.3.txt (+3/-0)
To merge this branch: bzr merge lp:~spiv/bzr/redundant-missing-keys-check
Reviewer Review Type Date Requested Status
John A Meinel 2010-12-14 Approve on 2010-12-14
Review via email: mp+43623@code.launchpad.net

Commit Message

Remove redundant parent inventories calculation during fetch.

Description of the Change

This removes a parent inventories calculation from fetch.py. The reasons why I am confident this is safe are:

 * repo.insert_stream already performs this calculation to generate the "missing_keys" set it returns, and RepoFetcher._parent_inventories was adding to that same set.
 * it was only called for repositories with fallbacks, so there's no way that old formats with could be affected, so there's no risk to strange corner cases involving e.g. insert_stream on a knit repo.
 * when there are missing keys, a send insert_stream is performed, and the missing_keys result from that second call is asserted to be empty as a sanity check — yet that sanity check never calls self._parent_inventories. So this code was already trusting insert_stream to be sufficient.
 * by inspection, the parent inventories logic in insert_stream is identical (in fact, stricter: it performs the calculation if the repo supports stacking, even if it isn't stacked).
 * all the tests pass as is, except for one HPSS acceptance test, which improves by one round trip :)

As a result of deleting this redundant code the push-to-stacked HPSS ratchet tightens by 1 call: it removes a get_parent_map call that used to happen immediately after an insert_stream call.

I added a What's New entry that covers both this and the "Branch.lock_read caches tags" improvement made earlier in the 2.3 cycle.

To post a comment you must log in.
John A Meinel (jameinel) wrote :
Download full text (3.5 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/14/2010 3:38 AM, Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/redundant-missing-keys-check into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
>
> This removes a parent inventories calculation from fetch.py. The reasons why I am confident this is safe are:
>
> * repo.insert_stream already performs this calculation to generate the "missing_keys" set it returns, and RepoFetcher._parent_inventories was adding to that same set.
> * it was only called for repositories with fallbacks, so there's no way that old formats with could be affected, so there's no risk to strange corner cases involving e.g. insert_stream on a knit repo.
> * when there are missing keys, a send insert_stream is performed, and the missing_keys result from that second call is asserted to be empty as a sanity check — yet that sanity check never calls self._parent_inventories. So this code was already trusting insert_stream to be sufficient.
> * by inspection, the parent inventories logic in insert_stream is identical (in fact, stricter: it performs the calculation if the repo supports stacking, even if it isn't stacked).
> * all the tests pass as is, except for one HPSS acceptance test, which improves by one round trip :)
>
> As a result of deleting this redundant code the push-to-stacked HPSS ratchet tightens by 1 call: it removes a get_parent_map call that used to happen immediately after an insert_stream call.
>
> I added a What's New entry that covers both this and the "Branch.lock_read caches tags" improvement made earlier in the 2.3 cycle.

=== modified file 'bzrlib/fetch.py'
- --- bzrlib/fetch.py 2010-09-17 04:35:23 +0000
+++ bzrlib/fetch.py 2010-12-14 09:38:10 +0000
@@ -125,9 +125,6 @@
             pb.update("Inserting stream")
             resume_tokens, missing_keys = self.sink.insert_stream(
                 stream, from_format, [])
- - if self.to_repository._fallback_repositories:
- - missing_keys.update(
- - self._parent_inventories(search.get_keys()))
             if missing_keys:
                 pb.update("Missing keys")
                 stream = source.get_stream_for_missing_keys(missing_keys)

^- IIRC, this was added because we *weren't* sending missing keys
properly in the stream. So when we fixed it, we had both the client and
the server double check for extra missing keys.

I'm not positive that this is the fix, but in bzr-1.14.txt:
* Pushing a new stacked branch will also push the parent inventories for
  revisions at the stacking boundary. This makes sure that the stacked
  branch has enough data to calculate inventory deltas for all of its
  revisions (without requiring the fallback branch). This avoids
  "'AbsentContentFactory' object has no attribute 'get_bytes_as'" errors
  when fetching the stacked branch from a 1.13 (or later) smart server.
  This partially fixes #354036. (Andrew Bennetts, Robert Collins)

So we've only really had the bug in 1.6/1.9/1.14 era repos. It doesn't
look like it has been a problem for 2a repos.

I'm not as worried about maintaining strict workarounds for bzr-1.13 vs
not...

Read more...

review: Approve
Andrew Bennetts (spiv) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/fetch.py'
2--- bzrlib/fetch.py 2010-09-17 04:35:23 +0000
3+++ bzrlib/fetch.py 2010-12-14 09:38:10 +0000
4@@ -125,9 +125,6 @@
5 pb.update("Inserting stream")
6 resume_tokens, missing_keys = self.sink.insert_stream(
7 stream, from_format, [])
8- if self.to_repository._fallback_repositories:
9- missing_keys.update(
10- self._parent_inventories(search.get_keys()))
11 if missing_keys:
12 pb.update("Missing keys")
13 stream = source.get_stream_for_missing_keys(missing_keys)
14@@ -163,18 +160,6 @@
15 self.from_repository, self._last_revision,
16 find_ghosts=self.find_ghosts)
17
18- def _parent_inventories(self, revision_ids):
19- # Find all the parent revisions referenced by the stream, but
20- # not present in the stream, and make sure we send their
21- # inventories.
22- parent_maps = self.to_repository.get_parent_map(revision_ids)
23- parents = set()
24- map(parents.update, parent_maps.itervalues())
25- parents.discard(NULL_REVISION)
26- parents.difference_update(revision_ids)
27- missing_keys = set(('inventories', rev_id) for rev_id in parents)
28- return missing_keys
29-
30
31 class Inter1and2Helper(object):
32 """Helper for operations that convert data from model 1 and 2
33
34=== modified file 'bzrlib/tests/blackbox/test_push.py'
35--- bzrlib/tests/blackbox/test_push.py 2010-12-09 04:26:53 +0000
36+++ bzrlib/tests/blackbox/test_push.py 2010-12-14 09:38:10 +0000
37@@ -231,7 +231,7 @@
38 # being too low. If rpc_count increases, more network roundtrips have
39 # become necessary for this use case. Please do not adjust this number
40 # upwards without agreement from bzr's network support maintainers.
41- self.assertLength(14, self.hpss_calls)
42+ self.assertLength(13, self.hpss_calls)
43 remote = branch.Branch.open('public')
44 self.assertEndsWith(remote.get_stacked_on_url(), '/parent')
45
46
47=== modified file 'doc/en/release-notes/bzr-2.3.txt'
48--- doc/en/release-notes/bzr-2.3.txt 2010-12-07 16:27:49 +0000
49+++ doc/en/release-notes/bzr-2.3.txt 2010-12-14 09:38:10 +0000
50@@ -26,6 +26,11 @@
51 .. Improvements to existing commands, especially improved performance
52 or memory usage, or better results.
53
54+* A redundant parent inventories calculation was removed from
55+ ``fetch.py``, as ``Repository.insert_stream`` already reports any
56+ missing inventories. This removes at least one network roundtrip when
57+ pushing to a stacked branch. (Andrew Bennetts)
58+
59 * ``bzr resolve`` now accepts ``--take-this`` and ``--take-other`` actions
60 for text conflicts. This *replace* the whole file with the content
61 designated by the action. This will *ignore* all differences that would
62
63=== modified file 'doc/en/whats-new/whats-new-in-2.3.txt'
64--- doc/en/whats-new/whats-new-in-2.3.txt 2010-12-03 07:51:28 +0000
65+++ doc/en/whats-new/whats-new-in-2.3.txt 2010-12-14 09:38:10 +0000
66@@ -69,6 +69,9 @@
67 * ``bzr send`` uses less memory.
68 (John Arbash Meinel, #614576)
69
70+* Fetches involving stacked branches and branches with tags are now faster.
71+ Some redundant network reads were removed. (Andrew Bennetts)
72+
73 * Inventory entries now consume less memory (on 32-bit Ubuntu file entries
74 have dropped from 68 bytes to 40, and directory entries from 120 bytes
75 to 48). This affects most operations, and depending on the size of the