Merge lp:~lifeless/bzr/bug-402652 into lp:~bzr/bzr/trunk-old

Proposed by Robert Collins
Status: Rejected
Rejected by: John A Meinel
Proposed branch: lp:~lifeless/bzr/bug-402652
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 94 lines
To merge this branch: bzr merge lp:~lifeless/bzr/bug-402652
Reviewer Review Type Date Requested Status
John A Meinel Needs Resubmitting
Review via email: mp+10960@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

[MERGE] Fix bug 402652 by recompressing all texts that are streamed -
slightly slower at fetch, substantially faster and more compact at read.

--

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/bug-402652 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> [MERGE] Fix bug 402652 by recompressing all texts that are streamed -
> slightly slower at fetch, substantially faster and more compact at read.
>
>

 review: abstain

I personally feel that this is going too far overboard. If you are doing
something like branching all of launchpad you will have:
60,000 revisions of perfectly packed data
and
1,000 revisions of poorly packed data (some of which is probably autopacked)

Telling the system that it must repack all 61,000 revisions on every
fetch is (IMO) significant overkill.

I think the current tradeoff (unordered fetching) is a better tradeoff
than that, until we get to the point where we can do a partial packing
on-the-fly.

By the way, is there an open bug for asking Launchpad to pack the
mirrored area? (You could force a pack of hosted bzr+ssh://
repositories, but that wouldn't do anything for the http side.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqdhGsACgkQJdeBCYSNAAO+hQCfUqkujjGCsliSCBA49dER50v5
JAMAoJJiTu9v8SWcqMdIkAdyn64Y6Rln
=CadJ
-----END PGP SIGNATURE-----

review: Abstain
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/bug-402652 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> [MERGE] Fix bug 402652 by recompressing all texts that are streamed -
> slightly slower at fetch, substantially faster and more compact at read.
>
>

A few actual review comments, while I'm looking at the code:

1) The test needs a self.addCleanup(target.unlock)
2) The change breaks some of the direct tests of insert_record_stream in
bzrlib/tests/test_groupcompress.py

3) I'm not sure what the 'if not insert_keys' stuff is. Maybe to make
those tests pass? insert_keys is only filled if random_id is sent, to
ensure that the ids sent are genuinely not repeated. (I've run into some
problems with the chk fetching code in the past when I set random_id =
True).

If it is there because of those tests, I think that is incorrect. But
I'd like to understand what you are trying to do.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqdj+kACgkQJdeBCYSNAANsKACfYbkJLci0fos+gOtvy72DTPEL
rAgAoIb7g7y28NWtztQbaZ9mMocmvjVO
=4vsm
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

I'm pretty sure this is superseded by the change to streaming fetch that I did. So I'm marking this as rejected. If I'm missing something, please re-open/resubmit.

review: Needs Resubmitting
Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, 2009-10-06 at 18:01 +0000, John A Meinel wrote:
> Review: Resubmit
> I'm pretty sure this is superseded by the change to streaming fetch that I did. So I'm marking this as rejected. If I'm missing something, please re-open/resubmit.

It was the same thing, earlier version of :P

So reject is fine.

-Rob

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-08-30 23:51:10 +0000
3+++ NEWS 2009-09-01 07:35:08 +0000
4@@ -69,6 +69,13 @@
5 revisions that are in the fallback repository. (Regressed in 2.0rc1).
6 (John Arbash Meinel, #419241)
7
8+* Fetches from 2a to 2a are now again requested in 'groupcompress' order,
9+ and all texts are recombined appropriately. This doesn't reuse existing
10+ groups, which will be measurable in some specific circumstances - an
11+ approximately 25% overhead. However, doing this ensures high performance
12+ reads subsequent to the fetch operation, which is the most common
13+ operation: write once read many. (Robert Collins, #402652)
14+
15 * Fix a segmentation fault when computing the ``merge_sort`` of a graph
16 that has a ghost in the mainline ancestry.
17 (John Arbash Meinel, #419241)
18
19=== modified file 'bzrlib/groupcompress.py'
20--- bzrlib/groupcompress.py 2009-08-30 21:34:42 +0000
21+++ bzrlib/groupcompress.py 2009-09-01 07:35:08 +0000
22@@ -1516,7 +1516,8 @@
23 # test_insert_record_stream_existing_keys fail for groupcompress and
24 # groupcompress-nograph, this needs to be revisited while addressing
25 # 'bzr branch' performance issues.
26- for _ in self._insert_record_stream(stream, random_id=False):
27+ for _ in self._insert_record_stream(stream, random_id=False,
28+ reuse_blocks=False):
29 pass
30
31 def _insert_record_stream(self, stream, random_id=False, nostore_sha=None,
32@@ -1580,7 +1581,7 @@
33 ' but then inserted %r two times', record.key)
34 continue
35 inserted_keys.add(record.key)
36- if reuse_blocks:
37+ if not inserted_keys and reuse_blocks:
38 # If the reuse_blocks flag is set, check to see if we can just
39 # copy a groupcompress block as-is.
40 if record.storage_kind == 'groupcompress-block':
41
42=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
43--- bzrlib/repofmt/groupcompress_repo.py 2009-08-24 19:34:13 +0000
44+++ bzrlib/repofmt/groupcompress_repo.py 2009-09-01 07:35:08 +0000
45@@ -932,7 +932,7 @@
46 super(GroupCHKStreamSource, self).__init__(from_repository, to_format)
47 self._revision_keys = None
48 self._text_keys = None
49- # self._text_fetch_order = 'unordered'
50+ self._text_fetch_order = 'groupcompress'
51 self._chk_id_roots = None
52 self._chk_p_id_roots = None
53
54@@ -949,7 +949,7 @@
55 p_id_roots_set = set()
56 source_vf = self.from_repository.inventories
57 stream = source_vf.get_record_stream(inventory_keys,
58- 'unordered', True)
59+ 'groupcompress', True)
60 for record in stream:
61 if record.storage_kind == 'absent':
62 if allow_absent:
63
64=== modified file 'bzrlib/tests/test_repository.py'
65--- bzrlib/tests/test_repository.py 2009-08-17 23:15:55 +0000
66+++ bzrlib/tests/test_repository.py 2009-09-01 07:35:08 +0000
67@@ -683,6 +683,27 @@
68
69 class Test2a(TestCaseWithTransport):
70
71+ def test_fetch_combines_groups(self):
72+ builder = self.make_branch_builder('source', format='2a')
73+ builder.start_series()
74+ builder.build_snapshot('1', None, [
75+ ('add', ('', 'root-id', 'directory', '')),
76+ ('add', ('file', 'file-id', 'file', 'content\n'))])
77+ builder.build_snapshot('2', ['1'], [
78+ ('modify', ('file-id', 'content-2\n'))])
79+ builder.finish_series()
80+ source = builder.get_branch()
81+ target = self.make_repository('target', format='2a')
82+ target.fetch(source.repository)
83+ target.lock_read()
84+ details = target.texts._index.get_build_details(
85+ [('file-id', '1',), ('file-id', '2',)])
86+ file_1_details = details[('file-id', '1')]
87+ file_2_details = details[('file-id', '2')]
88+ # The index, and what to read off disk, should be the same for both
89+ # versions of the file.
90+ self.assertEqual(file_1_details[0][:3], file_2_details[0][:3])
91+
92 def test_format_pack_compresses_True(self):
93 repo = self.make_repository('repo', format='2a')
94 self.assertTrue(repo._format.pack_compresses)