Merge lp:~lifeless/bzr/autopack-cross-format-fetch-1 into lp:~bzr/bzr/trunk-old

Proposed by Robert Collins
Status: Merged
Merged at revision: not available
Proposed branch: lp:~lifeless/bzr/autopack-cross-format-fetch-1
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: None lines
To merge this branch: bzr merge lp:~lifeless/bzr/autopack-cross-format-fetch-1
Reviewer Review Type Date Requested Status
John A Meinel Approve
Review via email: mp+7748@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

This branch causes a partial pack to happen when fetching across
serialisers for both the IDS and streaming code paths.

On the way, I encountered bug 365615 and had to fix it. I'd like John in
particular to review the gc repo changes accordingly - while textually
small the implications are not, and he wrote the GCCHK packer.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.8 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/autopack-cross-format-fetch-1 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This branch causes a partial pack to happen when fetching across
> serialisers for both the IDS and streaming code paths.
>
> On the way, I encountered bug 365615 and had to fix it. I'd like John in
> particular to review the gc repo changes accordingly - while textually
> small the implications are not, and he wrote the GCCHK packer.
>
> -Rob
>
>

...

I assume this is your fix for 365615:

                     for record in stream:
+ if record.storage_kind == 'absent':
+ # An absent CHK record: we assume that the
missing
+ # record is in a different pack - e.g. a
page not
+ # altered by the commit we're packing.
+ continue

Handling missing pages could be handled in 2 ways. What you have written
should work, and also doing:

=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
- --- bzrlib/repofmt/groupcompress_repo.py 2009-06-18 19:19:49 +0000
+++ bzrlib/repofmt/groupcompress_repo.py 2009-06-22 14:07:01 +0000
@@ -320,6 +320,7 @@
                 cur_keys = []
                 for prefix in sorted(keys_by_search_prefix):
                     cur_keys.extend(keys_by_search_prefix.pop(prefix))
+ cur_keys = remaining_keys.intersection(cur_keys)
         for stream in _get_referenced_stream(self._chk_id_roots,
                                              self._gather_text_refs):
             yield stream

In a lot of ways I'd like to get rid of "remaining_keys", though. (It is
very useful for progress indication, but we should be able to use
Index.key_count() for progress.) It just depends if there are any
failure modes that would give us AbsentContentFactory where
'remaining_keys' might say something different.

I'm pretty sure at one point I had something like intersection() but it
probably got factored out, and we didn't have specific tests for it.

Specifically, to trigger this you have to have a "big" inventory.
Something bigger than say 200 items, so that it triggers a split
(otherwise you only have a root node, and no references). And then if
you create 1 pack file with most of the chk pages. Then create another
group of pack files with only the deltas, and then autopack only those
new ones.

It happens all the time in the real world, and never in our test suite...

I also didn't see you add a test that would trigger this. Though
certainly maybe I just missed it.

Anyway the fix seems fine as is, but a test case would be great to
prevent it from regressing.

As for the rest of the fix... I noticed you track "hints" as a list, but
then in the inner pack loop you do:
+ if not hint or pack.name in hint:
+ pack_operations[-1][0] += pack.get_revision_count()
+ pack_operations[-1][1].append(pack)

^- This has moderate implications for IDS when copying an entire
repository. As you can have 100s (1000s?) of pack files c...

Read more...

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, 2009-06-22 at 14:21 +0000, John A Meinel wrote:

> I assume this is your fix for 365615:
>
> for record in stream:
> + if record.storage_kind == 'absent':
> + # An absent CHK record: we assume that the
> missing
> + # record is in a different pack - e.g. a
> page not
> + # altered by the commit we're packing.
> + continue

Yes.

> Handling missing pages could be handled in 2 ways. What you have written
> should work, and also doing:
>
> === modified file 'bzrlib/repofmt/groupcompress_repo.py'
> - --- bzrlib/repofmt/groupcompress_repo.py 2009-06-18 19:19:49 +0000
> +++ bzrlib/repofmt/groupcompress_repo.py 2009-06-22 14:07:01 +0000
> @@ -320,6 +320,7 @@
> cur_keys = []
> for prefix in sorted(keys_by_search_prefix):
> cur_keys.extend(keys_by_search_prefix.pop(prefix))
> + cur_keys = remaining_keys.intersection(cur_keys)
> for stream in _get_referenced_stream(self._chk_id_roots,
> self._gather_text_refs):
> yield stream
>

I prefer the above, and will discard the change I made if the above
works - I'll give it a shot asap.

> I'm pretty sure at one point I had something like intersection() but it
> probably got factored out, and we didn't have specific tests for it.
>
> Specifically, to trigger this you have to have a "big" inventory.
> Something bigger than say 200 items, so that it triggers a split
> (otherwise you only have a root node, and no references). And then if
> you create 1 pack file with most of the chk pages. Then create another
> group of pack files with only the deltas, and then autopack only those
> new ones.

Actually, you can test it by doing a simple loop on tree.commit('m'),
because all the nodes will be common. 20 no change commits in a row will
trigger it (the first autopack will include the chk pages, but the
second won't). My test for partial explicit packs was what triggered it
for me.

> It happens all the time in the real world, and never in our test suite...
>
> I also didn't see you add a test that would trigger this. Though
> certainly maybe I just missed it.

Not as a dedicated thing; I'll add one for clarity. I'll also setify
hint.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, 2009-06-22 at 14:21 +0000, John A Meinel wrote:
>
> cur_keys.extend(keys_by_search_prefix.pop(prefix))
> + cur_keys = remaining_keys.intersection(cur_keys)
> for stream in _get_referenced_stream(self._chk_id_roots,
> self._gather_text_refs):

Just to note; this doesn't work.

I think its because the root pages can be inaccessible too, and this is
too late.

For now I've gone with continue on 'absent' records.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Mon, 2009-06-22 at 14:21 +0000, John A Meinel wrote:
>> cur_keys.extend(keys_by_search_prefix.pop(prefix))
>> + cur_keys = remaining_keys.intersection(cur_keys)
>> for stream in _get_referenced_stream(self._chk_id_roots,
>> self._gather_text_refs):
>
> Just to note; this doesn't work.
>
> I think its because the root pages can be inaccessible too, and this is
> too late.
>
> For now I've gone with continue on 'absent' records.
>
> -Rob
>

Well, it would work for cases that aren't 20 no-op cases :), since that
is the only time the root page is unchanged.

But certainly, we can live with this for now, but I think you could
actually filter earlier with a similar .intersection check. Namely:

=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
- --- bzrlib/repofmt/groupcompress_repo.py 2009-06-22 15:13:45 +0000
+++ bzrlib/repofmt/groupcompress_repo.py 2009-06-23 03:14:58 +0000
@@ -320,6 +320,10 @@
                 cur_keys = []
                 for prefix in sorted(keys_by_search_prefix):
                     cur_keys.extend(keys_by_search_prefix.pop(prefix))
+ cur_keys = [key for key in cur_keys if key in
remaining_keys]
+ remaining_keys.intersection(cur_keys)
+ self._chk_id_roots = [key for key in self._chk_id_roots
+ if key in remaining_keys]
         for stream in _get_referenced_stream(self._chk_id_roots,
                                              self._gather_text_refs):
             yield stream

Looking at the code, I saw that I actually did this for 'pid_roots'
since they are likely to not have been changed.

The reason you don't actually want ".intersection" is because that ends
up returning a set, which ends up breaking the ordering that we had
carefully crafted to get optimal compression. (namely, grouping by the
search key prefix.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpASPMACgkQJdeBCYSNAAMkTwCgwG2oW72ofNjLzB8qGHr1xzI1
P0oAoKm2IiPcNmIRra2eGezklFjk6GrS
=gUcs
-----END PGP SIGNATURE-----

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-06-18 19:19:49 +0000
3+++ NEWS 2009-06-22 06:14:38 +0000
4@@ -42,6 +42,10 @@
5 ``BZR_PROGRESS_BAR`` is set to ``none``.
6 (Martin Pool, #339385)
7
8+* Repositories using CHK pages (which includes the new 2a format) will no
9+ longer error during commit or push operations when an autopack operation
10+ is triggered. (Robert Collins, #365615)
11+
12 Internals
13 *********
14
15@@ -58,11 +62,32 @@
16 for files with long ancestry and 'cherrypicked' changes.)
17 (John Arbash Meinel)
18
19+* ``GroupCompress`` repositories now take advantage of the pack hints
20+ parameter to permit cross-format fetching to incrementally pack the
21+ converted data. (Robert Collins)
22+
23 * pack <=> pack fetching is now done via a ``PackStreamSource`` rather
24 than the ``Packer`` code. The user visible change is that we now
25 properly fetch the minimum number of texts for non-smart fetching.
26 (John Arbash Meinel)
27
28+* ``Repository.commit_write_group`` now returns opaque data about what
29+ was committed, for passing to the ``Repository.pack``. Repositories
30+ without atomic commits will still return None. (Robert Collins)
31+
32+* ``Repository.pack`` now takes an optional ``hint`` parameter
33+ which will support doing partial packs for repositories that can do
34+ that. (Robert Collins)
35+
36+* RepositoryFormat has a new attribute 'pack_compresses' which is True
37+ when doing a pack operation changes the compression of content in the
38+ repository. (Robert Collins)
39+
40+* ``StreamSink`` and ``InterDifferingSerialiser`` will call
41+ ``Repository.pack`` with the hint returned by
42+ ``Repository.commit_write_group`` if the formats were different and the
43+ repository can increase compression by doing a pack operation.
44+ (Robert Collins, #376748)
45
46 Improvements
47 ************
48
49=== modified file 'bzrlib/remote.py'
50--- bzrlib/remote.py 2009-06-17 03:53:51 +0000
51+++ bzrlib/remote.py 2009-06-21 23:51:17 +0000
52@@ -566,6 +566,11 @@
53 return self._creating_repo._real_repository._format.network_name()
54
55 @property
56+ def pack_compresses(self):
57+ self._ensure_real()
58+ return self._custom_format.pack_compresses
59+
60+ @property
61 def _serializer(self):
62 self._ensure_real()
63 return self._custom_format._serializer
64@@ -1491,13 +1496,13 @@
65 return self._real_repository.inventories
66
67 @needs_write_lock
68- def pack(self):
69+ def pack(self, hint=None):
70 """Compress the data within the repository.
71
72 This is not currently implemented within the smart server.
73 """
74 self._ensure_real()
75- return self._real_repository.pack()
76+ return self._real_repository.pack(hint=hint)
77
78 @property
79 def revisions(self):
80
81=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
82--- bzrlib/repofmt/groupcompress_repo.py 2009-06-18 19:19:49 +0000
83+++ bzrlib/repofmt/groupcompress_repo.py 2009-06-22 04:56:21 +0000
84@@ -218,6 +218,7 @@
85 p_id_roots_set = set()
86 stream = source_vf.get_record_stream(keys, 'groupcompress', True)
87 for idx, record in enumerate(stream):
88+ # Inventories should always be with revisions; assume success.
89 bytes = record.get_bytes_as('fulltext')
90 chk_inv = inventory.CHKInventory.deserialise(None, bytes,
91 record.key)
92@@ -294,6 +295,11 @@
93 stream = source_vf.get_record_stream(cur_keys,
94 'as-requested', True)
95 for record in stream:
96+ if record.storage_kind == 'absent':
97+ # An absent CHK record: we assume that the missing
98+ # record is in a different pack - e.g. a page not
99+ # altered by the commit we're packing.
100+ continue
101 bytes = record.get_bytes_as('fulltext')
102 # We don't care about search_key_func for this code,
103 # because we only care about external references.
104@@ -558,11 +564,6 @@
105 pack_factory = GCPack
106 resumed_pack_factory = ResumedGCPack
107
108- def _already_packed(self):
109- """Is the collection already packed?"""
110- # Always repack GC repositories for now
111- return False
112-
113 def _execute_pack_operations(self, pack_operations,
114 _packer_class=GCCHKPacker,
115 reload_func=None):
116@@ -1048,6 +1049,7 @@
117 _fetch_order = 'unordered'
118 _fetch_uses_deltas = False # essentially ignored by the groupcompress code.
119 fast_deltas = True
120+ pack_compresses = True
121
122 def _get_matching_bzrdir(self):
123 return bzrdir.format_registry.make_bzrdir('development6-rich-root')
124
125=== modified file 'bzrlib/repofmt/pack_repo.py'
126--- bzrlib/repofmt/pack_repo.py 2009-06-17 17:57:15 +0000
127+++ bzrlib/repofmt/pack_repo.py 2009-06-22 04:56:21 +0000
128@@ -1459,12 +1459,12 @@
129 in synchronisation with certain steps. Otherwise the names collection
130 is not flushed.
131
132- :return: True if packing took place.
133+ :return: Something evaluating true if packing took place.
134 """
135 while True:
136 try:
137 return self._do_autopack()
138- except errors.RetryAutopack, e:
139+ except errors.RetryAutopack:
140 # If we get a RetryAutopack exception, we should abort the
141 # current action, and retry.
142 pass
143@@ -1474,7 +1474,7 @@
144 total_revisions = self.revision_index.combined_index.key_count()
145 total_packs = len(self._names)
146 if self._max_pack_count(total_revisions) >= total_packs:
147- return False
148+ return None
149 # determine which packs need changing
150 pack_distribution = self.pack_distribution(total_revisions)
151 existing_packs = []
152@@ -1502,10 +1502,10 @@
153 'containing %d revisions. Packing %d files into %d affecting %d'
154 ' revisions', self, total_packs, total_revisions, num_old_packs,
155 num_new_packs, num_revs_affected)
156- self._execute_pack_operations(pack_operations,
157+ result = self._execute_pack_operations(pack_operations,
158 reload_func=self._restart_autopack)
159 mutter('Auto-packing repository %s completed', self)
160- return True
161+ return result
162
163 def _execute_pack_operations(self, pack_operations, _packer_class=Packer,
164 reload_func=None):
165@@ -1513,7 +1513,7 @@
166
167 :param pack_operations: A list of [revision_count, packs_to_combine].
168 :param _packer_class: The class of packer to use (default: Packer).
169- :return: None.
170+ :return: The new pack names.
171 """
172 for revision_count, packs in pack_operations:
173 # we may have no-ops from the setup logic
174@@ -1535,10 +1535,11 @@
175 self._remove_pack_from_memory(pack)
176 # record the newly available packs and stop advertising the old
177 # packs
178- self._save_pack_names(clear_obsolete_packs=True)
179+ result = self._save_pack_names(clear_obsolete_packs=True)
180 # Move the old packs out of the way now they are no longer referenced.
181 for revision_count, packs in pack_operations:
182 self._obsolete_packs(packs)
183+ return result
184
185 def _flush_new_pack(self):
186 if self._new_pack is not None:
187@@ -1554,29 +1555,26 @@
188
189 def _already_packed(self):
190 """Is the collection already packed?"""
191- return len(self._names) < 2
192+ return not (self.repo._format.pack_compresses or (len(self._names) > 1))
193
194- def pack(self):
195+ def pack(self, hint=None):
196 """Pack the pack collection totally."""
197 self.ensure_loaded()
198 total_packs = len(self._names)
199 if self._already_packed():
200- # This is arguably wrong because we might not be optimal, but for
201- # now lets leave it in. (e.g. reconcile -> one pack. But not
202- # optimal.
203 return
204 total_revisions = self.revision_index.combined_index.key_count()
205 # XXX: the following may want to be a class, to pack with a given
206 # policy.
207 mutter('Packing repository %s, which has %d pack files, '
208- 'containing %d revisions into 1 packs.', self, total_packs,
209- total_revisions)
210+ 'containing %d revisions with hint %r.', self, total_packs,
211+ total_revisions, hint)
212 # determine which packs need changing
213- pack_distribution = [1]
214 pack_operations = [[0, []]]
215 for pack in self.all_packs():
216- pack_operations[-1][0] += pack.get_revision_count()
217- pack_operations[-1][1].append(pack)
218+ if not hint or pack.name in hint:
219+ pack_operations[-1][0] += pack.get_revision_count()
220+ pack_operations[-1][1].append(pack)
221 self._execute_pack_operations(pack_operations, OptimisingPacker)
222
223 def plan_autopack_combinations(self, existing_packs, pack_distribution):
224@@ -1938,6 +1936,7 @@
225
226 :param clear_obsolete_packs: If True, clear out the contents of the
227 obsolete_packs directory.
228+ :return: A list of the names saved that were not previously on disk.
229 """
230 self.lock_names()
231 try:
232@@ -1958,6 +1957,7 @@
233 self._unlock_names()
234 # synchronise the memory packs list with what we just wrote:
235 self._syncronize_pack_names_from_disk_nodes(disk_nodes)
236+ return [new_node[0][0] for new_node in new_nodes]
237
238 def reload_pack_names(self):
239 """Sync our pack listing with what is present in the repository.
240@@ -2097,7 +2097,7 @@
241 if not self.autopack():
242 # when autopack takes no steps, the names list is still
243 # unsaved.
244- self._save_pack_names()
245+ return self._save_pack_names()
246
247 def _suspend_write_group(self):
248 tokens = [pack.name for pack in self._resumed_packs]
249@@ -2348,13 +2348,13 @@
250 raise NotImplementedError(self.dont_leave_lock_in_place)
251
252 @needs_write_lock
253- def pack(self):
254+ def pack(self, hint=None):
255 """Compress the data within the repository.
256
257 This will pack all the data to a single pack. In future it may
258 recompress deltas or do other such expensive operations.
259 """
260- self._pack_collection.pack()
261+ self._pack_collection.pack(hint=hint)
262
263 @needs_write_lock
264 def reconcile(self, other=None, thorough=False):
265
266=== modified file 'bzrlib/repository.py'
267--- bzrlib/repository.py 2009-06-17 17:57:15 +0000
268+++ bzrlib/repository.py 2009-06-22 06:14:38 +0000
269@@ -1413,8 +1413,9 @@
270 raise errors.BzrError('mismatched lock context %r and '
271 'write group %r.' %
272 (self.get_transaction(), self._write_group))
273- self._commit_write_group()
274+ result = self._commit_write_group()
275 self._write_group = None
276+ return result
277
278 def _commit_write_group(self):
279 """Template method for per-repository write group cleanup.
280@@ -2427,7 +2428,7 @@
281 keys = tsort.topo_sort(parent_map)
282 return [None] + list(keys)
283
284- def pack(self):
285+ def pack(self, hint=None):
286 """Compress the data within the repository.
287
288 This operation only makes sense for some repository types. For other
289@@ -2436,6 +2437,13 @@
290 This stub method does not require a lock, but subclasses should use
291 @needs_write_lock as this is a long running call its reasonable to
292 implicitly lock for the user.
293+
294+ :param hint: If not supplied, the whole repository is packed.
295+ If supplied, the repository may use the hint parameter as a
296+ hint for the parts of the repository to pack. A hint can be
297+ obtained from the result of commit_write_group(). Out of
298+ date hints are simply ignored, because concurrent operations
299+ can obsolete them rapidly.
300 """
301
302 def get_transaction(self):
303@@ -2844,6 +2852,11 @@
304 # Does this format have < O(tree_size) delta generation. Used to hint what
305 # code path for commit, amongst other things.
306 fast_deltas = None
307+ # Does doing a pack operation compress data? Useful for the pack UI command
308+ # (so if there is one pack, the operation can still proceed because it may
309+ # help), and for fetching when data won't have come from the same
310+ # compressor.
311+ pack_compresses = False
312
313 def __str__(self):
314 return "<%s>" % self.__class__.__name__
315@@ -3675,6 +3688,7 @@
316 cache = lru_cache.LRUCache(100)
317 cache[basis_id] = basis_tree
318 del basis_tree # We don't want to hang on to it here
319+ hints = []
320 for offset in range(0, len(revision_ids), batch_size):
321 self.target.start_write_group()
322 try:
323@@ -3686,7 +3700,11 @@
324 self.target.abort_write_group()
325 raise
326 else:
327- self.target.commit_write_group()
328+ hint = self.target.commit_write_group()
329+ if hint:
330+ hints.extend(hint)
331+ if hints and self.target._format.pack_compresses:
332+ self.target.pack(hint=hints)
333 pb.update('Transferring revisions', len(revision_ids),
334 len(revision_ids))
335
336@@ -4034,7 +4052,10 @@
337 # missing keys can handle suspending a write group).
338 write_group_tokens = self.target_repo.suspend_write_group()
339 return write_group_tokens, missing_keys
340- self.target_repo.commit_write_group()
341+ hint = self.target_repo.commit_write_group()
342+ if (to_serializer != src_serializer and
343+ self.target_repo._format.pack_compresses):
344+ self.target_repo.pack(hint=hint)
345 return [], set()
346
347 def _extract_and_insert_inventories(self, substream, serializer):
348
349=== modified file 'bzrlib/tests/per_repository/test_pack.py'
350--- bzrlib/tests/per_repository/test_pack.py 2009-03-23 14:59:43 +0000
351+++ bzrlib/tests/per_repository/test_pack.py 2009-06-21 23:51:17 +0000
352@@ -24,3 +24,14 @@
353 def test_pack_empty_does_not_error(self):
354 repo = self.make_repository('.')
355 repo.pack()
356+
357+ def test_pack_accepts_opaque_hint(self):
358+ # For requesting packs of a repository where some data is known to be
359+ # unoptimal we permit packing just some data via a hint. If the hint is
360+ # illegible it is ignored.
361+ tree = self.make_branch_and_tree('tree')
362+ rev1 = tree.commit('1')
363+ rev2 = tree.commit('2')
364+ rev3 = tree.commit('3')
365+ rev4 = tree.commit('4')
366+ tree.branch.repository.pack(hint=[rev3, rev4])
367
368=== modified file 'bzrlib/tests/per_repository/test_repository.py'
369--- bzrlib/tests/per_repository/test_repository.py 2009-06-17 21:33:03 +0000
370+++ bzrlib/tests/per_repository/test_repository.py 2009-06-19 04:19:22 +0000
371@@ -66,29 +66,29 @@
372
373 class TestRepository(TestCaseWithRepository):
374
375+ def assertFormatAttribute(self, attribute, allowed_values):
376+ """Assert that the format has an attribute 'attribute'."""
377+ repo = self.make_repository('repo')
378+ self.assertSubset([getattr(repo._format, attribute)], allowed_values)
379+
380 def test_attribute__fetch_order(self):
381 """Test the the _fetch_order attribute."""
382- tree = self.make_branch_and_tree('tree')
383- repo = tree.branch.repository
384- self.assertTrue(repo._format._fetch_order in ('topological', 'unordered'))
385+ self.assertFormatAttribute('_fetch_order', ('topological', 'unordered'))
386
387 def test_attribute__fetch_uses_deltas(self):
388 """Test the the _fetch_uses_deltas attribute."""
389- tree = self.make_branch_and_tree('tree')
390- repo = tree.branch.repository
391- self.assertTrue(repo._format._fetch_uses_deltas in (True, False))
392+ self.assertFormatAttribute('_fetch_uses_deltas', (True, False))
393
394 def test_attribute_fast_deltas(self):
395 """Test the format.fast_deltas attribute."""
396- tree = self.make_branch_and_tree('tree')
397- repo = tree.branch.repository
398- self.assertTrue(repo._format.fast_deltas in (True, False))
399+ self.assertFormatAttribute('fast_deltas', (True, False))
400
401 def test_attribute__fetch_reconcile(self):
402 """Test the the _fetch_reconcile attribute."""
403- tree = self.make_branch_and_tree('tree')
404- repo = tree.branch.repository
405- self.assertTrue(repo._format._fetch_reconcile in (True, False))
406+ self.assertFormatAttribute('_fetch_reconcile', (True, False))
407+
408+ def test_attribute_format_pack_compresses(self):
409+ self.assertFormatAttribute('pack_compresses', (True, False))
410
411 def test_attribute_inventories_store(self):
412 """Test the existence of the inventories attribute."""
413
414=== modified file 'bzrlib/tests/per_repository/test_write_group.py'
415--- bzrlib/tests/per_repository/test_write_group.py 2009-06-10 03:56:49 +0000
416+++ bzrlib/tests/per_repository/test_write_group.py 2009-06-22 02:25:09 +0000
417@@ -68,11 +68,14 @@
418 repo.commit_write_group()
419 repo.unlock()
420
421- def test_commit_write_group_gets_None(self):
422+ def test_commit_write_group_does_not_error(self):
423 repo = self.make_repository('.')
424 repo.lock_write()
425 repo.start_write_group()
426- self.assertEqual(None, repo.commit_write_group())
427+ # commit_write_group can either return None (for repositories without
428+ # isolated transactions) or a hint for pack(). So we only check it
429+ # works in this interface test, because all repositories are exercised.
430+ repo.commit_write_group()
431 repo.unlock()
432
433 def test_unlock_in_write_group(self):
434
435=== modified file 'bzrlib/tests/test_pack_repository.py'
436--- bzrlib/tests/test_pack_repository.py 2009-06-17 17:57:15 +0000
437+++ bzrlib/tests/test_pack_repository.py 2009-06-22 02:25:09 +0000
438@@ -238,6 +238,35 @@
439 pack_names = [node[1][0] for node in index.iter_all_entries()]
440 self.assertTrue(large_pack_name in pack_names)
441
442+ def test_commit_write_group_returns_new_pack_names(self):
443+ format = self.get_format()
444+ tree = self.make_branch_and_tree('foo', format=format)
445+ tree.commit('first post')
446+ repo = tree.branch.repository
447+ repo.lock_write()
448+ try:
449+ repo.start_write_group()
450+ try:
451+ inv = inventory.Inventory(revision_id="A")
452+ inv.root.revision = "A"
453+ repo.texts.add_lines((inv.root.file_id, "A"), [], [])
454+ rev = _mod_revision.Revision(timestamp=0, timezone=None,
455+ committer="Foo Bar <foo@example.com>", message="Message",
456+ revision_id="A")
457+ rev.parent_ids = ()
458+ repo.add_revision("A", rev, inv=inv)
459+ except:
460+ repo.abort_write_group()
461+ raise
462+ else:
463+ old_names = repo._pack_collection._names.keys()
464+ result = repo.commit_write_group()
465+ cur_names = repo._pack_collection._names.keys()
466+ new_names = list(set(cur_names) - set(old_names))
467+ self.assertEqual(new_names, result)
468+ finally:
469+ repo.unlock()
470+
471 def test_fail_obsolete_deletion(self):
472 # failing to delete obsolete packs is not fatal
473 format = self.get_format()
474
475=== modified file 'bzrlib/tests/test_repository.py'
476--- bzrlib/tests/test_repository.py 2009-06-18 18:00:01 +0000
477+++ bzrlib/tests/test_repository.py 2009-06-22 06:15:41 +0000
478@@ -673,10 +673,14 @@
479 self.assertFalse(repo._format.supports_external_lookups)
480
481
482-class TestDevelopment6(TestCaseWithTransport):
483+class Test2a(TestCaseWithTransport):
484+
485+ def test_format_pack_compresses_True(self):
486+ repo = self.make_repository('repo', format='2a')
487+ self.assertTrue(repo._format.pack_compresses)
488
489 def test_inventories_use_chk_map_with_parent_base_dict(self):
490- tree = self.make_branch_and_tree('repo', format="development6-rich-root")
491+ tree = self.make_branch_and_tree('repo', format="2a")
492 revid = tree.commit("foo")
493 tree.lock_read()
494 self.addCleanup(tree.unlock)
495@@ -688,14 +692,33 @@
496 self.assertEqual(65536,
497 inv.parent_id_basename_to_file_id._root_node.maximum_size)
498
499+ def test_pack_with_hint(self):
500+ tree = self.make_branch_and_tree('tree', format='2a')
501+ # 1 commit to leave untouched
502+ tree.commit('1')
503+ to_keep = tree.branch.repository._pack_collection.names()
504+ # 2 to combine
505+ tree.commit('2')
506+ tree.commit('3')
507+ all = tree.branch.repository._pack_collection.names()
508+ combine = list(set(all) - set(to_keep))
509+ self.assertLength(3, all)
510+ self.assertLength(2, combine)
511+ tree.branch.repository.pack(hint=combine)
512+ final = tree.branch.repository._pack_collection.names()
513+ self.assertLength(2, final)
514+ self.assertFalse(combine[0] in final)
515+ self.assertFalse(combine[1] in final)
516+ self.assertSubset(to_keep, final)
517+
518 def test_stream_source_to_gc(self):
519- source = self.make_repository('source', format='development6-rich-root')
520- target = self.make_repository('target', format='development6-rich-root')
521+ source = self.make_repository('source', format='2a')
522+ target = self.make_repository('target', format='2a')
523 stream = source._get_source(target._format)
524 self.assertIsInstance(stream, groupcompress_repo.GroupCHKStreamSource)
525
526 def test_stream_source_to_non_gc(self):
527- source = self.make_repository('source', format='development6-rich-root')
528+ source = self.make_repository('source', format='2a')
529 target = self.make_repository('target', format='rich-root-pack')
530 stream = source._get_source(target._format)
531 # We don't want the child GroupCHKStreamSource
532@@ -703,7 +726,7 @@
533
534 def test_get_stream_for_missing_keys_includes_all_chk_refs(self):
535 source_builder = self.make_branch_builder('source',
536- format='development6-rich-root')
537+ format='2a')
538 # We have to build a fairly large tree, so that we are sure the chk
539 # pages will have split into multiple pages.
540 entries = [('add', ('', 'a-root-id', 'directory', None))]
541@@ -726,7 +749,7 @@
542 source_branch = source_builder.get_branch()
543 source_branch.lock_read()
544 self.addCleanup(source_branch.unlock)
545- target = self.make_repository('target', format='development6-rich-root')
546+ target = self.make_repository('target', format='2a')
547 source = source_branch.repository._get_source(target._format)
548 self.assertIsInstance(source, groupcompress_repo.GroupCHKStreamSource)
549
550@@ -1354,3 +1377,83 @@
551 self.assertTrue(new_pack.inventory_index._optimize_for_size)
552 self.assertTrue(new_pack.text_index._optimize_for_size)
553 self.assertTrue(new_pack.signature_index._optimize_for_size)
554+
555+
556+class TestCrossFormatPacks(TestCaseWithTransport):
557+
558+ def log_pack(self, hint=None):
559+ self.calls.append(('pack', hint))
560+ self.orig_pack(hint=hint)
561+ if self.expect_hint:
562+ self.assertTrue(hint)
563+
564+ def run_stream(self, src_fmt, target_fmt, expect_pack_called):
565+ self.expect_hint = expect_pack_called
566+ self.calls = []
567+ source_tree = self.make_branch_and_tree('src', format=src_fmt)
568+ source_tree.lock_write()
569+ self.addCleanup(source_tree.unlock)
570+ tip = source_tree.commit('foo')
571+ target = self.make_repository('target', format=target_fmt)
572+ target.lock_write()
573+ self.addCleanup(target.unlock)
574+ source = source_tree.branch.repository._get_source(target._format)
575+ self.orig_pack = target.pack
576+ target.pack = self.log_pack
577+ search = target.search_missing_revision_ids(
578+ source_tree.branch.repository, tip)
579+ stream = source.get_stream(search)
580+ from_format = source_tree.branch.repository._format
581+ sink = target._get_sink()
582+ sink.insert_stream(stream, from_format, [])
583+ if expect_pack_called:
584+ self.assertLength(1, self.calls)
585+ else:
586+ self.assertLength(0, self.calls)
587+
588+ def run_fetch(self, src_fmt, target_fmt, expect_pack_called):
589+ self.expect_hint = expect_pack_called
590+ self.calls = []
591+ source_tree = self.make_branch_and_tree('src', format=src_fmt)
592+ source_tree.lock_write()
593+ self.addCleanup(source_tree.unlock)
594+ tip = source_tree.commit('foo')
595+ target = self.make_repository('target', format=target_fmt)
596+ target.lock_write()
597+ self.addCleanup(target.unlock)
598+ source = source_tree.branch.repository
599+ self.orig_pack = target.pack
600+ target.pack = self.log_pack
601+ target.fetch(source)
602+ if expect_pack_called:
603+ self.assertLength(1, self.calls)
604+ else:
605+ self.assertLength(0, self.calls)
606+
607+ def test_sink_format_hint_no(self):
608+ # When the target format says packing makes no difference, pack is not
609+ # called.
610+ self.run_stream('1.9', 'rich-root-pack', False)
611+
612+ def test_sink_format_hint_yes(self):
613+ # When the target format says packing makes a difference, pack is
614+ # called.
615+ self.run_stream('1.9', '2a', True)
616+
617+ def test_sink_format_same_no(self):
618+ # When the formats are the same, pack is not called.
619+ self.run_stream('2a', '2a', False)
620+
621+ def test_IDS_format_hint_no(self):
622+ # When the target format says packing makes no difference, pack is not
623+ # called.
624+ self.run_fetch('1.9', 'rich-root-pack', False)
625+
626+ def test_IDS_format_hint_yes(self):
627+ # When the target format says packing makes a difference, pack is
628+ # called.
629+ self.run_fetch('1.9', '2a', True)
630+
631+ def test_IDS_format_same_no(self):
632+ # When the formats are the same, pack is not called.
633+ self.run_fetch('2a', '2a', False)
634