Merge lp:~lifeless/bzr/autopack-cross-format-fetch-1 into lp:~bzr/bzr/trunk-old
- autopack-cross-format-fetch-1
- Merge into trunk-old
Status: | Merged | ||||
---|---|---|---|---|---|
Merged at revision: | not available | ||||
Proposed branch: | lp:~lifeless/bzr/autopack-cross-format-fetch-1 | ||||
Merge into: | lp:~bzr/bzr/trunk-old | ||||
Diff against target: | None lines | ||||
To merge this branch: | bzr merge lp:~lifeless/bzr/autopack-cross-format-fetch-1 | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
John A Meinel | Approve | ||
Review via email: mp+7748@code.launchpad.net |
Commit message
Description of the change
Robert Collins (lifeless) wrote : | # |
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/autopack-cross-format-fetch-1 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This branch causes a partial pack to happen when fetching across
> serialisers for both the IDS and streaming code paths.
>
> On the way, I encountered bug 365615 and had to fix it. I'd like John in
> particular to review the gc repo changes accordingly - while textually
> small the implications are not, and he wrote the GCCHK packer.
>
> -Rob
>
>
...
I assume this is your fix for 365615:
+ if record.storage_kind == 'absent':
+ # An absent CHK record: we assume that the
missing
+ # record is in a different pack - e.g. a
page not
+ # altered by the commit we're packing.
+ continue
Handling missing pages could be handled in 2 ways. What you have written
should work, and also doing:
=== modified file 'bzrlib/
- --- bzrlib/
+++ bzrlib/
@@ -320,6 +320,7 @@
+ cur_keys = remaining_
for stream in _get_referenced
yield stream
In a lot of ways I'd like to get rid of "remaining_keys", though. (It is
very useful for progress indication, but we should be able to use
Index.key_count() for progress.) It just depends if there are any
failure modes that would give us AbsentContentFa
'remaining_keys' might say something different.
I'm pretty sure at one point I had something like intersection() but it
probably got factored out, and we didn't have specific tests for it.
Specifically, to trigger this you have to have a "big" inventory.
Something bigger than say 200 items, so that it triggers a split
(otherwise you only have a root node, and no references). And then if
you create 1 pack file with most of the chk pages. Then create another
group of pack files with only the deltas, and then autopack only those
new ones.
It happens all the time in the real world, and never in our test suite...
I also didn't see you add a test that would trigger this. Though
certainly maybe I just missed it.
Anyway the fix seems fine as is, but a test case would be great to
prevent it from regressing.
As for the rest of the fix... I noticed you track "hints" as a list, but
then in the inner pack loop you do:
+ if not hint or pack.name in hint:
+ pack_operations
+ pack_operations
^- This has moderate implications for IDS when copying an entire
repository. As you can have 100s (1000s?) of pack files c...
Robert Collins (lifeless) wrote : | # |
On Mon, 2009-06-22 at 14:21 +0000, John A Meinel wrote:
> I assume this is your fix for 365615:
>
> for record in stream:
> + if record.storage_kind == 'absent':
> + # An absent CHK record: we assume that the
> missing
> + # record is in a different pack - e.g. a
> page not
> + # altered by the commit we're packing.
> + continue
Yes.
> Handling missing pages could be handled in 2 ways. What you have written
> should work, and also doing:
>
> === modified file 'bzrlib/
> - --- bzrlib/
> +++ bzrlib/
> @@ -320,6 +320,7 @@
> cur_keys = []
> for prefix in sorted(
> cur_keys.
> + cur_keys = remaining_
> for stream in _get_referenced
> self._gather_
> yield stream
>
I prefer the above, and will discard the change I made if the above
works - I'll give it a shot asap.
> I'm pretty sure at one point I had something like intersection() but it
> probably got factored out, and we didn't have specific tests for it.
>
> Specifically, to trigger this you have to have a "big" inventory.
> Something bigger than say 200 items, so that it triggers a split
> (otherwise you only have a root node, and no references). And then if
> you create 1 pack file with most of the chk pages. Then create another
> group of pack files with only the deltas, and then autopack only those
> new ones.
Actually, you can test it by doing a simple loop on tree.commit('m'),
because all the nodes will be common. 20 no change commits in a row will
trigger it (the first autopack will include the chk pages, but the
second won't). My test for partial explicit packs was what triggered it
for me.
> It happens all the time in the real world, and never in our test suite...
>
> I also didn't see you add a test that would trigger this. Though
> certainly maybe I just missed it.
Not as a dedicated thing; I'll add one for clarity. I'll also setify
hint.
-Rob
Robert Collins (lifeless) wrote : | # |
On Mon, 2009-06-22 at 14:21 +0000, John A Meinel wrote:
>
> cur_keys.
> + cur_keys = remaining_
> for stream in _get_referenced
> self._gather_
Just to note; this doesn't work.
I think its because the root pages can be inaccessible too, and this is
too late.
For now I've gone with continue on 'absent' records.
-Rob
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> On Mon, 2009-06-22 at 14:21 +0000, John A Meinel wrote:
>> cur_keys.
>> + cur_keys = remaining_
>> for stream in _get_referenced
>> self._gather_
>
> Just to note; this doesn't work.
>
> I think its because the root pages can be inaccessible too, and this is
> too late.
>
> For now I've gone with continue on 'absent' records.
>
> -Rob
>
Well, it would work for cases that aren't 20 no-op cases :), since that
is the only time the root page is unchanged.
But certainly, we can live with this for now, but I think you could
actually filter earlier with a similar .intersection check. Namely:
=== modified file 'bzrlib/
- --- bzrlib/
+++ bzrlib/
@@ -320,6 +320,10 @@
+ cur_keys = [key for key in cur_keys if key in
remaining_keys]
+ remaining_
+ self._chk_id_roots = [key for key in self._chk_id_roots
+ if key in remaining_keys]
for stream in _get_referenced
yield stream
Looking at the code, I saw that I actually did this for 'pid_roots'
since they are likely to not have been changed.
The reason you don't actually want ".intersection" is because that ends
up returning a set, which ends up breaking the ordering that we had
carefully crafted to get optimal compression. (namely, grouping by the
search key prefix.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkp
P0oAoKm2IiPcNmI
=gUcs
-----END PGP SIGNATURE-----
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-06-18 19:19:49 +0000 |
3 | +++ NEWS 2009-06-22 06:14:38 +0000 |
4 | @@ -42,6 +42,10 @@ |
5 | ``BZR_PROGRESS_BAR`` is set to ``none``. |
6 | (Martin Pool, #339385) |
7 | |
8 | +* Repositories using CHK pages (which includes the new 2a format) will no |
9 | + longer error during commit or push operations when an autopack operation |
10 | + is triggered. (Robert Collins, #365615) |
11 | + |
12 | Internals |
13 | ********* |
14 | |
15 | @@ -58,11 +62,32 @@ |
16 | for files with long ancestry and 'cherrypicked' changes.) |
17 | (John Arbash Meinel) |
18 | |
19 | +* ``GroupCompress`` repositories now take advantage of the pack hints |
20 | + parameter to permit cross-format fetching to incrementally pack the |
21 | + converted data. (Robert Collins) |
22 | + |
23 | * pack <=> pack fetching is now done via a ``PackStreamSource`` rather |
24 | than the ``Packer`` code. The user visible change is that we now |
25 | properly fetch the minimum number of texts for non-smart fetching. |
26 | (John Arbash Meinel) |
27 | |
28 | +* ``Repository.commit_write_group`` now returns opaque data about what |
29 | + was committed, for passing to the ``Repository.pack``. Repositories |
30 | + without atomic commits will still return None. (Robert Collins) |
31 | + |
32 | +* ``Repository.pack`` now takes an optional ``hint`` parameter |
33 | + which will support doing partial packs for repositories that can do |
34 | + that. (Robert Collins) |
35 | + |
36 | +* RepositoryFormat has a new attribute 'pack_compresses' which is True |
37 | + when doing a pack operation changes the compression of content in the |
38 | + repository. (Robert Collins) |
39 | + |
40 | +* ``StreamSink`` and ``InterDifferingSerialiser`` will call |
41 | + ``Repository.pack`` with the hint returned by |
42 | + ``Repository.commit_write_group`` if the formats were different and the |
43 | + repository can increase compression by doing a pack operation. |
44 | + (Robert Collins, #376748) |
45 | |
46 | Improvements |
47 | ************ |
48 | |
49 | === modified file 'bzrlib/remote.py' |
50 | --- bzrlib/remote.py 2009-06-17 03:53:51 +0000 |
51 | +++ bzrlib/remote.py 2009-06-21 23:51:17 +0000 |
52 | @@ -566,6 +566,11 @@ |
53 | return self._creating_repo._real_repository._format.network_name() |
54 | |
55 | @property |
56 | + def pack_compresses(self): |
57 | + self._ensure_real() |
58 | + return self._custom_format.pack_compresses |
59 | + |
60 | + @property |
61 | def _serializer(self): |
62 | self._ensure_real() |
63 | return self._custom_format._serializer |
64 | @@ -1491,13 +1496,13 @@ |
65 | return self._real_repository.inventories |
66 | |
67 | @needs_write_lock |
68 | - def pack(self): |
69 | + def pack(self, hint=None): |
70 | """Compress the data within the repository. |
71 | |
72 | This is not currently implemented within the smart server. |
73 | """ |
74 | self._ensure_real() |
75 | - return self._real_repository.pack() |
76 | + return self._real_repository.pack(hint=hint) |
77 | |
78 | @property |
79 | def revisions(self): |
80 | |
81 | === modified file 'bzrlib/repofmt/groupcompress_repo.py' |
82 | --- bzrlib/repofmt/groupcompress_repo.py 2009-06-18 19:19:49 +0000 |
83 | +++ bzrlib/repofmt/groupcompress_repo.py 2009-06-22 04:56:21 +0000 |
84 | @@ -218,6 +218,7 @@ |
85 | p_id_roots_set = set() |
86 | stream = source_vf.get_record_stream(keys, 'groupcompress', True) |
87 | for idx, record in enumerate(stream): |
88 | + # Inventories should always be with revisions; assume success. |
89 | bytes = record.get_bytes_as('fulltext') |
90 | chk_inv = inventory.CHKInventory.deserialise(None, bytes, |
91 | record.key) |
92 | @@ -294,6 +295,11 @@ |
93 | stream = source_vf.get_record_stream(cur_keys, |
94 | 'as-requested', True) |
95 | for record in stream: |
96 | + if record.storage_kind == 'absent': |
97 | + # An absent CHK record: we assume that the missing |
98 | + # record is in a different pack - e.g. a page not |
99 | + # altered by the commit we're packing. |
100 | + continue |
101 | bytes = record.get_bytes_as('fulltext') |
102 | # We don't care about search_key_func for this code, |
103 | # because we only care about external references. |
104 | @@ -558,11 +564,6 @@ |
105 | pack_factory = GCPack |
106 | resumed_pack_factory = ResumedGCPack |
107 | |
108 | - def _already_packed(self): |
109 | - """Is the collection already packed?""" |
110 | - # Always repack GC repositories for now |
111 | - return False |
112 | - |
113 | def _execute_pack_operations(self, pack_operations, |
114 | _packer_class=GCCHKPacker, |
115 | reload_func=None): |
116 | @@ -1048,6 +1049,7 @@ |
117 | _fetch_order = 'unordered' |
118 | _fetch_uses_deltas = False # essentially ignored by the groupcompress code. |
119 | fast_deltas = True |
120 | + pack_compresses = True |
121 | |
122 | def _get_matching_bzrdir(self): |
123 | return bzrdir.format_registry.make_bzrdir('development6-rich-root') |
124 | |
125 | === modified file 'bzrlib/repofmt/pack_repo.py' |
126 | --- bzrlib/repofmt/pack_repo.py 2009-06-17 17:57:15 +0000 |
127 | +++ bzrlib/repofmt/pack_repo.py 2009-06-22 04:56:21 +0000 |
128 | @@ -1459,12 +1459,12 @@ |
129 | in synchronisation with certain steps. Otherwise the names collection |
130 | is not flushed. |
131 | |
132 | - :return: True if packing took place. |
133 | + :return: Something evaluating true if packing took place. |
134 | """ |
135 | while True: |
136 | try: |
137 | return self._do_autopack() |
138 | - except errors.RetryAutopack, e: |
139 | + except errors.RetryAutopack: |
140 | # If we get a RetryAutopack exception, we should abort the |
141 | # current action, and retry. |
142 | pass |
143 | @@ -1474,7 +1474,7 @@ |
144 | total_revisions = self.revision_index.combined_index.key_count() |
145 | total_packs = len(self._names) |
146 | if self._max_pack_count(total_revisions) >= total_packs: |
147 | - return False |
148 | + return None |
149 | # determine which packs need changing |
150 | pack_distribution = self.pack_distribution(total_revisions) |
151 | existing_packs = [] |
152 | @@ -1502,10 +1502,10 @@ |
153 | 'containing %d revisions. Packing %d files into %d affecting %d' |
154 | ' revisions', self, total_packs, total_revisions, num_old_packs, |
155 | num_new_packs, num_revs_affected) |
156 | - self._execute_pack_operations(pack_operations, |
157 | + result = self._execute_pack_operations(pack_operations, |
158 | reload_func=self._restart_autopack) |
159 | mutter('Auto-packing repository %s completed', self) |
160 | - return True |
161 | + return result |
162 | |
163 | def _execute_pack_operations(self, pack_operations, _packer_class=Packer, |
164 | reload_func=None): |
165 | @@ -1513,7 +1513,7 @@ |
166 | |
167 | :param pack_operations: A list of [revision_count, packs_to_combine]. |
168 | :param _packer_class: The class of packer to use (default: Packer). |
169 | - :return: None. |
170 | + :return: The new pack names. |
171 | """ |
172 | for revision_count, packs in pack_operations: |
173 | # we may have no-ops from the setup logic |
174 | @@ -1535,10 +1535,11 @@ |
175 | self._remove_pack_from_memory(pack) |
176 | # record the newly available packs and stop advertising the old |
177 | # packs |
178 | - self._save_pack_names(clear_obsolete_packs=True) |
179 | + result = self._save_pack_names(clear_obsolete_packs=True) |
180 | # Move the old packs out of the way now they are no longer referenced. |
181 | for revision_count, packs in pack_operations: |
182 | self._obsolete_packs(packs) |
183 | + return result |
184 | |
185 | def _flush_new_pack(self): |
186 | if self._new_pack is not None: |
187 | @@ -1554,29 +1555,26 @@ |
188 | |
189 | def _already_packed(self): |
190 | """Is the collection already packed?""" |
191 | - return len(self._names) < 2 |
192 | + return not (self.repo._format.pack_compresses or (len(self._names) > 1)) |
193 | |
194 | - def pack(self): |
195 | + def pack(self, hint=None): |
196 | """Pack the pack collection totally.""" |
197 | self.ensure_loaded() |
198 | total_packs = len(self._names) |
199 | if self._already_packed(): |
200 | - # This is arguably wrong because we might not be optimal, but for |
201 | - # now lets leave it in. (e.g. reconcile -> one pack. But not |
202 | - # optimal. |
203 | return |
204 | total_revisions = self.revision_index.combined_index.key_count() |
205 | # XXX: the following may want to be a class, to pack with a given |
206 | # policy. |
207 | mutter('Packing repository %s, which has %d pack files, ' |
208 | - 'containing %d revisions into 1 packs.', self, total_packs, |
209 | - total_revisions) |
210 | + 'containing %d revisions with hint %r.', self, total_packs, |
211 | + total_revisions, hint) |
212 | # determine which packs need changing |
213 | - pack_distribution = [1] |
214 | pack_operations = [[0, []]] |
215 | for pack in self.all_packs(): |
216 | - pack_operations[-1][0] += pack.get_revision_count() |
217 | - pack_operations[-1][1].append(pack) |
218 | + if not hint or pack.name in hint: |
219 | + pack_operations[-1][0] += pack.get_revision_count() |
220 | + pack_operations[-1][1].append(pack) |
221 | self._execute_pack_operations(pack_operations, OptimisingPacker) |
222 | |
223 | def plan_autopack_combinations(self, existing_packs, pack_distribution): |
224 | @@ -1938,6 +1936,7 @@ |
225 | |
226 | :param clear_obsolete_packs: If True, clear out the contents of the |
227 | obsolete_packs directory. |
228 | + :return: A list of the names saved that were not previously on disk. |
229 | """ |
230 | self.lock_names() |
231 | try: |
232 | @@ -1958,6 +1957,7 @@ |
233 | self._unlock_names() |
234 | # synchronise the memory packs list with what we just wrote: |
235 | self._syncronize_pack_names_from_disk_nodes(disk_nodes) |
236 | + return [new_node[0][0] for new_node in new_nodes] |
237 | |
238 | def reload_pack_names(self): |
239 | """Sync our pack listing with what is present in the repository. |
240 | @@ -2097,7 +2097,7 @@ |
241 | if not self.autopack(): |
242 | # when autopack takes no steps, the names list is still |
243 | # unsaved. |
244 | - self._save_pack_names() |
245 | + return self._save_pack_names() |
246 | |
247 | def _suspend_write_group(self): |
248 | tokens = [pack.name for pack in self._resumed_packs] |
249 | @@ -2348,13 +2348,13 @@ |
250 | raise NotImplementedError(self.dont_leave_lock_in_place) |
251 | |
252 | @needs_write_lock |
253 | - def pack(self): |
254 | + def pack(self, hint=None): |
255 | """Compress the data within the repository. |
256 | |
257 | This will pack all the data to a single pack. In future it may |
258 | recompress deltas or do other such expensive operations. |
259 | """ |
260 | - self._pack_collection.pack() |
261 | + self._pack_collection.pack(hint=hint) |
262 | |
263 | @needs_write_lock |
264 | def reconcile(self, other=None, thorough=False): |
265 | |
266 | === modified file 'bzrlib/repository.py' |
267 | --- bzrlib/repository.py 2009-06-17 17:57:15 +0000 |
268 | +++ bzrlib/repository.py 2009-06-22 06:14:38 +0000 |
269 | @@ -1413,8 +1413,9 @@ |
270 | raise errors.BzrError('mismatched lock context %r and ' |
271 | 'write group %r.' % |
272 | (self.get_transaction(), self._write_group)) |
273 | - self._commit_write_group() |
274 | + result = self._commit_write_group() |
275 | self._write_group = None |
276 | + return result |
277 | |
278 | def _commit_write_group(self): |
279 | """Template method for per-repository write group cleanup. |
280 | @@ -2427,7 +2428,7 @@ |
281 | keys = tsort.topo_sort(parent_map) |
282 | return [None] + list(keys) |
283 | |
284 | - def pack(self): |
285 | + def pack(self, hint=None): |
286 | """Compress the data within the repository. |
287 | |
288 | This operation only makes sense for some repository types. For other |
289 | @@ -2436,6 +2437,13 @@ |
290 | This stub method does not require a lock, but subclasses should use |
291 | @needs_write_lock as this is a long running call its reasonable to |
292 | implicitly lock for the user. |
293 | + |
294 | + :param hint: If not supplied, the whole repository is packed. |
295 | + If supplied, the repository may use the hint parameter as a |
296 | + hint for the parts of the repository to pack. A hint can be |
297 | + obtained from the result of commit_write_group(). Out of |
298 | + date hints are simply ignored, because concurrent operations |
299 | + can obsolete them rapidly. |
300 | """ |
301 | |
302 | def get_transaction(self): |
303 | @@ -2844,6 +2852,11 @@ |
304 | # Does this format have < O(tree_size) delta generation. Used to hint what |
305 | # code path for commit, amongst other things. |
306 | fast_deltas = None |
307 | + # Does doing a pack operation compress data? Useful for the pack UI command |
308 | + # (so if there is one pack, the operation can still proceed because it may |
309 | + # help), and for fetching when data won't have come from the same |
310 | + # compressor. |
311 | + pack_compresses = False |
312 | |
313 | def __str__(self): |
314 | return "<%s>" % self.__class__.__name__ |
315 | @@ -3675,6 +3688,7 @@ |
316 | cache = lru_cache.LRUCache(100) |
317 | cache[basis_id] = basis_tree |
318 | del basis_tree # We don't want to hang on to it here |
319 | + hints = [] |
320 | for offset in range(0, len(revision_ids), batch_size): |
321 | self.target.start_write_group() |
322 | try: |
323 | @@ -3686,7 +3700,11 @@ |
324 | self.target.abort_write_group() |
325 | raise |
326 | else: |
327 | - self.target.commit_write_group() |
328 | + hint = self.target.commit_write_group() |
329 | + if hint: |
330 | + hints.extend(hint) |
331 | + if hints and self.target._format.pack_compresses: |
332 | + self.target.pack(hint=hints) |
333 | pb.update('Transferring revisions', len(revision_ids), |
334 | len(revision_ids)) |
335 | |
336 | @@ -4034,7 +4052,10 @@ |
337 | # missing keys can handle suspending a write group). |
338 | write_group_tokens = self.target_repo.suspend_write_group() |
339 | return write_group_tokens, missing_keys |
340 | - self.target_repo.commit_write_group() |
341 | + hint = self.target_repo.commit_write_group() |
342 | + if (to_serializer != src_serializer and |
343 | + self.target_repo._format.pack_compresses): |
344 | + self.target_repo.pack(hint=hint) |
345 | return [], set() |
346 | |
347 | def _extract_and_insert_inventories(self, substream, serializer): |
348 | |
349 | === modified file 'bzrlib/tests/per_repository/test_pack.py' |
350 | --- bzrlib/tests/per_repository/test_pack.py 2009-03-23 14:59:43 +0000 |
351 | +++ bzrlib/tests/per_repository/test_pack.py 2009-06-21 23:51:17 +0000 |
352 | @@ -24,3 +24,14 @@ |
353 | def test_pack_empty_does_not_error(self): |
354 | repo = self.make_repository('.') |
355 | repo.pack() |
356 | + |
357 | + def test_pack_accepts_opaque_hint(self): |
358 | + # For requesting packs of a repository where some data is known to be |
359 | + # unoptimal we permit packing just some data via a hint. If the hint is |
360 | + # illegible it is ignored. |
361 | + tree = self.make_branch_and_tree('tree') |
362 | + rev1 = tree.commit('1') |
363 | + rev2 = tree.commit('2') |
364 | + rev3 = tree.commit('3') |
365 | + rev4 = tree.commit('4') |
366 | + tree.branch.repository.pack(hint=[rev3, rev4]) |
367 | |
368 | === modified file 'bzrlib/tests/per_repository/test_repository.py' |
369 | --- bzrlib/tests/per_repository/test_repository.py 2009-06-17 21:33:03 +0000 |
370 | +++ bzrlib/tests/per_repository/test_repository.py 2009-06-19 04:19:22 +0000 |
371 | @@ -66,29 +66,29 @@ |
372 | |
373 | class TestRepository(TestCaseWithRepository): |
374 | |
375 | + def assertFormatAttribute(self, attribute, allowed_values): |
376 | + """Assert that the format has an attribute 'attribute'.""" |
377 | + repo = self.make_repository('repo') |
378 | + self.assertSubset([getattr(repo._format, attribute)], allowed_values) |
379 | + |
380 | def test_attribute__fetch_order(self): |
381 | """Test the the _fetch_order attribute.""" |
382 | - tree = self.make_branch_and_tree('tree') |
383 | - repo = tree.branch.repository |
384 | - self.assertTrue(repo._format._fetch_order in ('topological', 'unordered')) |
385 | + self.assertFormatAttribute('_fetch_order', ('topological', 'unordered')) |
386 | |
387 | def test_attribute__fetch_uses_deltas(self): |
388 | """Test the the _fetch_uses_deltas attribute.""" |
389 | - tree = self.make_branch_and_tree('tree') |
390 | - repo = tree.branch.repository |
391 | - self.assertTrue(repo._format._fetch_uses_deltas in (True, False)) |
392 | + self.assertFormatAttribute('_fetch_uses_deltas', (True, False)) |
393 | |
394 | def test_attribute_fast_deltas(self): |
395 | """Test the format.fast_deltas attribute.""" |
396 | - tree = self.make_branch_and_tree('tree') |
397 | - repo = tree.branch.repository |
398 | - self.assertTrue(repo._format.fast_deltas in (True, False)) |
399 | + self.assertFormatAttribute('fast_deltas', (True, False)) |
400 | |
401 | def test_attribute__fetch_reconcile(self): |
402 | """Test the the _fetch_reconcile attribute.""" |
403 | - tree = self.make_branch_and_tree('tree') |
404 | - repo = tree.branch.repository |
405 | - self.assertTrue(repo._format._fetch_reconcile in (True, False)) |
406 | + self.assertFormatAttribute('_fetch_reconcile', (True, False)) |
407 | + |
408 | + def test_attribute_format_pack_compresses(self): |
409 | + self.assertFormatAttribute('pack_compresses', (True, False)) |
410 | |
411 | def test_attribute_inventories_store(self): |
412 | """Test the existence of the inventories attribute.""" |
413 | |
414 | === modified file 'bzrlib/tests/per_repository/test_write_group.py' |
415 | --- bzrlib/tests/per_repository/test_write_group.py 2009-06-10 03:56:49 +0000 |
416 | +++ bzrlib/tests/per_repository/test_write_group.py 2009-06-22 02:25:09 +0000 |
417 | @@ -68,11 +68,14 @@ |
418 | repo.commit_write_group() |
419 | repo.unlock() |
420 | |
421 | - def test_commit_write_group_gets_None(self): |
422 | + def test_commit_write_group_does_not_error(self): |
423 | repo = self.make_repository('.') |
424 | repo.lock_write() |
425 | repo.start_write_group() |
426 | - self.assertEqual(None, repo.commit_write_group()) |
427 | + # commit_write_group can either return None (for repositories without |
428 | + # isolated transactions) or a hint for pack(). So we only check it |
429 | + # works in this interface test, because all repositories are exercised. |
430 | + repo.commit_write_group() |
431 | repo.unlock() |
432 | |
433 | def test_unlock_in_write_group(self): |
434 | |
435 | === modified file 'bzrlib/tests/test_pack_repository.py' |
436 | --- bzrlib/tests/test_pack_repository.py 2009-06-17 17:57:15 +0000 |
437 | +++ bzrlib/tests/test_pack_repository.py 2009-06-22 02:25:09 +0000 |
438 | @@ -238,6 +238,35 @@ |
439 | pack_names = [node[1][0] for node in index.iter_all_entries()] |
440 | self.assertTrue(large_pack_name in pack_names) |
441 | |
442 | + def test_commit_write_group_returns_new_pack_names(self): |
443 | + format = self.get_format() |
444 | + tree = self.make_branch_and_tree('foo', format=format) |
445 | + tree.commit('first post') |
446 | + repo = tree.branch.repository |
447 | + repo.lock_write() |
448 | + try: |
449 | + repo.start_write_group() |
450 | + try: |
451 | + inv = inventory.Inventory(revision_id="A") |
452 | + inv.root.revision = "A" |
453 | + repo.texts.add_lines((inv.root.file_id, "A"), [], []) |
454 | + rev = _mod_revision.Revision(timestamp=0, timezone=None, |
455 | + committer="Foo Bar <foo@example.com>", message="Message", |
456 | + revision_id="A") |
457 | + rev.parent_ids = () |
458 | + repo.add_revision("A", rev, inv=inv) |
459 | + except: |
460 | + repo.abort_write_group() |
461 | + raise |
462 | + else: |
463 | + old_names = repo._pack_collection._names.keys() |
464 | + result = repo.commit_write_group() |
465 | + cur_names = repo._pack_collection._names.keys() |
466 | + new_names = list(set(cur_names) - set(old_names)) |
467 | + self.assertEqual(new_names, result) |
468 | + finally: |
469 | + repo.unlock() |
470 | + |
471 | def test_fail_obsolete_deletion(self): |
472 | # failing to delete obsolete packs is not fatal |
473 | format = self.get_format() |
474 | |
475 | === modified file 'bzrlib/tests/test_repository.py' |
476 | --- bzrlib/tests/test_repository.py 2009-06-18 18:00:01 +0000 |
477 | +++ bzrlib/tests/test_repository.py 2009-06-22 06:15:41 +0000 |
478 | @@ -673,10 +673,14 @@ |
479 | self.assertFalse(repo._format.supports_external_lookups) |
480 | |
481 | |
482 | -class TestDevelopment6(TestCaseWithTransport): |
483 | +class Test2a(TestCaseWithTransport): |
484 | + |
485 | + def test_format_pack_compresses_True(self): |
486 | + repo = self.make_repository('repo', format='2a') |
487 | + self.assertTrue(repo._format.pack_compresses) |
488 | |
489 | def test_inventories_use_chk_map_with_parent_base_dict(self): |
490 | - tree = self.make_branch_and_tree('repo', format="development6-rich-root") |
491 | + tree = self.make_branch_and_tree('repo', format="2a") |
492 | revid = tree.commit("foo") |
493 | tree.lock_read() |
494 | self.addCleanup(tree.unlock) |
495 | @@ -688,14 +692,33 @@ |
496 | self.assertEqual(65536, |
497 | inv.parent_id_basename_to_file_id._root_node.maximum_size) |
498 | |
499 | + def test_pack_with_hint(self): |
500 | + tree = self.make_branch_and_tree('tree', format='2a') |
501 | + # 1 commit to leave untouched |
502 | + tree.commit('1') |
503 | + to_keep = tree.branch.repository._pack_collection.names() |
504 | + # 2 to combine |
505 | + tree.commit('2') |
506 | + tree.commit('3') |
507 | + all = tree.branch.repository._pack_collection.names() |
508 | + combine = list(set(all) - set(to_keep)) |
509 | + self.assertLength(3, all) |
510 | + self.assertLength(2, combine) |
511 | + tree.branch.repository.pack(hint=combine) |
512 | + final = tree.branch.repository._pack_collection.names() |
513 | + self.assertLength(2, final) |
514 | + self.assertFalse(combine[0] in final) |
515 | + self.assertFalse(combine[1] in final) |
516 | + self.assertSubset(to_keep, final) |
517 | + |
518 | def test_stream_source_to_gc(self): |
519 | - source = self.make_repository('source', format='development6-rich-root') |
520 | - target = self.make_repository('target', format='development6-rich-root') |
521 | + source = self.make_repository('source', format='2a') |
522 | + target = self.make_repository('target', format='2a') |
523 | stream = source._get_source(target._format) |
524 | self.assertIsInstance(stream, groupcompress_repo.GroupCHKStreamSource) |
525 | |
526 | def test_stream_source_to_non_gc(self): |
527 | - source = self.make_repository('source', format='development6-rich-root') |
528 | + source = self.make_repository('source', format='2a') |
529 | target = self.make_repository('target', format='rich-root-pack') |
530 | stream = source._get_source(target._format) |
531 | # We don't want the child GroupCHKStreamSource |
532 | @@ -703,7 +726,7 @@ |
533 | |
534 | def test_get_stream_for_missing_keys_includes_all_chk_refs(self): |
535 | source_builder = self.make_branch_builder('source', |
536 | - format='development6-rich-root') |
537 | + format='2a') |
538 | # We have to build a fairly large tree, so that we are sure the chk |
539 | # pages will have split into multiple pages. |
540 | entries = [('add', ('', 'a-root-id', 'directory', None))] |
541 | @@ -726,7 +749,7 @@ |
542 | source_branch = source_builder.get_branch() |
543 | source_branch.lock_read() |
544 | self.addCleanup(source_branch.unlock) |
545 | - target = self.make_repository('target', format='development6-rich-root') |
546 | + target = self.make_repository('target', format='2a') |
547 | source = source_branch.repository._get_source(target._format) |
548 | self.assertIsInstance(source, groupcompress_repo.GroupCHKStreamSource) |
549 | |
550 | @@ -1354,3 +1377,83 @@ |
551 | self.assertTrue(new_pack.inventory_index._optimize_for_size) |
552 | self.assertTrue(new_pack.text_index._optimize_for_size) |
553 | self.assertTrue(new_pack.signature_index._optimize_for_size) |
554 | + |
555 | + |
556 | +class TestCrossFormatPacks(TestCaseWithTransport): |
557 | + |
558 | + def log_pack(self, hint=None): |
559 | + self.calls.append(('pack', hint)) |
560 | + self.orig_pack(hint=hint) |
561 | + if self.expect_hint: |
562 | + self.assertTrue(hint) |
563 | + |
564 | + def run_stream(self, src_fmt, target_fmt, expect_pack_called): |
565 | + self.expect_hint = expect_pack_called |
566 | + self.calls = [] |
567 | + source_tree = self.make_branch_and_tree('src', format=src_fmt) |
568 | + source_tree.lock_write() |
569 | + self.addCleanup(source_tree.unlock) |
570 | + tip = source_tree.commit('foo') |
571 | + target = self.make_repository('target', format=target_fmt) |
572 | + target.lock_write() |
573 | + self.addCleanup(target.unlock) |
574 | + source = source_tree.branch.repository._get_source(target._format) |
575 | + self.orig_pack = target.pack |
576 | + target.pack = self.log_pack |
577 | + search = target.search_missing_revision_ids( |
578 | + source_tree.branch.repository, tip) |
579 | + stream = source.get_stream(search) |
580 | + from_format = source_tree.branch.repository._format |
581 | + sink = target._get_sink() |
582 | + sink.insert_stream(stream, from_format, []) |
583 | + if expect_pack_called: |
584 | + self.assertLength(1, self.calls) |
585 | + else: |
586 | + self.assertLength(0, self.calls) |
587 | + |
588 | + def run_fetch(self, src_fmt, target_fmt, expect_pack_called): |
589 | + self.expect_hint = expect_pack_called |
590 | + self.calls = [] |
591 | + source_tree = self.make_branch_and_tree('src', format=src_fmt) |
592 | + source_tree.lock_write() |
593 | + self.addCleanup(source_tree.unlock) |
594 | + tip = source_tree.commit('foo') |
595 | + target = self.make_repository('target', format=target_fmt) |
596 | + target.lock_write() |
597 | + self.addCleanup(target.unlock) |
598 | + source = source_tree.branch.repository |
599 | + self.orig_pack = target.pack |
600 | + target.pack = self.log_pack |
601 | + target.fetch(source) |
602 | + if expect_pack_called: |
603 | + self.assertLength(1, self.calls) |
604 | + else: |
605 | + self.assertLength(0, self.calls) |
606 | + |
607 | + def test_sink_format_hint_no(self): |
608 | + # When the target format says packing makes no difference, pack is not |
609 | + # called. |
610 | + self.run_stream('1.9', 'rich-root-pack', False) |
611 | + |
612 | + def test_sink_format_hint_yes(self): |
613 | + # When the target format says packing makes a difference, pack is |
614 | + # called. |
615 | + self.run_stream('1.9', '2a', True) |
616 | + |
617 | + def test_sink_format_same_no(self): |
618 | + # When the formats are the same, pack is not called. |
619 | + self.run_stream('2a', '2a', False) |
620 | + |
621 | + def test_IDS_format_hint_no(self): |
622 | + # When the target format says packing makes no difference, pack is not |
623 | + # called. |
624 | + self.run_fetch('1.9', 'rich-root-pack', False) |
625 | + |
626 | + def test_IDS_format_hint_yes(self): |
627 | + # When the target format says packing makes a difference, pack is |
628 | + # called. |
629 | + self.run_fetch('1.9', '2a', True) |
630 | + |
631 | + def test_IDS_format_same_no(self): |
632 | + # When the formats are the same, pack is not called. |
633 | + self.run_fetch('2a', '2a', False) |
634 |
This branch causes a partial pack to happen when fetching across
serialisers for both the IDS and streaming code paths.
On the way, I encountered bug 365615 and had to fix it. I'd like John in
particular to review the gc repo changes accordingly - while textually
small the implications are not, and he wrote the GCCHK packer.
-Rob