Merge lp:~jameinel/bzr/1.15-pack-source into lp:~bzr/bzr/trunk-old
- 1.15-pack-source
- Merge into trunk-old
Proposed by
John A Meinel
Status: | Merged |
---|---|
Merged at revision: | not available |
Proposed branch: | lp:~jameinel/bzr/1.15-pack-source |
Merge into: | lp:~bzr/bzr/trunk-old |
Diff against target: | 824 lines |
To merge this branch: | bzr merge lp:~jameinel/bzr/1.15-pack-source |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Martin Pool | Approve | ||
Review via email: mp+6985@code.launchpad.net |
Commit message
Description of the change
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote : | # |
Revision history for this message
Martin Pool (mbp) wrote : | # |
This looks ok to me, though you might want to run the concept past Robert.
review:
Approve
Revision history for this message
Robert Collins (lifeless) wrote : | # |
On Tue, 2009-06-16 at 05:33 +0000, Martin Pool wrote:
> Review: Approve
> This looks ok to me, though you might want to run the concept past Robert.
Conceptually fine. Using Packer was a hack when we had no interface able
to be efficient back in the days of single VersionedFile and Knits.
-Rob
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'bzrlib/fetch.py' | |||
2 | --- bzrlib/fetch.py 2009-06-10 03:56:49 +0000 | |||
3 | +++ bzrlib/fetch.py 2009-06-16 02:36:36 +0000 | |||
4 | @@ -51,9 +51,6 @@ | |||
5 | 51 | :param last_revision: If set, try to limit to the data this revision | 51 | :param last_revision: If set, try to limit to the data this revision |
6 | 52 | references. | 52 | references. |
7 | 53 | :param find_ghosts: If True search the entire history for ghosts. | 53 | :param find_ghosts: If True search the entire history for ghosts. |
8 | 54 | :param _write_group_acquired_callable: Don't use; this parameter only | ||
9 | 55 | exists to facilitate a hack done in InterPackRepo.fetch. We would | ||
10 | 56 | like to remove this parameter. | ||
11 | 57 | :param pb: ProgressBar object to use; deprecated and ignored. | 54 | :param pb: ProgressBar object to use; deprecated and ignored. |
12 | 58 | This method will just create one on top of the stack. | 55 | This method will just create one on top of the stack. |
13 | 59 | """ | 56 | """ |
14 | 60 | 57 | ||
15 | === modified file 'bzrlib/repofmt/groupcompress_repo.py' | |||
16 | --- bzrlib/repofmt/groupcompress_repo.py 2009-06-12 01:11:00 +0000 | |||
17 | +++ bzrlib/repofmt/groupcompress_repo.py 2009-06-16 02:36:36 +0000 | |||
18 | @@ -48,6 +48,7 @@ | |||
19 | 48 | Pack, | 48 | Pack, |
20 | 49 | NewPack, | 49 | NewPack, |
21 | 50 | KnitPackRepository, | 50 | KnitPackRepository, |
22 | 51 | KnitPackStreamSource, | ||
23 | 51 | PackRootCommitBuilder, | 52 | PackRootCommitBuilder, |
24 | 52 | RepositoryPackCollection, | 53 | RepositoryPackCollection, |
25 | 53 | RepositoryFormatPack, | 54 | RepositoryFormatPack, |
26 | @@ -736,21 +737,10 @@ | |||
27 | 736 | # make it raise to trap naughty direct users. | 737 | # make it raise to trap naughty direct users. |
28 | 737 | raise NotImplementedError(self._iter_inventory_xmls) | 738 | raise NotImplementedError(self._iter_inventory_xmls) |
29 | 738 | 739 | ||
45 | 739 | def _find_parent_ids_of_revisions(self, revision_ids): | 740 | def _find_present_inventory_keys(self, revision_keys): |
46 | 740 | # TODO: we probably want to make this a helper that other code can get | 741 | parent_map = self.inventories.get_parent_map(revision_keys) |
47 | 741 | # at | 742 | present_inventory_keys = set(k for k in parent_map) |
48 | 742 | parent_map = self.get_parent_map(revision_ids) | 743 | return present_inventory_keys |
34 | 743 | parents = set() | ||
35 | 744 | map(parents.update, parent_map.itervalues()) | ||
36 | 745 | parents.difference_update(revision_ids) | ||
37 | 746 | parents.discard(_mod_revision.NULL_REVISION) | ||
38 | 747 | return parents | ||
39 | 748 | |||
40 | 749 | def _find_present_inventory_ids(self, revision_ids): | ||
41 | 750 | keys = [(r,) for r in revision_ids] | ||
42 | 751 | parent_map = self.inventories.get_parent_map(keys) | ||
43 | 752 | present_inventory_ids = set(k[-1] for k in parent_map) | ||
44 | 753 | return present_inventory_ids | ||
49 | 754 | 744 | ||
50 | 755 | def fileids_altered_by_revision_ids(self, revision_ids, _inv_weave=None): | 745 | def fileids_altered_by_revision_ids(self, revision_ids, _inv_weave=None): |
51 | 756 | """Find the file ids and versions affected by revisions. | 746 | """Find the file ids and versions affected by revisions. |
52 | @@ -767,12 +757,20 @@ | |||
53 | 767 | file_id_revisions = {} | 757 | file_id_revisions = {} |
54 | 768 | pb = ui.ui_factory.nested_progress_bar() | 758 | pb = ui.ui_factory.nested_progress_bar() |
55 | 769 | try: | 759 | try: |
58 | 770 | parent_ids = self._find_parent_ids_of_revisions(revision_ids) | 760 | revision_keys = [(r,) for r in revision_ids] |
59 | 771 | present_parent_inv_ids = self._find_present_inventory_ids(parent_ids) | 761 | parent_keys = self._find_parent_keys_of_revisions(revision_keys) |
60 | 762 | # TODO: instead of using _find_present_inventory_keys, change the | ||
61 | 763 | # code paths to allow missing inventories to be tolerated. | ||
62 | 764 | # However, we only want to tolerate missing parent | ||
63 | 765 | # inventories, not missing inventories for revision_ids | ||
64 | 766 | present_parent_inv_keys = self._find_present_inventory_keys( | ||
65 | 767 | parent_keys) | ||
66 | 768 | present_parent_inv_ids = set( | ||
67 | 769 | [k[-1] for k in present_parent_inv_keys]) | ||
68 | 772 | uninteresting_root_keys = set() | 770 | uninteresting_root_keys = set() |
69 | 773 | interesting_root_keys = set() | 771 | interesting_root_keys = set() |
72 | 774 | inventories_to_read = set(present_parent_inv_ids) | 772 | inventories_to_read = set(revision_ids) |
73 | 775 | inventories_to_read.update(revision_ids) | 773 | inventories_to_read.update(present_parent_inv_ids) |
74 | 776 | for inv in self.iter_inventories(inventories_to_read): | 774 | for inv in self.iter_inventories(inventories_to_read): |
75 | 777 | entry_chk_root_key = inv.id_to_entry.key() | 775 | entry_chk_root_key = inv.id_to_entry.key() |
76 | 778 | if inv.revision_id in present_parent_inv_ids: | 776 | if inv.revision_id in present_parent_inv_ids: |
77 | @@ -846,7 +844,7 @@ | |||
78 | 846 | return super(CHKInventoryRepository, self)._get_source(to_format) | 844 | return super(CHKInventoryRepository, self)._get_source(to_format) |
79 | 847 | 845 | ||
80 | 848 | 846 | ||
82 | 849 | class GroupCHKStreamSource(repository.StreamSource): | 847 | class GroupCHKStreamSource(KnitPackStreamSource): |
83 | 850 | """Used when both the source and target repo are GroupCHK repos.""" | 848 | """Used when both the source and target repo are GroupCHK repos.""" |
84 | 851 | 849 | ||
85 | 852 | def __init__(self, from_repository, to_format): | 850 | def __init__(self, from_repository, to_format): |
86 | @@ -854,6 +852,7 @@ | |||
87 | 854 | super(GroupCHKStreamSource, self).__init__(from_repository, to_format) | 852 | super(GroupCHKStreamSource, self).__init__(from_repository, to_format) |
88 | 855 | self._revision_keys = None | 853 | self._revision_keys = None |
89 | 856 | self._text_keys = None | 854 | self._text_keys = None |
90 | 855 | self._text_fetch_order = 'groupcompress' | ||
91 | 857 | self._chk_id_roots = None | 856 | self._chk_id_roots = None |
92 | 858 | self._chk_p_id_roots = None | 857 | self._chk_p_id_roots = None |
93 | 859 | 858 | ||
94 | @@ -898,16 +897,10 @@ | |||
95 | 898 | p_id_roots_set.clear() | 897 | p_id_roots_set.clear() |
96 | 899 | return ('inventories', _filtered_inv_stream()) | 898 | return ('inventories', _filtered_inv_stream()) |
97 | 900 | 899 | ||
105 | 901 | def _find_present_inventories(self, revision_ids): | 900 | def _get_filtered_chk_streams(self, excluded_revision_keys): |
99 | 902 | revision_keys = [(r,) for r in revision_ids] | ||
100 | 903 | inventories = self.from_repository.inventories | ||
101 | 904 | present_inventories = inventories.get_parent_map(revision_keys) | ||
102 | 905 | return [p[-1] for p in present_inventories] | ||
103 | 906 | |||
104 | 907 | def _get_filtered_chk_streams(self, excluded_revision_ids): | ||
106 | 908 | self._text_keys = set() | 901 | self._text_keys = set() |
109 | 909 | excluded_revision_ids.discard(_mod_revision.NULL_REVISION) | 902 | excluded_revision_keys.discard(_mod_revision.NULL_REVISION) |
110 | 910 | if not excluded_revision_ids: | 903 | if not excluded_revision_keys: |
111 | 911 | uninteresting_root_keys = set() | 904 | uninteresting_root_keys = set() |
112 | 912 | uninteresting_pid_root_keys = set() | 905 | uninteresting_pid_root_keys = set() |
113 | 913 | else: | 906 | else: |
114 | @@ -915,9 +908,9 @@ | |||
115 | 915 | # actually present | 908 | # actually present |
116 | 916 | # TODO: Update Repository.iter_inventories() to add | 909 | # TODO: Update Repository.iter_inventories() to add |
117 | 917 | # ignore_missing=True | 910 | # ignore_missing=True |
121 | 918 | present_ids = self.from_repository._find_present_inventory_ids( | 911 | present_keys = self.from_repository._find_present_inventory_keys( |
122 | 919 | excluded_revision_ids) | 912 | excluded_revision_keys) |
123 | 920 | present_ids = self._find_present_inventories(excluded_revision_ids) | 913 | present_ids = [k[-1] for k in present_keys] |
124 | 921 | uninteresting_root_keys = set() | 914 | uninteresting_root_keys = set() |
125 | 922 | uninteresting_pid_root_keys = set() | 915 | uninteresting_pid_root_keys = set() |
126 | 923 | for inv in self.from_repository.iter_inventories(present_ids): | 916 | for inv in self.from_repository.iter_inventories(present_ids): |
127 | @@ -948,14 +941,6 @@ | |||
128 | 948 | self._chk_p_id_roots = None | 941 | self._chk_p_id_roots = None |
129 | 949 | yield 'chk_bytes', _get_parent_id_basename_to_file_id_pages() | 942 | yield 'chk_bytes', _get_parent_id_basename_to_file_id_pages() |
130 | 950 | 943 | ||
131 | 951 | def _get_text_stream(self): | ||
132 | 952 | # Note: We know we don't have to handle adding root keys, because both | ||
133 | 953 | # the source and target are GCCHK, and those always support rich-roots | ||
134 | 954 | # We may want to request as 'unordered', in case the source has done a | ||
135 | 955 | # 'split' packing | ||
136 | 956 | return ('texts', self.from_repository.texts.get_record_stream( | ||
137 | 957 | self._text_keys, 'groupcompress', False)) | ||
138 | 958 | |||
139 | 959 | def get_stream(self, search): | 944 | def get_stream(self, search): |
140 | 960 | revision_ids = search.get_keys() | 945 | revision_ids = search.get_keys() |
141 | 961 | for stream_info in self._fetch_revision_texts(revision_ids): | 946 | for stream_info in self._fetch_revision_texts(revision_ids): |
142 | @@ -966,8 +951,9 @@ | |||
143 | 966 | # For now, exclude all parents that are at the edge of ancestry, for | 951 | # For now, exclude all parents that are at the edge of ancestry, for |
144 | 967 | # which we have inventories | 952 | # which we have inventories |
145 | 968 | from_repo = self.from_repository | 953 | from_repo = self.from_repository |
148 | 969 | parent_ids = from_repo._find_parent_ids_of_revisions(revision_ids) | 954 | parent_keys = from_repo._find_parent_keys_of_revisions( |
149 | 970 | for stream_info in self._get_filtered_chk_streams(parent_ids): | 955 | self._revision_keys) |
150 | 956 | for stream_info in self._get_filtered_chk_streams(parent_keys): | ||
151 | 971 | yield stream_info | 957 | yield stream_info |
152 | 972 | yield self._get_text_stream() | 958 | yield self._get_text_stream() |
153 | 973 | 959 | ||
154 | @@ -991,8 +977,8 @@ | |||
155 | 991 | # no unavailable texts when the ghost inventories are not filled in. | 977 | # no unavailable texts when the ghost inventories are not filled in. |
156 | 992 | yield self._get_inventory_stream(missing_inventory_keys, | 978 | yield self._get_inventory_stream(missing_inventory_keys, |
157 | 993 | allow_absent=True) | 979 | allow_absent=True) |
160 | 994 | # We use the empty set for excluded_revision_ids, to make it clear that | 980 | # We use the empty set for excluded_revision_keys, to make it clear |
161 | 995 | # we want to transmit all referenced chk pages. | 981 | # that we want to transmit all referenced chk pages. |
162 | 996 | for stream_info in self._get_filtered_chk_streams(set()): | 982 | for stream_info in self._get_filtered_chk_streams(set()): |
163 | 997 | yield stream_info | 983 | yield stream_info |
164 | 998 | 984 | ||
165 | 999 | 985 | ||
166 | === modified file 'bzrlib/repofmt/pack_repo.py' | |||
167 | --- bzrlib/repofmt/pack_repo.py 2009-06-10 03:56:49 +0000 | |||
168 | +++ bzrlib/repofmt/pack_repo.py 2009-06-16 02:36:36 +0000 | |||
169 | @@ -73,6 +73,7 @@ | |||
170 | 73 | MetaDirRepositoryFormat, | 73 | MetaDirRepositoryFormat, |
171 | 74 | RepositoryFormat, | 74 | RepositoryFormat, |
172 | 75 | RootCommitBuilder, | 75 | RootCommitBuilder, |
173 | 76 | StreamSource, | ||
174 | 76 | ) | 77 | ) |
175 | 77 | import bzrlib.revision as _mod_revision | 78 | import bzrlib.revision as _mod_revision |
176 | 78 | from bzrlib.trace import ( | 79 | from bzrlib.trace import ( |
177 | @@ -2265,6 +2266,11 @@ | |||
178 | 2265 | pb.finished() | 2266 | pb.finished() |
179 | 2266 | return result | 2267 | return result |
180 | 2267 | 2268 | ||
181 | 2269 | def _get_source(self, to_format): | ||
182 | 2270 | if to_format.network_name() == self._format.network_name(): | ||
183 | 2271 | return KnitPackStreamSource(self, to_format) | ||
184 | 2272 | return super(KnitPackRepository, self)._get_source(to_format) | ||
185 | 2273 | |||
186 | 2268 | def _make_parents_provider(self): | 2274 | def _make_parents_provider(self): |
187 | 2269 | return graph.CachingParentsProvider(self) | 2275 | return graph.CachingParentsProvider(self) |
188 | 2270 | 2276 | ||
189 | @@ -2384,6 +2390,79 @@ | |||
190 | 2384 | repo.unlock() | 2390 | repo.unlock() |
191 | 2385 | 2391 | ||
192 | 2386 | 2392 | ||
193 | 2393 | class KnitPackStreamSource(StreamSource): | ||
194 | 2394 | """A StreamSource used to transfer data between same-format KnitPack repos. | ||
195 | 2395 | |||
196 | 2396 | This source assumes: | ||
197 | 2397 | 1) Same serialization format for all objects | ||
198 | 2398 | 2) Same root information | ||
199 | 2399 | 3) XML format inventories | ||
200 | 2400 | 4) Atomic inserts (so we can stream inventory texts before text | ||
201 | 2401 | content) | ||
202 | 2402 | 5) No chk_bytes | ||
203 | 2403 | """ | ||
204 | 2404 | |||
205 | 2405 | def __init__(self, from_repository, to_format): | ||
206 | 2406 | super(KnitPackStreamSource, self).__init__(from_repository, to_format) | ||
207 | 2407 | self._text_keys = None | ||
208 | 2408 | self._text_fetch_order = 'unordered' | ||
209 | 2409 | |||
210 | 2410 | def _get_filtered_inv_stream(self, revision_ids): | ||
211 | 2411 | from_repo = self.from_repository | ||
212 | 2412 | parent_ids = from_repo._find_parent_ids_of_revisions(revision_ids) | ||
213 | 2413 | parent_keys = [(p,) for p in parent_ids] | ||
214 | 2414 | find_text_keys = from_repo._find_text_key_references_from_xml_inventory_lines | ||
215 | 2415 | parent_text_keys = set(find_text_keys( | ||
216 | 2416 | from_repo._inventory_xml_lines_for_keys(parent_keys))) | ||
217 | 2417 | content_text_keys = set() | ||
218 | 2418 | knit = KnitVersionedFiles(None, None) | ||
219 | 2419 | factory = KnitPlainFactory() | ||
220 | 2420 | def find_text_keys_from_content(record): | ||
221 | 2421 | if record.storage_kind not in ('knit-delta-gz', 'knit-ft-gz'): | ||
222 | 2422 | raise ValueError("Unknown content storage kind for" | ||
223 | 2423 | " inventory text: %s" % (record.storage_kind,)) | ||
224 | 2424 | # It's a knit record, it has a _raw_record field (even if it was | ||
225 | 2425 | # reconstituted from a network stream). | ||
226 | 2426 | raw_data = record._raw_record | ||
227 | 2427 | # read the entire thing | ||
228 | 2428 | revision_id = record.key[-1] | ||
229 | 2429 | content, _ = knit._parse_record(revision_id, raw_data) | ||
230 | 2430 | if record.storage_kind == 'knit-delta-gz': | ||
231 | 2431 | line_iterator = factory.get_linedelta_content(content) | ||
232 | 2432 | elif record.storage_kind == 'knit-ft-gz': | ||
233 | 2433 | line_iterator = factory.get_fulltext_content(content) | ||
234 | 2434 | content_text_keys.update(find_text_keys( | ||
235 | 2435 | [(line, revision_id) for line in line_iterator])) | ||
236 | 2436 | revision_keys = [(r,) for r in revision_ids] | ||
237 | 2437 | def _filtered_inv_stream(): | ||
238 | 2438 | source_vf = from_repo.inventories | ||
239 | 2439 | stream = source_vf.get_record_stream(revision_keys, | ||
240 | 2440 | 'unordered', False) | ||
241 | 2441 | for record in stream: | ||
242 | 2442 | if record.storage_kind == 'absent': | ||
243 | 2443 | raise errors.NoSuchRevision(from_repo, record.key) | ||
244 | 2444 | find_text_keys_from_content(record) | ||
245 | 2445 | yield record | ||
246 | 2446 | self._text_keys = content_text_keys - parent_text_keys | ||
247 | 2447 | return ('inventories', _filtered_inv_stream()) | ||
248 | 2448 | |||
249 | 2449 | def _get_text_stream(self): | ||
250 | 2450 | # Note: We know we don't have to handle adding root keys, because both | ||
251 | 2451 | # the source and target are the identical network name. | ||
252 | 2452 | text_stream = self.from_repository.texts.get_record_stream( | ||
253 | 2453 | self._text_keys, self._text_fetch_order, False) | ||
254 | 2454 | return ('texts', text_stream) | ||
255 | 2455 | |||
256 | 2456 | def get_stream(self, search): | ||
257 | 2457 | revision_ids = search.get_keys() | ||
258 | 2458 | for stream_info in self._fetch_revision_texts(revision_ids): | ||
259 | 2459 | yield stream_info | ||
260 | 2460 | self._revision_keys = [(rev_id,) for rev_id in revision_ids] | ||
261 | 2461 | yield self._get_filtered_inv_stream(revision_ids) | ||
262 | 2462 | yield self._get_text_stream() | ||
263 | 2463 | |||
264 | 2464 | |||
265 | 2465 | |||
266 | 2387 | class RepositoryFormatPack(MetaDirRepositoryFormat): | 2466 | class RepositoryFormatPack(MetaDirRepositoryFormat): |
267 | 2388 | """Format logic for pack structured repositories. | 2467 | """Format logic for pack structured repositories. |
268 | 2389 | 2468 | ||
269 | 2390 | 2469 | ||
270 | === modified file 'bzrlib/repository.py' | |||
271 | --- bzrlib/repository.py 2009-06-12 01:11:00 +0000 | |||
272 | +++ bzrlib/repository.py 2009-06-16 02:36:36 +0000 | |||
273 | @@ -1919,29 +1919,25 @@ | |||
274 | 1919 | yield line, revid | 1919 | yield line, revid |
275 | 1920 | 1920 | ||
276 | 1921 | def _find_file_ids_from_xml_inventory_lines(self, line_iterator, | 1921 | def _find_file_ids_from_xml_inventory_lines(self, line_iterator, |
278 | 1922 | revision_ids): | 1922 | revision_keys): |
279 | 1923 | """Helper routine for fileids_altered_by_revision_ids. | 1923 | """Helper routine for fileids_altered_by_revision_ids. |
280 | 1924 | 1924 | ||
281 | 1925 | This performs the translation of xml lines to revision ids. | 1925 | This performs the translation of xml lines to revision ids. |
282 | 1926 | 1926 | ||
283 | 1927 | :param line_iterator: An iterator of lines, origin_version_id | 1927 | :param line_iterator: An iterator of lines, origin_version_id |
285 | 1928 | :param revision_ids: The revision ids to filter for. This should be a | 1928 | :param revision_keys: The revision ids to filter for. This should be a |
286 | 1929 | set or other type which supports efficient __contains__ lookups, as | 1929 | set or other type which supports efficient __contains__ lookups, as |
289 | 1930 | the revision id from each parsed line will be looked up in the | 1930 | the revision key from each parsed line will be looked up in the |
290 | 1931 | revision_ids filter. | 1931 | revision_keys filter. |
291 | 1932 | :return: a dictionary mapping altered file-ids to an iterable of | 1932 | :return: a dictionary mapping altered file-ids to an iterable of |
292 | 1933 | revision_ids. Each altered file-ids has the exact revision_ids that | 1933 | revision_ids. Each altered file-ids has the exact revision_ids that |
293 | 1934 | altered it listed explicitly. | 1934 | altered it listed explicitly. |
294 | 1935 | """ | 1935 | """ |
295 | 1936 | seen = set(self._find_text_key_references_from_xml_inventory_lines( | 1936 | seen = set(self._find_text_key_references_from_xml_inventory_lines( |
296 | 1937 | line_iterator).iterkeys()) | 1937 | line_iterator).iterkeys()) |
302 | 1938 | # Note that revision_ids are revision keys. | 1938 | parent_keys = self._find_parent_keys_of_revisions(revision_keys) |
298 | 1939 | parent_maps = self.revisions.get_parent_map(revision_ids) | ||
299 | 1940 | parents = set() | ||
300 | 1941 | map(parents.update, parent_maps.itervalues()) | ||
301 | 1942 | parents.difference_update(revision_ids) | ||
303 | 1943 | parent_seen = set(self._find_text_key_references_from_xml_inventory_lines( | 1939 | parent_seen = set(self._find_text_key_references_from_xml_inventory_lines( |
305 | 1944 | self._inventory_xml_lines_for_keys(parents))) | 1940 | self._inventory_xml_lines_for_keys(parent_keys))) |
306 | 1945 | new_keys = seen - parent_seen | 1941 | new_keys = seen - parent_seen |
307 | 1946 | result = {} | 1942 | result = {} |
308 | 1947 | setdefault = result.setdefault | 1943 | setdefault = result.setdefault |
309 | @@ -1949,6 +1945,33 @@ | |||
310 | 1949 | setdefault(key[0], set()).add(key[-1]) | 1945 | setdefault(key[0], set()).add(key[-1]) |
311 | 1950 | return result | 1946 | return result |
312 | 1951 | 1947 | ||
313 | 1948 | def _find_parent_ids_of_revisions(self, revision_ids): | ||
314 | 1949 | """Find all parent ids that are mentioned in the revision graph. | ||
315 | 1950 | |||
316 | 1951 | :return: set of revisions that are parents of revision_ids which are | ||
317 | 1952 | not part of revision_ids themselves | ||
318 | 1953 | """ | ||
319 | 1954 | parent_map = self.get_parent_map(revision_ids) | ||
320 | 1955 | parent_ids = set() | ||
321 | 1956 | map(parent_ids.update, parent_map.itervalues()) | ||
322 | 1957 | parent_ids.difference_update(revision_ids) | ||
323 | 1958 | parent_ids.discard(_mod_revision.NULL_REVISION) | ||
324 | 1959 | return parent_ids | ||
325 | 1960 | |||
326 | 1961 | def _find_parent_keys_of_revisions(self, revision_keys): | ||
327 | 1962 | """Similar to _find_parent_ids_of_revisions, but used with keys. | ||
328 | 1963 | |||
329 | 1964 | :param revision_keys: An iterable of revision_keys. | ||
330 | 1965 | :return: The parents of all revision_keys that are not already in | ||
331 | 1966 | revision_keys | ||
332 | 1967 | """ | ||
333 | 1968 | parent_map = self.revisions.get_parent_map(revision_keys) | ||
334 | 1969 | parent_keys = set() | ||
335 | 1970 | map(parent_keys.update, parent_map.itervalues()) | ||
336 | 1971 | parent_keys.difference_update(revision_keys) | ||
337 | 1972 | parent_keys.discard(_mod_revision.NULL_REVISION) | ||
338 | 1973 | return parent_keys | ||
339 | 1974 | |||
340 | 1952 | def fileids_altered_by_revision_ids(self, revision_ids, _inv_weave=None): | 1975 | def fileids_altered_by_revision_ids(self, revision_ids, _inv_weave=None): |
341 | 1953 | """Find the file ids and versions affected by revisions. | 1976 | """Find the file ids and versions affected by revisions. |
342 | 1954 | 1977 | ||
343 | @@ -3418,144 +3441,6 @@ | |||
344 | 3418 | return self.source.revision_ids_to_search_result(result_set) | 3441 | return self.source.revision_ids_to_search_result(result_set) |
345 | 3419 | 3442 | ||
346 | 3420 | 3443 | ||
347 | 3421 | class InterPackRepo(InterSameDataRepository): | ||
348 | 3422 | """Optimised code paths between Pack based repositories.""" | ||
349 | 3423 | |||
350 | 3424 | @classmethod | ||
351 | 3425 | def _get_repo_format_to_test(self): | ||
352 | 3426 | from bzrlib.repofmt import pack_repo | ||
353 | 3427 | return pack_repo.RepositoryFormatKnitPack6RichRoot() | ||
354 | 3428 | |||
355 | 3429 | @staticmethod | ||
356 | 3430 | def is_compatible(source, target): | ||
357 | 3431 | """Be compatible with known Pack formats. | ||
358 | 3432 | |||
359 | 3433 | We don't test for the stores being of specific types because that | ||
360 | 3434 | could lead to confusing results, and there is no need to be | ||
361 | 3435 | overly general. | ||
362 | 3436 | |||
363 | 3437 | InterPackRepo does not support CHK based repositories. | ||
364 | 3438 | """ | ||
365 | 3439 | from bzrlib.repofmt.pack_repo import RepositoryFormatPack | ||
366 | 3440 | from bzrlib.repofmt.groupcompress_repo import RepositoryFormatCHK1 | ||
367 | 3441 | try: | ||
368 | 3442 | are_packs = (isinstance(source._format, RepositoryFormatPack) and | ||
369 | 3443 | isinstance(target._format, RepositoryFormatPack)) | ||
370 | 3444 | not_packs = (isinstance(source._format, RepositoryFormatCHK1) or | ||
371 | 3445 | isinstance(target._format, RepositoryFormatCHK1)) | ||
372 | 3446 | except AttributeError: | ||
373 | 3447 | return False | ||
374 | 3448 | if not_packs or not are_packs: | ||
375 | 3449 | return False | ||
376 | 3450 | return InterRepository._same_model(source, target) | ||
377 | 3451 | |||
378 | 3452 | @needs_write_lock | ||
379 | 3453 | def fetch(self, revision_id=None, pb=None, find_ghosts=False, | ||
380 | 3454 | fetch_spec=None): | ||
381 | 3455 | """See InterRepository.fetch().""" | ||
382 | 3456 | if (len(self.source._fallback_repositories) > 0 or | ||
383 | 3457 | len(self.target._fallback_repositories) > 0): | ||
384 | 3458 | # The pack layer is not aware of fallback repositories, so when | ||
385 | 3459 | # fetching from a stacked repository or into a stacked repository | ||
386 | 3460 | # we use the generic fetch logic which uses the VersionedFiles | ||
387 | 3461 | # attributes on repository. | ||
388 | 3462 | from bzrlib.fetch import RepoFetcher | ||
389 | 3463 | fetcher = RepoFetcher(self.target, self.source, revision_id, | ||
390 | 3464 | pb, find_ghosts, fetch_spec=fetch_spec) | ||
391 | 3465 | if fetch_spec is not None: | ||
392 | 3466 | if len(list(fetch_spec.heads)) != 1: | ||
393 | 3467 | raise AssertionError( | ||
394 | 3468 | "InterPackRepo.fetch doesn't support " | ||
395 | 3469 | "fetching multiple heads yet.") | ||
396 | 3470 | revision_id = list(fetch_spec.heads)[0] | ||
397 | 3471 | fetch_spec = None | ||
398 | 3472 | if revision_id is None: | ||
399 | 3473 | # TODO: | ||
400 | 3474 | # everything to do - use pack logic | ||
401 | 3475 | # to fetch from all packs to one without | ||
402 | 3476 | # inventory parsing etc, IFF nothing to be copied is in the target. | ||
403 | 3477 | # till then: | ||
404 | 3478 | source_revision_ids = frozenset(self.source.all_revision_ids()) | ||
405 | 3479 | revision_ids = source_revision_ids - \ | ||
406 | 3480 | frozenset(self.target.get_parent_map(source_revision_ids)) | ||
407 | 3481 | revision_keys = [(revid,) for revid in revision_ids] | ||
408 | 3482 | index = self.target._pack_collection.revision_index.combined_index | ||
409 | 3483 | present_revision_ids = set(item[1][0] for item in | ||
410 | 3484 | index.iter_entries(revision_keys)) | ||
411 | 3485 | revision_ids = set(revision_ids) - present_revision_ids | ||
412 | 3486 | # implementing the TODO will involve: | ||
413 | 3487 | # - detecting when all of a pack is selected | ||
414 | 3488 | # - avoiding as much as possible pre-selection, so the | ||
415 | 3489 | # more-core routines such as create_pack_from_packs can filter in | ||
416 | 3490 | # a just-in-time fashion. (though having a HEADS list on a | ||
417 | 3491 | # repository might make this a lot easier, because we could | ||
418 | 3492 | # sensibly detect 'new revisions' without doing a full index scan. | ||
419 | 3493 | elif _mod_revision.is_null(revision_id): | ||
420 | 3494 | # nothing to do: | ||
421 | 3495 | return (0, []) | ||
422 | 3496 | else: | ||
423 | 3497 | revision_ids = self.search_missing_revision_ids(revision_id, | ||
424 | 3498 | find_ghosts=find_ghosts).get_keys() | ||
425 | 3499 | if len(revision_ids) == 0: | ||
426 | 3500 | return (0, []) | ||
427 | 3501 | return self._pack(self.source, self.target, revision_ids) | ||
428 | 3502 | |||
429 | 3503 | def _pack(self, source, target, revision_ids): | ||
430 | 3504 | from bzrlib.repofmt.pack_repo import Packer | ||
431 | 3505 | packs = source._pack_collection.all_packs() | ||
432 | 3506 | pack = Packer(self.target._pack_collection, packs, '.fetch', | ||
433 | 3507 | revision_ids).pack() | ||
434 | 3508 | if pack is not None: | ||
435 | 3509 | self.target._pack_collection._save_pack_names() | ||
436 | 3510 | copied_revs = pack.get_revision_count() | ||
437 | 3511 | # Trigger an autopack. This may duplicate effort as we've just done | ||
438 | 3512 | # a pack creation, but for now it is simpler to think about as | ||
439 | 3513 | # 'upload data, then repack if needed'. | ||
440 | 3514 | self.target._pack_collection.autopack() | ||
441 | 3515 | return (copied_revs, []) | ||
442 | 3516 | else: | ||
443 | 3517 | return (0, []) | ||
444 | 3518 | |||
445 | 3519 | @needs_read_lock | ||
446 | 3520 | def search_missing_revision_ids(self, revision_id=None, find_ghosts=True): | ||
447 | 3521 | """See InterRepository.missing_revision_ids(). | ||
448 | 3522 | |||
449 | 3523 | :param find_ghosts: Find ghosts throughout the ancestry of | ||
450 | 3524 | revision_id. | ||
451 | 3525 | """ | ||
452 | 3526 | if not find_ghosts and revision_id is not None: | ||
453 | 3527 | return self._walk_to_common_revisions([revision_id]) | ||
454 | 3528 | elif revision_id is not None: | ||
455 | 3529 | # Find ghosts: search for revisions pointing from one repository to | ||
456 | 3530 | # the other, and vice versa, anywhere in the history of revision_id. | ||
457 | 3531 | graph = self.target.get_graph(other_repository=self.source) | ||
458 | 3532 | searcher = graph._make_breadth_first_searcher([revision_id]) | ||
459 | 3533 | found_ids = set() | ||
460 | 3534 | while True: | ||
461 | 3535 | try: | ||
462 | 3536 | next_revs, ghosts = searcher.next_with_ghosts() | ||
463 | 3537 | except StopIteration: | ||
464 | 3538 | break | ||
465 | 3539 | if revision_id in ghosts: | ||
466 | 3540 | raise errors.NoSuchRevision(self.source, revision_id) | ||
467 | 3541 | found_ids.update(next_revs) | ||
468 | 3542 | found_ids.update(ghosts) | ||
469 | 3543 | found_ids = frozenset(found_ids) | ||
470 | 3544 | # Double query here: should be able to avoid this by changing the | ||
471 | 3545 | # graph api further. | ||
472 | 3546 | result_set = found_ids - frozenset( | ||
473 | 3547 | self.target.get_parent_map(found_ids)) | ||
474 | 3548 | else: | ||
475 | 3549 | source_ids = self.source.all_revision_ids() | ||
476 | 3550 | # source_ids is the worst possible case we may need to pull. | ||
477 | 3551 | # now we want to filter source_ids against what we actually | ||
478 | 3552 | # have in target, but don't try to check for existence where we know | ||
479 | 3553 | # we do not have a revision as that would be pointless. | ||
480 | 3554 | target_ids = set(self.target.all_revision_ids()) | ||
481 | 3555 | result_set = set(source_ids).difference(target_ids) | ||
482 | 3556 | return self.source.revision_ids_to_search_result(result_set) | ||
483 | 3557 | |||
484 | 3558 | |||
485 | 3559 | class InterDifferingSerializer(InterRepository): | 3444 | class InterDifferingSerializer(InterRepository): |
486 | 3560 | 3445 | ||
487 | 3561 | @classmethod | 3446 | @classmethod |
488 | @@ -3836,7 +3721,6 @@ | |||
489 | 3836 | InterRepository.register_optimiser(InterSameDataRepository) | 3721 | InterRepository.register_optimiser(InterSameDataRepository) |
490 | 3837 | InterRepository.register_optimiser(InterWeaveRepo) | 3722 | InterRepository.register_optimiser(InterWeaveRepo) |
491 | 3838 | InterRepository.register_optimiser(InterKnitRepo) | 3723 | InterRepository.register_optimiser(InterKnitRepo) |
492 | 3839 | InterRepository.register_optimiser(InterPackRepo) | ||
493 | 3840 | 3724 | ||
494 | 3841 | 3725 | ||
495 | 3842 | class CopyConverter(object): | 3726 | class CopyConverter(object): |
496 | 3843 | 3727 | ||
497 | === modified file 'bzrlib/tests/test_pack_repository.py' | |||
498 | --- bzrlib/tests/test_pack_repository.py 2009-06-10 03:56:49 +0000 | |||
499 | +++ bzrlib/tests/test_pack_repository.py 2009-06-16 02:36:36 +0000 | |||
500 | @@ -38,6 +38,10 @@ | |||
501 | 38 | upgrade, | 38 | upgrade, |
502 | 39 | workingtree, | 39 | workingtree, |
503 | 40 | ) | 40 | ) |
504 | 41 | from bzrlib.repofmt import ( | ||
505 | 42 | pack_repo, | ||
506 | 43 | groupcompress_repo, | ||
507 | 44 | ) | ||
508 | 41 | from bzrlib.repofmt.groupcompress_repo import RepositoryFormatCHK1 | 45 | from bzrlib.repofmt.groupcompress_repo import RepositoryFormatCHK1 |
509 | 42 | from bzrlib.smart import ( | 46 | from bzrlib.smart import ( |
510 | 43 | client, | 47 | client, |
511 | @@ -556,58 +560,43 @@ | |||
512 | 556 | missing_ghost.get_inventory, 'ghost') | 560 | missing_ghost.get_inventory, 'ghost') |
513 | 557 | 561 | ||
514 | 558 | def make_write_ready_repo(self): | 562 | def make_write_ready_repo(self): |
516 | 559 | repo = self.make_repository('.', format=self.get_format()) | 563 | format = self.get_format() |
517 | 564 | if isinstance(format.repository_format, RepositoryFormatCHK1): | ||
518 | 565 | raise TestNotApplicable("No missing compression parents") | ||
519 | 566 | repo = self.make_repository('.', format=format) | ||
520 | 560 | repo.lock_write() | 567 | repo.lock_write() |
521 | 568 | self.addCleanup(repo.unlock) | ||
522 | 561 | repo.start_write_group() | 569 | repo.start_write_group() |
523 | 570 | self.addCleanup(repo.abort_write_group) | ||
524 | 562 | return repo | 571 | return repo |
525 | 563 | 572 | ||
526 | 564 | def test_missing_inventories_compression_parent_prevents_commit(self): | 573 | def test_missing_inventories_compression_parent_prevents_commit(self): |
527 | 565 | repo = self.make_write_ready_repo() | 574 | repo = self.make_write_ready_repo() |
528 | 566 | key = ('junk',) | 575 | key = ('junk',) |
529 | 567 | if not getattr(repo.inventories._index, '_missing_compression_parents', | ||
530 | 568 | None): | ||
531 | 569 | raise TestSkipped("No missing compression parents") | ||
532 | 570 | repo.inventories._index._missing_compression_parents.add(key) | 576 | repo.inventories._index._missing_compression_parents.add(key) |
533 | 571 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | 577 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) |
534 | 572 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | 578 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) |
535 | 573 | repo.abort_write_group() | ||
536 | 574 | repo.unlock() | ||
537 | 575 | 579 | ||
538 | 576 | def test_missing_revisions_compression_parent_prevents_commit(self): | 580 | def test_missing_revisions_compression_parent_prevents_commit(self): |
539 | 577 | repo = self.make_write_ready_repo() | 581 | repo = self.make_write_ready_repo() |
540 | 578 | key = ('junk',) | 582 | key = ('junk',) |
541 | 579 | if not getattr(repo.inventories._index, '_missing_compression_parents', | ||
542 | 580 | None): | ||
543 | 581 | raise TestSkipped("No missing compression parents") | ||
544 | 582 | repo.revisions._index._missing_compression_parents.add(key) | 583 | repo.revisions._index._missing_compression_parents.add(key) |
545 | 583 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | 584 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) |
546 | 584 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | 585 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) |
547 | 585 | repo.abort_write_group() | ||
548 | 586 | repo.unlock() | ||
549 | 587 | 586 | ||
550 | 588 | def test_missing_signatures_compression_parent_prevents_commit(self): | 587 | def test_missing_signatures_compression_parent_prevents_commit(self): |
551 | 589 | repo = self.make_write_ready_repo() | 588 | repo = self.make_write_ready_repo() |
552 | 590 | key = ('junk',) | 589 | key = ('junk',) |
553 | 591 | if not getattr(repo.inventories._index, '_missing_compression_parents', | ||
554 | 592 | None): | ||
555 | 593 | raise TestSkipped("No missing compression parents") | ||
556 | 594 | repo.signatures._index._missing_compression_parents.add(key) | 590 | repo.signatures._index._missing_compression_parents.add(key) |
557 | 595 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | 591 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) |
558 | 596 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | 592 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) |
559 | 597 | repo.abort_write_group() | ||
560 | 598 | repo.unlock() | ||
561 | 599 | 593 | ||
562 | 600 | def test_missing_text_compression_parent_prevents_commit(self): | 594 | def test_missing_text_compression_parent_prevents_commit(self): |
563 | 601 | repo = self.make_write_ready_repo() | 595 | repo = self.make_write_ready_repo() |
564 | 602 | key = ('some', 'junk') | 596 | key = ('some', 'junk') |
565 | 603 | if not getattr(repo.inventories._index, '_missing_compression_parents', | ||
566 | 604 | None): | ||
567 | 605 | raise TestSkipped("No missing compression parents") | ||
568 | 606 | repo.texts._index._missing_compression_parents.add(key) | 597 | repo.texts._index._missing_compression_parents.add(key) |
569 | 607 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | 598 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) |
570 | 608 | e = self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | 599 | e = self.assertRaises(errors.BzrCheckError, repo.commit_write_group) |
571 | 609 | repo.abort_write_group() | ||
572 | 610 | repo.unlock() | ||
573 | 611 | 600 | ||
574 | 612 | def test_supports_external_lookups(self): | 601 | def test_supports_external_lookups(self): |
575 | 613 | repo = self.make_repository('.', format=self.get_format()) | 602 | repo = self.make_repository('.', format=self.get_format()) |
576 | 614 | 603 | ||
577 | === modified file 'bzrlib/tests/test_repository.py' | |||
578 | --- bzrlib/tests/test_repository.py 2009-06-10 03:56:49 +0000 | |||
579 | +++ bzrlib/tests/test_repository.py 2009-06-16 02:36:36 +0000 | |||
580 | @@ -31,7 +31,10 @@ | |||
581 | 31 | UnknownFormatError, | 31 | UnknownFormatError, |
582 | 32 | UnsupportedFormatError, | 32 | UnsupportedFormatError, |
583 | 33 | ) | 33 | ) |
585 | 34 | from bzrlib import graph | 34 | from bzrlib import ( |
586 | 35 | graph, | ||
587 | 36 | tests, | ||
588 | 37 | ) | ||
589 | 35 | from bzrlib.branchbuilder import BranchBuilder | 38 | from bzrlib.branchbuilder import BranchBuilder |
590 | 36 | from bzrlib.btree_index import BTreeBuilder, BTreeGraphIndex | 39 | from bzrlib.btree_index import BTreeBuilder, BTreeGraphIndex |
591 | 37 | from bzrlib.index import GraphIndex, InMemoryGraphIndex | 40 | from bzrlib.index import GraphIndex, InMemoryGraphIndex |
592 | @@ -685,6 +688,147 @@ | |||
593 | 685 | self.assertEqual(65536, | 688 | self.assertEqual(65536, |
594 | 686 | inv.parent_id_basename_to_file_id._root_node.maximum_size) | 689 | inv.parent_id_basename_to_file_id._root_node.maximum_size) |
595 | 687 | 690 | ||
596 | 691 | def test_stream_source_to_gc(self): | ||
597 | 692 | source = self.make_repository('source', format='development6-rich-root') | ||
598 | 693 | target = self.make_repository('target', format='development6-rich-root') | ||
599 | 694 | stream = source._get_source(target._format) | ||
600 | 695 | self.assertIsInstance(stream, groupcompress_repo.GroupCHKStreamSource) | ||
601 | 696 | |||
602 | 697 | def test_stream_source_to_non_gc(self): | ||
603 | 698 | source = self.make_repository('source', format='development6-rich-root') | ||
604 | 699 | target = self.make_repository('target', format='rich-root-pack') | ||
605 | 700 | stream = source._get_source(target._format) | ||
606 | 701 | # We don't want the child GroupCHKStreamSource | ||
607 | 702 | self.assertIs(type(stream), repository.StreamSource) | ||
608 | 703 | |||
609 | 704 | def test_get_stream_for_missing_keys_includes_all_chk_refs(self): | ||
610 | 705 | source_builder = self.make_branch_builder('source', | ||
611 | 706 | format='development6-rich-root') | ||
612 | 707 | # We have to build a fairly large tree, so that we are sure the chk | ||
613 | 708 | # pages will have split into multiple pages. | ||
614 | 709 | entries = [('add', ('', 'a-root-id', 'directory', None))] | ||
615 | 710 | for i in 'abcdefghijklmnopqrstuvwxyz123456789': | ||
616 | 711 | for j in 'abcdefghijklmnopqrstuvwxyz123456789': | ||
617 | 712 | fname = i + j | ||
618 | 713 | fid = fname + '-id' | ||
619 | 714 | content = 'content for %s\n' % (fname,) | ||
620 | 715 | entries.append(('add', (fname, fid, 'file', content))) | ||
621 | 716 | source_builder.start_series() | ||
622 | 717 | source_builder.build_snapshot('rev-1', None, entries) | ||
623 | 718 | # Now change a few of them, so we get a few new pages for the second | ||
624 | 719 | # revision | ||
625 | 720 | source_builder.build_snapshot('rev-2', ['rev-1'], [ | ||
626 | 721 | ('modify', ('aa-id', 'new content for aa-id\n')), | ||
627 | 722 | ('modify', ('cc-id', 'new content for cc-id\n')), | ||
628 | 723 | ('modify', ('zz-id', 'new content for zz-id\n')), | ||
629 | 724 | ]) | ||
630 | 725 | source_builder.finish_series() | ||
631 | 726 | source_branch = source_builder.get_branch() | ||
632 | 727 | source_branch.lock_read() | ||
633 | 728 | self.addCleanup(source_branch.unlock) | ||
634 | 729 | target = self.make_repository('target', format='development6-rich-root') | ||
635 | 730 | source = source_branch.repository._get_source(target._format) | ||
636 | 731 | self.assertIsInstance(source, groupcompress_repo.GroupCHKStreamSource) | ||
637 | 732 | |||
638 | 733 | # On a regular pass, getting the inventories and chk pages for rev-2 | ||
639 | 734 | # would only get the newly created chk pages | ||
640 | 735 | search = graph.SearchResult(set(['rev-2']), set(['rev-1']), 1, | ||
641 | 736 | set(['rev-2'])) | ||
642 | 737 | simple_chk_records = [] | ||
643 | 738 | for vf_name, substream in source.get_stream(search): | ||
644 | 739 | if vf_name == 'chk_bytes': | ||
645 | 740 | for record in substream: | ||
646 | 741 | simple_chk_records.append(record.key) | ||
647 | 742 | else: | ||
648 | 743 | for _ in substream: | ||
649 | 744 | continue | ||
650 | 745 | # 3 pages, the root (InternalNode), + 2 pages which actually changed | ||
651 | 746 | self.assertEqual([('sha1:91481f539e802c76542ea5e4c83ad416bf219f73',), | ||
652 | 747 | ('sha1:4ff91971043668583985aec83f4f0ab10a907d3f',), | ||
653 | 748 | ('sha1:81e7324507c5ca132eedaf2d8414ee4bb2226187',), | ||
654 | 749 | ('sha1:b101b7da280596c71a4540e9a1eeba8045985ee0',)], | ||
655 | 750 | simple_chk_records) | ||
656 | 751 | # Now, when we do a similar call using 'get_stream_for_missing_keys' | ||
657 | 752 | # we should get a much larger set of pages. | ||
658 | 753 | missing = [('inventories', 'rev-2')] | ||
659 | 754 | full_chk_records = [] | ||
660 | 755 | for vf_name, substream in source.get_stream_for_missing_keys(missing): | ||
661 | 756 | if vf_name == 'inventories': | ||
662 | 757 | for record in substream: | ||
663 | 758 | self.assertEqual(('rev-2',), record.key) | ||
664 | 759 | elif vf_name == 'chk_bytes': | ||
665 | 760 | for record in substream: | ||
666 | 761 | full_chk_records.append(record.key) | ||
667 | 762 | else: | ||
668 | 763 | self.fail('Should not be getting a stream of %s' % (vf_name,)) | ||
669 | 764 | # We have 257 records now. This is because we have 1 root page, and 256 | ||
670 | 765 | # leaf pages in a complete listing. | ||
671 | 766 | self.assertEqual(257, len(full_chk_records)) | ||
672 | 767 | self.assertSubset(simple_chk_records, full_chk_records) | ||
673 | 768 | |||
674 | 769 | |||
675 | 770 | class TestKnitPackStreamSource(tests.TestCaseWithMemoryTransport): | ||
676 | 771 | |||
677 | 772 | def test_source_to_exact_pack_092(self): | ||
678 | 773 | source = self.make_repository('source', format='pack-0.92') | ||
679 | 774 | target = self.make_repository('target', format='pack-0.92') | ||
680 | 775 | stream_source = source._get_source(target._format) | ||
681 | 776 | self.assertIsInstance(stream_source, pack_repo.KnitPackStreamSource) | ||
682 | 777 | |||
683 | 778 | def test_source_to_exact_pack_rich_root_pack(self): | ||
684 | 779 | source = self.make_repository('source', format='rich-root-pack') | ||
685 | 780 | target = self.make_repository('target', format='rich-root-pack') | ||
686 | 781 | stream_source = source._get_source(target._format) | ||
687 | 782 | self.assertIsInstance(stream_source, pack_repo.KnitPackStreamSource) | ||
688 | 783 | |||
689 | 784 | def test_source_to_exact_pack_19(self): | ||
690 | 785 | source = self.make_repository('source', format='1.9') | ||
691 | 786 | target = self.make_repository('target', format='1.9') | ||
692 | 787 | stream_source = source._get_source(target._format) | ||
693 | 788 | self.assertIsInstance(stream_source, pack_repo.KnitPackStreamSource) | ||
694 | 789 | |||
695 | 790 | def test_source_to_exact_pack_19_rich_root(self): | ||
696 | 791 | source = self.make_repository('source', format='1.9-rich-root') | ||
697 | 792 | target = self.make_repository('target', format='1.9-rich-root') | ||
698 | 793 | stream_source = source._get_source(target._format) | ||
699 | 794 | self.assertIsInstance(stream_source, pack_repo.KnitPackStreamSource) | ||
700 | 795 | |||
701 | 796 | def test_source_to_remote_exact_pack_19(self): | ||
702 | 797 | trans = self.make_smart_server('target') | ||
703 | 798 | trans.ensure_base() | ||
704 | 799 | source = self.make_repository('source', format='1.9') | ||
705 | 800 | target = self.make_repository('target', format='1.9') | ||
706 | 801 | target = repository.Repository.open(trans.base) | ||
707 | 802 | stream_source = source._get_source(target._format) | ||
708 | 803 | self.assertIsInstance(stream_source, pack_repo.KnitPackStreamSource) | ||
709 | 804 | |||
710 | 805 | def test_stream_source_to_non_exact(self): | ||
711 | 806 | source = self.make_repository('source', format='pack-0.92') | ||
712 | 807 | target = self.make_repository('target', format='1.9') | ||
713 | 808 | stream = source._get_source(target._format) | ||
714 | 809 | self.assertIs(type(stream), repository.StreamSource) | ||
715 | 810 | |||
716 | 811 | def test_stream_source_to_non_exact_rich_root(self): | ||
717 | 812 | source = self.make_repository('source', format='1.9') | ||
718 | 813 | target = self.make_repository('target', format='1.9-rich-root') | ||
719 | 814 | stream = source._get_source(target._format) | ||
720 | 815 | self.assertIs(type(stream), repository.StreamSource) | ||
721 | 816 | |||
722 | 817 | def test_source_to_remote_non_exact_pack_19(self): | ||
723 | 818 | trans = self.make_smart_server('target') | ||
724 | 819 | trans.ensure_base() | ||
725 | 820 | source = self.make_repository('source', format='1.9') | ||
726 | 821 | target = self.make_repository('target', format='1.6') | ||
727 | 822 | target = repository.Repository.open(trans.base) | ||
728 | 823 | stream_source = source._get_source(target._format) | ||
729 | 824 | self.assertIs(type(stream_source), repository.StreamSource) | ||
730 | 825 | |||
731 | 826 | def test_stream_source_to_knit(self): | ||
732 | 827 | source = self.make_repository('source', format='pack-0.92') | ||
733 | 828 | target = self.make_repository('target', format='dirstate') | ||
734 | 829 | stream = source._get_source(target._format) | ||
735 | 830 | self.assertIs(type(stream), repository.StreamSource) | ||
736 | 831 | |||
737 | 688 | 832 | ||
738 | 689 | class TestDevelopment6FindParentIdsOfRevisions(TestCaseWithTransport): | 833 | class TestDevelopment6FindParentIdsOfRevisions(TestCaseWithTransport): |
739 | 690 | """Tests for _find_parent_ids_of_revisions.""" | 834 | """Tests for _find_parent_ids_of_revisions.""" |
740 | @@ -1204,84 +1348,3 @@ | |||
741 | 1204 | self.assertTrue(new_pack.inventory_index._optimize_for_size) | 1348 | self.assertTrue(new_pack.inventory_index._optimize_for_size) |
742 | 1205 | self.assertTrue(new_pack.text_index._optimize_for_size) | 1349 | self.assertTrue(new_pack.text_index._optimize_for_size) |
743 | 1206 | self.assertTrue(new_pack.signature_index._optimize_for_size) | 1350 | self.assertTrue(new_pack.signature_index._optimize_for_size) |
744 | 1207 | |||
745 | 1208 | |||
746 | 1209 | class TestGCCHKPackCollection(TestCaseWithTransport): | ||
747 | 1210 | |||
748 | 1211 | def test_stream_source_to_gc(self): | ||
749 | 1212 | source = self.make_repository('source', format='development6-rich-root') | ||
750 | 1213 | target = self.make_repository('target', format='development6-rich-root') | ||
751 | 1214 | stream = source._get_source(target._format) | ||
752 | 1215 | self.assertIsInstance(stream, groupcompress_repo.GroupCHKStreamSource) | ||
753 | 1216 | |||
754 | 1217 | def test_stream_source_to_non_gc(self): | ||
755 | 1218 | source = self.make_repository('source', format='development6-rich-root') | ||
756 | 1219 | target = self.make_repository('target', format='rich-root-pack') | ||
757 | 1220 | stream = source._get_source(target._format) | ||
758 | 1221 | # We don't want the child GroupCHKStreamSource | ||
759 | 1222 | self.assertIs(type(stream), repository.StreamSource) | ||
760 | 1223 | |||
761 | 1224 | def test_get_stream_for_missing_keys_includes_all_chk_refs(self): | ||
762 | 1225 | source_builder = self.make_branch_builder('source', | ||
763 | 1226 | format='development6-rich-root') | ||
764 | 1227 | # We have to build a fairly large tree, so that we are sure the chk | ||
765 | 1228 | # pages will have split into multiple pages. | ||
766 | 1229 | entries = [('add', ('', 'a-root-id', 'directory', None))] | ||
767 | 1230 | for i in 'abcdefghijklmnopqrstuvwxyz123456789': | ||
768 | 1231 | for j in 'abcdefghijklmnopqrstuvwxyz123456789': | ||
769 | 1232 | fname = i + j | ||
770 | 1233 | fid = fname + '-id' | ||
771 | 1234 | content = 'content for %s\n' % (fname,) | ||
772 | 1235 | entries.append(('add', (fname, fid, 'file', content))) | ||
773 | 1236 | source_builder.start_series() | ||
774 | 1237 | source_builder.build_snapshot('rev-1', None, entries) | ||
775 | 1238 | # Now change a few of them, so we get a few new pages for the second | ||
776 | 1239 | # revision | ||
777 | 1240 | source_builder.build_snapshot('rev-2', ['rev-1'], [ | ||
778 | 1241 | ('modify', ('aa-id', 'new content for aa-id\n')), | ||
779 | 1242 | ('modify', ('cc-id', 'new content for cc-id\n')), | ||
780 | 1243 | ('modify', ('zz-id', 'new content for zz-id\n')), | ||
781 | 1244 | ]) | ||
782 | 1245 | source_builder.finish_series() | ||
783 | 1246 | source_branch = source_builder.get_branch() | ||
784 | 1247 | source_branch.lock_read() | ||
785 | 1248 | self.addCleanup(source_branch.unlock) | ||
786 | 1249 | target = self.make_repository('target', format='development6-rich-root') | ||
787 | 1250 | source = source_branch.repository._get_source(target._format) | ||
788 | 1251 | self.assertIsInstance(source, groupcompress_repo.GroupCHKStreamSource) | ||
789 | 1252 | |||
790 | 1253 | # On a regular pass, getting the inventories and chk pages for rev-2 | ||
791 | 1254 | # would only get the newly created chk pages | ||
792 | 1255 | search = graph.SearchResult(set(['rev-2']), set(['rev-1']), 1, | ||
793 | 1256 | set(['rev-2'])) | ||
794 | 1257 | simple_chk_records = [] | ||
795 | 1258 | for vf_name, substream in source.get_stream(search): | ||
796 | 1259 | if vf_name == 'chk_bytes': | ||
797 | 1260 | for record in substream: | ||
798 | 1261 | simple_chk_records.append(record.key) | ||
799 | 1262 | else: | ||
800 | 1263 | for _ in substream: | ||
801 | 1264 | continue | ||
802 | 1265 | # 3 pages, the root (InternalNode), + 2 pages which actually changed | ||
803 | 1266 | self.assertEqual([('sha1:91481f539e802c76542ea5e4c83ad416bf219f73',), | ||
804 | 1267 | ('sha1:4ff91971043668583985aec83f4f0ab10a907d3f',), | ||
805 | 1268 | ('sha1:81e7324507c5ca132eedaf2d8414ee4bb2226187',), | ||
806 | 1269 | ('sha1:b101b7da280596c71a4540e9a1eeba8045985ee0',)], | ||
807 | 1270 | simple_chk_records) | ||
808 | 1271 | # Now, when we do a similar call using 'get_stream_for_missing_keys' | ||
809 | 1272 | # we should get a much larger set of pages. | ||
810 | 1273 | missing = [('inventories', 'rev-2')] | ||
811 | 1274 | full_chk_records = [] | ||
812 | 1275 | for vf_name, substream in source.get_stream_for_missing_keys(missing): | ||
813 | 1276 | if vf_name == 'inventories': | ||
814 | 1277 | for record in substream: | ||
815 | 1278 | self.assertEqual(('rev-2',), record.key) | ||
816 | 1279 | elif vf_name == 'chk_bytes': | ||
817 | 1280 | for record in substream: | ||
818 | 1281 | full_chk_records.append(record.key) | ||
819 | 1282 | else: | ||
820 | 1283 | self.fail('Should not be getting a stream of %s' % (vf_name,)) | ||
821 | 1284 | # We have 257 records now. This is because we have 1 root page, and 256 | ||
822 | 1285 | # leaf pages in a complete listing. | ||
823 | 1286 | self.assertEqual(257, len(full_chk_records)) | ||
824 | 1287 | self.assertSubset(simple_chk_records, full_chk_records) |
This proposal changes how pack <=> pack fetching triggers.
It removes the InterPackRepo optimizer (which uses Packer internally) in favor of a new KnitPackStreamS ource.
The new source is a very streamlined version of StreamSource, which doesn't attempt to handle all the different cross-format issues. It only supports exact format fetching, and does so in a nice streamlined fashion.
Specifically, it sends data as (signatures, revisions, inventories, texts) since it knows we have atomic insertion.
It walks the inventory pages a single time, and extracts the text keys as the fetch is going, rather than doing so in a pre-read fetch. This is a moderate win for dump transport fetching (versus StreamSource, but not InterPackRepo) because it avoids reading the Inventory pages twice.
It also fixes a bug with the current InterPackRepo code. Namely, the Packer code was recently changed to make sure that all file_keys that are referenced are fetched, rather than only the ones mentioned in the specific revisions being fetched. This was done at ~ the same time as the updates to file_ids_ altered_ by... However, in updating that, it was not updated to read the parent inventories and remove their text keys.
This meant that if you got a fulltext inventory, you would end up copying the data for all texts in that revision, whether they were modified or not. For bzr.dev, this meant that it often downloaded ~3MB of extra data for a small change. I considered fixing Packer to handle this, but I figured we wanted to move to StreamSource as the one-and-only method for fetching anyway.
I also did a little bit of changes to make it clearer when a set of something was *keys* (tuples) and when it was *ids* (strings).
I also moved some of the helpers that were added as part of the gc-stacking patch, into the base Repository class, so that I could simply re-use them.