Merge lp:~leonidborisenko/bzr-hg/hacking into lp:bzr-hg
- hacking
- Merge into trunk
Status: | Merged | ||||
---|---|---|---|---|---|
Merged at revision: | 417 | ||||
Proposed branch: | lp:~leonidborisenko/bzr-hg/hacking | ||||
Merge into: | lp:bzr-hg | ||||
Diff against target: |
240 lines (+177/-8) 3 files modified
fetch.py (+16/-3) overlay.py (+1/-1) tests/test_fetch.py (+160/-4) |
||||
To merge this branch: | bzr merge lp:~leonidborisenko/bzr-hg/hacking | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Jelmer Vernooij (community) | Needs Fixing | ||
Review via email: mp+44319@code.launchpad.net |
Commit message
Description of the change
Leonid Borisenko (leonidborisenko) wrote : | # |
Hello Jelmer,
On 21.12.2010 13:49, Jelmer Vernooij wrote:
> On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
>> differences between files attachment (review-diff.txt)
>> === modified file 'fetch.py'
>> --- fetch.py 2010-12-18 18:28:46 +0000
>> +++ fetch.py 2010-12-21 09:17:08 +0000
>> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_
>> + pb):
>> i = 0
>> # Texts
>> while 1:
>> @@ -361,7 +370,11 @@
>> fileid = mapping.
>> chunkiter = mercurial.
>> def get_text(node):
>> - key = iter(filetext_
>> + if node in filetext_
>> + key = iter(filetext_
>> + else:
>> + key = self._get_
>> + fileid, first_imported_
> This assumes there is some significance in the first revision that's
> imported. This happens to be the case in your test case, but is not
> necessarily true.
>
> Perhaps we can pass csid to get_text and then use that to look up the
> matching revision id and use that to find the proper full text?
I think this is don't improve the solution.
Any cases of getting full file text that are satsified by knowledge of
current revision id or revision ids of it's parents in boundaries of
current fetching session are already handled right.
There is variable fulltext_cache in function
<email address hidden> and this variable (in phase of importing
texts) is the mapping from file parent nodeid to file text.
All texts in boundaries of fetching session are imported strictly
sequencely in 'for chunk in chunk_iter: ...' cycle of unpack_chunk_iter
function. After importing in target repository full file text is
remembered in fulltext_cache.
So if revision with required file parent nodeid was fetched in current
session, then text will be taken directly from fulltext_cache.
So the only cases, that are handled by my patch, are ones that needs
looking into revisions that are in target repository, but was imported
in any previous fetching sessions. These revisions are exactly ancestry
of first revision imported in current fetching session.
Am I missing some important point here? I really can't imagine the
case in which fulltext_cache miss means that:
* full file text can be restored with knowledge of csid (assuming
that csid is the equivalent of the id of second or third or ... imported
revision)
* AND in process of restoring full text it doesn't getting to id of
first imported revision.
>> +
>> + # Pull commited changeset to Bazaar branch.
>> + #
>> + # Prefer named function instead lambda to slightly more informative
>> + # fail message.
>> + def pull_to_
>> + bzrtree.
>> +
>> + # Not expected KeyError looked like:
>> + #
>> + # <...full traceback skipped...>
>> + # File "/tmp/hg/fetch.py", line 208, in manifest...
Jelmer Vernooij (jelmer) wrote : | # |
Hi Leonid,
On Tue, 2010-12-21 at 21:43 +0000, Leonid Borisenko wrote:
> On 21.12.2010 13:49, Jelmer Vernooij wrote:
> > On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
> >> differences between files attachment (review-diff.txt)
> >> === modified file 'fetch.py'
> >> --- fetch.py 2010-12-18 18:28:46 +0000
> >> +++ fetch.py 2010-12-21 09:17:08 +0000
> >> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_
> >> + pb):
> >> i = 0
> >> # Texts
> >> while 1:
> >> @@ -361,7 +370,11 @@
> >> fileid = mapping.
> >> chunkiter = mercurial.
> >> def get_text(node):
> >> - key = iter(filetext_
> >> + if node in filetext_
> >> + key = iter(filetext_
> >> + else:
> >> + key = self._get_
> >> + fileid, first_imported_
> > This assumes there is some significance in the first revision that's
> > imported. This happens to be the case in your test case, but is not
> > necessarily true.
> >
> > Perhaps we can pass csid to get_text and then use that to look up the
> > matching revision id and use that to find the proper full text?
> I think this is don't improve the solution.
> Any cases of getting full file text that are satsified by knowledge of
> current revision id or revision ids of it's parents in boundaries of
> current fetching session are already handled right.
> There is variable fulltext_cache in function
> <email address hidden> and this variable (in phase of importing
> texts) is the mapping from file parent nodeid to file text.
>
> All texts in boundaries of fetching session are imported strictly
> sequencely in 'for chunk in chunk_iter: ...' cycle of unpack_chunk_iter
> function. After importing in target repository full file text is
> remembered in fulltext_cache.
> So if revision with required file parent nodeid was fetched in current
> session, then text will be taken directly from fulltext_cache.
> So the only cases, that are handled by my patch, are ones that needs
> looking into revisions that are in target repository, but was imported
> in any previous fetching sessions. These revisions are exactly ancestry
> of first revision imported in current fetching session.
That's not necessarily true. Assume the following graph:
A
| \
B C
| |
D F
|/
E
If the first fetch pulls in up to C and D, and the second fetch pulls in
E then the first imported revision will probably be D but the parent of
F will be C - which is not the first imported revision, nor will it be
in its ancestry.
> Am I missing some important point here? I really can't imagine the
> case in which fulltext_cache miss means that:
> * full file text can be restored with knowledge of csid (assuming
> that csid is the equivalent of the id of second or third or ... imported
> revision)
I specifically meant csid of one of the texts that was /not/ fetched
during this run but a...
- 331. By Leonid Borisenko
-
Remove testtools import from test.
- 332. By Leonid Borisenko
-
Cosmetic fix of tests.
Make use of uniformly quoted strings.
- 333. By Leonid Borisenko
-
Remove unnecessary creating of directory in tests.
mercurial.
localrepo. localrepository class creates unexisted directory by
itself in constructor. - 334. By Leonid Borisenko
-
Add tests for incremental fetching of non-linear history.
Leonid Borisenko (leonidborisenko) wrote : | # |
Hello Jelmer,
On 22.12.2010 20:41, Jelmer Vernooij wrote:
> On Tue, 2010-12-21 at 21:43 +0000, Leonid Borisenko wrote:
> [skipped]
>> So the only cases, that are handled by my patch, are ones that needs
>> looking into revisions that are in target repository, but was imported
>> in any previous fetching sessions. These revisions are exactly ancestry
>> of first revision imported in current fetching session.
> That's not necessarily true. Assume the following graph:
>
> A
> | \
> B C
> | |
> D F
> |/
> E
>
> If the first fetch pulls in up to C and D, and the second fetch pulls in
> E then the first imported revision will probably be D but the parent of
> F will be C - which is not the first imported revision, nor will it be
> in its ancestry.
Thanks for example.
I've tried to reproduce this graph in test cases (look at my updated
branch) and one of two cases revealed new problem. It crashes at
importing manifests (i.e. before importing texts), but root of bug seems
the same (it concerns with filetext_map variable).
So my solution should be rethinked and reworked.
> [skipped]
>>>> + # Not expected KeyError looked like:
>>>> + #
>>>> + # <...full traceback skipped...>
>>>> + # File "/tmp/hg/fetch.py", line 208, in manifest_
>>>> + # (fileid, ie.revision))
>>>> + # KeyError: ('hg:f1', 'hg-v1:
>>>> + self.assertThat
>>>> + Not(raises(
>>> I think we should simply just call bzrtree.
>>> have a pattern of using testtools anywhere else in bzr-hg, and the error
>>> message it prints in this case is very unhelpful (it eats the
>>> backtrace).
>>
>> Ok, I'll get rid of testtools import and using of imported matchers.
>>
>> But I think that just calling 'bzrtree.
>> test case too generous (so I've intentionally set assertThat expression
>> to express strict expectation of error and narrow the test case).
>>
>> Does it make any sense? Or just leaving the comment about not expected
>> KeyError near calling 'bzrtree.pull' is sufficient?
> I don't see how not explicitly checking for that exception is too
> "generous". If that function does raise an exception then the test
> runner will report it. What does explicitly checking for that exception
> add?
>
> IMHO just leaving the comment is sufficient.
OK, I've implemented this.
Preview Diff
1 | === modified file 'fetch.py' |
2 | --- fetch.py 2010-12-18 18:28:46 +0000 |
3 | +++ fetch.py 2010-12-23 20:21:49 +0000 |
4 | @@ -349,7 +349,16 @@ |
5 | return self._symlink_targets[key] |
6 | return self._target_overlay.get_file_fulltext(key) |
7 | |
8 | - def _unpack_texts(self, cg, mapping, filetext_map, pb): |
9 | + def _get_target_fulltext_key_from_revision_ancestry(self, fileid, revid): |
10 | + graph = self.target.get_graph() |
11 | + revision_parents_ids = self._revisions[revid].parent_ids |
12 | + for ancestor_revid, _ in graph.iter_ancestry(revision_parents_ids): |
13 | + ids = self.target.fileids_altered_by_revision_ids([ancestor_revid]) |
14 | + if fileid in ids: |
15 | + return fileid, ancestor_revid |
16 | + |
17 | + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid, |
18 | + pb): |
19 | i = 0 |
20 | # Texts |
21 | while 1: |
22 | @@ -361,7 +370,11 @@ |
23 | fileid = mapping.generate_file_id(f) |
24 | chunkiter = mercurial.changegroup.chunkiter(cg) |
25 | def get_text(node): |
26 | - key = iter(filetext_map[fileid][node]).next() |
27 | + if node in filetext_map[fileid]: |
28 | + key = iter(filetext_map[fileid][node]).next() |
29 | + else: |
30 | + key = self._get_target_fulltext_key_from_revision_ancestry( |
31 | + fileid, first_imported_revid) |
32 | return self._get_target_fulltext(key) |
33 | for fulltext, hgkey, hgparents, csid in unpack_chunk_iter(chunkiter, get_text): |
34 | for revision, (kind, parents) in filetext_map[fileid][hgkey].iteritems(): |
35 | @@ -529,7 +542,7 @@ |
36 | pb = ui.ui_factory.nested_progress_bar() |
37 | try: |
38 | self.target.texts.insert_record_stream( |
39 | - self._unpack_texts(cg, mapping, filetext_map, pb)) |
40 | + self._unpack_texts(cg, mapping, filetext_map, todo[0], pb)) |
41 | finally: |
42 | pb.finished() |
43 | # Adding actual data |
44 | |
45 | === modified file 'overlay.py' |
46 | --- overlay.py 2009-11-26 21:33:32 +0000 |
47 | +++ overlay.py 2010-12-23 20:21:49 +0000 |
48 | @@ -199,7 +199,7 @@ |
49 | revid = self._lookup_revision_by_manifest_id(manifest_id) |
50 | return self.get_manifest_text_by_revid(revid) |
51 | |
52 | - def _get_file_fulltext(self, key): |
53 | + def get_file_fulltext(self, key): |
54 | ret = "".join(self.repo.iter_files_bytes([key + (None,)]).next()[1]) |
55 | if ret == "": # could be a symlink |
56 | ie = self.repo.get_inventory(key[1])[key[0]] |
57 | |
58 | === modified file 'tests/test_fetch.py' |
59 | --- tests/test_fetch.py 2010-12-19 23:00:50 +0000 |
60 | +++ tests/test_fetch.py 2010-12-23 20:21:49 +0000 |
61 | @@ -20,6 +20,7 @@ |
62 | from bzrlib.plugins.hg.ui import ui as hgui |
63 | from bzrlib.tests import TestCaseWithTransport |
64 | |
65 | +from mercurial import hg |
66 | import mercurial.localrepo |
67 | |
68 | class TestFetching(TestCaseWithTransport): |
69 | @@ -42,12 +43,167 @@ |
70 | hgrepo.commit("Remove file f2, so parent directories d2, d1 are empty") |
71 | |
72 | # Import history from Mercurial repository into Bazaar repository. |
73 | - bzrtree = self.make_branch_and_tree('bzr') |
74 | - hgdir = HgControlDirFormat().open(self.get_transport('hg')) |
75 | + bzrtree = self.make_branch_and_tree("bzr") |
76 | + hgdir = HgControlDirFormat().open(self.get_transport("hg")) |
77 | bzrtree.pull(hgdir.open_branch()) |
78 | |
79 | # As file f2 was deleted, directories d1 and d2 should not exists. |
80 | - self.failIfExists('bzr/d1') |
81 | + self.failIfExists("bzr/d1") |
82 | |
83 | # Self-assurance check that history was really imported. |
84 | - self.failUnlessExists('bzr/f1') |
85 | + self.failUnlessExists("bzr/f1") |
86 | + |
87 | + def test_getting_existing_text_metadata(self): |
88 | + # Create Mercurial repository and Bazaar branch to import into. |
89 | + hgrepo = mercurial.localrepo.localrepository(hgui(), "hg", create=True) |
90 | + hgdir = HgControlDirFormat().open(self.get_transport("hg")) |
91 | + hgbranch = hgdir.open_branch() |
92 | + bzrtree = self.make_branch_and_tree("bzr") |
93 | + |
94 | + # Create file 'f1' in Mercurial repository, commit it |
95 | + # and pull commited changeset to Bazaar branch. |
96 | + self.build_tree_contents([("hg/f1", "Initial content")]) |
97 | + hgrepo[None].add(["f1"]) |
98 | + hgrepo.commit("Initial commit") |
99 | + bzrtree.pull(hgbranch) |
100 | + |
101 | + # Change content of file 'f1' in Mercurial repository and commit |
102 | + # change. |
103 | + self.build_tree_contents([("hg/f1", "Changed content")]) |
104 | + hgrepo.commit("Change content of file f1") |
105 | + |
106 | + # Pull commited changeset to Bazaar branch. |
107 | + # |
108 | + # Should not raise KeyError which looked like: |
109 | + # |
110 | + # <...full traceback skipped...> |
111 | + # File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta |
112 | + # (fileid, ie.revision)) |
113 | + # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719') |
114 | + bzrtree.pull(hgbranch) |
115 | + |
116 | + # Self-assurance check that changesets was really pulled in. |
117 | + self.assertFileEqual("Changed content", "bzr/f1") |
118 | + |
119 | + def test_incremental_fetching_of_repository_with_non_conflict_merge(self): |
120 | + # Create Mercurial configuration and override possible definition |
121 | + # of external interactive merge tools. |
122 | + ui = hgui() |
123 | + ui.setconfig("ui", "merge", "internal:merge") |
124 | + |
125 | + # Create Mercurial repository and Bazaar branch to import into. |
126 | + hgrepo = mercurial.localrepo.localrepository(ui, "hg", create=True) |
127 | + hgdir = HgControlDirFormat().open(self.get_transport("hg")) |
128 | + hgbranch = hgdir.open_branch() |
129 | + bzrtree = self.make_branch_and_tree("bzr") |
130 | + |
131 | + # Create history graph in Mercurial repository |
132 | + # (history flows from left to right): |
133 | + # |
134 | + # A--B--D |
135 | + # | |
136 | + # \--C |
137 | + self.build_tree_contents([("hg/f1", "f1")]) |
138 | + hgrepo[None].add(["f1"]) |
139 | + hgrepo.commit("A (initial commit; first commit to main branch)") |
140 | + self.build_tree_contents([("hg/f2", "f2")]) |
141 | + hgrepo[None].add(["f2"]) |
142 | + hgrepo.commit("B (second commit to main branch)") |
143 | + hg.update(hgrepo, 0) |
144 | + self.build_tree_contents([("hg/f3", "f3")]) |
145 | + hgrepo[None].add(["f3"]) |
146 | + hgrepo.commit("C (first commit to secondary branch)") |
147 | + hg.update(hgrepo, 1) |
148 | + self.build_tree_contents([("hg/f4", "f4")]) |
149 | + hgrepo[None].add(["f4"]) |
150 | + hgrepo.commit("D (third commit to main branch)") |
151 | + |
152 | + # Pull commited changesets to Bazaar branch. |
153 | + bzrtree.pull(hgbranch) |
154 | + |
155 | + # Continue history graph in Mercurial repository |
156 | + # (history flows from up to down): |
157 | + # |
158 | + # a--b--d--E |
159 | + # | | |
160 | + # \--c--F-/ |
161 | + hg.update(hgrepo, 2) |
162 | + self.build_tree_contents([("hg/f5", "f5")]) |
163 | + hgrepo[None].add(["f5"]) |
164 | + hgrepo.commit("F (second commit to secondary branch)") |
165 | + hg.update(hgrepo, 3) |
166 | + hg.merge(hgrepo, 4) |
167 | + hgrepo.commit("E (commit merge of main branch with secondary branch)") |
168 | + |
169 | + # Pull commited changesets to Bazaar branch. |
170 | + bzrtree.pull(hgbranch) |
171 | + |
172 | + # Self-assurance check that all changesets was really pulled in. |
173 | + for i in range(1, 6): |
174 | + file_content = "f%d" % i |
175 | + file_path = "bzr/%s" % file_content |
176 | + self.assertFileEqual(file_content, file_path) |
177 | + |
178 | + def test_incremental_fetching_of_repository_with_conflict_merge(self): |
179 | + # Create Mercurial configuration and override possible definition |
180 | + # of external interactive merge tools. |
181 | + ui = hgui() |
182 | + ui.setconfig("ui", "merge", "internal:local") |
183 | + |
184 | + # Create Mercurial repository and Bazaar branch to import into. |
185 | + hgrepo = mercurial.localrepo.localrepository(ui, "hg", create=True) |
186 | + hgdir = HgControlDirFormat().open(self.get_transport("hg")) |
187 | + hgbranch = hgdir.open_branch() |
188 | + bzrtree = self.make_branch_and_tree("bzr") |
189 | + |
190 | + # Create history graph with conflict in Mercurial repository |
191 | + # (history flows from left to right, conflict made at commits B and C): |
192 | + # |
193 | + # A--B--D |
194 | + # | |
195 | + # \--C |
196 | + self.build_tree_contents([("hg/f1", "f1")]) |
197 | + hgrepo[None].add(["f1"]) |
198 | + hgrepo.commit("A (initial commit; first commit to main branch)") |
199 | + self.build_tree_contents([("hg/conflict_file", "Main branch")]) |
200 | + hgrepo[None].add(["conflict_file"]) |
201 | + hgrepo.commit("B (second commit to main branch)") |
202 | + hg.update(hgrepo, 0) |
203 | + self.build_tree_contents([("hg/conflict_file", "Secondary branch")]) |
204 | + hgrepo[None].add(["conflict_file"]) |
205 | + hgrepo.commit("C (first commit to secondary branch)") |
206 | + hg.update(hgrepo, 1) |
207 | + self.build_tree_contents([("hg/f2", "f2")]) |
208 | + hgrepo[None].add(["f2"]) |
209 | + hgrepo.commit("D (third commit to main branch)") |
210 | + |
211 | + # Pull commited changesets to Bazaar branch. |
212 | + bzrtree.pull(hgbranch) |
213 | + |
214 | + # Continue history graph in Mercurial repository |
215 | + # (history flows from up to down): |
216 | + # |
217 | + # a--b--d--E |
218 | + # | | |
219 | + # \--c--F-/ |
220 | + hg.update(hgrepo, 2) |
221 | + self.build_tree_contents([("hg/f3", "f3")]) |
222 | + hgrepo[None].add(["f3"]) |
223 | + hgrepo.commit("F (second commit to secondary branch)") |
224 | + hg.update(hgrepo, 3) |
225 | + hg.merge(hgrepo, 4) |
226 | + self.build_tree_contents([("hg/conflict_file", |
227 | + "Main branch\nSecondary branch")]) |
228 | + hgrepo.commit("E (commit merge of main branch with secondary branch)") |
229 | + |
230 | + # Pull commited changesets to Bazaar branch. |
231 | + bzrtree.pull(hgbranch) |
232 | + |
233 | + # Self-assurance check that all changesets was really pulled in. |
234 | + for i in range(1, 4): |
235 | + file_content = "f%d" % i |
236 | + file_path = "bzr/%s" % file_content |
237 | + self.assertFileEqual(file_content, file_path) |
238 | + |
239 | + self.assertFileEqual("bzr/conflict_file", |
240 | + "Main branch\nSecondary branch") |
Hi Leonid,
Thanks for the MP, and for adding a test along with your changes. I
don't the fix is quite right yet though:
review needsfixing
On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote: /bugs.launchpad .net/bugs/ 692901 revid, generate_ file_id( f) changegroup. chunkiter( cg) map[fileid] [node]) .next() map[fileid] : map[fileid] [node]) .next() target_ fulltext_ key_from_ revision_ ancestry( revid)
> Leonid Borisenko has proposed merging lp:~leonidborisenko/bzr-hg/hacking into lp:bzr-hg.
>
> Requested reviews:
> bzr-hg developers (bzr-hg)
> Related bugs:
> #692901 Crash with KeyError in incremental mirroring
> https:/
>
> differences between files attachment (review-diff.txt)
> === modified file 'fetch.py'
> --- fetch.py 2010-12-18 18:28:46 +0000
> +++ fetch.py 2010-12-21 09:17:08 +0000
> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_
> + pb):
> i = 0
> # Texts
> while 1:
> @@ -361,7 +370,11 @@
> fileid = mapping.
> chunkiter = mercurial.
> def get_text(node):
> - key = iter(filetext_
> + if node in filetext_
> + key = iter(filetext_
> + else:
> + key = self._get_
> + fileid, first_imported_
This assumes there is some significance in the first revision that's
imported. This happens to be the case in your test case, but is not
necessarily true.
Perhaps we can pass csid to get_text and then use that to look up the
matching revision id and use that to find the proper full text?
> + bzr_repo_ with_existing_ text_metadata( ): pull(hgbranch) to_inventory_ delta 97562cfbcf3b26e 7eacf17ca9b6f74 2f98bd0719' ) (pull_to_ bzr_repo_ with_existing_ text_metadata, KeyError) )) pull(hgbranch) here. We don't
> + # Pull commited changeset to Bazaar branch.
> + #
> + # Prefer named function instead lambda to slightly more informative
> + # fail message.
> + def pull_to_
> + bzrtree.
> +
> + # Not expected KeyError looked like:
> + #
> + # <...full traceback skipped...>
> + # File "/tmp/hg/fetch.py", line 208, in manifest_
> + # (fileid, ie.revision))
> + # KeyError: ('hg:f1', 'hg-v1:
> + self.assertThat
> + Not(raises(
I think we should simply just call bzrtree.
have a pattern of using testtools anywhere else in bzr-hg, and the error
message it prints in this case is very unhelpful (it eats the
backtrace).
Cheers,
Jelmer