Merge lp:~leonidborisenko/bzr-hg/hacking into lp:bzr-hg

Proposed by Leonid Borisenko
Status: Merged
Merged at revision: 417
Proposed branch: lp:~leonidborisenko/bzr-hg/hacking
Merge into: lp:bzr-hg
Diff against target: 240 lines (+177/-8)
3 files modified
fetch.py (+16/-3)
overlay.py (+1/-1)
tests/test_fetch.py (+160/-4)
To merge this branch: bzr merge lp:~leonidborisenko/bzr-hg/hacking
Reviewer Review Type Date Requested Status
Jelmer Vernooij (community) Needs Fixing
Review via email: mp+44319@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Hi Leonid,

Thanks for the MP, and for adding a test along with your changes. I
don't the fix is quite right yet though:

  review needsfixing

On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
> Leonid Borisenko has proposed merging lp:~leonidborisenko/bzr-hg/hacking into lp:bzr-hg.
>
> Requested reviews:
> bzr-hg developers (bzr-hg)
> Related bugs:
> #692901 Crash with KeyError in incremental mirroring
> https://bugs.launchpad.net/bugs/692901
>
> differences between files attachment (review-diff.txt)
> === modified file 'fetch.py'
> --- fetch.py 2010-12-18 18:28:46 +0000
> +++ fetch.py 2010-12-21 09:17:08 +0000
> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
> + pb):
> i = 0
> # Texts
> while 1:
> @@ -361,7 +370,11 @@
> fileid = mapping.generate_file_id(f)
> chunkiter = mercurial.changegroup.chunkiter(cg)
> def get_text(node):
> - key = iter(filetext_map[fileid][node]).next()
> + if node in filetext_map[fileid]:
> + key = iter(filetext_map[fileid][node]).next()
> + else:
> + key = self._get_target_fulltext_key_from_revision_ancestry(
> + fileid, first_imported_revid)
This assumes there is some significance in the first revision that's
imported. This happens to be the case in your test case, but is not
necessarily true.

Perhaps we can pass csid to get_text and then use that to look up the
matching revision id and use that to find the proper full text?

> +
> + # Pull commited changeset to Bazaar branch.
> + #
> + # Prefer named function instead lambda to slightly more informative
> + # fail message.
> + def pull_to_bzr_repo_with_existing_text_metadata():
> + bzrtree.pull(hgbranch)
> +
> + # Not expected KeyError looked like:
> + #
> + # <...full traceback skipped...>
> + # File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
> + # (fileid, ie.revision))
> + # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
> + self.assertThat(pull_to_bzr_repo_with_existing_text_metadata,
> + Not(raises(KeyError)))
I think we should simply just call bzrtree.pull(hgbranch) here. We don't
have a pattern of using testtools anywhere else in bzr-hg, and the error
message it prints in this case is very unhelpful (it eats the
backtrace).

Cheers,

Jelmer

review: Needs Fixing
Revision history for this message
Leonid Borisenko (leonidborisenko) wrote :
Download full text (4.0 KiB)

Hello Jelmer,

On 21.12.2010 13:49, Jelmer Vernooij wrote:
> On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
>> differences between files attachment (review-diff.txt)
>> === modified file 'fetch.py'
>> --- fetch.py 2010-12-18 18:28:46 +0000
>> +++ fetch.py 2010-12-21 09:17:08 +0000
>> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
>> + pb):
>> i = 0
>> # Texts
>> while 1:
>> @@ -361,7 +370,11 @@
>> fileid = mapping.generate_file_id(f)
>> chunkiter = mercurial.changegroup.chunkiter(cg)
>> def get_text(node):
>> - key = iter(filetext_map[fileid][node]).next()
>> + if node in filetext_map[fileid]:
>> + key = iter(filetext_map[fileid][node]).next()
>> + else:
>> + key = self._get_target_fulltext_key_from_revision_ancestry(
>> + fileid, first_imported_revid)
> This assumes there is some significance in the first revision that's
> imported. This happens to be the case in your test case, but is not
> necessarily true.
>
> Perhaps we can pass csid to get_text and then use that to look up the
> matching revision id and use that to find the proper full text?

  I think this is don't improve the solution.

  Any cases of getting full file text that are satsified by knowledge of
current revision id or revision ids of it's parents in boundaries of
current fetching session are already handled right.

  There is variable fulltext_cache in function
<email address hidden> and this variable (in phase of importing
texts) is the mapping from file parent nodeid to file text.

  All texts in boundaries of fetching session are imported strictly
sequencely in 'for chunk in chunk_iter: ...' cycle of unpack_chunk_iter
function. After importing in target repository full file text is
remembered in fulltext_cache.

  So if revision with required file parent nodeid was fetched in current
session, then text will be taken directly from fulltext_cache.

  So the only cases, that are handled by my patch, are ones that needs
looking into revisions that are in target repository, but was imported
in any previous fetching sessions. These revisions are exactly ancestry
of first revision imported in current fetching session.

  Am I missing some important point here? I really can't imagine the
case in which fulltext_cache miss means that:
    * full file text can be restored with knowledge of csid (assuming
that csid is the equivalent of the id of second or third or ... imported
revision)
    * AND in process of restoring full text it doesn't getting to id of
first imported revision.

>> +
>> + # Pull commited changeset to Bazaar branch.
>> + #
>> + # Prefer named function instead lambda to slightly more informative
>> + # fail message.
>> + def pull_to_bzr_repo_with_existing_text_metadata():
>> + bzrtree.pull(hgbranch)
>> +
>> + # Not expected KeyError looked like:
>> + #
>> + # <...full traceback skipped...>
>> + # File "/tmp/hg/fetch.py", line 208, in manifest...

Read more...

Revision history for this message
Jelmer Vernooij (jelmer) wrote :
Download full text (5.0 KiB)

Hi Leonid,

On Tue, 2010-12-21 at 21:43 +0000, Leonid Borisenko wrote:

> On 21.12.2010 13:49, Jelmer Vernooij wrote:
> > On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
> >> differences between files attachment (review-diff.txt)
> >> === modified file 'fetch.py'
> >> --- fetch.py 2010-12-18 18:28:46 +0000
> >> +++ fetch.py 2010-12-21 09:17:08 +0000
> >> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
> >> + pb):
> >> i = 0
> >> # Texts
> >> while 1:
> >> @@ -361,7 +370,11 @@
> >> fileid = mapping.generate_file_id(f)
> >> chunkiter = mercurial.changegroup.chunkiter(cg)
> >> def get_text(node):
> >> - key = iter(filetext_map[fileid][node]).next()
> >> + if node in filetext_map[fileid]:
> >> + key = iter(filetext_map[fileid][node]).next()
> >> + else:
> >> + key = self._get_target_fulltext_key_from_revision_ancestry(
> >> + fileid, first_imported_revid)
> > This assumes there is some significance in the first revision that's
> > imported. This happens to be the case in your test case, but is not
> > necessarily true.
> >
> > Perhaps we can pass csid to get_text and then use that to look up the
> > matching revision id and use that to find the proper full text?
> I think this is don't improve the solution.

> Any cases of getting full file text that are satsified by knowledge of
> current revision id or revision ids of it's parents in boundaries of
> current fetching session are already handled right.

> There is variable fulltext_cache in function
> <email address hidden> and this variable (in phase of importing
> texts) is the mapping from file parent nodeid to file text.
>
> All texts in boundaries of fetching session are imported strictly
> sequencely in 'for chunk in chunk_iter: ...' cycle of unpack_chunk_iter
> function. After importing in target repository full file text is
> remembered in fulltext_cache.

> So if revision with required file parent nodeid was fetched in current
> session, then text will be taken directly from fulltext_cache.

> So the only cases, that are handled by my patch, are ones that needs
> looking into revisions that are in target repository, but was imported
> in any previous fetching sessions. These revisions are exactly ancestry
> of first revision imported in current fetching session.
That's not necessarily true. Assume the following graph:

A
| \
B C
| |
D F
|/
E

If the first fetch pulls in up to C and D, and the second fetch pulls in
E then the first imported revision will probably be D but the parent of
F will be C - which is not the first imported revision, nor will it be
in its ancestry.

> Am I missing some important point here? I really can't imagine the
> case in which fulltext_cache miss means that:
> * full file text can be restored with knowledge of csid (assuming
> that csid is the equivalent of the id of second or third or ... imported
> revision)
I specifically meant csid of one of the texts that was /not/ fetched
during this run but a...

Read more...

lp:~leonidborisenko/bzr-hg/hacking updated
331. By Leonid Borisenko

Remove testtools import from test.

332. By Leonid Borisenko

Cosmetic fix of tests.

Make use of uniformly quoted strings.

333. By Leonid Borisenko

Remove unnecessary creating of directory in tests.

mercurial.localrepo.localrepository class creates unexisted directory by
itself in constructor.

334. By Leonid Borisenko

Add tests for incremental fetching of non-linear history.

Revision history for this message
Leonid Borisenko (leonidborisenko) wrote :

Hello Jelmer,

On 22.12.2010 20:41, Jelmer Vernooij wrote:
> On Tue, 2010-12-21 at 21:43 +0000, Leonid Borisenko wrote:
> [skipped]
>> So the only cases, that are handled by my patch, are ones that needs
>> looking into revisions that are in target repository, but was imported
>> in any previous fetching sessions. These revisions are exactly ancestry
>> of first revision imported in current fetching session.
> That's not necessarily true. Assume the following graph:
>
> A
> | \
> B C
> | |
> D F
> |/
> E
>
> If the first fetch pulls in up to C and D, and the second fetch pulls in
> E then the first imported revision will probably be D but the parent of
> F will be C - which is not the first imported revision, nor will it be
> in its ancestry.

  Thanks for example.
  I've tried to reproduce this graph in test cases (look at my updated
branch) and one of two cases revealed new problem. It crashes at
importing manifests (i.e. before importing texts), but root of bug seems
the same (it concerns with filetext_map variable).
  So my solution should be rethinked and reworked.

> [skipped]
>>>> + # Not expected KeyError looked like:
>>>> + #
>>>> + # <...full traceback skipped...>
>>>> + # File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
>>>> + # (fileid, ie.revision))
>>>> + # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
>>>> + self.assertThat(pull_to_bzr_repo_with_existing_text_metadata,
>>>> + Not(raises(KeyError)))
>>> I think we should simply just call bzrtree.pull(hgbranch) here. We don't
>>> have a pattern of using testtools anywhere else in bzr-hg, and the error
>>> message it prints in this case is very unhelpful (it eats the
>>> backtrace).
>>
>> Ok, I'll get rid of testtools import and using of imported matchers.
>>
>> But I think that just calling 'bzrtree.pull(hgbranch)' making this
>> test case too generous (so I've intentionally set assertThat expression
>> to express strict expectation of error and narrow the test case).
>>
>> Does it make any sense? Or just leaving the comment about not expected
>> KeyError near calling 'bzrtree.pull' is sufficient?
> I don't see how not explicitly checking for that exception is too
> "generous". If that function does raise an exception then the test
> runner will report it. What does explicitly checking for that exception
> add?
>
> IMHO just leaving the comment is sufficient.

  OK, I've implemented this.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'fetch.py'
2--- fetch.py 2010-12-18 18:28:46 +0000
3+++ fetch.py 2010-12-23 20:21:49 +0000
4@@ -349,7 +349,16 @@
5 return self._symlink_targets[key]
6 return self._target_overlay.get_file_fulltext(key)
7
8- def _unpack_texts(self, cg, mapping, filetext_map, pb):
9+ def _get_target_fulltext_key_from_revision_ancestry(self, fileid, revid):
10+ graph = self.target.get_graph()
11+ revision_parents_ids = self._revisions[revid].parent_ids
12+ for ancestor_revid, _ in graph.iter_ancestry(revision_parents_ids):
13+ ids = self.target.fileids_altered_by_revision_ids([ancestor_revid])
14+ if fileid in ids:
15+ return fileid, ancestor_revid
16+
17+ def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
18+ pb):
19 i = 0
20 # Texts
21 while 1:
22@@ -361,7 +370,11 @@
23 fileid = mapping.generate_file_id(f)
24 chunkiter = mercurial.changegroup.chunkiter(cg)
25 def get_text(node):
26- key = iter(filetext_map[fileid][node]).next()
27+ if node in filetext_map[fileid]:
28+ key = iter(filetext_map[fileid][node]).next()
29+ else:
30+ key = self._get_target_fulltext_key_from_revision_ancestry(
31+ fileid, first_imported_revid)
32 return self._get_target_fulltext(key)
33 for fulltext, hgkey, hgparents, csid in unpack_chunk_iter(chunkiter, get_text):
34 for revision, (kind, parents) in filetext_map[fileid][hgkey].iteritems():
35@@ -529,7 +542,7 @@
36 pb = ui.ui_factory.nested_progress_bar()
37 try:
38 self.target.texts.insert_record_stream(
39- self._unpack_texts(cg, mapping, filetext_map, pb))
40+ self._unpack_texts(cg, mapping, filetext_map, todo[0], pb))
41 finally:
42 pb.finished()
43 # Adding actual data
44
45=== modified file 'overlay.py'
46--- overlay.py 2009-11-26 21:33:32 +0000
47+++ overlay.py 2010-12-23 20:21:49 +0000
48@@ -199,7 +199,7 @@
49 revid = self._lookup_revision_by_manifest_id(manifest_id)
50 return self.get_manifest_text_by_revid(revid)
51
52- def _get_file_fulltext(self, key):
53+ def get_file_fulltext(self, key):
54 ret = "".join(self.repo.iter_files_bytes([key + (None,)]).next()[1])
55 if ret == "": # could be a symlink
56 ie = self.repo.get_inventory(key[1])[key[0]]
57
58=== modified file 'tests/test_fetch.py'
59--- tests/test_fetch.py 2010-12-19 23:00:50 +0000
60+++ tests/test_fetch.py 2010-12-23 20:21:49 +0000
61@@ -20,6 +20,7 @@
62 from bzrlib.plugins.hg.ui import ui as hgui
63 from bzrlib.tests import TestCaseWithTransport
64
65+from mercurial import hg
66 import mercurial.localrepo
67
68 class TestFetching(TestCaseWithTransport):
69@@ -42,12 +43,167 @@
70 hgrepo.commit("Remove file f2, so parent directories d2, d1 are empty")
71
72 # Import history from Mercurial repository into Bazaar repository.
73- bzrtree = self.make_branch_and_tree('bzr')
74- hgdir = HgControlDirFormat().open(self.get_transport('hg'))
75+ bzrtree = self.make_branch_and_tree("bzr")
76+ hgdir = HgControlDirFormat().open(self.get_transport("hg"))
77 bzrtree.pull(hgdir.open_branch())
78
79 # As file f2 was deleted, directories d1 and d2 should not exists.
80- self.failIfExists('bzr/d1')
81+ self.failIfExists("bzr/d1")
82
83 # Self-assurance check that history was really imported.
84- self.failUnlessExists('bzr/f1')
85+ self.failUnlessExists("bzr/f1")
86+
87+ def test_getting_existing_text_metadata(self):
88+ # Create Mercurial repository and Bazaar branch to import into.
89+ hgrepo = mercurial.localrepo.localrepository(hgui(), "hg", create=True)
90+ hgdir = HgControlDirFormat().open(self.get_transport("hg"))
91+ hgbranch = hgdir.open_branch()
92+ bzrtree = self.make_branch_and_tree("bzr")
93+
94+ # Create file 'f1' in Mercurial repository, commit it
95+ # and pull commited changeset to Bazaar branch.
96+ self.build_tree_contents([("hg/f1", "Initial content")])
97+ hgrepo[None].add(["f1"])
98+ hgrepo.commit("Initial commit")
99+ bzrtree.pull(hgbranch)
100+
101+ # Change content of file 'f1' in Mercurial repository and commit
102+ # change.
103+ self.build_tree_contents([("hg/f1", "Changed content")])
104+ hgrepo.commit("Change content of file f1")
105+
106+ # Pull commited changeset to Bazaar branch.
107+ #
108+ # Should not raise KeyError which looked like:
109+ #
110+ # <...full traceback skipped...>
111+ # File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
112+ # (fileid, ie.revision))
113+ # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
114+ bzrtree.pull(hgbranch)
115+
116+ # Self-assurance check that changesets was really pulled in.
117+ self.assertFileEqual("Changed content", "bzr/f1")
118+
119+ def test_incremental_fetching_of_repository_with_non_conflict_merge(self):
120+ # Create Mercurial configuration and override possible definition
121+ # of external interactive merge tools.
122+ ui = hgui()
123+ ui.setconfig("ui", "merge", "internal:merge")
124+
125+ # Create Mercurial repository and Bazaar branch to import into.
126+ hgrepo = mercurial.localrepo.localrepository(ui, "hg", create=True)
127+ hgdir = HgControlDirFormat().open(self.get_transport("hg"))
128+ hgbranch = hgdir.open_branch()
129+ bzrtree = self.make_branch_and_tree("bzr")
130+
131+ # Create history graph in Mercurial repository
132+ # (history flows from left to right):
133+ #
134+ # A--B--D
135+ # |
136+ # \--C
137+ self.build_tree_contents([("hg/f1", "f1")])
138+ hgrepo[None].add(["f1"])
139+ hgrepo.commit("A (initial commit; first commit to main branch)")
140+ self.build_tree_contents([("hg/f2", "f2")])
141+ hgrepo[None].add(["f2"])
142+ hgrepo.commit("B (second commit to main branch)")
143+ hg.update(hgrepo, 0)
144+ self.build_tree_contents([("hg/f3", "f3")])
145+ hgrepo[None].add(["f3"])
146+ hgrepo.commit("C (first commit to secondary branch)")
147+ hg.update(hgrepo, 1)
148+ self.build_tree_contents([("hg/f4", "f4")])
149+ hgrepo[None].add(["f4"])
150+ hgrepo.commit("D (third commit to main branch)")
151+
152+ # Pull commited changesets to Bazaar branch.
153+ bzrtree.pull(hgbranch)
154+
155+ # Continue history graph in Mercurial repository
156+ # (history flows from up to down):
157+ #
158+ # a--b--d--E
159+ # | |
160+ # \--c--F-/
161+ hg.update(hgrepo, 2)
162+ self.build_tree_contents([("hg/f5", "f5")])
163+ hgrepo[None].add(["f5"])
164+ hgrepo.commit("F (second commit to secondary branch)")
165+ hg.update(hgrepo, 3)
166+ hg.merge(hgrepo, 4)
167+ hgrepo.commit("E (commit merge of main branch with secondary branch)")
168+
169+ # Pull commited changesets to Bazaar branch.
170+ bzrtree.pull(hgbranch)
171+
172+ # Self-assurance check that all changesets was really pulled in.
173+ for i in range(1, 6):
174+ file_content = "f%d" % i
175+ file_path = "bzr/%s" % file_content
176+ self.assertFileEqual(file_content, file_path)
177+
178+ def test_incremental_fetching_of_repository_with_conflict_merge(self):
179+ # Create Mercurial configuration and override possible definition
180+ # of external interactive merge tools.
181+ ui = hgui()
182+ ui.setconfig("ui", "merge", "internal:local")
183+
184+ # Create Mercurial repository and Bazaar branch to import into.
185+ hgrepo = mercurial.localrepo.localrepository(ui, "hg", create=True)
186+ hgdir = HgControlDirFormat().open(self.get_transport("hg"))
187+ hgbranch = hgdir.open_branch()
188+ bzrtree = self.make_branch_and_tree("bzr")
189+
190+ # Create history graph with conflict in Mercurial repository
191+ # (history flows from left to right, conflict made at commits B and C):
192+ #
193+ # A--B--D
194+ # |
195+ # \--C
196+ self.build_tree_contents([("hg/f1", "f1")])
197+ hgrepo[None].add(["f1"])
198+ hgrepo.commit("A (initial commit; first commit to main branch)")
199+ self.build_tree_contents([("hg/conflict_file", "Main branch")])
200+ hgrepo[None].add(["conflict_file"])
201+ hgrepo.commit("B (second commit to main branch)")
202+ hg.update(hgrepo, 0)
203+ self.build_tree_contents([("hg/conflict_file", "Secondary branch")])
204+ hgrepo[None].add(["conflict_file"])
205+ hgrepo.commit("C (first commit to secondary branch)")
206+ hg.update(hgrepo, 1)
207+ self.build_tree_contents([("hg/f2", "f2")])
208+ hgrepo[None].add(["f2"])
209+ hgrepo.commit("D (third commit to main branch)")
210+
211+ # Pull commited changesets to Bazaar branch.
212+ bzrtree.pull(hgbranch)
213+
214+ # Continue history graph in Mercurial repository
215+ # (history flows from up to down):
216+ #
217+ # a--b--d--E
218+ # | |
219+ # \--c--F-/
220+ hg.update(hgrepo, 2)
221+ self.build_tree_contents([("hg/f3", "f3")])
222+ hgrepo[None].add(["f3"])
223+ hgrepo.commit("F (second commit to secondary branch)")
224+ hg.update(hgrepo, 3)
225+ hg.merge(hgrepo, 4)
226+ self.build_tree_contents([("hg/conflict_file",
227+ "Main branch\nSecondary branch")])
228+ hgrepo.commit("E (commit merge of main branch with secondary branch)")
229+
230+ # Pull commited changesets to Bazaar branch.
231+ bzrtree.pull(hgbranch)
232+
233+ # Self-assurance check that all changesets was really pulled in.
234+ for i in range(1, 4):
235+ file_content = "f%d" % i
236+ file_path = "bzr/%s" % file_content
237+ self.assertFileEqual(file_content, file_path)
238+
239+ self.assertFileEqual("bzr/conflict_file",
240+ "Main branch\nSecondary branch")

Subscribers

People subscribed via source and target branches