Merge into trunk : hacking : Code : Bazaar Hg Plugin

Status:

Merged

Merged at revision:

417

Proposed branch:

lp:~leonidborisenko/bzr-hg/hacking

Merge into:

lp:bzr-hg

Diff against target:

240 lines (+177/-8)

3 files modified

fetch.py (+16/-3)
overlay.py (+1/-1)
tests/test_fetch.py (+160/-4)

To merge this branch:

bzr merge lp:~leonidborisenko/bzr-hg/hacking

High

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Jelmer Vernooij (community)		2010-12-21	Needs Fixing on 2010-12-21
Review via email: mp+44319@code.launchpad.net

Revision history for this message

Jelmer Vernooij (jelmer) wrote on 2010-12-21:

#

Hi Leonid,

Thanks for the MP, and for adding a test along with your changes. I
don't the fix is quite right yet though:

review needsfixing

On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
> Leonid Borisenko has proposed merging lp:~leonidborisenko/bzr-hg/hacking into lp:bzr-hg.
>
> Requested reviews:
> bzr-hg developers (bzr-hg)
> Related bugs:
> #692901 Crash with KeyError in incremental mirroring
> https://bugs.launchpad.net/bugs/692901
>
> differences between files attachment (review-diff.txt)
> === modified file 'fetch.py'
> --- fetch.py 2010-12-18 18:28:46 +0000
> +++ fetch.py 2010-12-21 09:17:08 +0000
> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
> + pb):
> i = 0
> # Texts
> while 1:
> @@ -361,7 +370,11 @@
> fileid = mapping.generate_file_id(f)
> chunkiter = mercurial.changegroup.chunkiter(cg)
> def get_text(node):
> - key = iter(filetext_map[fileid][node]).next()
> + if node in filetext_map[fileid]:
> + key = iter(filetext_map[fileid][node]).next()
> + else:
> + key = self._get_target_fulltext_key_from_revision_ancestry(
> + fileid, first_imported_revid)
This assumes there is some significance in the first revision that's
imported. This happens to be the case in your test case, but is not
necessarily true.

Perhaps we can pass csid to get_text and then use that to look up the
matching revision id and use that to find the proper full text?

> +
> + # Pull commited changeset to Bazaar branch.
> + #
> + # Prefer named function instead lambda to slightly more informative
> + # fail message.
> + def pull_to_bzr_repo_with_existing_text_metadata():
> + bzrtree.pull(hgbranch)
> +
> + # Not expected KeyError looked like:
> + #
> + # <...full traceback skipped...>
> + # File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
> + # (fileid, ie.revision))
> + # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
> + self.assertThat(pull_to_bzr_repo_with_existing_text_metadata,
> + Not(raises(KeyError)))
I think we should simply just call bzrtree.pull(hgbranch) here. We don't
have a pattern of using testtools anywhere else in bzr-hg, and the error
message it prints in this case is very unhelpful (it eats the
backtrace).

Cheers,

Jelmer

Hi Leonid,

Thanks for the MP, and for adding a test along with your changes. I
don't the fix is quite right yet though:

review needsfixing

On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
> Leonid Borisenko has proposed merging lp:~leonidborisenko/bzr-hg/hacking into lp:bzr-hg.
> 
> Requested reviews:
>   bzr-hg developers (bzr-hg)
> Related bugs:
>   #692901 Crash with KeyError in incremental mirroring
>   https://bugs.launchpad.net/bugs/692901
> 
> differences between files attachment (review-diff.txt)
> === modified file 'fetch.py'
> --- fetch.py	2010-12-18 18:28:46 +0000
> +++ fetch.py	2010-12-21 09:17:08 +0000
> +    def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
> +                      pb):
>          i = 0
>          # Texts
>          while 1:
> @@ -361,7 +370,11 @@
>              fileid = mapping.generate_file_id(f)
>              chunkiter = mercurial.changegroup.chunkiter(cg)
>              def get_text(node):
> -                key = iter(filetext_map[fileid][node]).next()
> +                if node in filetext_map[fileid]:
> +                    key = iter(filetext_map[fileid][node]).next()
> +                else:
> +                    key = self._get_target_fulltext_key_from_revision_ancestry(
> +                               fileid, first_imported_revid)
This assumes there is some significance in the first revision that's
imported. This happens to be the case in your test case, but is not
necessarily true.

Perhaps we can pass csid to get_text and then use that to look up the
matching revision id and use that to find the proper full text?

> +
> +        # Pull commited changeset to Bazaar branch.
> +        #
> +        # Prefer named function instead lambda to slightly more informative
> +        # fail message.
> +        def pull_to_bzr_repo_with_existing_text_metadata():
> +            bzrtree.pull(hgbranch)
> +
> +        # Not expected KeyError looked like:
> +        #
> +        #  <...full traceback skipped...>
> +        #  File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
> +        #      (fileid, ie.revision))
> +        # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
> +        self.assertThat(pull_to_bzr_repo_with_existing_text_metadata,
> +                        Not(raises(KeyError)))
I think we should simply just call bzrtree.pull(hgbranch) here. We don't
have a pattern of using testtools anywhere else in bzr-hg, and the error
message it prints in this case is very unhelpful (it eats the
backtrace).

Cheers,

Jelmer

review: Needs Fixing

Revision history for this message

Leonid Borisenko (leonidborisenko) wrote on 2010-12-21:

#

Download full text (4.0 KiB)

Hello Jelmer,

On 21.12.2010 13:49, Jelmer Vernooij wrote:
> On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
>> differences between files attachment (review-diff.txt)
>> === modified file 'fetch.py'
>> --- fetch.py 2010-12-18 18:28:46 +0000
>> +++ fetch.py 2010-12-21 09:17:08 +0000
>> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
>> + pb):
>> i = 0
>> # Texts
>> while 1:
>> @@ -361,7 +370,11 @@
>> fileid = mapping.generate_file_id(f)
>> chunkiter = mercurial.changegroup.chunkiter(cg)
>> def get_text(node):
>> - key = iter(filetext_map[fileid][node]).next()
>> + if node in filetext_map[fileid]:
>> + key = iter(filetext_map[fileid][node]).next()
>> + else:
>> + key = self._get_target_fulltext_key_from_revision_ancestry(
>> + fileid, first_imported_revid)
> This assumes there is some significance in the first revision that's
> imported. This happens to be the case in your test case, but is not
> necessarily true.
>
> Perhaps we can pass csid to get_text and then use that to look up the
> matching revision id and use that to find the proper full text?

I think this is don't improve the solution.

Any cases of getting full file text that are satsified by knowledge of
current revision id or revision ids of it's parents in boundaries of
current fetching session are already handled right.

There is variable fulltext_cache in function
<email address hidden> and this variable (in phase of importing
texts) is the mapping from file parent nodeid to file text.

All texts in boundaries of fetching session are imported strictly
sequencely in 'for chunk in chunk_iter: ...' cycle of unpack_chunk_iter
function. After importing in target repository full file text is
remembered in fulltext_cache.

So if revision with required file parent nodeid was fetched in current
session, then text will be taken directly from fulltext_cache.

So the only cases, that are handled by my patch, are ones that needs
looking into revisions that are in target repository, but was imported
in any previous fetching sessions. These revisions are exactly ancestry
of first revision imported in current fetching session.

  Am I missing some important point here? I really can't imagine the
case in which fulltext_cache miss means that:
    * full file text can be restored with knowledge of csid (assuming
that csid is the equivalent of the id of second or third or ... imported
revision)
    * AND in process of restoring full text it doesn't getting to id of
first imported revision.

>> +
>> + # Pull commited changeset to Bazaar branch.
>> + #
>> + # Prefer named function instead lambda to slightly more informative
>> + # fail message.
>> + def pull_to_bzr_repo_with_existing_text_metadata():
>> + bzrtree.pull(hgbranch)
>> +
>> + # Not expected KeyError looked like:
>> + #
>> + # <...full traceback skipped...>
>> + # File "/tmp/hg/fetch.py", line 208, in manifest...

Hello Jelmer,

On 21.12.2010 13:49, Jelmer Vernooij wrote:
> On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
>> differences between files attachment (review-diff.txt)
>> === modified file 'fetch.py'
>> --- fetch.py	2010-12-18 18:28:46 +0000
>> +++ fetch.py	2010-12-21 09:17:08 +0000
>> +    def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
>> +                      pb):
>>          i = 0
>>          # Texts
>>          while 1:
>> @@ -361,7 +370,11 @@
>>              fileid = mapping.generate_file_id(f)
>>              chunkiter = mercurial.changegroup.chunkiter(cg)
>>              def get_text(node):
>> -                key = iter(filetext_map[fileid][node]).next()
>> +                if node in filetext_map[fileid]:
>> +                    key = iter(filetext_map[fileid][node]).next()
>> +                else:
>> +                    key = self._get_target_fulltext_key_from_revision_ancestry(
>> +                               fileid, first_imported_revid)
> This assumes there is some significance in the first revision that's
> imported. This happens to be the case in your test case, but is not
> necessarily true.
> 
> Perhaps we can pass csid to get_text and then use that to look up the
> matching revision id and use that to find the proper full text?

I think this is don't improve the solution.

Any cases of getting full file text that are satsified by knowledge of
current revision id or revision ids of it's parents in boundaries of
current fetching session are already handled right.

There is variable fulltext_cache in function
unpack_chunk_iter@parsers.py and this variable (in phase of importing
texts) is the mapping from file parent nodeid to file text.

All texts in boundaries of fetching session are imported strictly
sequencely in 'for chunk in chunk_iter: ...' cycle of unpack_chunk_iter
function. After importing in target repository full file text is
remembered in fulltext_cache.

So if revision with required file parent nodeid was fetched in current
session, then text will be taken directly from fulltext_cache.

So the only cases, that are handled by my patch, are ones that needs
looking into revisions that are in target repository, but was imported
in any previous fetching sessions. These revisions are exactly ancestry
of first revision imported in current fetching session.

Am I missing some important point here? I really can't imagine the
case in which fulltext_cache miss means that:
    * full file text can be restored with knowledge of csid (assuming
that csid is the equivalent of the id of second or third or ... imported
revision)
    * AND in process of restoring full text it doesn't getting to id of
first imported revision.

>> +
>> +        # Pull commited changeset to Bazaar branch.
>> +        #
>> +        # Prefer named function instead lambda to slightly more informative
>> +        # fail message.
>> +        def pull_to_bzr_repo_with_existing_text_metadata():
>> +            bzrtree.pull(hgbranch)
>> +
>> +        # Not expected KeyError looked like:
>> +        #
>> +        #  <...full traceback skipped...>
>> +        #  File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
>> +        #      (fileid, ie.revision))
>> +        # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
>> +        self.assertThat(pull_to_bzr_repo_with_existing_text_metadata,
>> +                        Not(raises(KeyError)))
> I think we should simply just call bzrtree.pull(hgbranch) here. We don't
> have a pattern of using testtools anywhere else in bzr-hg, and the error
> message it prints in this case is very unhelpful (it eats the
> backtrace).

Ok, I'll get rid of testtools import and using of imported matchers.

But I think that just calling 'bzrtree.pull(hgbranch)' making this
test case too generous (so I've intentionally set assertThat expression
to express strict expectation of error and narrow the test case).

Does it make any sense? Or just leaving the comment about not expected
KeyError near calling 'bzrtree.pull' is sufficient?

Revision history for this message

Jelmer Vernooij (jelmer) wrote on 2010-12-22:

#

Download full text (5.0 KiB)

Hi Leonid,

On Tue, 2010-12-21 at 21:43 +0000, Leonid Borisenko wrote:

> On 21.12.2010 13:49, Jelmer Vernooij wrote:
> > On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
> >> differences between files attachment (review-diff.txt)
> >> === modified file 'fetch.py'
> >> --- fetch.py 2010-12-18 18:28:46 +0000
> >> +++ fetch.py 2010-12-21 09:17:08 +0000
> >> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
> >> + pb):
> >> i = 0
> >> # Texts
> >> while 1:
> >> @@ -361,7 +370,11 @@
> >> fileid = mapping.generate_file_id(f)
> >> chunkiter = mercurial.changegroup.chunkiter(cg)
> >> def get_text(node):
> >> - key = iter(filetext_map[fileid][node]).next()
> >> + if node in filetext_map[fileid]:
> >> + key = iter(filetext_map[fileid][node]).next()
> >> + else:
> >> + key = self._get_target_fulltext_key_from_revision_ancestry(
> >> + fileid, first_imported_revid)
> > This assumes there is some significance in the first revision that's
> > imported. This happens to be the case in your test case, but is not
> > necessarily true.
> >
> > Perhaps we can pass csid to get_text and then use that to look up the
> > matching revision id and use that to find the proper full text?
> I think this is don't improve the solution.

> Any cases of getting full file text that are satsified by knowledge of
> current revision id or revision ids of it's parents in boundaries of
> current fetching session are already handled right.

> There is variable fulltext_cache in function
> <email address hidden> and this variable (in phase of importing
> texts) is the mapping from file parent nodeid to file text.
>
> All texts in boundaries of fetching session are imported strictly
> sequencely in 'for chunk in chunk_iter: ...' cycle of unpack_chunk_iter
> function. After importing in target repository full file text is
> remembered in fulltext_cache.

> So if revision with required file parent nodeid was fetched in current
> session, then text will be taken directly from fulltext_cache.

> So the only cases, that are handled by my patch, are ones that needs
> looking into revisions that are in target repository, but was imported
> in any previous fetching sessions. These revisions are exactly ancestry
> of first revision imported in current fetching session.
That's not necessarily true. Assume the following graph:

A
| \
B C
| |
D F
|/
E

If the first fetch pulls in up to C and D, and the second fetch pulls in
E then the first imported revision will probably be D but the parent of
F will be C - which is not the first imported revision, nor will it be
in its ancestry.

> Am I missing some important point here? I really can't imagine the
> case in which fulltext_cache miss means that:
> * full file text can be restored with knowledge of csid (assuming
> that csid is the equivalent of the id of second or third or ... imported
> revision)
I specifically meant csid of one of the texts that was /not/ fetched
during this run but a...

Hi Leonid,

On Tue, 2010-12-21 at 21:43 +0000, Leonid Borisenko wrote:

> On 21.12.2010 13:49, Jelmer Vernooij wrote:
> > On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
> >> differences between files attachment (review-diff.txt)
> >> === modified file 'fetch.py'
> >> --- fetch.py	2010-12-18 18:28:46 +0000
> >> +++ fetch.py	2010-12-21 09:17:08 +0000
> >> +    def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
> >> +                      pb):
> >>          i = 0
> >>          # Texts
> >>          while 1:
> >> @@ -361,7 +370,11 @@
> >>              fileid = mapping.generate_file_id(f)
> >>              chunkiter = mercurial.changegroup.chunkiter(cg)
> >>              def get_text(node):
> >> -                key = iter(filetext_map[fileid][node]).next()
> >> +                if node in filetext_map[fileid]:
> >> +                    key = iter(filetext_map[fileid][node]).next()
> >> +                else:
> >> +                    key = self._get_target_fulltext_key_from_revision_ancestry(
> >> +                               fileid, first_imported_revid)
> > This assumes there is some significance in the first revision that's
> > imported. This happens to be the case in your test case, but is not
> > necessarily true.
> > 
> > Perhaps we can pass csid to get_text and then use that to look up the
> > matching revision id and use that to find the proper full text?
>   I think this is don't improve the solution.

>   Any cases of getting full file text that are satsified by knowledge of
> current revision id or revision ids of it's parents in boundaries of
> current fetching session are already handled right.

> There is variable fulltext_cache in function
> unpack_chunk_iter@parsers.py and this variable (in phase of importing
> texts) is the mapping from file parent nodeid to file text.
> 
>   All texts in boundaries of fetching session are imported strictly
> sequencely in 'for chunk in chunk_iter: ...' cycle of unpack_chunk_iter
> function. After importing in target repository full file text is
> remembered in fulltext_cache.

>   So if revision with required file parent nodeid was fetched in current
> session, then text will be taken directly from fulltext_cache.

>   So the only cases, that are handled by my patch, are ones that needs
> looking into revisions that are in target repository, but was imported
> in any previous fetching sessions. These revisions are exactly ancestry
> of first revision imported in current fetching session.
That's not necessarily true. Assume the following graph:

A
| \
B C
| |
D F
|/
E

If the first fetch pulls in up to C and D, and the second fetch pulls in
E then the first imported revision will probably be D but the parent of
F will be C - which is not the first imported revision, nor will it be
in its ancestry.

>   Am I missing some important point here? I really can't imagine the
> case in which fulltext_cache miss means that:
>     * full file text can be restored with knowledge of csid (assuming
> that csid is the equivalent of the id of second or third or ... imported
> revision)
I specifically meant csid of one of the texts that was /not/ fetched
during this run but already present in the local repository.

>     * AND in process of restoring full text it doesn't getting to id of
> first imported revision.
See above for a situation in which that breaks.

> >> +
> >> +        # Pull commited changeset to Bazaar branch.
> >> +        #
> >> +        # Prefer named function instead lambda to slightly more informative
> >> +        # fail message.
> >> +        def pull_to_bzr_repo_with_existing_text_metadata():
> >> +            bzrtree.pull(hgbranch)
> >> +
> >> +        # Not expected KeyError looked like:
> >> +        #
> >> +        #  <...full traceback skipped...>
> >> +        #  File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
> >> +        #      (fileid, ie.revision))
> >> +        # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
> >> +        self.assertThat(pull_to_bzr_repo_with_existing_text_metadata,
> >> +                        Not(raises(KeyError)))
> > I think we should simply just call bzrtree.pull(hgbranch) here. We don't
> > have a pattern of using testtools anywhere else in bzr-hg, and the error
> > message it prints in this case is very unhelpful (it eats the
> > backtrace).
> 
>   Ok, I'll get rid of testtools import and using of imported matchers.
> 
>   But I think that just calling 'bzrtree.pull(hgbranch)' making this
> test case too generous (so I've intentionally set assertThat expression
> to express strict expectation of error and narrow the test case).
> 
>   Does it make any sense? Or just leaving the comment about not expected
> KeyError near calling 'bzrtree.pull' is sufficient?
I don't see how not explicitly checking for that exception is too
"generous". If that function does raise an exception then the test
runner will report it. What does explicitly checking for that exception
add?

IMHO just leaving the comment is sufficient.

Cheers,

Jelmer

lp:~leonidborisenko/bzr-hg/hacking updated on 2010-12-23

331. By Leonid Borisenko on 2010-12-23

Remove testtools import from test.

332. By Leonid Borisenko on 2010-12-23

Cosmetic fix of tests.

Make use of uniformly quoted strings.

333. By Leonid Borisenko on 2010-12-23

Remove unnecessary creating of directory in tests.

mercurial.localrepo.localrepository class creates unexisted directory by
itself in constructor.

334. By Leonid Borisenko on 2010-12-23

Add tests for incremental fetching of non-linear history.

Revision history for this message

Leonid Borisenko (leonidborisenko) wrote on 2010-12-23:

#

Hello Jelmer,

On 22.12.2010 20:41, Jelmer Vernooij wrote:
> On Tue, 2010-12-21 at 21:43 +0000, Leonid Borisenko wrote:
> [skipped]
>> So the only cases, that are handled by my patch, are ones that needs
>> looking into revisions that are in target repository, but was imported
>> in any previous fetching sessions. These revisions are exactly ancestry
>> of first revision imported in current fetching session.
> That's not necessarily true. Assume the following graph:
>
> A
> | \
> B C
> | |
> D F
> |/
> E
>
> If the first fetch pulls in up to C and D, and the second fetch pulls in
> E then the first imported revision will probably be D but the parent of
> F will be C - which is not the first imported revision, nor will it be
> in its ancestry.

  Thanks for example.
  I've tried to reproduce this graph in test cases (look at my updated
branch) and one of two cases revealed new problem. It crashes at
importing manifests (i.e. before importing texts), but root of bug seems
the same (it concerns with filetext_map variable).
  So my solution should be rethinked and reworked.

> [skipped]
>>>> + # Not expected KeyError looked like:
>>>> + #
>>>> + # <...full traceback skipped...>
>>>> + # File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
>>>> + # (fileid, ie.revision))
>>>> + # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
>>>> + self.assertThat(pull_to_bzr_repo_with_existing_text_metadata,
>>>> + Not(raises(KeyError)))
>>> I think we should simply just call bzrtree.pull(hgbranch) here. We don't
>>> have a pattern of using testtools anywhere else in bzr-hg, and the error
>>> message it prints in this case is very unhelpful (it eats the
>>> backtrace).
>>
>> Ok, I'll get rid of testtools import and using of imported matchers.
>>
>> But I think that just calling 'bzrtree.pull(hgbranch)' making this
>> test case too generous (so I've intentionally set assertThat expression
>> to express strict expectation of error and narrow the test case).
>>
>> Does it make any sense? Or just leaving the comment about not expected
>> KeyError near calling 'bzrtree.pull' is sufficient?
> I don't see how not explicitly checking for that exception is too
> "generous". If that function does raise an exception then the test
> runner will report it. What does explicitly checking for that exception
> add?
>
> IMHO just leaving the comment is sufficient.

OK, I've implemented this.

Hello Jelmer,

On 22.12.2010 20:41, Jelmer Vernooij wrote:
> On Tue, 2010-12-21 at 21:43 +0000, Leonid Borisenko wrote:
> [skipped]
>>   So the only cases, that are handled by my patch, are ones that needs
>> looking into revisions that are in target repository, but was imported
>> in any previous fetching sessions. These revisions are exactly ancestry
>> of first revision imported in current fetching session.
> That's not necessarily true. Assume the following graph:
> 
> A
> | \
> B C
> | |
> D F
> |/
> E
> 
> If the first fetch pulls in up to C and D, and the second fetch pulls in
> E then the first imported revision will probably be D but the parent of
> F will be C - which is not the first imported revision, nor will it be
> in its ancestry.

Thanks for example.
  I've tried to reproduce this graph in test cases (look at my updated
branch) and one of two cases revealed new problem. It crashes at
importing manifests (i.e. before importing texts), but root of bug seems
the same (it concerns with filetext_map variable).
  So my solution should be rethinked and reworked.

> [skipped]
>>>> +        # Not expected KeyError looked like:
>>>> +        #
>>>> +        #  <...full traceback skipped...>
>>>> +        #  File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
>>>> +        #      (fileid, ie.revision))
>>>> +        # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
>>>> +        self.assertThat(pull_to_bzr_repo_with_existing_text_metadata,
>>>> +                        Not(raises(KeyError)))
>>> I think we should simply just call bzrtree.pull(hgbranch) here. We don't
>>> have a pattern of using testtools anywhere else in bzr-hg, and the error
>>> message it prints in this case is very unhelpful (it eats the
>>> backtrace).
>>
>>   Ok, I'll get rid of testtools import and using of imported matchers.
>>
>>   But I think that just calling 'bzrtree.pull(hgbranch)' making this
>> test case too generous (so I've intentionally set assertThat expression
>> to express strict expectation of error and narrow the test case).
>>
>>   Does it make any sense? Or just leaving the comment about not expected
>> KeyError near calling 'bzrtree.pull' is sufficient?
> I don't see how not explicitly checking for that exception is too
> "generous". If that function does raise an exception then the test
> runner will report it. What does explicitly checking for that exception
> add?
> 
> IMHO just leaving the comment is sufficient.

OK, I've implemented this.

Bazaar Hg Plugin

Merge lp:~leonidborisenko/bzr-hg/hacking into lp:bzr-hg

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'fetch.py'
 --- fetch.py	2010-12-18 18:28:46 +0000
 +++ fetch.py	2010-12-23 20:21:49 +0000
@@ -349,7 +349,16 @@
              return self._symlink_targets[key]
          return self._target_overlay.get_file_fulltext(key)
--    def _unpack_texts(self, cg, mapping, filetext_map, pb):
++    def _get_target_fulltext_key_from_revision_ancestry(self, fileid, revid):
++        graph = self.target.get_graph()
++        revision_parents_ids = self._revisions[revid].parent_ids
++        for ancestor_revid, _ in graph.iter_ancestry(revision_parents_ids):
++            ids = self.target.fileids_altered_by_revision_ids([ancestor_revid])
++            if fileid in ids:
++                return fileid, ancestor_revid
++
++    def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
++                      pb):
          i = 0
          # Texts
          while 1:
@@ -361,7 +370,11 @@
              fileid = mapping.generate_file_id(f)
              chunkiter = mercurial.changegroup.chunkiter(cg)
              def get_text(node):
--                key = iter(filetext_map[fileid][node]).next()
++                if node in filetext_map[fileid]:
++                    key = iter(filetext_map[fileid][node]).next()
++                else:
++                    key = self._get_target_fulltext_key_from_revision_ancestry(
++                               fileid, first_imported_revid)
                  return self._get_target_fulltext(key)
              for fulltext, hgkey, hgparents, csid in unpack_chunk_iter(chunkiter, get_text):
                  for revision, (kind, parents) in filetext_map[fileid][hgkey].iteritems():
@@ -529,7 +542,7 @@
          pb = ui.ui_factory.nested_progress_bar()
          try:
              self.target.texts.insert_record_stream(
--                self._unpack_texts(cg, mapping, filetext_map, pb))
++                self._unpack_texts(cg, mapping, filetext_map, todo[0], pb))
          finally:
              pb.finished()
          # Adding actual data
 === modified file 'overlay.py'
 --- overlay.py	2009-11-26 21:33:32 +0000
 +++ overlay.py	2010-12-23 20:21:49 +0000
@@ -199,7 +199,7 @@
          revid = self._lookup_revision_by_manifest_id(manifest_id)
          return self.get_manifest_text_by_revid(revid)
--    def _get_file_fulltext(self, key):
++    def get_file_fulltext(self, key):
          ret = "".join(self.repo.iter_files_bytes([key + (None,)]).next()[1])
          if ret == "": # could be a symlink
              ie = self.repo.get_inventory(key[1])[key[0]]
 === modified file 'tests/test_fetch.py'
 --- tests/test_fetch.py	2010-12-19 23:00:50 +0000
 +++ tests/test_fetch.py	2010-12-23 20:21:49 +0000
@@ -20,6 +20,7 @@
  from bzrlib.plugins.hg.ui import ui as hgui
  from bzrlib.tests import TestCaseWithTransport
++from mercurial import hg
  import mercurial.localrepo
  class TestFetching(TestCaseWithTransport):
@@ -42,12 +43,167 @@
          hgrepo.commit("Remove file f2, so parent directories d2, d1 are empty")
          # Import history from Mercurial repository into Bazaar repository.
--        bzrtree = self.make_branch_and_tree('bzr')
--        hgdir = HgControlDirFormat().open(self.get_transport('hg'))
++        bzrtree = self.make_branch_and_tree("bzr")
++        hgdir = HgControlDirFormat().open(self.get_transport("hg"))
          bzrtree.pull(hgdir.open_branch())
          # As file f2 was deleted, directories d1 and d2 should not exists.
--        self.failIfExists('bzr/d1')
++        self.failIfExists("bzr/d1")
          # Self-assurance check that history was really imported.
--        self.failUnlessExists('bzr/f1')
++        self.failUnlessExists("bzr/f1")
++
++    def test_getting_existing_text_metadata(self):
++        # Create Mercurial repository and Bazaar branch to import into.
++        hgrepo = mercurial.localrepo.localrepository(hgui(), "hg", create=True)
++        hgdir = HgControlDirFormat().open(self.get_transport("hg"))
++        hgbranch = hgdir.open_branch()
++        bzrtree = self.make_branch_and_tree("bzr")
++
++        # Create file 'f1' in Mercurial repository, commit it
++        # and pull commited changeset to Bazaar branch.
++        self.build_tree_contents([("hg/f1", "Initial content")])
++        hgrepo[None].add(["f1"])
++        hgrepo.commit("Initial commit")
++        bzrtree.pull(hgbranch)
++
++        # Change content of file 'f1' in Mercurial repository and commit
++        # change.
++        self.build_tree_contents([("hg/f1", "Changed content")])
++        hgrepo.commit("Change content of file f1")
++
++        # Pull commited changeset to Bazaar branch.
++        #
++        # Should not raise KeyError which looked like:
++        #
++        #  <...full traceback skipped...>
++        #  File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
++        #      (fileid, ie.revision))
++        # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
++        bzrtree.pull(hgbranch)
++
++        # Self-assurance check that changesets was really pulled in.
++        self.assertFileEqual("Changed content", "bzr/f1")
++
++    def test_incremental_fetching_of_repository_with_non_conflict_merge(self):
++        # Create Mercurial configuration and override possible definition
++        # of external interactive merge tools.
++        ui = hgui()
++        ui.setconfig("ui", "merge", "internal:merge")
++
++        # Create Mercurial repository and Bazaar branch to import into.
++        hgrepo = mercurial.localrepo.localrepository(ui, "hg", create=True)
++        hgdir = HgControlDirFormat().open(self.get_transport("hg"))
++        hgbranch = hgdir.open_branch()
++        bzrtree = self.make_branch_and_tree("bzr")
++
++        # Create history graph in Mercurial repository
++        # (history flows from left to right):
++        #
++        # A--B--D
++        # |
++        # \--C
++        self.build_tree_contents([("hg/f1", "f1")])
++        hgrepo[None].add(["f1"])
++        hgrepo.commit("A (initial commit; first commit to main branch)")
++        self.build_tree_contents([("hg/f2", "f2")])
++        hgrepo[None].add(["f2"])
++        hgrepo.commit("B (second commit to main branch)")
++        hg.update(hgrepo, 0)
++        self.build_tree_contents([("hg/f3", "f3")])
++        hgrepo[None].add(["f3"])
++        hgrepo.commit("C (first commit to secondary branch)")
++        hg.update(hgrepo, 1)
++        self.build_tree_contents([("hg/f4", "f4")])
++        hgrepo[None].add(["f4"])
++        hgrepo.commit("D (third commit to main branch)")
++
++        # Pull commited changesets to Bazaar branch.
++        bzrtree.pull(hgbranch)
++
++        # Continue history graph in Mercurial repository
++        # (history flows from up to down):
++        #
++        # a--b--d--E
++        # |       |
++        # \--c--F-/
++        hg.update(hgrepo, 2)
++        self.build_tree_contents([("hg/f5", "f5")])
++        hgrepo[None].add(["f5"])
++        hgrepo.commit("F (second commit to secondary branch)")
++        hg.update(hgrepo, 3)
++        hg.merge(hgrepo, 4)
++        hgrepo.commit("E (commit merge of main branch with secondary branch)")
++
++        # Pull commited changesets to Bazaar branch.
++        bzrtree.pull(hgbranch)
++
++        # Self-assurance check that all changesets was really pulled in.
++        for i in range(1, 6):
++            file_content = "f%d" % i
++            file_path = "bzr/%s" % file_content
++            self.assertFileEqual(file_content, file_path)
++
++    def test_incremental_fetching_of_repository_with_conflict_merge(self):
++        # Create Mercurial configuration and override possible definition
++        # of external interactive merge tools.
++        ui = hgui()
++        ui.setconfig("ui", "merge", "internal:local")
++
++        # Create Mercurial repository and Bazaar branch to import into.
++        hgrepo = mercurial.localrepo.localrepository(ui, "hg", create=True)
++        hgdir = HgControlDirFormat().open(self.get_transport("hg"))
++        hgbranch = hgdir.open_branch()
++        bzrtree = self.make_branch_and_tree("bzr")
++
++        # Create history graph with conflict in Mercurial repository
++        # (history flows from left to right, conflict made at commits B and C):
++        #
++        # A--B--D
++        # |
++        # \--C
++        self.build_tree_contents([("hg/f1", "f1")])
++        hgrepo[None].add(["f1"])
++        hgrepo.commit("A (initial commit; first commit to main branch)")
++        self.build_tree_contents([("hg/conflict_file", "Main branch")])
++        hgrepo[None].add(["conflict_file"])
++        hgrepo.commit("B (second commit to main branch)")
++        hg.update(hgrepo, 0)
++        self.build_tree_contents([("hg/conflict_file", "Secondary branch")])
++        hgrepo[None].add(["conflict_file"])
++        hgrepo.commit("C (first commit to secondary branch)")
++        hg.update(hgrepo, 1)
++        self.build_tree_contents([("hg/f2", "f2")])
++        hgrepo[None].add(["f2"])
++        hgrepo.commit("D (third commit to main branch)")
++
++        # Pull commited changesets to Bazaar branch.
++        bzrtree.pull(hgbranch)
++
++        # Continue history graph in Mercurial repository
++        # (history flows from up to down):
++        #
++        # a--b--d--E
++        # |       |
++        # \--c--F-/
++        hg.update(hgrepo, 2)
++        self.build_tree_contents([("hg/f3", "f3")])
++        hgrepo[None].add(["f3"])
++        hgrepo.commit("F (second commit to secondary branch)")
++        hg.update(hgrepo, 3)
++        hg.merge(hgrepo, 4)
++        self.build_tree_contents([("hg/conflict_file",
++                                   "Main branch\nSecondary branch")])
++        hgrepo.commit("E (commit merge of main branch with secondary branch)")
++
++        # Pull commited changesets to Bazaar branch.
++        bzrtree.pull(hgbranch)
++
++        # Self-assurance check that all changesets was really pulled in.
++        for i in range(1, 4):
++            file_content = "f%d" % i
++            file_path = "bzr/%s" % file_content
++            self.assertFileEqual(file_content, file_path)
++
++        self.assertFileEqual("bzr/conflict_file",
++                             "Main branch\nSecondary branch")