Merge lp:~zyga/bzr/find_ancestors into lp:bzr

Proposed by Zygmunt Krynicki
Status: Needs review
Proposed branch: lp:~zyga/bzr/find_ancestors
Merge into: lp:bzr
Diff against target: 41 lines (+13/-0)
2 files modified
bzrlib/btree_index.py (+3/-0)
bzrlib/index.py (+10/-0)
To merge this branch: bzr merge lp:~zyga/bzr/find_ancestors
Reviewer Review Type Date Requested Status
Zygmunt Krynicki (community) Needs Information
Martin Packman (community) Needs Fixing
Review via email: mp+127610@code.launchpad.net

Description of the change

Add a workaround for bzr-fastimport crashing, see bug https://bugs.launchpad.net/bzr/+bug/541626

As stated, I don't know how correct this solution is in practice. From my limited testing it _fixes_ the crash and
seems to produce proper repository data. I want to get feedback from developers that know the core bzr better than I do.

To post a comment you must log in.
Revision history for this message
Martin Packman (gz) wrote :

Thanks for having a go at fixing this.

To land, this branch really needs:
* Tests, to demonstrate the problem and evaluate any attempted fix against
* Logic, rather than than just raising NotImplementedError and skipping that case

Did you have a look at Wouter's branch, which implements the method? Does that also work for you?

<lp:~larstiq/bzr/bug541626>

If so, picking up that code and writing a test case that exercises it would be a good way forwards.

review: Needs Fixing
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

> Thanks for having a go at fixing this.

Thanks for the feedback!

> To land, this branch really needs:
> * Tests, to demonstrate the problem and evaluate any attempted fix against
> * Logic, rather than than just raising NotImplementedError and skipping that
> case

Sure, at the moment it's just an early thing I managed to do (having no understanding of bzr internals) that allowed me to use bzr.

> Did you have a look at Wouter's branch, which implements the method? Does that
> also work for you?

I did see that branch but I recall reading that the solution is incorrect. I am unable to judge that myself. Could you please suggest if skipping BTreeBuilder instances makes sense? If not, should the (can they) implement the same _find_ancestors() method?

Also, if you could point me out to any docs that describe how bzr internals work would help me a lot.

Thanks
ZK

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/10/2012 9:23 PM, Zygmunt Krynicki wrote:
> Review: Needs Information
>
>> Thanks for having a go at fixing this.
>
> Thanks for the feedback!
>
>> To land, this branch really needs: * Tests, to demonstrate the
>> problem and evaluate any attempted fix against * Logic, rather
>> than than just raising NotImplementedError and skipping that
>> case
>
> Sure, at the moment it's just an early thing I managed to do
> (having no understanding of bzr internals) that allowed me to use
> bzr.
>
>> Did you have a look at Wouter's branch, which implements the
>> method? Does that also work for you?
>
> I did see that branch but I recall reading that the solution is
> incorrect. I am unable to judge that myself. Could you please
> suggest if skipping BTreeBuilder instances makes sense? If not,
> should the (can they) implement the same _find_ancestors() method?
>
> Also, if you could point me out to any docs that describe how bzr
> internals work would help me a lot.
>
> Thanks ZK
>

They can implement _find_ancestors() and they should be queried if
available.

_find_ancestors was meant as a way to dig more quickly into ancestors
when the data storage isn't great at random lookups (rather than
proceeding step-by-step into ancestors, grab whatever ancestors you
can get cheaply, and return).

Anyway, the implementation for BTreeBuilder can be trivial (return 1
generation of ancestors). It is a performance optimization to have a
tighter inner loop. If you don't, it will just call the function again.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlB1uYUACgkQJdeBCYSNAANZGQCfSUM+Upaw1/lHs9BJH0X1KNf7
pxcAoMDpJjoE71nselXAWa7gYaPRmQA0
=jhNq
-----END PGP SIGNATURE-----

Revision history for this message
Martin Packman (gz) wrote :

> I did see that branch but I recall reading that the solution is incorrect. I
> am unable to judge that myself. Could you please suggest if skipping
> BTreeBuilder instances makes sense? If not, should the (can they) implement
> the same _find_ancestors() method?

The bug is not terribly clear, but as I read it:

* The first branch just copied the method over, which is not correct
* No one commented on Wouter's branch, which implements _find_ancestors differently

It's not clear to me if it really makes sense for the builder to have a _find_ancestors method, but without a test that demonstrates the situation in which one is present in the graph, it's hard to understand.

> Also, if you could point me out to any docs that describe how bzr internals
> work would help me a lot.

Refer to the docs/developers directory, also available online:

<http://doc.bazaar.canonical.com/bzr.dev/developers/>

See particularly:

<http://doc.bazaar.canonical.com/bzr.dev/developers/overview.html>
<http://doc.bazaar.canonical.com/bzr.dev/developers/indices.html>
<http://people.canonical.com/~mwh/bzrlibapi/bzrlib.btree_index.html>

Also the relevant parts of:

<http://doc.bazaar.canonical.com/bzr.dev/developers/HACKING.html>
<http://doc.bazaar.canonical.com/bzr.dev/developers/testing.html>

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

> > I did see that branch but I recall reading that the solution is incorrect. I
> > am unable to judge that myself. Could you please suggest if skipping
> > BTreeBuilder instances makes sense? If not, should the (can they) implement
> > the same _find_ancestors() method?
>
> The bug is not terribly clear, but as I read it:

This bug can be triggered in a very simple way using bzr-fastimport. If this helps I can prepare reproducible instructions. This prevents one from using bzr-fastimport to consume a patch.

> * The first branch just copied the method over, which is not correct
> * No one commented on Wouter's branch, which implements _find_ancestors
> differently
>
> It's not clear to me if it really makes sense for the builder to have a
> _find_ancestors method, but without a test that demonstrates the situation in
> which one is present in the graph, it's hard to understand.

I read that as a request for reproducible instructions. I'm on it!

> > Also, if you could point me out to any docs that describe how bzr internals
> > work would help me a lot.
>
> Refer to the docs/developers directory, also available online:

Thanks a lot, I'll get right to it!

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/10/2012 10:33 PM, Zygmunt Krynicki wrote:
>>> I did see that branch but I recall reading that the solution is
>>> incorrect. I am unable to judge that myself. Could you please
>>> suggest if skipping BTreeBuilder instances makes sense? If not,
>>> should the (can they) implement the same _find_ancestors()
>>> method?
>>
>> The bug is not terribly clear, but as I read it:
>
> This bug can be triggered in a very simple way using
> bzr-fastimport. If this helps I can prepare reproducible
> instructions. This prevents one from using bzr-fastimport to
> consume a patch.

Right, bzr-fastimport is the only code that uses find_ancestry before
the BTreeBuilder content is actually committed to the repository, and
it becomes a normal BTreeIndex. Which is why it only fails for
bzr-fastimport, and we don't have any tests inside bzr that fail (bzr
itself never needs to search the ancestry that it is now adding).

I believe the reason for searching the ancestry in bzr-fastimport is
genuine, and the find_ancestry call is reasonable to implement. I
thought I looked at Wouter's branch and made a comment on it, but it
has been quite a while since I looked at that.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlB2R3EACgkQJdeBCYSNAAOwkgCcDh63D0ttLH7esE0/e/fXF/0b
i3gAniNjj07MLTiJgK3L62ro31eBdJlU
=YITc
-----END PGP SIGNATURE-----

Unmerged revisions

6569. By Zygmunt Krynicki

Have CombinedIndex.find_ancestry() skip BTreeBuilder instances

This apparantly works around bug https://bugs.launchpad.net/bzr/+bug/541626
As the comment states I don't really know how correct that is but it seems to
produce proper results for the things that affected me and it no longer crashes
bzr in that case.

I could not move the import statement to the top as that made it fail. It is
probably related to some internal bzr import hook.

6568. By Zygmunt Krynicki

Add stub for BTreeBuilder._find_ancestors()

This replaces confusing AttributeErrors with something that is easier to
understand. This is related to bug https://bugs.launchpad.net/bzr/+bug/541626

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/btree_index.py'
2--- bzrlib/btree_index.py 2012-09-05 18:16:08 +0000
3+++ bzrlib/btree_index.py 2012-10-02 23:15:25 +0000
4@@ -614,6 +614,9 @@
5 def validate(self):
6 """In memory index's have no known corruption at the moment."""
7
8+ def _find_ancestors(self, keys, ref_list_num, parent_map, missing_keys):
9+ raise NotImplementedError("BTreeBuilder._find_ancestors() is not implemented")
10+
11
12 class _LeafNode(dict):
13 """A leaf node for a serialised B+Tree index."""
14
15=== modified file 'bzrlib/index.py'
16--- bzrlib/index.py 2011-12-19 13:23:58 +0000
17+++ bzrlib/index.py 2012-10-02 23:15:25 +0000
18@@ -1500,6 +1500,7 @@
19 parent_map = {}
20 keys_to_lookup = set(keys)
21 generation = 0
22+ from bzrlib.btree_index import BTreeBuilder
23 while keys_to_lookup:
24 # keys that *all* indexes claim are missing, stop searching them
25 generation += 1
26@@ -1509,6 +1510,15 @@
27 # len(parent_map),
28 # len(missing_keys))
29 for index_idx, index in enumerate(self._indices):
30+ # Skip BTreeBuilder instances as ... they don't really
31+ # implement _find_ancestors(). This is not a real fix as I
32+ # don't understand bzr data model but at least it makes bzr
33+ # fast-import work for me.
34+ if isinstance(index, BTreeBuilder):
35+ trace.mutter("Skipping incompatible index %s", index)
36+ continue
37+ else:
38+ trace.mutter("Looking in %s", index)
39 # TODO: we should probably be doing something with
40 # 'missing_keys' since we've already determined that
41 # those revisions have not been found anywhere