Merge lp:~zyga/bzr/find_ancestors into lp:bzr

Proposed by Zygmunt Krynicki on 2012-10-02
Status: Needs review
Proposed branch: lp:~zyga/bzr/find_ancestors
Merge into: lp:bzr
Diff against target: 41 lines (+13/-0)
2 files modified
bzrlib/btree_index.py (+3/-0)
bzrlib/index.py (+10/-0)
To merge this branch: bzr merge lp:~zyga/bzr/find_ancestors
Reviewer Review Type Date Requested Status
Zygmunt Krynicki (community) Needs Information on 2012-10-10
Martin Packman (community) 2012-10-02 Needs Fixing on 2012-10-10
Review via email: mp+127610@code.launchpad.net

Description of the Change

Add a workaround for bzr-fastimport crashing, see bug https://bugs.launchpad.net/bzr/+bug/541626

As stated, I don't know how correct this solution is in practice. From my limited testing it _fixes_ the crash and
seems to produce proper repository data. I want to get feedback from developers that know the core bzr better than I do.

To post a comment you must log in.
Martin Packman (gz) wrote :

Thanks for having a go at fixing this.

To land, this branch really needs:
* Tests, to demonstrate the problem and evaluate any attempted fix against
* Logic, rather than than just raising NotImplementedError and skipping that case

Did you have a look at Wouter's branch, which implements the method? Does that also work for you?

<lp:~larstiq/bzr/bug541626>

If so, picking up that code and writing a test case that exercises it would be a good way forwards.

review: Needs Fixing
Zygmunt Krynicki (zyga) wrote :

> Thanks for having a go at fixing this.

Thanks for the feedback!

> To land, this branch really needs:
> * Tests, to demonstrate the problem and evaluate any attempted fix against
> * Logic, rather than than just raising NotImplementedError and skipping that
> case

Sure, at the moment it's just an early thing I managed to do (having no understanding of bzr internals) that allowed me to use bzr.

> Did you have a look at Wouter's branch, which implements the method? Does that
> also work for you?

I did see that branch but I recall reading that the solution is incorrect. I am unable to judge that myself. Could you please suggest if skipping BTreeBuilder instances makes sense? If not, should the (can they) implement the same _find_ancestors() method?

Also, if you could point me out to any docs that describe how bzr internals work would help me a lot.

Thanks
ZK

review: Needs Information
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/10/2012 9:23 PM, Zygmunt Krynicki wrote:
> Review: Needs Information
>
>> Thanks for having a go at fixing this.
>
> Thanks for the feedback!
>
>> To land, this branch really needs: * Tests, to demonstrate the
>> problem and evaluate any attempted fix against * Logic, rather
>> than than just raising NotImplementedError and skipping that
>> case
>
> Sure, at the moment it's just an early thing I managed to do
> (having no understanding of bzr internals) that allowed me to use
> bzr.
>
>> Did you have a look at Wouter's branch, which implements the
>> method? Does that also work for you?
>
> I did see that branch but I recall reading that the solution is
> incorrect. I am unable to judge that myself. Could you please
> suggest if skipping BTreeBuilder instances makes sense? If not,
> should the (can they) implement the same _find_ancestors() method?
>
> Also, if you could point me out to any docs that describe how bzr
> internals work would help me a lot.
>
> Thanks ZK
>

They can implement _find_ancestors() and they should be queried if
available.

_find_ancestors was meant as a way to dig more quickly into ancestors
when the data storage isn't great at random lookups (rather than
proceeding step-by-step into ancestors, grab whatever ancestors you
can get cheaply, and return).

Anyway, the implementation for BTreeBuilder can be trivial (return 1
generation of ancestors). It is a performance optimization to have a
tighter inner loop. If you don't, it will just call the function again.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlB1uYUACgkQJdeBCYSNAANZGQCfSUM+Upaw1/lHs9BJH0X1KNf7
pxcAoMDpJjoE71nselXAWa7gYaPRmQA0
=jhNq
-----END PGP SIGNATURE-----

Martin Packman (gz) wrote :

> I did see that branch but I recall reading that the solution is incorrect. I
> am unable to judge that myself. Could you please suggest if skipping
> BTreeBuilder instances makes sense? If not, should the (can they) implement
> the same _find_ancestors() method?

The bug is not terribly clear, but as I read it:

* The first branch just copied the method over, which is not correct
* No one commented on Wouter's branch, which implements _find_ancestors differently

It's not clear to me if it really makes sense for the builder to have a _find_ancestors method, but without a test that demonstrates the situation in which one is present in the graph, it's hard to understand.

> Also, if you could point me out to any docs that describe how bzr internals
> work would help me a lot.

Refer to the docs/developers directory, also available online:

<http://doc.bazaar.canonical.com/bzr.dev/developers/>

See particularly:

<http://doc.bazaar.canonical.com/bzr.dev/developers/overview.html>
<http://doc.bazaar.canonical.com/bzr.dev/developers/indices.html>
<http://people.canonical.com/~mwh/bzrlibapi/bzrlib.btree_index.html>

Also the relevant parts of:

<http://doc.bazaar.canonical.com/bzr.dev/developers/HACKING.html>
<http://doc.bazaar.canonical.com/bzr.dev/developers/testing.html>

Zygmunt Krynicki (zyga) wrote :

> > I did see that branch but I recall reading that the solution is incorrect. I
> > am unable to judge that myself. Could you please suggest if skipping
> > BTreeBuilder instances makes sense? If not, should the (can they) implement
> > the same _find_ancestors() method?
>
> The bug is not terribly clear, but as I read it:

This bug can be triggered in a very simple way using bzr-fastimport. If this helps I can prepare reproducible instructions. This prevents one from using bzr-fastimport to consume a patch.

> * The first branch just copied the method over, which is not correct
> * No one commented on Wouter's branch, which implements _find_ancestors
> differently
>
> It's not clear to me if it really makes sense for the builder to have a
> _find_ancestors method, but without a test that demonstrates the situation in
> which one is present in the graph, it's hard to understand.

I read that as a request for reproducible instructions. I'm on it!

> > Also, if you could point me out to any docs that describe how bzr internals
> > work would help me a lot.
>
> Refer to the docs/developers directory, also available online:

Thanks a lot, I'll get right to it!

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/10/2012 10:33 PM, Zygmunt Krynicki wrote:
>>> I did see that branch but I recall reading that the solution is
>>> incorrect. I am unable to judge that myself. Could you please
>>> suggest if skipping BTreeBuilder instances makes sense? If not,
>>> should the (can they) implement the same _find_ancestors()
>>> method?
>>
>> The bug is not terribly clear, but as I read it:
>
> This bug can be triggered in a very simple way using
> bzr-fastimport. If this helps I can prepare reproducible
> instructions. This prevents one from using bzr-fastimport to
> consume a patch.

Right, bzr-fastimport is the only code that uses find_ancestry before
the BTreeBuilder content is actually committed to the repository, and
it becomes a normal BTreeIndex. Which is why it only fails for
bzr-fastimport, and we don't have any tests inside bzr that fail (bzr
itself never needs to search the ancestry that it is now adding).

I believe the reason for searching the ancestry in bzr-fastimport is
genuine, and the find_ancestry call is reasonable to implement. I
thought I looked at Wouter's branch and made a comment on it, but it
has been quite a while since I looked at that.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlB2R3EACgkQJdeBCYSNAAOwkgCcDh63D0ttLH7esE0/e/fXF/0b
i3gAniNjj07MLTiJgK3L62ro31eBdJlU
=YITc
-----END PGP SIGNATURE-----

Unmerged revisions

6569. By Zygmunt Krynicki on 2012-10-02

Have CombinedIndex.find_ancestry() skip BTreeBuilder instances

This apparantly works around bug https://bugs.launchpad.net/bzr/+bug/541626
As the comment states I don't really know how correct that is but it seems to
produce proper results for the things that affected me and it no longer crashes
bzr in that case.

I could not move the import statement to the top as that made it fail. It is
probably related to some internal bzr import hook.

6568. By Zygmunt Krynicki on 2012-10-02

Add stub for BTreeBuilder._find_ancestors()

This replaces confusing AttributeErrors with something that is easier to
understand. This is related to bug https://bugs.launchpad.net/bzr/+bug/541626

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/btree_index.py'
2--- bzrlib/btree_index.py 2012-09-05 18:16:08 +0000
3+++ bzrlib/btree_index.py 2012-10-02 23:15:25 +0000
4@@ -614,6 +614,9 @@
5 def validate(self):
6 """In memory index's have no known corruption at the moment."""
7
8+ def _find_ancestors(self, keys, ref_list_num, parent_map, missing_keys):
9+ raise NotImplementedError("BTreeBuilder._find_ancestors() is not implemented")
10+
11
12 class _LeafNode(dict):
13 """A leaf node for a serialised B+Tree index."""
14
15=== modified file 'bzrlib/index.py'
16--- bzrlib/index.py 2011-12-19 13:23:58 +0000
17+++ bzrlib/index.py 2012-10-02 23:15:25 +0000
18@@ -1500,6 +1500,7 @@
19 parent_map = {}
20 keys_to_lookup = set(keys)
21 generation = 0
22+ from bzrlib.btree_index import BTreeBuilder
23 while keys_to_lookup:
24 # keys that *all* indexes claim are missing, stop searching them
25 generation += 1
26@@ -1509,6 +1510,15 @@
27 # len(parent_map),
28 # len(missing_keys))
29 for index_idx, index in enumerate(self._indices):
30+ # Skip BTreeBuilder instances as ... they don't really
31+ # implement _find_ancestors(). This is not a real fix as I
32+ # don't understand bzr data model but at least it makes bzr
33+ # fast-import work for me.
34+ if isinstance(index, BTreeBuilder):
35+ trace.mutter("Skipping incompatible index %s", index)
36+ continue
37+ else:
38+ trace.mutter("Looking in %s", index)
39 # TODO: we should probably be doing something with
40 # 'missing_keys' since we've already determined that
41 # those revisions have not been found anywhere