Merge lp:~jameinel/bzr/2.3-btree-chk-leaf into lp:bzr
| Status: | Merged |
|---|---|
| Approved by: | Vincent Ladeuil on 2010-08-20 |
| Approved revision: | 5388 |
| Merged at revision: | 5390 |
| Proposed branch: | lp:~jameinel/bzr/2.3-btree-chk-leaf |
| Merge into: | lp:bzr |
| Diff against target: |
1448 lines (+962/-60) 17 files modified
NEWS (+7/-0) bzrlib/_btree_serializer_pyx.pyx (+545/-2) bzrlib/_static_tuple_c.pxd (+1/-0) bzrlib/btree_index.py (+33/-16) bzrlib/clean_tree.py (+1/-1) bzrlib/generate_ids.py (+1/-1) bzrlib/python-compat.h (+3/-0) bzrlib/repofmt/pack_repo.py (+22/-20) bzrlib/tests/__init__.py (+1/-0) bzrlib/tests/test__btree_serializer.py (+305/-0) bzrlib/tests/test_btree_index.py (+15/-15) bzrlib/tests/test_clean_tree.py (+1/-1) bzrlib/tests/test_inv.py (+1/-1) bzrlib/tests/test_inventory_delta.py (+1/-1) bzrlib/tests/test_repository.py (+16/-0) bzrlib/tests/test_source.py (+5/-2) doc/en/whats-new/whats-new-in-2.3.txt (+4/-0) |
| To merge this branch: | bzr merge lp:~jameinel/bzr/2.3-btree-chk-leaf |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Vincent Ladeuil | 2010-08-05 | Needs Fixing on 2010-08-20 | |
|
Review via email:
|
|||
Commit Message
Use a custom class definition for CHK BTree leaf nodes.
Description of the Change
This changes the BTreeGraphIndex class for CHK indexes.
It replaces them with a custom structure that I described on the list. The net effect is to change us from using something like 200+ bytes / index record (400+ on 64-bit) down to very close to 42bytes/record (32 and 64 bit))
It also speeds up creating these objects, though my overall timing shows that isn't really a dominant factor.
It creates the appropriate tuples as-requested, which is where most of the memory savings gain is. (40-bytes is going to be 1 string, much less all the extra tuples and such that go along with it.)
This is valid to do, because often our chk indexes are underutilized. Every time we read 1 we have to read ~100, and because they are hashes, they are distributed and we rarely read more than 1 from a single index.
The main thing I'm not sure about, is how I expose this to the repository. We've done a 'unlimited_
If someone has a good idea here, I'm happy to change that code.
This also changes the BTreeIndex code, so that it treats the LeafNodes themselves as dicts, rather than using an attribute on them. This is potentially a small slowdown (according to timeit). (x in o._dict), is slightly faster than (x in subclass_dict), which really surprised me. Though a custom __getitem__ that thunks over to self._dict is much slower.
11.3us dict[x]
15.6us class._dict[x]
28.2us subclass[x]
56us __getitem__(return self._dict[x])
I think it is okay, though it is a shame to regress the normal case.
- 5388. By John A Meinel on 2010-08-06
-
A __sizeof__ check that ensure we are getting what we are looking for.
It also checks that it grows appropriately for how many new entries are added.
| John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 8/20/2010 7:19 AM, Vincent Ladeuil wrote:
> Review: Needs Fixing
> Wow, I clicked Status -> Approved while my comment was being processed and it get lost :-/
>
> Here it is again:
>
> Nice catch !
>
> You had test code (_test_* functions) in production code which has a bad smell,
> with this tweaked, please land.
>
They aren't actually test functions. They are Python wrappers to expose
the pure C functions to the test infrastructure.
How would you like me to tweak it? I intentionally called them "test_*"
so that it would be obvious not to call them from production code...
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkx
/S8AoL7RVhlsMx0
=oeNU
-----END PGP SIGNATURE-----
- 5389. By John A Meinel on 2010-08-23
-
Change the _test names to _get and _py so that it doesn't provoke bad smells :)
- 5390. By John A Meinel on 2010-08-23
-
Merge bzr.dev 5387 in prep for NEWS
- 5391. By John A Meinel on 2010-08-23
-
Update What's New and NEWS
| John A Meinel (jameinel) wrote : | # |
sent to pqm by email
- 5392. By John A Meinel on 2010-08-23
-
switch 'x += 1' to 'x = x + 1' to deal with brain-damaged old versions of pyrex.
| John A Meinel (jameinel) wrote : | # |
sent to pqm by email
- 5393. By John A Meinel on 2010-08-23
-
Lots of compatibility changes for python2.4 and mingw32 compilers.
gcc has strtoll available, and does the wrong thing if you use _strtoi64 (it implicitly
defines it to return an 'int' which isn't a 64-bit integer.)Further, python2.4 doesn't support the %lu syntax, only having %ld and %d. So we go
back to casting everything into real python objects and then stringifying them, but
only for python 2.4.
| John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 8/23/2010 2:38 PM, John A Meinel wrote:
> sent to pqm by email
>
Sorry for the extra merge noise. It turns out things were a bit uglier
than I thought for older versions of python.
Specifically, python 2.4 PyString_
doesn't support '%lu' or even '%u'. This was changed again in python 2.7
to support '%llu', but the docs only mention that last change, not the
python2.5 change to add '%lu' support.
You can see the difference in here:
http://
versus:
http://
versus:
http://
I also ran into pyrex 0.9.6 vs 0.9.8 vs Cython incompatibilities, etc.
However, I've now ran the test suite on python 2.4.4 w/ pyrex 0.9.6.4
(matching hardy, I think), and python 2.5 pyrex 0.9.8.5 and python 2.6
Cython 0.11.2
Which I think should cover everything.
Oh, and it also covers using 'gcc' vs 'cl'. Since 'gcc' supports strtoll
directly, while cl needs to use _strtoi64. (If you use _strtoi64 w/ gcc,
but don't define it, then it auto-defines it to return a simple 'int'
which is only 32 bits.)
So I'm pretty sure this will handle what we need. Babune can tell me if
I missed anything, I guess. :)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkx
MQYAnjuxPkiBcKI
=pU3B
-----END PGP SIGNATURE-----
| John A Meinel (jameinel) wrote : | # |
sent to pqm by email
- 5394. By John A Meinel on 2010-08-24
-
Handle test_source and extensions. Also define an 'extern' protocol, to allow
the test suite to recognize that returning an object of that type is a Python object.

Wow, I clicked Status -> Approved while my comment was being processed and it get lost :-/
Here it is again:
Nice catch !
You had test code (_test_* functions) in production code which has a bad smell,
with this tweaked, please land.