Merge lp:~jameinel/bzr/2.3-btree-chk-leaf into lp:bzr

Proposed by John A Meinel on 2010-08-05
Status: Merged
Approved by: Vincent Ladeuil on 2010-08-20
Approved revision: 5388
Merged at revision: 5390
Proposed branch: lp:~jameinel/bzr/2.3-btree-chk-leaf
Merge into: lp:bzr
Diff against target: 1448 lines (+962/-60)
17 files modified
NEWS (+7/-0)
bzrlib/_btree_serializer_pyx.pyx (+545/-2)
bzrlib/_static_tuple_c.pxd (+1/-0)
bzrlib/btree_index.py (+33/-16)
bzrlib/clean_tree.py (+1/-1)
bzrlib/generate_ids.py (+1/-1)
bzrlib/python-compat.h (+3/-0)
bzrlib/repofmt/pack_repo.py (+22/-20)
bzrlib/tests/__init__.py (+1/-0)
bzrlib/tests/test__btree_serializer.py (+305/-0)
bzrlib/tests/test_btree_index.py (+15/-15)
bzrlib/tests/test_clean_tree.py (+1/-1)
bzrlib/tests/test_inv.py (+1/-1)
bzrlib/tests/test_inventory_delta.py (+1/-1)
bzrlib/tests/test_repository.py (+16/-0)
bzrlib/tests/test_source.py (+5/-2)
doc/en/whats-new/whats-new-in-2.3.txt (+4/-0)
To merge this branch: bzr merge lp:~jameinel/bzr/2.3-btree-chk-leaf
Reviewer Review Type Date Requested Status
Vincent Ladeuil 2010-08-05 Needs Fixing on 2010-08-20
Review via email: mp+31904@code.launchpad.net

Commit Message

Use a custom class definition for CHK BTree leaf nodes.

Description of the Change

This changes the BTreeGraphIndex class for CHK indexes.
It replaces them with a custom structure that I described on the list. The net effect is to change us from using something like 200+ bytes / index record (400+ on 64-bit) down to very close to 42bytes/record (32 and 64 bit))

It also speeds up creating these objects, though my overall timing shows that isn't really a dominant factor.

It creates the appropriate tuples as-requested, which is where most of the memory savings gain is. (40-bytes is going to be 1 string, much less all the extra tuples and such that go along with it.)

This is valid to do, because often our chk indexes are underutilized. Every time we read 1 we have to read ~100, and because they are hashes, they are distributed and we rarely read more than 1 from a single index.

The main thing I'm not sure about, is how I expose this to the repository. We've done a 'unlimited_cache=True' flag for the last round of chk-specific changes. I didn't really want to add yet-another flag, but manually setting a private member isn't a lot better.

If someone has a good idea here, I'm happy to change that code.

This also changes the BTreeIndex code, so that it treats the LeafNodes themselves as dicts, rather than using an attribute on them. This is potentially a small slowdown (according to timeit). (x in o._dict), is slightly faster than (x in subclass_dict), which really surprised me. Though a custom __getitem__ that thunks over to self._dict is much slower.

 11.3us dict[x]
 15.6us class._dict[x]
 28.2us subclass[x]
 56us __getitem__(return self._dict[x])

I think it is okay, though it is a shame to regress the normal case.

To post a comment you must log in.
lp:~jameinel/bzr/2.3-btree-chk-leaf updated on 2010-08-06
5388. By John A Meinel on 2010-08-06

A __sizeof__ check that ensure we are getting what we are looking for.

It also checks that it grows appropriately for how many new entries are added.

Vincent Ladeuil (vila) wrote :

Wow, I clicked Status -> Approved while my comment was being processed and it get lost :-/

Here it is again:

Nice catch !

You had test code (_test_* functions) in production code which has a bad smell,
with this tweaked, please land.

review: Needs Fixing
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/20/2010 7:19 AM, Vincent Ladeuil wrote:
> Review: Needs Fixing
> Wow, I clicked Status -> Approved while my comment was being processed and it get lost :-/
>
> Here it is again:
>
> Nice catch !
>
> You had test code (_test_* functions) in production code which has a bad smell,
> with this tweaked, please land.
>

They aren't actually test functions. They are Python wrappers to expose
the pure C functions to the test infrastructure.

How would you like me to tweak it? I intentionally called them "test_*"
so that it would be obvious not to call them from production code...

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxyfoMACgkQJdeBCYSNAANsYwCggTDnJPLzlmlM+erieFXnsWK5
/S8AoL7RVhlsMx0RSMykIKqZul2GwWCv
=oeNU
-----END PGP SIGNATURE-----

lp:~jameinel/bzr/2.3-btree-chk-leaf updated on 2010-08-23
5389. By John A Meinel on 2010-08-23

Change the _test names to _get and _py so that it doesn't provoke bad smells :)

5390. By John A Meinel on 2010-08-23

Merge bzr.dev 5387 in prep for NEWS

5391. By John A Meinel on 2010-08-23

Update What's New and NEWS

John A Meinel (jameinel) wrote :

sent to pqm by email

lp:~jameinel/bzr/2.3-btree-chk-leaf updated on 2010-08-23
5392. By John A Meinel on 2010-08-23

switch 'x += 1' to 'x = x + 1' to deal with brain-damaged old versions of pyrex.

John A Meinel (jameinel) wrote :

sent to pqm by email

lp:~jameinel/bzr/2.3-btree-chk-leaf updated on 2010-08-23
5393. By John A Meinel on 2010-08-23

Lots of compatibility changes for python2.4 and mingw32 compilers.

gcc has strtoll available, and does the wrong thing if you use _strtoi64 (it implicitly
defines it to return an 'int' which isn't a 64-bit integer.)

Further, python2.4 doesn't support the %lu syntax, only having %ld and %d. So we go
back to casting everything into real python objects and then stringifying them, but
only for python 2.4.

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/23/2010 2:38 PM, John A Meinel wrote:
> sent to pqm by email
>

Sorry for the extra merge noise. It turns out things were a bit uglier
than I thought for older versions of python.

Specifically, python 2.4 PyString_FromFormat() only supports ('%d'). It
doesn't support '%lu' or even '%u'. This was changed again in python 2.7
to support '%llu', but the docs only mention that last change, not the
python2.5 change to add '%lu' support.

You can see the difference in here:
  http://docs.python.org/release/2.4.4/api/stringObjects.html

versus:
  http://docs.python.org/release/2.5.4/api/stringObjects.html

versus:
  http://docs.python.org/c-api/string.html#PyString_FromFormat

I also ran into pyrex 0.9.6 vs 0.9.8 vs Cython incompatibilities, etc.

However, I've now ran the test suite on python 2.4.4 w/ pyrex 0.9.6.4
(matching hardy, I think), and python 2.5 pyrex 0.9.8.5 and python 2.6
Cython 0.11.2

Which I think should cover everything.

Oh, and it also covers using 'gcc' vs 'cl'. Since 'gcc' supports strtoll
directly, while cl needs to use _strtoi64. (If you use _strtoi64 w/ gcc,
but don't define it, then it auto-defines it to return a simple 'int'
which is only 32 bits.)

So I'm pretty sure this will handle what we need. Babune can tell me if
I missed anything, I guess. :)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxy75gACgkQJdeBCYSNAAPdvwCgifspGLbTe5vaHvCvE0Y+P3Oi
MQYAnjuxPkiBcKI18jnxNfznp0zmKmT7
=pU3B
-----END PGP SIGNATURE-----

John A Meinel (jameinel) wrote :

sent to pqm by email

lp:~jameinel/bzr/2.3-btree-chk-leaf updated on 2010-08-24
5394. By John A Meinel on 2010-08-24

Handle test_source and extensions. Also define an 'extern' protocol, to allow
the test suite to recognize that returning an object of that type is a Python object.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-08-21 17:09:54 +0000
3+++ NEWS 2010-08-24 19:24:49 +0000
4@@ -104,6 +104,13 @@
5 will be backed up (adding an extention of the form .~#~).
6 (Marius Kruger, #400554)
7
8+* Decrease memory consumption when many chk index pages are loaded. (Such
9+ as during ``bzr co`` or ``bzr ls -R`` of a large tree.) Often we need to
10+ read many chk pages because the individual chk map nodes will be spread
11+ randomly. Peak memory for 'bzr ls -R' on a large tree dropped from 396MB
12+ down to 247MB, expect even more significant savings on 64-bit platforms.
13+ (John Arbash Meinel)
14+
15 * Inventory entries now consume less memory (on 32-bit Ubuntu file entries
16 have dropped from 68 bytes to 40, and directory entries from 120 bytes
17 to 48). (Andrew Bennetts)
18
19=== modified file 'bzrlib/_btree_serializer_pyx.pyx'
20--- bzrlib/_btree_serializer_pyx.pyx 2010-02-17 17:11:16 +0000
21+++ bzrlib/_btree_serializer_pyx.pyx 2010-08-24 19:24:49 +0000
22@@ -33,6 +33,7 @@
23 char *PyString_AsString(object p) except NULL
24 object PyString_FromStringAndSize(char *, Py_ssize_t)
25 PyObject *PyString_FromStringAndSize_ptr "PyString_FromStringAndSize" (char *, Py_ssize_t)
26+ object PyString_FromFormat(char *, ...)
27 int PyString_CheckExact(object s)
28 int PyString_CheckExact_ptr "PyString_CheckExact" (PyObject *)
29 Py_ssize_t PyString_Size(object p)
30@@ -49,19 +50,31 @@
31 PyObject *PyTuple_GET_ITEM_ptr_object "PyTuple_GET_ITEM" (object tpl, int index)
32 void Py_INCREF(object)
33 void Py_DECREF_ptr "Py_DECREF" (PyObject *)
34+ void *PyMem_Malloc(size_t nbytes)
35+ void PyMem_Free(void *)
36+ void memset(void *, int, size_t)
37
38 cdef extern from "string.h":
39 void *memcpy(void *dest, void *src, size_t n)
40 void *memchr(void *s, int c, size_t n)
41+ int memcmp(void *s1, void *s2, size_t n)
42 # GNU extension
43 # void *memrchr(void *s, int c, size_t n)
44 int strncmp(char *s1, char *s2, size_t n)
45+ unsigned long strtoul(char *s1, char **out, int base)
46+ long long strtoll(char *s1, char **out, int base)
47+
48
49 # It seems we need to import the definitions so that the pyrex compiler has
50 # local names to access them.
51 from _static_tuple_c cimport StaticTuple, \
52 import_static_tuple_c, StaticTuple_New, \
53- StaticTuple_Intern, StaticTuple_SET_ITEM, StaticTuple_CheckExact
54+ StaticTuple_Intern, StaticTuple_SET_ITEM, StaticTuple_CheckExact, \
55+ StaticTuple_GET_SIZE, StaticTuple_GET_ITEM
56+# This tells the test infrastructure that StaticTuple is a class, so we don't
57+# have to worry about exception checking.
58+## extern cdef class StaticTuple
59+import sys
60
61
62 # TODO: Find some way to import this from _dirstate_helpers
63@@ -103,7 +116,6 @@
64 Py_DECREF_ptr(py_str)
65 return result
66
67-from bzrlib import _static_tuple_c
68 # This sets up the StaticTuple C_API functionality
69 import_static_tuple_c()
70
71@@ -315,6 +327,537 @@
72 return parser.parse()
73
74
75+# TODO: We can go from 8 byte offset + 4 byte length to a simple lookup,
76+# because the block_offset + length is likely to be repeated. However,
77+# the big win there is to cache across pages, and not just one page
78+# Though if we did cache in a page, we could certainly use a short int.
79+# And this goes from 40 bytes to 30 bytes.
80+# One slightly ugly option would be to cache block offsets in a global.
81+# However, that leads to thread-safety issues, etc.
82+ctypedef struct gc_chk_sha1_record:
83+ long long block_offset
84+ unsigned int block_length
85+ unsigned int record_start
86+ unsigned int record_end
87+ char sha1[20]
88+
89+
90+cdef int _unhexbuf[256]
91+cdef char *_hexbuf
92+_hexbuf = '0123456789abcdef'
93+
94+cdef _populate_unhexbuf():
95+ cdef int i
96+ for i from 0 <= i < 256:
97+ _unhexbuf[i] = -1
98+ for i from 0 <= i < 10: # 0123456789 => map to the raw number
99+ _unhexbuf[(i + c'0')] = i
100+ for i from 10 <= i < 16: # abcdef => 10, 11, 12, 13, 14, 15, 16
101+ _unhexbuf[(i - 10 + c'a')] = i
102+ for i from 10 <= i < 16: # ABCDEF => 10, 11, 12, 13, 14, 15, 16
103+ _unhexbuf[(i - 10 + c'A')] = i
104+_populate_unhexbuf()
105+
106+
107+cdef int _unhexlify_sha1(char *as_hex, char *as_bin): # cannot_raise
108+ """Take the hex sha1 in as_hex and make it binary in as_bin
109+
110+ Same as binascii.unhexlify, but working on C strings, not Python objects.
111+ """
112+ cdef int top
113+ cdef int bot
114+ cdef int i, j
115+ cdef char *cur
116+
117+ # binascii does this using isupper() and tolower() and ?: syntax. I'm
118+ # guessing a simple lookup array should be faster.
119+ j = 0
120+ for i from 0 <= i < 20:
121+ top = _unhexbuf[<unsigned char>(as_hex[j])]
122+ j = j + 1
123+ bot = _unhexbuf[<unsigned char>(as_hex[j])]
124+ j = j + 1
125+ if top == -1 or bot == -1:
126+ return 0
127+ as_bin[i] = <unsigned char>((top << 4) + bot);
128+ return 1
129+
130+
131+def _py_unhexlify(as_hex):
132+ """For the test infrastructure, just thunks to _unhexlify_sha1"""
133+ if len(as_hex) != 40 or not PyString_CheckExact(as_hex):
134+ raise ValueError('not a 40-byte hex digest')
135+ as_bin = PyString_FromStringAndSize(NULL, 20)
136+ if _unhexlify_sha1(PyString_AS_STRING(as_hex), PyString_AS_STRING(as_bin)):
137+ return as_bin
138+ return None
139+
140+
141+cdef void _hexlify_sha1(char *as_bin, char *as_hex): # cannot_raise
142+ cdef int i, j
143+ cdef char c
144+
145+ j = 0
146+ for i from 0 <= i < 20:
147+ c = as_bin[i]
148+ as_hex[j] = _hexbuf[(c>>4)&0xf]
149+ j = j + 1
150+ as_hex[j] = _hexbuf[(c)&0xf]
151+ j = j + 1
152+
153+
154+def _py_hexlify(as_bin):
155+ """For test infrastructure, thunk to _hexlify_sha1"""
156+ if len(as_bin) != 20 or not PyString_CheckExact(as_bin):
157+ raise ValueError('not a 20-byte binary digest')
158+ as_hex = PyString_FromStringAndSize(NULL, 40)
159+ _hexlify_sha1(PyString_AS_STRING(as_bin), PyString_AS_STRING(as_hex))
160+ return as_hex
161+
162+
163+cdef int _key_to_sha1(key, char *sha1): # cannot_raise
164+ """Map a key into its sha1 content.
165+
166+ :param key: A tuple of style ('sha1:abcd...',)
167+ :param sha1: A char buffer of 20 bytes
168+ :return: 1 if this could be converted, 0 otherwise
169+ """
170+ cdef char *c_val
171+ cdef PyObject *p_val
172+
173+ if StaticTuple_CheckExact(key) and StaticTuple_GET_SIZE(key) == 1:
174+ p_val = <PyObject *>StaticTuple_GET_ITEM(key, 0)
175+ elif (PyTuple_CheckExact(key) and PyTuple_GET_SIZE(key) == 1):
176+ p_val = PyTuple_GET_ITEM_ptr_object(key, 0)
177+ else:
178+ # Not a tuple or a StaticTuple
179+ return 0
180+ if (PyString_CheckExact_ptr(p_val) and PyString_GET_SIZE_ptr(p_val) == 45):
181+ c_val = PyString_AS_STRING_ptr(p_val)
182+ else:
183+ return 0
184+ if strncmp(c_val, 'sha1:', 5) != 0:
185+ return 0
186+ if not _unhexlify_sha1(c_val + 5, sha1):
187+ return 0
188+ return 1
189+
190+
191+def _py_key_to_sha1(key):
192+ """Map a key to a simple sha1 string.
193+
194+ This is a testing thunk to the C function.
195+ """
196+ as_bin_sha = PyString_FromStringAndSize(NULL, 20)
197+ if _key_to_sha1(key, PyString_AS_STRING(as_bin_sha)):
198+ return as_bin_sha
199+ return None
200+
201+
202+cdef StaticTuple _sha1_to_key(char *sha1):
203+ """Compute a ('sha1:abcd',) key for a given sha1."""
204+ cdef StaticTuple key
205+ cdef object hexxed
206+ cdef char *c_buf
207+ hexxed = PyString_FromStringAndSize(NULL, 45)
208+ c_buf = PyString_AS_STRING(hexxed)
209+ memcpy(c_buf, 'sha1:', 5)
210+ _hexlify_sha1(sha1, c_buf+5)
211+ key = StaticTuple_New(1)
212+ Py_INCREF(hexxed)
213+ StaticTuple_SET_ITEM(key, 0, hexxed)
214+ # This is a bit expensive. To parse 120 keys takes 48us, to return them all
215+ # can be done in 66.6us (so 18.6us to build them all).
216+ # Adding simple hash() here brings it to 76.6us (so computing the hash
217+ # value of 120keys is 10us), Intern is 86.9us (another 10us to look and add
218+ # them to the intern structure.)
219+ # However, since we only intern keys that are in active use, it is probably
220+ # a win. Since they would have been read from elsewhere anyway.
221+ # We *could* hang the PyObject form off of the gc_chk_sha1_record for ones
222+ # that we have deserialized. Something to think about, at least.
223+ key = StaticTuple_Intern(key)
224+ return key
225+
226+
227+def _py_sha1_to_key(sha1_bin):
228+ """Test thunk to check the sha1 mapping."""
229+ if not PyString_CheckExact(sha1_bin) or PyString_GET_SIZE(sha1_bin) != 20:
230+ raise ValueError('sha1_bin must be a str of exactly 20 bytes')
231+ return _sha1_to_key(PyString_AS_STRING(sha1_bin))
232+
233+
234+cdef unsigned int _sha1_to_uint(char *sha1): # cannot_raise
235+ cdef unsigned int val
236+ # Must be in MSB, because that is how the content is sorted
237+ val = (((<unsigned int>(sha1[0]) & 0xff) << 24)
238+ | ((<unsigned int>(sha1[1]) & 0xff) << 16)
239+ | ((<unsigned int>(sha1[2]) & 0xff) << 8)
240+ | ((<unsigned int>(sha1[3]) & 0xff) << 0))
241+ return val
242+
243+
244+cdef _format_record_py24(gc_chk_sha1_record *record):
245+ """Python2.4 PyString_FromFormat doesn't have %u.
246+
247+ It only has %d and %ld. We would really like to even have %llu, which
248+ is only in python2.7. So we go back into casting to regular objects.
249+ """
250+ return "%s %s %s %s" % (record.block_offset, record.block_length,
251+ record.record_start, record.record_end)
252+
253+
254+cdef _format_record(gc_chk_sha1_record *record):
255+ # This is inefficient to go from a logical state back to a
256+ # string, but it makes things work a bit better internally for now.
257+ if record.block_offset >= 0xFFFFFFFF:
258+ # %llu is what we really want, but unfortunately it was only added
259+ # in python 2.7... :(
260+ block_offset_str = str(record.block_offset)
261+ value = PyString_FromFormat('%s %lu %lu %lu',
262+ PyString_AS_STRING(block_offset_str),
263+ record.block_length,
264+ record.record_start, record.record_end)
265+ else:
266+ value = PyString_FromFormat('%lu %lu %lu %lu',
267+ <unsigned long>record.block_offset,
268+ record.block_length,
269+ record.record_start, record.record_end)
270+ return value
271+
272+ctypedef object (*formatproc)(gc_chk_sha1_record *)
273+cdef formatproc _record_formatter
274+_record_formatter = _format_record
275+if sys.version_info[:2] == (2, 4):
276+ _record_formatter = _format_record_py24
277+
278+
279+cdef class GCCHKSHA1LeafNode:
280+ """Track all the entries for a given leaf node."""
281+
282+ cdef gc_chk_sha1_record *records
283+ cdef public object last_key
284+ cdef gc_chk_sha1_record *last_record
285+ cdef public int num_records
286+ # This is the number of bits to shift to get to the interesting byte. A
287+ # value of 24 means that the very first byte changes across all keys.
288+ # Anything else means that there is a common prefix of bits that we can
289+ # ignore. 0 means that at least the first 3 bytes are identical, though
290+ # that is going to be very rare
291+ cdef public unsigned char common_shift
292+ # This maps an interesting byte to the first record that matches.
293+ # Equivalent to bisect.bisect_left(self.records, sha1), though only taking
294+ # into account that one byte.
295+ cdef unsigned char offsets[257]
296+
297+ def __sizeof__(self):
298+ # :( Why doesn't Pyrex let me do a simple sizeof(GCCHKSHA1LeafNode)
299+ # like Cython? Explicitly enumerating everything here seems to leave my
300+ # size off by 2 (286 bytes vs 288 bytes actual). I'm guessing it is an
301+ # alignment/padding issue. Oh well- at least we scale properly with
302+ # num_records and are very close to correct, which is what I care
303+ # about.
304+ # If we ever decide to require cython:
305+ # return (sizeof(GCCHKSHA1LeafNode)
306+ # + sizeof(gc_chk_sha1_record)*self.num_records)
307+ return (sizeof(PyObject) + sizeof(void*) + sizeof(int)
308+ + sizeof(gc_chk_sha1_record*) + sizeof(PyObject *)
309+ + sizeof(gc_chk_sha1_record*) + sizeof(char)
310+ + sizeof(unsigned char)*257
311+ + sizeof(gc_chk_sha1_record)*self.num_records)
312+
313+ def __dealloc__(self):
314+ if self.records != NULL:
315+ PyMem_Free(self.records)
316+ self.records = NULL
317+
318+ def __init__(self, bytes):
319+ self._parse_bytes(bytes)
320+ self.last_key = None
321+ self.last_record = NULL
322+
323+ property min_key:
324+ def __get__(self):
325+ if self.num_records > 0:
326+ return _sha1_to_key(self.records[0].sha1)
327+ return None
328+
329+ property max_key:
330+ def __get__(self):
331+ if self.num_records > 0:
332+ return _sha1_to_key(self.records[self.num_records-1].sha1)
333+ return None
334+
335+ cdef StaticTuple _record_to_value_and_refs(self,
336+ gc_chk_sha1_record *record):
337+ """Extract the refs and value part of this record."""
338+ cdef StaticTuple value_and_refs
339+ cdef StaticTuple empty
340+ value_and_refs = StaticTuple_New(2)
341+ value = _record_formatter(record)
342+ Py_INCREF(value)
343+ StaticTuple_SET_ITEM(value_and_refs, 0, value)
344+ # Always empty refs
345+ empty = StaticTuple_New(0)
346+ Py_INCREF(empty)
347+ StaticTuple_SET_ITEM(value_and_refs, 1, empty)
348+ return value_and_refs
349+
350+ cdef StaticTuple _record_to_item(self, gc_chk_sha1_record *record):
351+ """Turn a given record back into a fully fledged item.
352+ """
353+ cdef StaticTuple item
354+ cdef StaticTuple key
355+ cdef StaticTuple value_and_refs
356+ cdef object value
357+ key = _sha1_to_key(record.sha1)
358+ item = StaticTuple_New(2)
359+ Py_INCREF(key)
360+ StaticTuple_SET_ITEM(item, 0, key)
361+ value_and_refs = self._record_to_value_and_refs(record)
362+ Py_INCREF(value_and_refs)
363+ StaticTuple_SET_ITEM(item, 1, value_and_refs)
364+ return item
365+
366+ cdef gc_chk_sha1_record* _lookup_record(self, char *sha1) except? NULL:
367+ """Find a gc_chk_sha1_record that matches the sha1 supplied."""
368+ cdef int lo, hi, mid, the_cmp
369+ cdef int offset
370+
371+ # TODO: We can speed up misses by comparing this sha1 to the common
372+ # bits, and seeing if the common prefix matches, if not, we don't
373+ # need to search for anything because it cannot match
374+ # Use the offset array to find the closest fit for this entry
375+ # follow that up with bisecting, since multiple keys can be in one
376+ # spot
377+ # Bisecting dropped us from 7000 comparisons to 582 (4.8/key), using
378+ # the offset array dropped us from 23us to 20us and 156 comparisions
379+ # (1.3/key)
380+ offset = self._offset_for_sha1(sha1)
381+ lo = self.offsets[offset]
382+ hi = self.offsets[offset+1]
383+ if hi == 255:
384+ # if hi == 255 that means we potentially ran off the end of the
385+ # list, so push it up to num_records
386+ # note that if 'lo' == 255, that is ok, because we can start
387+ # searching from that part of the list.
388+ hi = self.num_records
389+ local_n_cmp = 0
390+ while lo < hi:
391+ mid = (lo + hi) / 2
392+ the_cmp = memcmp(self.records[mid].sha1, sha1, 20)
393+ if the_cmp == 0:
394+ return &self.records[mid]
395+ elif the_cmp < 0:
396+ lo = mid + 1
397+ else:
398+ hi = mid
399+ return NULL
400+
401+ def __contains__(self, key):
402+ cdef char sha1[20]
403+ cdef gc_chk_sha1_record *record
404+ if _key_to_sha1(key, sha1):
405+ # If it isn't a sha1 key, then it won't be in this leaf node
406+ record = self._lookup_record(sha1)
407+ if record != NULL:
408+ self.last_key = key
409+ self.last_record = record
410+ return True
411+ return False
412+
413+ def __getitem__(self, key):
414+ cdef char sha1[20]
415+ cdef gc_chk_sha1_record *record
416+ record = NULL
417+ if self.last_record != NULL and key is self.last_key:
418+ record = self.last_record
419+ elif _key_to_sha1(key, sha1):
420+ record = self._lookup_record(sha1)
421+ if record == NULL:
422+ raise KeyError('key %r is not present' % (key,))
423+ return self._record_to_value_and_refs(record)
424+
425+ def __len__(self):
426+ return self.num_records
427+
428+ def all_keys(self):
429+ cdef int i
430+ result = []
431+ for i from 0 <= i < self.num_records:
432+ PyList_Append(result, _sha1_to_key(self.records[i].sha1))
433+ return result
434+
435+ def all_items(self):
436+ cdef int i
437+ result = []
438+ for i from 0 <= i < self.num_records:
439+ item = self._record_to_item(&self.records[i])
440+ PyList_Append(result, item)
441+ return result
442+
443+ cdef int _count_records(self, char *c_content, char *c_end): # cannot_raise
444+ """Count how many records are in this section."""
445+ cdef char *c_cur
446+ cdef int num_records
447+
448+ c_cur = c_content
449+ num_records = 0
450+ while c_cur != NULL and c_cur < c_end:
451+ c_cur = <char *>memchr(c_cur, c'\n', c_end - c_cur);
452+ if c_cur == NULL:
453+ break
454+ c_cur = c_cur + 1
455+ num_records = num_records + 1
456+ return num_records
457+
458+ cdef _parse_bytes(self, bytes):
459+ """Parse the string 'bytes' into content."""
460+ cdef char *c_bytes
461+ cdef char *c_cur
462+ cdef char *c_end
463+ cdef Py_ssize_t n_bytes
464+ cdef int num_records
465+ cdef int entry
466+ cdef gc_chk_sha1_record *cur_record
467+
468+ if not PyString_CheckExact(bytes):
469+ raise TypeError('We only support parsing plain 8-bit strings.')
470+ # Pass 1, count how many records there will be
471+ n_bytes = PyString_GET_SIZE(bytes)
472+ c_bytes = PyString_AS_STRING(bytes)
473+ c_end = c_bytes + n_bytes
474+ if strncmp(c_bytes, 'type=leaf\n', 10):
475+ raise ValueError("bytes did not start with 'type=leaf\\n': %r"
476+ % (bytes[:10],))
477+ c_cur = c_bytes + 10
478+ num_records = self._count_records(c_cur, c_end)
479+ # Now allocate the memory for these items, and go to town
480+ self.records = <gc_chk_sha1_record*>PyMem_Malloc(num_records *
481+ (sizeof(unsigned short) + sizeof(gc_chk_sha1_record)))
482+ self.num_records = num_records
483+ cur_record = self.records
484+ entry = 0
485+ while c_cur != NULL and c_cur < c_end and entry < num_records:
486+ c_cur = self._parse_one_entry(c_cur, c_end, cur_record)
487+ cur_record = cur_record + 1
488+ entry = entry + 1
489+ if (entry != self.num_records
490+ or c_cur != c_end
491+ or cur_record != self.records + self.num_records):
492+ raise ValueError('Something went wrong while parsing.')
493+ # Pass 3: build the offset map
494+ self._compute_common()
495+
496+ cdef char *_parse_one_entry(self, char *c_cur, char *c_end,
497+ gc_chk_sha1_record *cur_record) except NULL:
498+ """Read a single sha record from the bytes.
499+ :param c_cur: The pointer to the start of bytes
500+ :param cur_record:
501+ """
502+ cdef char *c_next
503+ if strncmp(c_cur, 'sha1:', 5):
504+ raise ValueError('line did not start with sha1: %r'
505+ % (safe_string_from_size(c_cur, 10),))
506+ c_cur = c_cur + 5
507+ c_next = <char *>memchr(c_cur, c'\0', c_end - c_cur)
508+ if c_next == NULL or (c_next - c_cur != 40):
509+ raise ValueError('Line did not contain 40 hex bytes')
510+ if not _unhexlify_sha1(c_cur, cur_record.sha1):
511+ raise ValueError('We failed to unhexlify')
512+ c_cur = c_next + 1
513+ if c_cur[0] != c'\0':
514+ raise ValueError('only 1 null, not 2 as expected')
515+ c_cur = c_cur + 1
516+ cur_record.block_offset = strtoll(c_cur, &c_next, 10)
517+ if c_cur == c_next or c_next[0] != c' ':
518+ raise ValueError('Failed to parse block offset')
519+ c_cur = c_next + 1
520+ cur_record.block_length = strtoul(c_cur, &c_next, 10)
521+ if c_cur == c_next or c_next[0] != c' ':
522+ raise ValueError('Failed to parse block length')
523+ c_cur = c_next + 1
524+ cur_record.record_start = strtoul(c_cur, &c_next, 10)
525+ if c_cur == c_next or c_next[0] != c' ':
526+ raise ValueError('Failed to parse block length')
527+ c_cur = c_next + 1
528+ cur_record.record_end = strtoul(c_cur, &c_next, 10)
529+ if c_cur == c_next or c_next[0] != c'\n':
530+ raise ValueError('Failed to parse record end')
531+ c_cur = c_next + 1
532+ return c_cur
533+
534+ cdef int _offset_for_sha1(self, char *sha1) except -1:
535+ """Find the first interesting 8-bits of this sha1."""
536+ cdef int this_offset
537+ cdef unsigned int as_uint
538+ as_uint = _sha1_to_uint(sha1)
539+ this_offset = (as_uint >> self.common_shift) & 0xFF
540+ return this_offset
541+
542+ def _get_offset_for_sha1(self, sha1):
543+ return self._offset_for_sha1(PyString_AS_STRING(sha1))
544+
545+ cdef _compute_common(self):
546+ cdef unsigned int first
547+ cdef unsigned int this
548+ cdef unsigned int common_mask
549+ cdef unsigned char common_shift
550+ cdef int i
551+ cdef int offset, this_offset
552+ cdef int max_offset
553+ # The idea with the offset map is that we should be able to quickly
554+ # jump to the key that matches a gives sha1. We know that the keys are
555+ # in sorted order, and we know that a lot of the prefix is going to be
556+ # the same across them.
557+ # By XORing the records together, we can determine what bits are set in
558+ # all of them
559+ if self.num_records < 2:
560+ # Everything is in common if you have 0 or 1 leaves
561+ # So we'll always just shift to the first byte
562+ self.common_shift = 24
563+ else:
564+ common_mask = 0xFFFFFFFF
565+ first = _sha1_to_uint(self.records[0].sha1)
566+ for i from 0 < i < self.num_records:
567+ this = _sha1_to_uint(self.records[i].sha1)
568+ common_mask = (~(first ^ this)) & common_mask
569+ common_shift = 24
570+ while common_mask & 0x80000000 and common_shift > 0:
571+ common_mask = common_mask << 1
572+ common_shift = common_shift - 1
573+ self.common_shift = common_shift
574+ offset = 0
575+ max_offset = self.num_records
576+ # We cap this loop at 254 records. All the other offsets just get
577+ # filled with 0xff as the singleton saying 'too many'.
578+ # It means that if we have >255 records we have to bisect the second
579+ # half of the list, but this is going to be very rare in practice.
580+ if max_offset > 255:
581+ max_offset = 255
582+ for i from 0 <= i < max_offset:
583+ this_offset = self._offset_for_sha1(self.records[i].sha1)
584+ while offset <= this_offset:
585+ self.offsets[offset] = i
586+ offset = offset + 1
587+ while offset < 257:
588+ self.offsets[offset] = max_offset
589+ offset = offset + 1
590+
591+ def _get_offsets(self):
592+ cdef int i
593+ result = []
594+ for i from 0 <= i < 257:
595+ PyList_Append(result, self.offsets[i])
596+ return result
597+
598+
599+def _parse_into_chk(bytes, key_length, ref_list_length):
600+ """Parse into a format optimized for chk records."""
601+ assert key_length == 1
602+ assert ref_list_length == 0
603+ return GCCHKSHA1LeafNode(bytes)
604+
605+
606 def _flatten_node(node, reference_lists):
607 """Convert a node into the serialized form.
608
609
610=== modified file 'bzrlib/_static_tuple_c.pxd'
611--- bzrlib/_static_tuple_c.pxd 2010-05-11 14:13:31 +0000
612+++ bzrlib/_static_tuple_c.pxd 2010-08-24 19:24:49 +0000
613@@ -33,6 +33,7 @@
614 int import_static_tuple_c() except -1
615 StaticTuple StaticTuple_New(Py_ssize_t)
616 StaticTuple StaticTuple_Intern(StaticTuple)
617+ StaticTuple StaticTuple_FromSequence(object)
618
619 # Steals a reference and val must be a valid type, no checking is done
620 void StaticTuple_SET_ITEM(StaticTuple key, Py_ssize_t offset, object val)
621
622=== modified file 'bzrlib/btree_index.py'
623--- bzrlib/btree_index.py 2010-06-20 11:18:38 +0000
624+++ bzrlib/btree_index.py 2010-08-24 19:24:49 +0000
625@@ -602,10 +602,10 @@
626 """In memory index's have no known corruption at the moment."""
627
628
629-class _LeafNode(object):
630+class _LeafNode(dict):
631 """A leaf node for a serialised B+Tree index."""
632
633- __slots__ = ('keys', 'min_key', 'max_key')
634+ __slots__ = ('min_key', 'max_key', '_keys')
635
636 def __init__(self, bytes, key_length, ref_list_length):
637 """Parse bytes to create a leaf node object."""
638@@ -617,7 +617,20 @@
639 self.max_key = key_list[-1][0]
640 else:
641 self.min_key = self.max_key = None
642- self.keys = dict(key_list)
643+ super(_LeafNode, self).__init__(key_list)
644+ self._keys = dict(self)
645+
646+ def all_items(self):
647+ """Return a sorted list of (key, (value, refs)) items"""
648+ items = self.items()
649+ items.sort()
650+ return items
651+
652+ def all_keys(self):
653+ """Return a sorted list of all keys."""
654+ keys = self.keys()
655+ keys.sort()
656+ return keys
657
658
659 class _InternalNode(object):
660@@ -672,6 +685,7 @@
661 self._recommended_pages = self._compute_recommended_pages()
662 self._root_node = None
663 self._base_offset = offset
664+ self._leaf_factory = _LeafNode
665 # Default max size is 100,000 leave values
666 self._leaf_value_cache = None # lru_cache.LRUCache(100*1000)
667 if unlimited_cache:
668@@ -950,7 +964,7 @@
669 """Cache directly from key => value, skipping the btree."""
670 if self._leaf_value_cache is not None:
671 for node in nodes.itervalues():
672- for key, value in node.keys.iteritems():
673+ for key, value in node.all_items():
674 if key in self._leaf_value_cache:
675 # Don't add the rest of the keys, we've seen this node
676 # before.
677@@ -980,10 +994,10 @@
678 if self._row_offsets[-1] == 1:
679 # There is only the root node, and we read that via key_count()
680 if self.node_ref_lists:
681- for key, (value, refs) in sorted(self._root_node.keys.items()):
682+ for key, (value, refs) in self._root_node.all_items():
683 yield (self, key, value, refs)
684 else:
685- for key, (value, refs) in sorted(self._root_node.keys.items()):
686+ for key, (value, refs) in self._root_node.all_items():
687 yield (self, key, value)
688 return
689 start_of_leaves = self._row_offsets[-2]
690@@ -999,11 +1013,11 @@
691 # for spilling index builds to disk.
692 if self.node_ref_lists:
693 for _, node in nodes:
694- for key, (value, refs) in sorted(node.keys.items()):
695+ for key, (value, refs) in node.all_items():
696 yield (self, key, value, refs)
697 else:
698 for _, node in nodes:
699- for key, (value, refs) in sorted(node.keys.items()):
700+ for key, (value, refs) in node.all_items():
701 yield (self, key, value)
702
703 @staticmethod
704@@ -1170,8 +1184,8 @@
705 continue
706 node = nodes[node_index]
707 for next_sub_key in sub_keys:
708- if next_sub_key in node.keys:
709- value, refs = node.keys[next_sub_key]
710+ if next_sub_key in node:
711+ value, refs = node[next_sub_key]
712 if self.node_ref_lists:
713 yield (self, next_sub_key, value, refs)
714 else:
715@@ -1245,14 +1259,13 @@
716 # sub_keys is all of the keys we are looking for that should exist
717 # on this page, if they aren't here, then they won't be found
718 node = nodes[node_index]
719- node_keys = node.keys
720 parents_to_check = set()
721 for next_sub_key in sub_keys:
722- if next_sub_key not in node_keys:
723+ if next_sub_key not in node:
724 # This one is just not present in the index at all
725 missing_keys.add(next_sub_key)
726 else:
727- value, refs = node_keys[next_sub_key]
728+ value, refs = node[next_sub_key]
729 parent_keys = refs[ref_list_num]
730 parent_map[next_sub_key] = parent_keys
731 parents_to_check.update(parent_keys)
732@@ -1265,8 +1278,8 @@
733 while parents_to_check:
734 next_parents_to_check = set()
735 for key in parents_to_check:
736- if key in node_keys:
737- value, refs = node_keys[key]
738+ if key in node:
739+ value, refs = node[key]
740 parent_keys = refs[ref_list_num]
741 parent_map[key] = parent_keys
742 next_parents_to_check.update(parent_keys)
743@@ -1546,7 +1559,8 @@
744 continue
745 bytes = zlib.decompress(data)
746 if bytes.startswith(_LEAF_FLAG):
747- node = _LeafNode(bytes, self._key_length, self.node_ref_lists)
748+ node = self._leaf_factory(bytes, self._key_length,
749+ self.node_ref_lists)
750 elif bytes.startswith(_INTERNAL_FLAG):
751 node = _InternalNode(bytes)
752 else:
753@@ -1571,8 +1585,11 @@
754 pass
755
756
757+_gcchk_factory = _LeafNode
758+
759 try:
760 from bzrlib import _btree_serializer_pyx as _btree_serializer
761+ _gcchk_factory = _btree_serializer._parse_into_chk
762 except ImportError, e:
763 osutils.failed_to_load_extension(e)
764 from bzrlib import _btree_serializer_py as _btree_serializer
765
766=== modified file 'bzrlib/clean_tree.py'
767--- bzrlib/clean_tree.py 2010-08-21 03:42:26 +0000
768+++ bzrlib/clean_tree.py 2010-08-24 19:24:49 +0000
769@@ -1,4 +1,4 @@
770-# Copyright (C) 2005 Canonical Ltd
771+# Copyright (C) 2009, 2010 Canonical Ltd
772 #
773 # This program is free software; you can redistribute it and/or modify
774 # it under the terms of the GNU General Public License as published by
775
776=== modified file 'bzrlib/generate_ids.py'
777--- bzrlib/generate_ids.py 2010-08-10 19:17:49 +0000
778+++ bzrlib/generate_ids.py 2010-08-24 19:24:49 +0000
779@@ -1,4 +1,4 @@
780-# Copyright (C) 2006 Canonical Ltd
781+# Copyright (C) 2006, 2007, 2009, 2010 Canonical Ltd
782 #
783 # This program is free software; you can redistribute it and/or modify
784 # it under the terms of the GNU General Public License as published by
785
786=== modified file 'bzrlib/python-compat.h'
787--- bzrlib/python-compat.h 2009-10-21 20:53:21 +0000
788+++ bzrlib/python-compat.h 2010-08-24 19:24:49 +0000
789@@ -74,6 +74,9 @@
790
791 #ifdef _MSC_VER
792 #define snprintf _snprintf
793+/* gcc (mingw32) has strtoll, while the MSVC compiler uses _strtoi64 */
794+#define strtoll _strtoi64
795+#define strtoull _strtoui64
796 #endif
797
798 /* Introduced in Python 2.6 */
799
800=== modified file 'bzrlib/repofmt/pack_repo.py'
801--- bzrlib/repofmt/pack_repo.py 2010-05-11 08:36:16 +0000
802+++ bzrlib/repofmt/pack_repo.py 2010-08-24 19:24:49 +0000
803@@ -49,6 +49,7 @@
804 """)
805 from bzrlib import (
806 bzrdir,
807+ btree_index,
808 errors,
809 lockable_files,
810 lockdir,
811@@ -56,10 +57,6 @@
812 )
813
814 from bzrlib.decorators import needs_write_lock, only_raises
815-from bzrlib.btree_index import (
816- BTreeGraphIndex,
817- BTreeBuilder,
818- )
819 from bzrlib.index import (
820 GraphIndex,
821 InMemoryGraphIndex,
822@@ -231,11 +228,13 @@
823 unlimited_cache = False
824 if index_type == 'chk':
825 unlimited_cache = True
826- setattr(self, index_type + '_index',
827- self.index_class(self.index_transport,
828- self.index_name(index_type, self.name),
829- self.index_sizes[self.index_offset(index_type)],
830- unlimited_cache=unlimited_cache))
831+ index = self.index_class(self.index_transport,
832+ self.index_name(index_type, self.name),
833+ self.index_sizes[self.index_offset(index_type)],
834+ unlimited_cache=unlimited_cache)
835+ if index_type == 'chk':
836+ index._leaf_factory = btree_index._gcchk_factory
837+ setattr(self, index_type + '_index', index)
838
839
840 class ExistingPack(Pack):
841@@ -1680,7 +1679,7 @@
842 txt_index = self._make_index(name, '.tix')
843 sig_index = self._make_index(name, '.six')
844 if self.chk_index is not None:
845- chk_index = self._make_index(name, '.cix', unlimited_cache=True)
846+ chk_index = self._make_index(name, '.cix', is_chk=True)
847 else:
848 chk_index = None
849 result = ExistingPack(self._pack_transport, name, rev_index,
850@@ -1706,7 +1705,7 @@
851 sig_index = self._make_index(name, '.six', resume=True)
852 if self.chk_index is not None:
853 chk_index = self._make_index(name, '.cix', resume=True,
854- unlimited_cache=True)
855+ is_chk=True)
856 else:
857 chk_index = None
858 result = self.resumed_pack_factory(name, rev_index, inv_index,
859@@ -1742,7 +1741,7 @@
860 return self._index_class(self.transport, 'pack-names', None
861 ).iter_all_entries()
862
863- def _make_index(self, name, suffix, resume=False, unlimited_cache=False):
864+ def _make_index(self, name, suffix, resume=False, is_chk=False):
865 size_offset = self._suffix_offsets[suffix]
866 index_name = name + suffix
867 if resume:
868@@ -1751,8 +1750,11 @@
869 else:
870 transport = self._index_transport
871 index_size = self._names[name][size_offset]
872- return self._index_class(transport, index_name, index_size,
873- unlimited_cache=unlimited_cache)
874+ index = self._index_class(transport, index_name, index_size,
875+ unlimited_cache=is_chk)
876+ if is_chk and self._index_class is btree_index.BTreeGraphIndex:
877+ index._leaf_factory = btree_index._gcchk_factory
878+ return index
879
880 def _max_pack_count(self, total_revisions):
881 """Return the maximum number of packs to use for total revisions.
882@@ -2829,8 +2831,8 @@
883 _commit_builder_class = PackCommitBuilder
884 supports_external_lookups = True
885 # What index classes to use
886- index_builder_class = BTreeBuilder
887- index_class = BTreeGraphIndex
888+ index_builder_class = btree_index.BTreeBuilder
889+ index_class = btree_index.BTreeGraphIndex
890
891 @property
892 def _serializer(self):
893@@ -2865,8 +2867,8 @@
894 supports_tree_reference = False # no subtrees
895 supports_external_lookups = True
896 # What index classes to use
897- index_builder_class = BTreeBuilder
898- index_class = BTreeGraphIndex
899+ index_builder_class = btree_index.BTreeBuilder
900+ index_class = btree_index.BTreeGraphIndex
901
902 @property
903 def _serializer(self):
904@@ -2907,8 +2909,8 @@
905 supports_tree_reference = True
906 supports_external_lookups = True
907 # What index classes to use
908- index_builder_class = BTreeBuilder
909- index_class = BTreeGraphIndex
910+ index_builder_class = btree_index.BTreeBuilder
911+ index_class = btree_index.BTreeGraphIndex
912
913 @property
914 def _serializer(self):
915
916=== modified file 'bzrlib/tests/__init__.py'
917--- bzrlib/tests/__init__.py 2010-08-10 18:07:54 +0000
918+++ bzrlib/tests/__init__.py 2010-08-24 19:24:49 +0000
919@@ -3669,6 +3669,7 @@
920 'bzrlib.tests.per_workingtree',
921 'bzrlib.tests.test__annotator',
922 'bzrlib.tests.test__bencode',
923+ 'bzrlib.tests.test__btree_serializer',
924 'bzrlib.tests.test__chk_map',
925 'bzrlib.tests.test__dirstate_helpers',
926 'bzrlib.tests.test__groupcompress',
927
928=== added file 'bzrlib/tests/test__btree_serializer.py'
929--- bzrlib/tests/test__btree_serializer.py 1970-01-01 00:00:00 +0000
930+++ bzrlib/tests/test__btree_serializer.py 2010-08-24 19:24:49 +0000
931@@ -0,0 +1,305 @@
932+# Copyright (C) 2010 Canonical Ltd
933+#
934+# This program is free software; you can redistribute it and/or modify
935+# it under the terms of the GNU General Public License as published by
936+# the Free Software Foundation; either version 2 of the License, or
937+# (at your option) any later version.
938+#
939+# This program is distributed in the hope that it will be useful,
940+# but WITHOUT ANY WARRANTY; without even the implied warranty of
941+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
942+# GNU General Public License for more details.
943+#
944+# You should have received a copy of the GNU General Public License
945+# along with this program; if not, write to the Free Software
946+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
947+#
948+
949+"""Direct tests of the btree serializer extension"""
950+
951+import binascii
952+import bisect
953+
954+from bzrlib import tests
955+
956+from bzrlib.tests.test_btree_index import compiled_btreeparser_feature
957+
958+
959+class TestBtreeSerializer(tests.TestCase):
960+
961+ _test_needs_features = [compiled_btreeparser_feature]
962+
963+ def setUp(self):
964+ super(TestBtreeSerializer, self).setUp()
965+ self.module = compiled_btreeparser_feature.module
966+
967+
968+class TestHexAndUnhex(TestBtreeSerializer):
969+
970+ def assertHexlify(self, as_binary):
971+ self.assertEqual(binascii.hexlify(as_binary),
972+ self.module._py_hexlify(as_binary))
973+
974+ def assertUnhexlify(self, as_hex):
975+ ba_unhex = binascii.unhexlify(as_hex)
976+ mod_unhex = self.module._py_unhexlify(as_hex)
977+ if ba_unhex != mod_unhex:
978+ if mod_unhex is None:
979+ mod_hex = '<None>'
980+ else:
981+ mod_hex = binascii.hexlify(mod_unhex)
982+ self.fail('_py_unhexlify returned a different answer'
983+ ' from binascii:\n %s\n != %s'
984+ % (binascii.hexlify(ba_unhex), mod_hex))
985+
986+ def assertFailUnhexlify(self, as_hex):
987+ # Invalid hex content
988+ self.assertIs(None, self.module._py_unhexlify(as_hex))
989+
990+ def test_to_hex(self):
991+ raw_bytes = ''.join(map(chr, range(256)))
992+ for i in range(0, 240, 20):
993+ self.assertHexlify(raw_bytes[i:i+20])
994+ self.assertHexlify(raw_bytes[240:]+raw_bytes[0:4])
995+
996+ def test_from_hex(self):
997+ self.assertUnhexlify('0123456789abcdef0123456789abcdef01234567')
998+ self.assertUnhexlify('123456789abcdef0123456789abcdef012345678')
999+ self.assertUnhexlify('0123456789ABCDEF0123456789ABCDEF01234567')
1000+ self.assertUnhexlify('123456789ABCDEF0123456789ABCDEF012345678')
1001+ hex_chars = binascii.hexlify(''.join(map(chr, range(256))))
1002+ for i in range(0, 480, 40):
1003+ self.assertUnhexlify(hex_chars[i:i+40])
1004+ self.assertUnhexlify(hex_chars[480:]+hex_chars[0:8])
1005+
1006+ def test_from_invalid_hex(self):
1007+ self.assertFailUnhexlify('123456789012345678901234567890123456789X')
1008+ self.assertFailUnhexlify('12345678901234567890123456789012345678X9')
1009+
1010+
1011+_hex_form = '123456789012345678901234567890abcdefabcd'
1012+
1013+class Test_KeyToSha1(TestBtreeSerializer):
1014+
1015+ def assertKeyToSha1(self, expected, key):
1016+ if expected is None:
1017+ expected_bin = None
1018+ else:
1019+ expected_bin = binascii.unhexlify(expected)
1020+ actual_sha1 = self.module._py_key_to_sha1(key)
1021+ if expected_bin != actual_sha1:
1022+ actual_hex_sha1 = None
1023+ if actual_sha1 is not None:
1024+ actual_hex_sha1 = binascii.hexlify(actual_sha1)
1025+ self.fail('_key_to_sha1 returned:\n %s\n != %s'
1026+ % (actual_sha1, expected))
1027+
1028+ def test_simple(self):
1029+ self.assertKeyToSha1(_hex_form, ('sha1:' + _hex_form,))
1030+
1031+ def test_invalid_not_tuple(self):
1032+ self.assertKeyToSha1(None, _hex_form)
1033+ self.assertKeyToSha1(None, 'sha1:' + _hex_form)
1034+
1035+ def test_invalid_empty(self):
1036+ self.assertKeyToSha1(None, ())
1037+
1038+ def test_invalid_not_string(self):
1039+ self.assertKeyToSha1(None, (None,))
1040+ self.assertKeyToSha1(None, (list(_hex_form),))
1041+
1042+ def test_invalid_not_sha1(self):
1043+ self.assertKeyToSha1(None, (_hex_form,))
1044+ self.assertKeyToSha1(None, ('sha2:' + _hex_form,))
1045+
1046+ def test_invalid_not_hex(self):
1047+ self.assertKeyToSha1(None,
1048+ ('sha1:abcdefghijklmnopqrstuvwxyz12345678901234',))
1049+
1050+
1051+class Test_Sha1ToKey(TestBtreeSerializer):
1052+
1053+ def assertSha1ToKey(self, hex_sha1):
1054+ bin_sha1 = binascii.unhexlify(hex_sha1)
1055+ key = self.module._py_sha1_to_key(bin_sha1)
1056+ self.assertEqual(('sha1:' + hex_sha1,), key)
1057+
1058+ def test_simple(self):
1059+ self.assertSha1ToKey(_hex_form)
1060+
1061+
1062+_one_key_content = """type=leaf
1063+sha1:123456789012345678901234567890abcdefabcd\x00\x001 2 3 4
1064+"""
1065+
1066+_large_offsets = """type=leaf
1067+sha1:123456789012345678901234567890abcdefabcd\x00\x0012345678901 1234567890 0 1
1068+sha1:abcd123456789012345678901234567890abcdef\x00\x002147483648 2147483647 0 1
1069+sha1:abcdefabcd123456789012345678901234567890\x00\x004294967296 4294967295 4294967294 1
1070+"""
1071+
1072+_multi_key_content = """type=leaf
1073+sha1:c80c881d4a26984ddce795f6f71817c9cf4480e7\x00\x000 0 0 0
1074+sha1:c86f7e437faa5a7fce15d1ddcb9eaeaea377667b\x00\x001 1 1 1
1075+sha1:c8e240de74fb1ed08fa08d38063f6a6a91462a81\x00\x002 2 2 2
1076+sha1:cda39a3ee5e6b4b0d3255bfef95601890afd8070\x00\x003 3 3 3
1077+sha1:cdf51e37c269aa94d38f93e537bf6e2020b21406\x00\x004 4 4 4
1078+sha1:ce0c9035898dd52fc65c41454cec9c4d2611bfb3\x00\x005 5 5 5
1079+sha1:ce93b4e3c464ffd51732fbd6ded717e9efda28aa\x00\x006 6 6 6
1080+sha1:cf7a9e24777ec23212c54d7a350bc5bea5477fdb\x00\x007 7 7 7
1081+"""
1082+
1083+_multi_key_same_offset = """type=leaf
1084+sha1:080c881d4a26984ddce795f6f71817c9cf4480e7\x00\x000 0 0 0
1085+sha1:c86f7e437faa5a7fce15d1ddcb9eaeaea377667b\x00\x001 1 1 1
1086+sha1:cd0c9035898dd52fc65c41454cec9c4d2611bfb3\x00\x002 2 2 2
1087+sha1:cda39a3ee5e6b4b0d3255bfef95601890afd8070\x00\x003 3 3 3
1088+sha1:cde240de74fb1ed08fa08d38063f6a6a91462a81\x00\x004 4 4 4
1089+sha1:cdf51e37c269aa94d38f93e537bf6e2020b21406\x00\x005 5 5 5
1090+sha1:ce7a9e24777ec23212c54d7a350bc5bea5477fdb\x00\x006 6 6 6
1091+sha1:ce93b4e3c464ffd51732fbd6ded717e9efda28aa\x00\x007 7 7 7
1092+"""
1093+
1094+_common_32_bits = """type=leaf
1095+sha1:123456784a26984ddce795f6f71817c9cf4480e7\x00\x000 0 0 0
1096+sha1:1234567874fb1ed08fa08d38063f6a6a91462a81\x00\x001 1 1 1
1097+sha1:12345678777ec23212c54d7a350bc5bea5477fdb\x00\x002 2 2 2
1098+sha1:123456787faa5a7fce15d1ddcb9eaeaea377667b\x00\x003 3 3 3
1099+sha1:12345678898dd52fc65c41454cec9c4d2611bfb3\x00\x004 4 4 4
1100+sha1:12345678c269aa94d38f93e537bf6e2020b21406\x00\x005 5 5 5
1101+sha1:12345678c464ffd51732fbd6ded717e9efda28aa\x00\x006 6 6 6
1102+sha1:12345678e5e6b4b0d3255bfef95601890afd8070\x00\x007 7 7 7
1103+"""
1104+
1105+
1106+class TestGCCKHSHA1LeafNode(TestBtreeSerializer):
1107+
1108+ def assertInvalid(self, bytes):
1109+ """Ensure that we get a proper error when trying to parse invalid bytes.
1110+
1111+ (mostly this is testing that bad input doesn't cause us to segfault)
1112+ """
1113+ self.assertRaises((ValueError, TypeError),
1114+ self.module._parse_into_chk, bytes, 1, 0)
1115+
1116+ def test_non_str(self):
1117+ self.assertInvalid(u'type=leaf\n')
1118+
1119+ def test_not_leaf(self):
1120+ self.assertInvalid('type=internal\n')
1121+
1122+ def test_empty_leaf(self):
1123+ leaf = self.module._parse_into_chk('type=leaf\n', 1, 0)
1124+ self.assertEqual(0, len(leaf))
1125+ self.assertEqual([], leaf.all_items())
1126+ self.assertEqual([], leaf.all_keys())
1127+ # It should allow any key to be queried
1128+ self.assertFalse(('key',) in leaf)
1129+
1130+ def test_one_key_leaf(self):
1131+ leaf = self.module._parse_into_chk(_one_key_content, 1, 0)
1132+ self.assertEqual(1, len(leaf))
1133+ sha_key = ('sha1:' + _hex_form,)
1134+ self.assertEqual([sha_key], leaf.all_keys())
1135+ self.assertEqual([(sha_key, ('1 2 3 4', ()))], leaf.all_items())
1136+ self.assertTrue(sha_key in leaf)
1137+
1138+ def test_large_offsets(self):
1139+ leaf = self.module._parse_into_chk(_large_offsets, 1, 0)
1140+ self.assertEqual(['12345678901 1234567890 0 1',
1141+ '2147483648 2147483647 0 1',
1142+ '4294967296 4294967295 4294967294 1',
1143+ ], [x[1][0] for x in leaf.all_items()])
1144+
1145+ def test_many_key_leaf(self):
1146+ leaf = self.module._parse_into_chk(_multi_key_content, 1, 0)
1147+ self.assertEqual(8, len(leaf))
1148+ all_keys = leaf.all_keys()
1149+ self.assertEqual(8, len(leaf.all_keys()))
1150+ for idx, key in enumerate(all_keys):
1151+ self.assertEqual(str(idx), leaf[key][0].split()[0])
1152+
1153+ def test_common_shift(self):
1154+ # The keys were deliberately chosen so that the first 5 bits all
1155+ # overlapped, it also happens that a later bit overlaps
1156+ # Note that by 'overlap' we mean that given bit is either on in all
1157+ # keys, or off in all keys
1158+ leaf = self.module._parse_into_chk(_multi_key_content, 1, 0)
1159+ self.assertEqual(19, leaf.common_shift)
1160+ # The interesting byte for each key is
1161+ # (defined as the 8-bits that come after the common prefix)
1162+ lst = [1, 13, 28, 180, 190, 193, 210, 239]
1163+ offsets = leaf._get_offsets()
1164+ self.assertEqual([bisect.bisect_left(lst, x) for x in range(0, 257)],
1165+ offsets)
1166+ for idx, val in enumerate(lst):
1167+ self.assertEqual(idx, offsets[val])
1168+ for idx, key in enumerate(leaf.all_keys()):
1169+ self.assertEqual(str(idx), leaf[key][0].split()[0])
1170+
1171+ def test_multi_key_same_offset(self):
1172+ # there is no common prefix, though there are some common bits
1173+ leaf = self.module._parse_into_chk(_multi_key_same_offset, 1, 0)
1174+ self.assertEqual(24, leaf.common_shift)
1175+ offsets = leaf._get_offsets()
1176+ # The interesting byte is just the first 8-bits of the key
1177+ lst = [8, 200, 205, 205, 205, 205, 206, 206]
1178+ self.assertEqual([bisect.bisect_left(lst, x) for x in range(0, 257)],
1179+ offsets)
1180+ for val in lst:
1181+ self.assertEqual(lst.index(val), offsets[val])
1182+ for idx, key in enumerate(leaf.all_keys()):
1183+ self.assertEqual(str(idx), leaf[key][0].split()[0])
1184+
1185+ def test_all_common_prefix(self):
1186+ # The first 32 bits of all hashes are the same. This is going to be
1187+ # pretty much impossible, but I don't want to fail because of this
1188+ leaf = self.module._parse_into_chk(_common_32_bits, 1, 0)
1189+ self.assertEqual(0, leaf.common_shift)
1190+ lst = [0x78] * 8
1191+ offsets = leaf._get_offsets()
1192+ self.assertEqual([bisect.bisect_left(lst, x) for x in range(0, 257)],
1193+ offsets)
1194+ for val in lst:
1195+ self.assertEqual(lst.index(val), offsets[val])
1196+ for idx, key in enumerate(leaf.all_keys()):
1197+ self.assertEqual(str(idx), leaf[key][0].split()[0])
1198+
1199+ def test_many_entries(self):
1200+ # Again, this is almost impossible, but we should still work
1201+ # It would be hard to fit more that 120 entries in a 4k page, much less
1202+ # more than 256 of them. but hey, weird stuff happens sometimes
1203+ lines = ['type=leaf\n']
1204+ for i in range(500):
1205+ key_str = 'sha1:%04x%s' % (i, _hex_form[:36])
1206+ key = (key_str,)
1207+ lines.append('%s\0\0%d %d %d %d\n' % (key_str, i, i, i, i))
1208+ bytes = ''.join(lines)
1209+ leaf = self.module._parse_into_chk(bytes, 1, 0)
1210+ self.assertEqual(24-7, leaf.common_shift)
1211+ offsets = leaf._get_offsets()
1212+ # This is the interesting bits for each entry
1213+ lst = [x // 2 for x in range(500)]
1214+ expected_offsets = [x * 2 for x in range(128)] + [255]*129
1215+ self.assertEqual(expected_offsets, offsets)
1216+ # We truncate because offsets is an unsigned char. So the bisection
1217+ # will just say 'greater than the last one' for all the rest
1218+ lst = lst[:255]
1219+ self.assertEqual([bisect.bisect_left(lst, x) for x in range(0, 257)],
1220+ offsets)
1221+ for val in lst:
1222+ self.assertEqual(lst.index(val), offsets[val])
1223+ for idx, key in enumerate(leaf.all_keys()):
1224+ self.assertEqual(str(idx), leaf[key][0].split()[0])
1225+
1226+ def test__sizeof__(self):
1227+ # We can't use the exact numbers because of platform variations, etc.
1228+ # But what we really care about is that it does get bigger with more
1229+ # content.
1230+ leaf0 = self.module._parse_into_chk('type=leaf\n', 1, 0)
1231+ leaf1 = self.module._parse_into_chk(_one_key_content, 1, 0)
1232+ leafN = self.module._parse_into_chk(_multi_key_content, 1, 0)
1233+ sizeof_1 = leaf1.__sizeof__() - leaf0.__sizeof__()
1234+ self.assertTrue(sizeof_1 > 0)
1235+ sizeof_N = leafN.__sizeof__() - leaf0.__sizeof__()
1236+ self.assertEqual(sizeof_1 * len(leafN), sizeof_N)
1237
1238=== modified file 'bzrlib/tests/test_btree_index.py'
1239--- bzrlib/tests/test_btree_index.py 2010-06-20 11:18:38 +0000
1240+++ bzrlib/tests/test_btree_index.py 2010-08-24 19:24:49 +0000
1241@@ -216,12 +216,12 @@
1242 leaf1_bytes = zlib.decompress(leaf1)
1243 sorted_node_keys = sorted(node[0] for node in nodes)
1244 node = btree_index._LeafNode(leaf1_bytes, 1, 0)
1245- self.assertEqual(231, len(node.keys))
1246- self.assertEqual(sorted_node_keys[:231], sorted(node.keys))
1247+ self.assertEqual(231, len(node))
1248+ self.assertEqual(sorted_node_keys[:231], node.all_keys())
1249 leaf2_bytes = zlib.decompress(leaf2)
1250 node = btree_index._LeafNode(leaf2_bytes, 1, 0)
1251- self.assertEqual(400 - 231, len(node.keys))
1252- self.assertEqual(sorted_node_keys[231:], sorted(node.keys))
1253+ self.assertEqual(400 - 231, len(node))
1254+ self.assertEqual(sorted_node_keys[231:], node.all_keys())
1255
1256 def test_last_page_rounded_1_layer(self):
1257 builder = btree_index.BTreeBuilder(key_elements=1, reference_lists=0)
1258@@ -241,9 +241,9 @@
1259 leaf2 = content[74:]
1260 leaf2_bytes = zlib.decompress(leaf2)
1261 node = btree_index._LeafNode(leaf2_bytes, 1, 0)
1262- self.assertEqual(10, len(node.keys))
1263+ self.assertEqual(10, len(node))
1264 sorted_node_keys = sorted(node[0] for node in nodes)
1265- self.assertEqual(sorted_node_keys, sorted(node.keys))
1266+ self.assertEqual(sorted_node_keys, node.all_keys())
1267
1268 def test_last_page_not_rounded_2_layer(self):
1269 builder = btree_index.BTreeBuilder(key_elements=1, reference_lists=0)
1270@@ -263,9 +263,9 @@
1271 leaf2 = content[8192:]
1272 leaf2_bytes = zlib.decompress(leaf2)
1273 node = btree_index._LeafNode(leaf2_bytes, 1, 0)
1274- self.assertEqual(400 - 231, len(node.keys))
1275+ self.assertEqual(400 - 231, len(node))
1276 sorted_node_keys = sorted(node[0] for node in nodes)
1277- self.assertEqual(sorted_node_keys[231:], sorted(node.keys))
1278+ self.assertEqual(sorted_node_keys[231:], node.all_keys())
1279
1280 def test_three_level_tree_details(self):
1281 # The left most pointer in the second internal node in a row should
1282@@ -302,7 +302,7 @@
1283 # in the second node it points at
1284 pos = index._row_offsets[2] + internal_node2.offset + 1
1285 leaf = index._get_leaf_nodes([pos])[pos]
1286- self.assertTrue(internal_node2.keys[0] in leaf.keys)
1287+ self.assertTrue(internal_node2.keys[0] in leaf)
1288
1289 def test_2_leaves_2_2(self):
1290 builder = btree_index.BTreeBuilder(key_elements=2, reference_lists=2)
1291@@ -728,7 +728,7 @@
1292 nodes = dict(index._read_nodes([0]))
1293 self.assertEqual([0], nodes.keys())
1294 node = nodes[0]
1295- self.assertEqual([('key',)], node.keys.keys())
1296+ self.assertEqual([('key',)], node.all_keys())
1297 self.assertEqual([('get', 'index')], trans._activity)
1298
1299 def test__read_nodes_no_size_multiple_pages(self):
1300@@ -1112,7 +1112,7 @@
1301 min_l2_key = l2.min_key
1302 max_l1_key = l1.max_key
1303 self.assertTrue(max_l1_key < min_l2_key)
1304- parents_min_l2_key = l2.keys[min_l2_key][1][0]
1305+ parents_min_l2_key = l2[min_l2_key][1][0]
1306 self.assertEqual((l1.max_key,), parents_min_l2_key)
1307 # Now, whatever key we select that would fall on the second page,
1308 # should give us all the parents until the page break
1309@@ -1131,7 +1131,7 @@
1310 parent_map = {}
1311 search_keys = index._find_ancestors([max_l1_key], 0, parent_map,
1312 missing_keys)
1313- self.assertEqual(sorted(l1.keys), sorted(parent_map))
1314+ self.assertEqual(l1.all_keys(), sorted(parent_map))
1315 self.assertEqual(set(), missing_keys)
1316 self.assertEqual(set(), search_keys)
1317
1318@@ -1205,7 +1205,7 @@
1319 ("2222222222222222222222222222222222222222",): ("value:2", ()),
1320 ("3333333333333333333333333333333333333333",): ("value:3", ()),
1321 ("4444444444444444444444444444444444444444",): ("value:4", ()),
1322- }, node.keys)
1323+ }, dict(node.all_items()))
1324
1325 def test_LeafNode_2_2(self):
1326 node_bytes = ("type=leaf\n"
1327@@ -1225,7 +1225,7 @@
1328 ('11', '33'): ('value:3',
1329 ((('11', 'ref22'),), (('11', 'ref22'), ('11', 'ref22')))),
1330 ('11', '44'): ('value:4', ((), (('11', 'ref00'),)))
1331- }, node.keys)
1332+ }, dict(node.all_items()))
1333
1334 def test_InternalNode_1(self):
1335 node_bytes = ("type=internal\n"
1336@@ -1266,7 +1266,7 @@
1337 ('11', '33'): ('value:3',
1338 ((('11', 'ref22'),), (('11', 'ref22'), ('11', 'ref22')))),
1339 ('11', '44'): ('value:4', ((), (('11', 'ref00'),)))
1340- }, node.keys)
1341+ }, dict(node.all_items()))
1342
1343 def assertFlattened(self, expected, key, value, refs):
1344 flat_key, flat_line = self.parse_btree._flatten_node(
1345
1346=== modified file 'bzrlib/tests/test_clean_tree.py'
1347--- bzrlib/tests/test_clean_tree.py 2010-08-21 03:42:26 +0000
1348+++ bzrlib/tests/test_clean_tree.py 2010-08-24 19:24:49 +0000
1349@@ -1,4 +1,4 @@
1350-# Copyright (C) 2005, 2009 Canonical Ltd
1351+# Copyright (C) 2009, 2010 Canonical Ltd
1352 #
1353 # This program is free software; you can redistribute it and/or modify
1354 # it under the terms of the GNU General Public License as published by
1355
1356=== modified file 'bzrlib/tests/test_inv.py'
1357--- bzrlib/tests/test_inv.py 2010-08-03 13:31:06 +0000
1358+++ bzrlib/tests/test_inv.py 2010-08-24 19:24:49 +0000
1359@@ -1,4 +1,4 @@
1360-# Copyright (C) 2005, 2006, 2007, 2008, 2009 Canonical Ltd
1361+# Copyright (C) 2005-2010 Canonical Ltd
1362 #
1363 # This program is free software; you can redistribute it and/or modify
1364 # it under the terms of the GNU General Public License as published by
1365
1366=== modified file 'bzrlib/tests/test_inventory_delta.py'
1367--- bzrlib/tests/test_inventory_delta.py 2010-08-03 13:31:06 +0000
1368+++ bzrlib/tests/test_inventory_delta.py 2010-08-24 19:24:49 +0000
1369@@ -1,4 +1,4 @@
1370-# Copyright (C) 2008, 2009 Canonical Ltd
1371+# Copyright (C) 2009, 2010 Canonical Ltd
1372 #
1373 # This program is free software; you can redistribute it and/or modify
1374 # it under the terms of the GNU General Public License as published by
1375
1376=== modified file 'bzrlib/tests/test_repository.py'
1377--- bzrlib/tests/test_repository.py 2010-03-31 16:29:25 +0000
1378+++ bzrlib/tests/test_repository.py 2010-08-24 19:24:49 +0000
1379@@ -31,6 +31,7 @@
1380 UnsupportedFormatError,
1381 )
1382 from bzrlib import (
1383+ btree_index,
1384 graph,
1385 tests,
1386 )
1387@@ -681,6 +682,21 @@
1388
1389 class Test2a(tests.TestCaseWithMemoryTransport):
1390
1391+ def test_chk_bytes_uses_custom_btree_parser(self):
1392+ mt = self.make_branch_and_memory_tree('test', format='2a')
1393+ mt.lock_write()
1394+ self.addCleanup(mt.unlock)
1395+ mt.add([''], ['root-id'])
1396+ mt.commit('first')
1397+ index = mt.branch.repository.chk_bytes._index._graph_index._indices[0]
1398+ self.assertEqual(btree_index._gcchk_factory, index._leaf_factory)
1399+ # It should also work if we re-open the repo
1400+ repo = mt.branch.repository.bzrdir.open_repository()
1401+ repo.lock_read()
1402+ self.addCleanup(repo.unlock)
1403+ index = repo.chk_bytes._index._graph_index._indices[0]
1404+ self.assertEqual(btree_index._gcchk_factory, index._leaf_factory)
1405+
1406 def test_fetch_combines_groups(self):
1407 builder = self.make_branch_builder('source', format='2a')
1408 builder.start_series()
1409
1410=== modified file 'bzrlib/tests/test_source.py'
1411--- bzrlib/tests/test_source.py 2010-07-05 16:01:33 +0000
1412+++ bzrlib/tests/test_source.py 2010-08-24 19:24:49 +0000
1413@@ -388,8 +388,10 @@
1414 """
1415 both_exc_and_no_exc = []
1416 missing_except = []
1417- class_re = re.compile(r'^(cdef\s+)?(public\s+)?(api\s+)?class (\w+).*:',
1418- re.MULTILINE)
1419+ class_re = re.compile(r'^(cdef\s+)?(public\s+)?'
1420+ r'(api\s+)?class (\w+).*:', re.MULTILINE)
1421+ extern_class_re = re.compile(r'## extern cdef class (\w+)',
1422+ re.MULTILINE)
1423 except_re = re.compile(r'cdef\s+' # start with cdef
1424 r'([\w *]*?)\s*' # this is the return signature
1425 r'(\w+)\s*\(' # the function name
1426@@ -400,6 +402,7 @@
1427 for fname, text in self.get_source_file_contents(
1428 extensions=('.pyx',)):
1429 known_classes = set([m[-1] for m in class_re.findall(text)])
1430+ known_classes.update(extern_class_re.findall(text))
1431 cdefs = except_re.findall(text)
1432 for sig, func, exc_clause, no_exc_comment in cdefs:
1433 if sig.startswith('api '):
1434
1435=== modified file 'doc/en/whats-new/whats-new-in-2.3.txt'
1436--- doc/en/whats-new/whats-new-in-2.3.txt 2010-08-18 07:25:22 +0000
1437+++ doc/en/whats-new/whats-new-in-2.3.txt 2010-08-24 19:24:49 +0000
1438@@ -37,6 +37,10 @@
1439 tree may substantially improve the speed of operations like ``bzr
1440 commit``. (Andrew Bennetts)
1441
1442+* Lower memory consumption when reading many chk index pages. Helpful for
1443+ things like ``bzr co`` or ``bzr ls -R`` on large trees.
1444+ (John Arbash Meinel)
1445+
1446 * When building new working trees, default to reading from the repository
1447 rather than the source tree unless explicitly requested. (via
1448 ``--files-from`` and ``--hardlink`` for ``bzr branch`` and