Merge lp:~jameinel/bzr/2.1-static-tuple-btree-string-intern into lp:bzr

Proposed by John A Meinel
Status: Merged
Approved by: Andrew Bennetts
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/2.1-static-tuple-btree-string-intern
Merge into: lp:bzr
Diff against target: 553 lines
12 files modified
NEWS (+13/-7)
bzrlib/_bencode_pyx.pyx (+9/-1)
bzrlib/_btree_serializer_pyx.pyx (+80/-38)
bzrlib/_static_tuple_c.c (+8/-2)
bzrlib/btree_index.py (+2/-0)
bzrlib/builtins.py (+4/-1)
bzrlib/index.py (+4/-1)
bzrlib/repository.py (+7/-0)
bzrlib/static_tuple.py (+25/-0)
bzrlib/tests/test__static_tuple.py (+21/-0)
bzrlib/util/_bencode_py.py (+7/-0)
setup.py (+2/-1)
To merge this branch: bzr merge lp:~jameinel/bzr/2.1-static-tuple-btree-string-intern
Reviewer Review Type Date Requested Status
Andrew Bennetts Approve
bzr-core Pending
Review via email: mp+13296@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

This is the same as an earlier patch for using StaticTuple as part of the btree code. It has a couple small additions.

1) Small fix for 'bzr dump-btree' that casts the objects back to tuples for nicer formatting.
2) Add 'StaticTuple' as a type that 'bencode' knows how to deal with (just treats it as another
   Tuple/List object.)
   Arguably we probably want to end up with 'decode_as_tuples=True' to return StaticTuple
   instances. For now, though, this was all that was necessary to get the test suite to pass on my
   machine. (Though a lot of the tests that were failing on PQM weren't failing here anyway...)

Revision history for this message
Andrew Bennetts (spiv) wrote :

384 + refs_as_tuples = tuple([tuple([tuple(ref) for ref in ref_list])
385 + for ref_list in node[3]])

I wonder if it would be worth adding a convenience method, perhaps StaticTuple.as_tuples(), that recursively does this conversion. That would make this ugliness unnecessary.

431 + # I don't believe we can define a method by which
432 + # (prefix,) + StaticTuple will work, though we could

In plain Python you could define an __radd__ for this, so surely there's a way to do this in C?

class T(object):
    def __radd__(self, other):
        return 'haha!'
t = T()

print ('tuple',) + t # prints 'haha!'

You may need to do something odd like provide the nb_add slot, even though this isn't really a numeric type, but I think that's ok. (All pure python classes would have that I think, even the non-numeric ones, so presumably having tp_as_number filled doesn't automatically make Python do dumb things.)

I think we can live without this, but it would be nice.

488 + k1 = stuple(stuple('<email address hidden>',),
489 + stuple('<email address hidden>',))
490 + k2 = stuple(stuple(stuple('<email address hidden>',),
491 + stuple('<email address hidden>',)),
492 + stuple('<email address hidden>',))

This test data is needlessly complex and hard to read. Why not e.g.:

    k1 = stuple(stuple('a',), stuple('b',))
    k2 = stuple(stuple(stuple('c',), stuple('d',)), stuple('a',))

Which is structurally the same and much easier to follow.

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Review: Approve
> 384 + refs_as_tuples = tuple([tuple([tuple(ref) for ref in ref_list])
> 385 + for ref_list in node[3]])
>
> I wonder if it would be worth adding a convenience method, perhaps StaticTuple.as_tuples(), that recursively does this conversion. That would make this ugliness unnecessary.

As for "as_tuples()" I would be fine just extending ".as_tuple()" to do
exactly that. The main restriction is that we may not always have tuples
at this point.

At least so far, tuple is interchangeable w/ StaticTuple.

>
> 431 + # I don't believe we can define a method by which
> 432 + # (prefix,) + StaticTuple will work, though we could
>
> In plain Python you could define an __radd__ for this, so surely there's a way to do this in C?
>
> class T(object):
> def __radd__(self, other):
> return 'haha!'
> t = T()
>
> print ('tuple',) + t # prints 'haha!'
>

Tuple uses "tp_as_sequence.sq_concat" to handle ('tuple',) + t, which I
didn't think worked the other way. But thanks for pointing me to this,
I'll look into it.

> You may need to do something odd like provide the nb_add slot, even though this isn't really a numeric type, but I think that's ok. (All pure python classes would have that I think, even the non-numeric ones, so presumably having tp_as_number filled doesn't automatically make Python do dumb things.)
>
> I think we can live without this, but it would be nice.

Actually, the main reason I added the comment is because I expect things
to fail at that point, but I haven't gotten a test case to trigger it,
and it also won't trigger with --2a formats... (They don't have missing
compression parents.)

>
> 488 + k1 = stuple(stuple('<email address hidden>',),
> 489 + stuple('<email address hidden>',))
> 490 + k2 = stuple(stuple(stuple('<email address hidden>',),
> 491 + stuple('<email address hidden>',)),
> 492 + stuple('<email address hidden>',))
>
> This test data is needlessly complex and hard to read. Why not e.g.:
>
> k1 = stuple(stuple('a',), stuple('b',))
> k2 = stuple(stuple(stuple('c',), stuple('d',)), stuple('a',))
>
> Which is structurally the same and much easier to follow.

Sure. I did the above because that was the actual data I was getting. Of
course, I've since narrowed it down to a bug in interning....

Anyway, I'm happy to simplify it, and should have done so before submitting.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrVNP8ACgkQJdeBCYSNAAN61QCbBceibTUybQ2cwzsABrC2rcPc
RwcAni/o9YUyAE/7ShfvcoeHZFGUMwDw
=ZhaX
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote :

[...]
> > In plain Python you could define an __radd__ for this, so surely there's a
> > way to do this in C?
[...]
> > You may need to do something odd like provide the nb_add slot, even though
> > this isn't really a numeric type, but I think that's ok. (All pure python
> > classes would have that I think, even the non-numeric ones, so presumably
> > having tp_as_number filled doesn't automatically make Python do dumb
> > things.)

By the way, because lack of support for tuple + StaticTuple caused all the
babune builders to go red, I took a look at this. (specifically,
bzrlib.tests.per_repository.test_write_group.TestResumeableWriteGroup.test_commit_resumed_write_group_with_missing_parents
was failing)

You actually need nb_coerce as well as nb_add. Here's a rough patch that does
this. The error it gives when you try to add a plain tuple with incompatible
elements (e.g. ints) is probably not ideal, but it works.

-Andrew.

=== modified file 'bzrlib/_static_tuple_c.c'
--- bzrlib/_static_tuple_c.c 2009-10-15 18:18:44 +0000
+++ bzrlib/_static_tuple_c.c 2009-10-16 08:24:36 +0000
@@ -513,6 +513,78 @@
513 "Check to see if this tuple has been interned.\n";513 "Check to see if this tuple has been interned.\n";
514514
515515
516int
517StaticTuple_coerce(PyObject **v, PyObject **w)
518{
519 StaticTuple *st;
520 if (PyTuple_Check(*v)) {
521 st = (StaticTuple*) StaticTuple_new_constructor(
522 &StaticTuple_Type, *v, NULL);
523 if (!st)
524 return -1;
525 Py_INCREF(st);
526 *v = (PyObject*)st;
527 } else if (StaticTuple_CheckExact(*v))
528 Py_INCREF(*v);
529 else
530 return 1;
531
532 if (PyTuple_Check(*w)) {
533 st = (StaticTuple*) StaticTuple_new_constructor(
534 &StaticTuple_Type, *w, NULL);
535 if (!st)
536 return -1;
537 Py_INCREF(st);
538 *w = (PyObject*)st;
539 } else if (StaticTuple_CheckExact(*w))
540 Py_INCREF(*w);
541 else
542 return 1;
543 return 0;
544}
545
546static PyObject *
547StaticTuple_add(PyObject *v, PyObject *w)
548{
549 PyObject *v_t = NULL, *w_t = NULL;
550 PyObject *tmp_tuple, *result;
551 /* StaticTuples and plain tuples may be added (concatenated) to
552 * StaticTuples.
553 */
554 if (StaticTuple_CheckExact(v)) {
555 v_t = StaticTuple_as_tuple((StaticTuple*)v);
556 if (!v_t)
557 goto fail;
558 } else if (PyTuple_Check(v))
559 v_t = v;
560 else
561 goto not_imp;
562
563 if (StaticTuple_CheckExact(w)) {
564 w_t = StaticTuple_as_tuple((StaticTuple*)w);
565 if (!w_t)
566 goto fail;
567 } else if (PyTuple_Check(w))
568 w_t = w;
569 else
570 goto not_imp;
571
572 tmp_tuple = PySequence_Concat(v_t, w_t);
573 result = StaticTuple_new_constructor(&StaticTuple_Type, tmp_tuple, NULL);
574 Py_DECREF(tmp_tuple);
575 Py_INCREF(result);
576 return result;
577
578not_imp:
579 Py_XDECREF(v_t);
580 Py_XDECREF(w_t);
581 return Py_NotImplemented;
582fail:
583 Py_XDECREF(v_t);
584 Py_XDECREF(w_t);
585 return NULL;
586}
587
516static PyObject *588static PyObject *
517StaticTuple_item(StaticTuple *self, Py_ssize_t offset)589StaticTuple_item(StaticTuple *self, Py_ssize_t offset)
518{590{
@@ -574,6 +646,29 @@
574 {NULL, NULL} /* sentinel */646 {NULL, NULL} /* sentinel */
575};647};
576648
649
650static PyNumberMethods StaticTuple_as_number = {
651 (binaryfunc) StaticTuple_add, /* nb_add */
652 0, /* nb_subtract */
653 0, /* nb_multiply */
654 0, /* nb_divide */
655 0, /* nb_remainder */
656 0, /* nb_divmod */
657 0, /* nb_power */
658 0, /* nb_negative */
659 0, /* nb_positive */
660 0, /* nb_absolute */
661 0, /* nb_nonzero */
662 0, /* nb_invert */
663 0, /* nb_lshift */
664 0, /* nb_rshift */
665 0, /* nb_and */
666 0, /* nb_xor */
667 0, /* nb_or */
668 StaticTuple_coerce, /* nb_coerce */
669};
670
671
577static PySequenceMethods StaticTuple_as_sequence = {672static PySequenceMethods StaticTuple_as_sequence = {
578 (lenfunc)StaticTuple_length, /* sq_length */673 (lenfunc)StaticTuple_length, /* sq_length */
579 0, /* sq_concat */674 0, /* sq_concat */
@@ -604,7 +699,7 @@
604 0, /* tp_setattr */699 0, /* tp_setattr */
605 0, /* tp_compare */700 0, /* tp_compare */
606 (reprfunc)StaticTuple_repr, /* tp_repr */701 (reprfunc)StaticTuple_repr, /* tp_repr */
607 0, /* tp_as_number */702 &StaticTuple_as_number, /* tp_as_number */
608 &StaticTuple_as_sequence, /* tp_as_sequence */703 &StaticTuple_as_sequence, /* tp_as_sequence */
609 0, /* tp_as_mapping */704 0, /* tp_as_mapping */
610 (hashfunc)StaticTuple_hash, /* tp_hash */705 (hashfunc)StaticTuple_hash, /* tp_hash */

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-10-15 04:06:32 +0000
+++ NEWS 2009-10-15 18:31:17 +0000
@@ -25,6 +25,11 @@
25Improvements25Improvements
26************26************
2727
28* When reading index files, we now use a ``StaticTuple`` rather than a
29 plain ``tuple`` object. This generally gives a 20% decrease in peak
30 memory, and can give a performance boost up to 40% on large projects.
31 (John Arbash Meinel)
32
28Documentation33Documentation
29*************34*************
3035
@@ -45,13 +50,14 @@
45 used as the interning structure for StaticTuple objects.50 used as the interning structure for StaticTuple objects.
46 (John Arbash Meinel)51 (John Arbash Meinel)
4752
48* ``bzrlib._static_tuple_pyx.StaticTuple`` is now available. This class53* ``bzrlib._static_tuple_pyx.StaticTuple`` is now available and used by
49 functions similarly to ``tuple`` objects. However, it can only point at54 the btree index parser. This class functions similarly to ``tuple``
50 other ``StaticTuple`` instances or strings. This allows us to remove it55 objects. However, it can only point at other ``StaticTuple`` instances
51 from the garbage collector (it cannot be in a cycle), it also allows us56 or strings. This allows us to remove it from the garbage collector (it
52 to intern the objects. In testing, this can reduce peak memory by57 cannot be in a cycle), it also allows us to intern the objects. In
53 20-40%, and significantly improve performance by removing objects from58 testing, this can reduce peak memory by 20-40%, and significantly
54 being inspected by the garbage collector. (John Arbash Meinel)59 improve performance by removing objects from being inspected by the
60 garbage collector. (John Arbash Meinel)
5561
56Testing62Testing
57*******63*******
5864
=== modified file 'bzrlib/_bencode_pyx.pyx'
--- bzrlib/_bencode_pyx.pyx 2009-06-05 01:48:32 +0000
+++ bzrlib/_bencode_pyx.pyx 2009-10-15 18:31:17 +0000
@@ -58,6 +58,13 @@
58 void D_UPDATE_TAIL(Decoder, int n)58 void D_UPDATE_TAIL(Decoder, int n)
59 void E_UPDATE_TAIL(Encoder, int n)59 void E_UPDATE_TAIL(Encoder, int n)
6060
61# To maintain compatibility with older versions of pyrex, we have to use the
62# relative import here, rather than 'bzrlib._static_tuple_c'
63from _static_tuple_c cimport StaticTuple, StaticTuple_CheckExact, \
64 import_static_tuple_c
65
66import_static_tuple_c()
67
6168
62cdef class Decoder:69cdef class Decoder:
63 """Bencode decoder"""70 """Bencode decoder"""
@@ -371,7 +378,8 @@
371 self._encode_int(x)378 self._encode_int(x)
372 elif PyLong_CheckExact(x):379 elif PyLong_CheckExact(x):
373 self._encode_long(x)380 self._encode_long(x)
374 elif PyList_CheckExact(x) or PyTuple_CheckExact(x):381 elif (PyList_CheckExact(x) or PyTuple_CheckExact(x)
382 or StaticTuple_CheckExact(x)):
375 self._encode_list(x)383 self._encode_list(x)
376 elif PyDict_CheckExact(x):384 elif PyDict_CheckExact(x):
377 self._encode_dict(x)385 self._encode_dict(x)
378386
=== modified file 'bzrlib/_btree_serializer_pyx.pyx'
--- bzrlib/_btree_serializer_pyx.pyx 2009-10-08 05:12:01 +0000
+++ bzrlib/_btree_serializer_pyx.pyx 2009-10-15 18:31:17 +0000
@@ -38,6 +38,8 @@
38 Py_ssize_t PyString_Size(object p)38 Py_ssize_t PyString_Size(object p)
39 Py_ssize_t PyString_GET_SIZE_ptr "PyString_GET_SIZE" (PyObject *)39 Py_ssize_t PyString_GET_SIZE_ptr "PyString_GET_SIZE" (PyObject *)
40 char * PyString_AS_STRING_ptr "PyString_AS_STRING" (PyObject *)40 char * PyString_AS_STRING_ptr "PyString_AS_STRING" (PyObject *)
41 char * PyString_AS_STRING(object)
42 Py_ssize_t PyString_GET_SIZE(object)
41 int PyString_AsStringAndSize_ptr(PyObject *, char **buf, Py_ssize_t *len)43 int PyString_AsStringAndSize_ptr(PyObject *, char **buf, Py_ssize_t *len)
42 void PyString_InternInPlace(PyObject **)44 void PyString_InternInPlace(PyObject **)
43 int PyTuple_CheckExact(object t)45 int PyTuple_CheckExact(object t)
@@ -55,6 +57,12 @@
55 # void *memrchr(void *s, int c, size_t n)57 # void *memrchr(void *s, int c, size_t n)
56 int strncmp(char *s1, char *s2, size_t n)58 int strncmp(char *s1, char *s2, size_t n)
5759
60# It seems we need to import the definitions so that the pyrex compiler has
61# local names to access them.
62from _static_tuple_c cimport StaticTuple, \
63 import_static_tuple_c, StaticTuple_New, \
64 StaticTuple_Intern, StaticTuple_SET_ITEM, StaticTuple_CheckExact
65
5866
59# TODO: Find some way to import this from _dirstate_helpers67# TODO: Find some way to import this from _dirstate_helpers
60cdef void* _my_memrchr(void *s, int c, size_t n):68cdef void* _my_memrchr(void *s, int c, size_t n):
@@ -71,6 +79,7 @@
71 pos = pos - 179 pos = pos - 1
72 return NULL80 return NULL
7381
82
74# TODO: Import this from _dirstate_helpers when it is merged83# TODO: Import this from _dirstate_helpers when it is merged
75cdef object safe_string_from_size(char *s, Py_ssize_t size):84cdef object safe_string_from_size(char *s, Py_ssize_t size):
76 if size < 0:85 if size < 0:
@@ -94,6 +103,10 @@
94 Py_DECREF_ptr(py_str)103 Py_DECREF_ptr(py_str)
95 return result104 return result
96105
106from bzrlib import _static_tuple_c
107# This sets up the StaticTuple C_API functionality
108import_static_tuple_c()
109
97110
98cdef class BTreeLeafParser:111cdef class BTreeLeafParser:
99 """Parse the leaf nodes of a BTree index.112 """Parse the leaf nodes of a BTree index.
@@ -133,6 +146,7 @@
133 self._cur_str = NULL146 self._cur_str = NULL
134 self._end_str = NULL147 self._end_str = NULL
135 self._header_found = 0148 self._header_found = 0
149 # keys are tuples
136150
137 cdef extract_key(self, char * last):151 cdef extract_key(self, char * last):
138 """Extract a key.152 """Extract a key.
@@ -142,8 +156,9 @@
142 """156 """
143 cdef char *temp_ptr157 cdef char *temp_ptr
144 cdef int loop_counter158 cdef int loop_counter
145 # keys are tuples159 cdef StaticTuple key
146 key = PyTuple_New(self.key_length)160
161 key = StaticTuple_New(self.key_length)
147 for loop_counter from 0 <= loop_counter < self.key_length:162 for loop_counter from 0 <= loop_counter < self.key_length:
148 # grab a key segment163 # grab a key segment
149 temp_ptr = <char*>memchr(self._start, c'\0', last - self._start)164 temp_ptr = <char*>memchr(self._start, c'\0', last - self._start)
@@ -158,15 +173,19 @@
158 last - self._start)))173 last - self._start)))
159 raise AssertionError(failure_string)174 raise AssertionError(failure_string)
160 # capture the key string175 # capture the key string
161 # TODO: Consider using PyIntern_FromString, the only caveat is that176 if (self.key_length == 1
162 # it assumes a NULL-terminated string, so we have to check if177 and (temp_ptr - self._start) == 45
163 # temp_ptr[0] == c'\0' or some other char.178 and strncmp(self._start, 'sha1:', 5) == 0):
164 key_element = safe_interned_string_from_size(self._start,179 key_element = safe_string_from_size(self._start,
180 temp_ptr - self._start)
181 else:
182 key_element = safe_interned_string_from_size(self._start,
165 temp_ptr - self._start)183 temp_ptr - self._start)
166 # advance our pointer184 # advance our pointer
167 self._start = temp_ptr + 1185 self._start = temp_ptr + 1
168 Py_INCREF(key_element)186 Py_INCREF(key_element)
169 PyTuple_SET_ITEM(key, loop_counter, key_element)187 StaticTuple_SET_ITEM(key, loop_counter, key_element)
188 key = StaticTuple_Intern(key)
170 return key189 return key
171190
172 cdef int process_line(self) except -1:191 cdef int process_line(self) except -1:
@@ -176,6 +195,7 @@
176 cdef char *ref_ptr195 cdef char *ref_ptr
177 cdef char *next_start196 cdef char *next_start
178 cdef int loop_counter197 cdef int loop_counter
198 cdef Py_ssize_t str_len
179199
180 self._start = self._cur_str200 self._start = self._cur_str
181 # Find the next newline201 # Find the next newline
@@ -211,12 +231,25 @@
211 # Invalid line231 # Invalid line
212 raise AssertionError("Failed to find the value area")232 raise AssertionError("Failed to find the value area")
213 else:233 else:
214 # capture the value string234 # Because of how conversions were done, we ended up with *lots* of
215 value = safe_string_from_size(temp_ptr + 1, last - temp_ptr - 1)235 # values that are identical. These are all of the 0-length nodes
236 # that are referred to by the TREE_ROOT (and likely some other
237 # directory nodes.) For example, bzr has 25k references to
238 # something like '12607215 328306 0 0', which ends up consuming 1MB
239 # of memory, just for those strings.
240 str_len = last - temp_ptr - 1
241 if (str_len > 4
242 and strncmp(" 0 0", last - 4, 4) == 0):
243 # This drops peak mem for bzr.dev from 87.4MB => 86.2MB
244 # For Launchpad 236MB => 232MB
245 value = safe_interned_string_from_size(temp_ptr + 1, str_len)
246 else:
247 value = safe_string_from_size(temp_ptr + 1, str_len)
216 # shrink the references end point248 # shrink the references end point
217 last = temp_ptr249 last = temp_ptr
250
218 if self.ref_list_length:251 if self.ref_list_length:
219 ref_lists = []252 ref_lists = StaticTuple_New(self.ref_list_length)
220 loop_counter = 0253 loop_counter = 0
221 while loop_counter < self.ref_list_length:254 while loop_counter < self.ref_list_length:
222 ref_list = []255 ref_list = []
@@ -248,18 +281,20 @@
248 if temp_ptr == NULL:281 if temp_ptr == NULL:
249 # key runs to the end282 # key runs to the end
250 temp_ptr = ref_ptr283 temp_ptr = ref_ptr
284
251 PyList_Append(ref_list, self.extract_key(temp_ptr))285 PyList_Append(ref_list, self.extract_key(temp_ptr))
252 PyList_Append(ref_lists, tuple(ref_list))286 ref_list = StaticTuple_Intern(StaticTuple(*ref_list))
287 Py_INCREF(ref_list)
288 StaticTuple_SET_ITEM(ref_lists, loop_counter - 1, ref_list)
253 # prepare for the next reference list289 # prepare for the next reference list
254 self._start = next_start290 self._start = next_start
255 ref_lists = tuple(ref_lists)291 node_value = StaticTuple(value, ref_lists)
256 node_value = (value, ref_lists)
257 else:292 else:
258 if last != self._start:293 if last != self._start:
259 # unexpected reference data present294 # unexpected reference data present
260 raise AssertionError("unexpected reference data present")295 raise AssertionError("unexpected reference data present")
261 node_value = (value, ())296 node_value = StaticTuple(value, StaticTuple())
262 PyList_Append(self.keys, (key, node_value))297 PyList_Append(self.keys, StaticTuple(key, node_value))
263 return 0298 return 0
264299
265 def parse(self):300 def parse(self):
@@ -294,7 +329,6 @@
294 cdef Py_ssize_t flat_len329 cdef Py_ssize_t flat_len
295 cdef Py_ssize_t key_len330 cdef Py_ssize_t key_len
296 cdef Py_ssize_t node_len331 cdef Py_ssize_t node_len
297 cdef PyObject * val
298 cdef char * value332 cdef char * value
299 cdef Py_ssize_t value_len333 cdef Py_ssize_t value_len
300 cdef char * out334 cdef char * out
@@ -303,13 +337,12 @@
303 cdef int first_ref_list337 cdef int first_ref_list
304 cdef int first_reference338 cdef int first_reference
305 cdef int i339 cdef int i
306 cdef PyObject *ref_bit
307 cdef Py_ssize_t ref_bit_len340 cdef Py_ssize_t ref_bit_len
308341
309 if not PyTuple_CheckExact(node):342 if not PyTuple_CheckExact(node) and not StaticTuple_CheckExact(node):
310 raise TypeError('We expected a tuple() for node not: %s'343 raise TypeError('We expected a tuple() or StaticTuple() for node not: %s'
311 % type(node))344 % type(node))
312 node_len = PyTuple_GET_SIZE(node)345 node_len = len(node)
313 have_reference_lists = reference_lists346 have_reference_lists = reference_lists
314 if have_reference_lists:347 if have_reference_lists:
315 if node_len != 4:348 if node_len != 4:
@@ -318,8 +351,17 @@
318 elif node_len < 3:351 elif node_len < 3:
319 raise ValueError('Without ref_lists, we need at least 3 entries not: %s'352 raise ValueError('Without ref_lists, we need at least 3 entries not: %s'
320 % len(node))353 % len(node))
321 # I don't expect that we can do faster than string.join()354 # TODO: We can probably do better than string.join(), namely
322 string_key = '\0'.join(<object>PyTuple_GET_ITEM_ptr_object(node, 1))355 # when key has only 1 item, we can just grab that string
356 # And when there are 2 items, we could do a single malloc + len() + 1
357 # also, doing .join() requires a PyObject_GetAttrString call, which
358 # we could also avoid.
359 # TODO: Note that pyrex 0.9.6 generates fairly crummy code here, using the
360 # python object interface, versus 0.9.8+ which uses a helper that
361 # checks if this supports the sequence interface.
362 # We *could* do more work on our own, and grab the actual items
363 # lists. For now, just ask people to use a better compiler. :)
364 string_key = '\0'.join(node[1])
323365
324 # TODO: instead of using string joins, precompute the final string length,366 # TODO: instead of using string joins, precompute the final string length,
325 # and then malloc a single string and copy everything in.367 # and then malloc a single string and copy everything in.
@@ -336,7 +378,7 @@
336 refs_len = 0378 refs_len = 0
337 if have_reference_lists:379 if have_reference_lists:
338 # Figure out how many bytes it will take to store the references380 # Figure out how many bytes it will take to store the references
339 ref_lists = <object>PyTuple_GET_ITEM_ptr_object(node, 3)381 ref_lists = node[3]
340 next_len = len(ref_lists) # TODO: use a Py function382 next_len = len(ref_lists) # TODO: use a Py function
341 if next_len > 0:383 if next_len > 0:
342 # If there are no nodes, we don't need to do any work384 # If there are no nodes, we don't need to do any work
@@ -350,31 +392,31 @@
350 # references392 # references
351 refs_len = refs_len + (next_len - 1)393 refs_len = refs_len + (next_len - 1)
352 for reference in ref_list:394 for reference in ref_list:
353 if not PyTuple_CheckExact(reference):395 if (not PyTuple_CheckExact(reference)
396 and not StaticTuple_CheckExact(reference)):
354 raise TypeError(397 raise TypeError(
355 'We expect references to be tuples not: %s'398 'We expect references to be tuples not: %s'
356 % type(reference))399 % type(reference))
357 next_len = PyTuple_GET_SIZE(reference)400 next_len = len(reference)
358 if next_len > 0:401 if next_len > 0:
359 # We will need (len - 1) '\x00' characters to402 # We will need (len - 1) '\x00' characters to
360 # separate the reference key403 # separate the reference key
361 refs_len = refs_len + (next_len - 1)404 refs_len = refs_len + (next_len - 1)
362 for i from 0 <= i < next_len:405 for ref_bit in reference:
363 ref_bit = PyTuple_GET_ITEM_ptr_object(reference, i)406 if not PyString_CheckExact(ref_bit):
364 if not PyString_CheckExact_ptr(ref_bit):
365 raise TypeError('We expect reference bits'407 raise TypeError('We expect reference bits'
366 ' to be strings not: %s'408 ' to be strings not: %s'
367 % type(<object>ref_bit))409 % type(<object>ref_bit))
368 refs_len = refs_len + PyString_GET_SIZE_ptr(ref_bit)410 refs_len = refs_len + PyString_GET_SIZE(ref_bit)
369411
370 # So we have the (key NULL refs NULL value LF)412 # So we have the (key NULL refs NULL value LF)
371 key_len = PyString_Size(string_key)413 key_len = PyString_Size(string_key)
372 val = PyTuple_GET_ITEM_ptr_object(node, 2)414 val = node[2]
373 if not PyString_CheckExact_ptr(val):415 if not PyString_CheckExact(val):
374 raise TypeError('Expected a plain str for value not: %s'416 raise TypeError('Expected a plain str for value not: %s'
375 % type(<object>val))417 % type(val))
376 value = PyString_AS_STRING_ptr(val)418 value = PyString_AS_STRING(val)
377 value_len = PyString_GET_SIZE_ptr(val)419 value_len = PyString_GET_SIZE(val)
378 flat_len = (key_len + 1 + refs_len + 1 + value_len + 1)420 flat_len = (key_len + 1 + refs_len + 1 + value_len + 1)
379 line = PyString_FromStringAndSize(NULL, flat_len)421 line = PyString_FromStringAndSize(NULL, flat_len)
380 # Get a pointer to the new buffer422 # Get a pointer to the new buffer
@@ -396,14 +438,14 @@
396 out[0] = c'\r'438 out[0] = c'\r'
397 out = out + 1439 out = out + 1
398 first_reference = 0440 first_reference = 0
399 next_len = PyTuple_GET_SIZE(reference)441 next_len = len(reference)
400 for i from 0 <= i < next_len:442 for i from 0 <= i < next_len:
401 if i != 0:443 if i != 0:
402 out[0] = c'\x00'444 out[0] = c'\x00'
403 out = out + 1445 out = out + 1
404 ref_bit = PyTuple_GET_ITEM_ptr_object(reference, i)446 ref_bit = reference[i]
405 ref_bit_len = PyString_GET_SIZE_ptr(ref_bit)447 ref_bit_len = PyString_GET_SIZE(ref_bit)
406 memcpy(out, PyString_AS_STRING_ptr(ref_bit), ref_bit_len)448 memcpy(out, PyString_AS_STRING(ref_bit), ref_bit_len)
407 out = out + ref_bit_len449 out = out + ref_bit_len
408 out[0] = c'\0'450 out[0] = c'\0'
409 out = out + 1451 out = out + 1
410452
=== modified file 'bzrlib/_static_tuple_c.c'
--- bzrlib/_static_tuple_c.c 2009-10-15 16:18:47 +0000
+++ bzrlib/_static_tuple_c.c 2009-10-15 18:31:17 +0000
@@ -418,9 +418,15 @@
418 return NULL; /* There seems to be an error */418 return NULL; /* There seems to be an error */
419 }419 }
420 if (result == Py_NotImplemented) {420 if (result == Py_NotImplemented) {
421 PyErr_BadInternalCall();
422 Py_DECREF(result);421 Py_DECREF(result);
423 return NULL;422 /* One side must have had a string and the other a StaticTuple.
423 * This clearly means that they are not equal.
424 */
425 if (op == Py_EQ) {
426 Py_INCREF(Py_False);
427 return Py_False;
428 }
429 result = PyObject_RichCompare(v_obj, w_obj, Py_EQ);
424 }430 }
425 if (result == Py_False) {431 if (result == Py_False) {
426 /* This entry is not identical432 /* This entry is not identical
427433
=== modified file 'bzrlib/btree_index.py'
--- bzrlib/btree_index.py 2009-10-15 04:01:26 +0000
+++ bzrlib/btree_index.py 2009-10-15 18:31:17 +0000
@@ -163,6 +163,7 @@
163 node_refs, _ = self._check_key_ref_value(key, references, value)163 node_refs, _ = self._check_key_ref_value(key, references, value)
164 if key in self._nodes:164 if key in self._nodes:
165 raise errors.BadIndexDuplicateKey(key, self)165 raise errors.BadIndexDuplicateKey(key, self)
166 # TODO: StaticTuple
166 self._nodes[key] = (node_refs, value)167 self._nodes[key] = (node_refs, value)
167 self._keys.add(key)168 self._keys.add(key)
168 if self._nodes_by_key is not None and self._key_length > 1:169 if self._nodes_by_key is not None and self._key_length > 1:
@@ -625,6 +626,7 @@
625 for line in lines[2:]:626 for line in lines[2:]:
626 if line == '':627 if line == '':
627 break628 break
629 # TODO: Switch to StaticTuple here.
628 nodes.append(tuple(map(intern, line.split('\0'))))630 nodes.append(tuple(map(intern, line.split('\0'))))
629 return nodes631 return nodes
630632
631633
=== modified file 'bzrlib/builtins.py'
--- bzrlib/builtins.py 2009-10-08 16:32:43 +0000
+++ bzrlib/builtins.py 2009-10-15 18:31:17 +0000
@@ -431,7 +431,10 @@
431 for node in bt.iter_all_entries():431 for node in bt.iter_all_entries():
432 # Node is made up of:432 # Node is made up of:
433 # (index, key, value, [references])433 # (index, key, value, [references])
434 self.outf.write('%s\n' % (node[1:],))434 refs_as_tuples = tuple([tuple([tuple(ref) for ref in ref_list])
435 for ref_list in node[3]])
436 as_tuple = (tuple(node[1]), node[2], refs_as_tuples)
437 self.outf.write('%s\n' % (as_tuple,))
435438
436439
437class cmd_remove_tree(Command):440class cmd_remove_tree(Command):
438441
=== modified file 'bzrlib/index.py'
--- bzrlib/index.py 2009-10-13 05:20:50 +0000
+++ bzrlib/index.py 2009-10-15 18:31:17 +0000
@@ -40,6 +40,7 @@
40 debug,40 debug,
41 errors,41 errors,
42 )42 )
43from bzrlib.static_tuple import StaticTuple
4344
44_HEADER_READV = (0, 200)45_HEADER_READV = (0, 200)
45_OPTION_KEY_ELEMENTS = "key_elements="46_OPTION_KEY_ELEMENTS = "key_elements="
@@ -102,7 +103,7 @@
102103
103 def _check_key(self, key):104 def _check_key(self, key):
104 """Raise BadIndexKey if key is not a valid key for this index."""105 """Raise BadIndexKey if key is not a valid key for this index."""
105 if type(key) != tuple:106 if type(key) not in (tuple, StaticTuple):
106 raise errors.BadIndexKey(key)107 raise errors.BadIndexKey(key)
107 if self._key_length != len(key):108 if self._key_length != len(key):
108 raise errors.BadIndexKey(key)109 raise errors.BadIndexKey(key)
@@ -202,7 +203,9 @@
202 if reference not in self._nodes:203 if reference not in self._nodes:
203 self._check_key(reference)204 self._check_key(reference)
204 absent_references.append(reference)205 absent_references.append(reference)
206 # TODO: StaticTuple
205 node_refs.append(tuple(reference_list))207 node_refs.append(tuple(reference_list))
208 # TODO: StaticTuple
206 return tuple(node_refs), absent_references209 return tuple(node_refs), absent_references
207210
208 def add_node(self, key, value, references=()):211 def add_node(self, key, value, references=()):
209212
=== modified file 'bzrlib/repository.py'
--- bzrlib/repository.py 2009-10-08 22:53:13 +0000
+++ bzrlib/repository.py 2009-10-15 18:31:17 +0000
@@ -4319,6 +4319,13 @@
4319 ):4319 ):
4320 if versioned_file is None:4320 if versioned_file is None:
4321 continue4321 continue
4322 # TODO: key is often going to be a StaticTuple object
4323 # I don't believe we can define a method by which
4324 # (prefix,) + StaticTuple will work, though we could
4325 # define a StaticTuple.sq_concat that would allow you to
4326 # pass in either a tuple or a StaticTuple as the second
4327 # object, so instead we could have:
4328 # StaticTuple(prefix) + key here...
4322 missing_keys.update((prefix,) + key for key in4329 missing_keys.update((prefix,) + key for key in
4323 versioned_file.get_missing_compression_parent_keys())4330 versioned_file.get_missing_compression_parent_keys())
4324 except NotImplementedError:4331 except NotImplementedError:
43254332
=== added file 'bzrlib/static_tuple.py'
--- bzrlib/static_tuple.py 1970-01-01 00:00:00 +0000
+++ bzrlib/static_tuple.py 2009-10-15 18:31:17 +0000
@@ -0,0 +1,25 @@
1# Copyright (C) 2009 Canonical Ltd
2#
3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by
5# the Free Software Foundation; either version 2 of the License, or
6# (at your option) any later version.
7#
8# This program is distributed in the hope that it will be useful,
9# but WITHOUT ANY WARRANTY; without even the implied warranty of
10# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11# GNU General Public License for more details.
12#
13# You should have received a copy of the GNU General Public License
14# along with this program; if not, write to the Free Software
15# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
16
17"""Interface thunk for a StaticTuple implementation."""
18
19try:
20 from bzrlib._static_tuple_c import StaticTuple
21except ImportError, e:
22 from bzrlib import osutils
23 osutils.failed_to_load_extension(e)
24 from bzrlib._static_tuple_py import StaticTuple
25
026
=== modified file 'bzrlib/tests/test__static_tuple.py'
--- bzrlib/tests/test__static_tuple.py 2009-10-12 18:10:24 +0000
+++ bzrlib/tests/test__static_tuple.py 2009-10-15 18:31:17 +0000
@@ -23,6 +23,7 @@
23 _static_tuple_py,23 _static_tuple_py,
24 errors,24 errors,
25 osutils,25 osutils,
26 static_tuple,
26 tests,27 tests,
27 )28 )
2829
@@ -278,6 +279,16 @@
278 self.assertCompareEqual(k3, (k1, ('foo', 'bar')))279 self.assertCompareEqual(k3, (k1, ('foo', 'bar')))
279 self.assertCompareEqual((k1, ('foo', 'bar')), k3)280 self.assertCompareEqual((k1, ('foo', 'bar')), k3)
280281
282 def test_compare_mixed_depths(self):
283 stuple = self.module.StaticTuple
284 k1 = stuple(stuple('a',), stuple('b',))
285 k2 = stuple(stuple(stuple('c',), stuple('d',)),
286 stuple('b',))
287 # This requires comparing a StaticTuple to a 'string', and then
288 # interpreting that value in the next higher StaticTuple. This used to
289 # generate a PyErr_BadIternalCall. We now fall back to *something*.
290 self.assertCompareNoRelation(k1, k2)
291
281 def test_hash(self):292 def test_hash(self):
282 k = self.module.StaticTuple('foo')293 k = self.module.StaticTuple('foo')
283 self.assertEqual(hash(k), hash(('foo',)))294 self.assertEqual(hash(k), hash(('foo',)))
@@ -416,3 +427,13 @@
416 if self.module is _static_tuple_py:427 if self.module is _static_tuple_py:
417 return428 return
418 self.assertIsNot(None, self.module._C_API)429 self.assertIsNot(None, self.module._C_API)
430
431 def test_static_tuple_thunk(self):
432 # Make sure the right implementation is available from
433 # bzrlib.static_tuple.StaticTuple.
434 if self.module is _static_tuple_py:
435 if CompiledStaticTuple.available():
436 # We will be using the C version
437 return
438 self.assertIs(static_tuple.StaticTuple,
439 self.module.StaticTuple)
419440
=== modified file 'bzrlib/util/_bencode_py.py'
--- bzrlib/util/_bencode_py.py 2009-06-10 03:56:49 +0000
+++ bzrlib/util/_bencode_py.py 2009-10-15 18:31:17 +0000
@@ -154,6 +154,13 @@
154 encode_int(int(x), r)154 encode_int(int(x), r)
155 encode_func[BooleanType] = encode_bool155 encode_func[BooleanType] = encode_bool
156156
157try:
158 from bzrlib._static_tuple_c import StaticTuple
159except ImportError:
160 pass
161else:
162 encode_func[StaticTuple] = encode_list
163
157164
158def bencode(x):165def bencode(x):
159 r = []166 r = []
160167
=== modified file 'setup.py'
--- setup.py 2009-10-12 17:03:40 +0000
+++ setup.py 2009-10-15 18:31:17 +0000
@@ -270,7 +270,6 @@
270270
271add_pyrex_extension('bzrlib._annotator_pyx')271add_pyrex_extension('bzrlib._annotator_pyx')
272add_pyrex_extension('bzrlib._bencode_pyx')272add_pyrex_extension('bzrlib._bencode_pyx')
273add_pyrex_extension('bzrlib._btree_serializer_pyx')
274add_pyrex_extension('bzrlib._chunks_to_lines_pyx')273add_pyrex_extension('bzrlib._chunks_to_lines_pyx')
275add_pyrex_extension('bzrlib._groupcompress_pyx',274add_pyrex_extension('bzrlib._groupcompress_pyx',
276 extra_source=['bzrlib/diff-delta.c'])275 extra_source=['bzrlib/diff-delta.c'])
@@ -303,6 +302,8 @@
303add_pyrex_extension('bzrlib._simple_set_pyx')302add_pyrex_extension('bzrlib._simple_set_pyx')
304ext_modules.append(Extension('bzrlib._static_tuple_c',303ext_modules.append(Extension('bzrlib._static_tuple_c',
305 ['bzrlib/_static_tuple_c.c']))304 ['bzrlib/_static_tuple_c.c']))
305add_pyrex_extension('bzrlib._btree_serializer_pyx')
306
306307
307if unavailable_files:308if unavailable_files:
308 print 'C extension(s) not found:'309 print 'C extension(s) not found:'