Merge lp:~gz/bzr/remove_zlib_double_dependency_pyapi into lp:bzr

Proposed by Martin Packman on 2010-04-19
Status: Merged
Merged at revision: 5215
Proposed branch: lp:~gz/bzr/remove_zlib_double_dependency_pyapi
Merge into: lp:bzr
Diff against target: 92 lines (+8/-21)
2 files modified
bzrlib/_chk_map_pyx.pyx (+7/-18)
setup.py (+1/-3)
To merge this branch: bzr merge lp:~gz/bzr/remove_zlib_double_dependency_pyapi
Reviewer Review Type Date Requested Status
Vincent Ladeuil 2010-04-19 Approve on 2010-04-22
John A Meinel 2010-04-22 Pending
Review via email: mp+23711@code.launchpad.net

Description of the Change

This is a competing proposal against <lp:~gz/bzr/remove_zlib_double_dependency_static>

Avoid the need for zlib.h and linking against the library by using a python-level function call to crc32 (in binascii, but zlib would be fine too).

The downside here is that this way is actually more expensive, the extra boxing/unboxing and refcounting costs something like 0.5μs per call. Also, that cost is likely to be similar on nix, which never had any issues with the old way.

I'm not certain I've done the pyrex code in the best possible manner here for recount minimisation, I believe it is correct though.

To post a comment you must log in.
Alexander Belchenko (bialix) wrote :

Martin [gz] пишет:
> Martin [gz] has proposed merging lp:~gz/bzr/remove_zlib_double_dependency_pyapi into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #566923 Remove zlib double-dependency
> https://bugs.launchpad.net/bugs/566923
>
>
> This is a competing proposal against <lp:~gz/bzr/remove_zlib_double_dependency_static>
>
> Avoid the need for zlib.h and linking against the library by using a python-level function call to crc32 (in binascii, but zlib would be fine too).

Does zlib.h is really the problem?

--

All the dude wanted was his rug back

Martin Packman (gz) wrote :

It's a pain, yes Alexander, see the bug for more.

Alexander Belchenko (bialix) wrote :

Martin [gz] пишет:
> It's a pain, yes Alexander, see the bug for more.

I'm abstain to vote on this.

But as C-programmer and given in mind that CRC32 is standard function which should have classic
implementation and unlikely will have security problems of hidden bugs, I'd better have the copy of
that function as C-extension for *Windows* *only*. I don't like the idea of make bzr a bit slower on
Windows: it's already slower here than on Linux.

--
All the dude wanted was his rug back

Martin Packman (gz) wrote :

Well, I'm not certain this does make anything slower. Yeah, I can measure a difference in a microbenchmark, but I don't know what, if any, workloads these functions are a significant part of. For instance, a profile of a problem I ran the other day totalled ~366 seconds, of which 0.0178 seconds were spent in 5 _search_key_255 calls. This branch will make no measurable difference to that operation.

If these functions aren't hot, there's no reason to worry about μs overhead.

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin [gz] wrote:
> Well, I'm not certain this does make anything slower. Yeah, I can measure a difference in a microbenchmark, but I don't know what, if any, workloads these functions are a significant part of. For instance, a profile of a problem I ran the other day totalled ~366 seconds, of which 0.0178 seconds were spent in 5 _search_key_255 calls. This branch will make no measurable difference to that operation.
>
> If these functions aren't hot, there's no reason to worry about μs overhead.

bzr log -v

is potentially one.

5 calls is certainly nothing worth optimizing.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvPHhoACgkQJdeBCYSNAAMAVgCgkCGWuNx7KQEmHWnUBDkoG2C7
mrkAni/SIdppsfMVheNxtF0yVO1al0In
=Yk61
-----END PGP SIGNATURE-----

Martin Packman (gz) wrote :

Thanks John, that was just what I was after. I've put some profile numbers in the bug, summary: On a 50 second operation, this branch causes about a 0.05 second slowdown across 75311 calls, 0.01% overall. The difference between runs has a wider variation than that.

Alexander Belchenko (bialix) wrote :

Martin [gz] пишет:
> Thanks John, that was just what I was after. I've put some profile numbers in the bug, summary: On a 50 second operation, this branch causes about a 0.05 second slowdown across 75311 calls, 0.01% overall. The difference between runs has a wider variation than that.

OK for me.

Vincent Ladeuil (vila) wrote :

Sounds fine, I'll ask John for a second vote.

review: Approve
Vincent Ladeuil (vila) wrote :

@John: ping

Alexander Belchenko (bialix) wrote :

I've looked at the changes, it seems correct, but there is very obscure interaction with pyrex generator to get the desired behavior. I dunno, maybe that code would look simpler in plain C. Or maybe not.

I'm second on waiting final word from John Meinel.

John A Meinel (jameinel) wrote :
Download full text (3.6 KiB)

So it seems microbenchmarking shows a rather significant difference. Specifically, loading all CHK pages and grabbing their keys, and then running:

$ TIMEIT -s "import cPickle; keys = [i[0] for i in cPickle.load(open(',,all_items.pickle', 'rb'))];\
from bzrlib.chk_map import _search_key_255 as sk" \
"[sk(k) for k in keys]"

 624ms bzr.dev
1150ms this version
1170ms a version that doesn't optimize for being a StaticTuple
        (use key[i] rather than <object>StaticTuple_GET_ITEM_ptr(key, i), and take out the type
        check, etc.)
9420ms _chk_map_py._search_key_255 (pure python version)

Now, this is doing it something like 2M times in the loop. So while the proposed version is about 2x slower than the fully optimized one, it is still probably beneath the threshold. And certainly it is nice if it doesn't require having the extra lib around.

There are a couple bits I would suggest, though:

1) unsigned long PyInt_AsUnsignedLongMask(object) except? -1

PyInt_AsUnsignedLongMask can throw an exception, so we should define it as such.

2) We can get rid of a lot of the bit/<object> ugliness by just switching to:
        crc_val = PyInt_AsUnsignedLongMask(crc32(key[i]))

3) If we do that, then we may as well get rid of the StaticTuple_CHECK_EXACT check at the beginning. After all, if we are optimizing for clarity, we should do so.

4) We actually have to be pretty careful about 64-bit platforms, and I don't have one here to test. Specifically, this comment in zlibmodule.c:
    int len, signed_val;
    ...
    /* In Python 2.x we return a signed integer regardless of native platform
     * long size (the 32bit unsigned long is treated as 32-bit signed and sign
     * extended into a 64-bit long inside the integer object). 3.0 does the
     * right thing and returns unsigned. http://bugs.python.org/issue1202 */
    signed_val = crc32(crc32val, buf, len);
    return PyInt_FromLong(signed_val);

I know you noted stuff like 0xFFFFFFFF and such, I'd just like to make sure that someone actually runs the tests on a 64-bit platform to make sure we don't break anything.

I pushed my version up as:

  lp:~jameinel/bzr/remove_zlib_dependency

As for some of the other commentary:

a) On linux, requiring people install zlib-devel to build doesn't seem particularly onerous. Certainly it is just an apt-get away, and if they used "apt-get build-deps bzr" then they should get it along with all the other gcc, pyrex, etc.

b) Cython/Pyrex vs C. It depends if you actually want to write robust C code that handles exceptions, or if you just write a quick and dirty version. For example, in Pyrex you can write:

cdef extern from "Python.h":
      unsigned long PyInt_AsUnsignedLongMask(object) except? -1

def to_mask(obj):
  cdef unsigned long myval

  myval = PyInt_AsUnsignedLongMask(obj)
...

In C, you can also write something like that. Except it really turns into:

unsigned long to_mask(PyObject *obj)
{
  unsigned long myval;

  myval = PyInt_AsUnsignedLong(obj);
  if (myval == -1 && PyErr_Occurred()) {
    /* Do something with the exception */
  }
}

And even further, in Pyrex you can write:

  key[i]

In C, it is *much* uglier, especially if you want...

Read more...

Martin Packman (gz) wrote :

Thanks for the review!

> Now, this is doing it something like 2M times in the loop. So while the
> proposed version is about 2x slower than the fully optimized one, it is still
> probably beneath the threshold. And certainly it is nice if it doesn't require
> having the extra lib around.

Hm, that's a little worse than I measured, but I didn't use real keys. Does changing the crc32 function from the one in binascii to the one in zlib make any measurable difference for you?

> PyInt_AsUnsignedLongMask can throw an exception, so we should define it as
> such.

Made the assumption that if crc32 returned successfully, it gives a valid python integer, so it can be converted without exception. Perhaps a leap too far, or might be clearer using PyInt_AS_LONG and casting to unsigned if we want the shortcut.

> 2) We can get rid of a lot of the bit/<object> ugliness by just switching to:
> crc_val = PyInt_AsUnsignedLongMask(crc32(key[i]))

Okay, that seems reasonable.

> 3) If we do that, then we may as well get rid of the StaticTuple_CHECK_EXACT
> check at the beginning. After all, if we are optimizing for clarity, we should
> do so.

Isn't throwing an error if it's given something that's not a StaticTuple good for correctness?

> 4) We actually have to be pretty careful about 64-bit platforms, and I don't
> have one here to test. Specifically, this comment in zlibmodule.c:
> int len, signed_val;
> ...
> /* In Python 2.x we return a signed integer regardless of native platform
> * long size (the 32bit unsigned long is treated as 32-bit signed and sign
> * extended into a 64-bit long inside the integer object). 3.0 does the
> * right thing and returns unsigned. http://bugs.python.org/issue1202 */
> signed_val = crc32(crc32val, buf, len);
> return PyInt_FromLong(signed_val);
>
> I know you noted stuff like 0xFFFFFFFF and such, I'd just like to make sure
> that someone actually runs the tests on a 64-bit platform to make sure we
> don't break anything.

Okay, I *was* reasonably sure 64-bit was covered, but having looked at it again am now worried. I think we need an extra step, checking it would be good.

John A Meinel (jameinel) wrote :
Download full text (3.8 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin [gz] wrote:
> Thanks for the review!
>
>> Now, this is doing it something like 2M times in the loop. So while the
>> proposed version is about 2x slower than the fully optimized one, it is still
>> probably beneath the threshold. And certainly it is nice if it doesn't require
>> having the extra lib around.
>
> Hm, that's a little worse than I measured, but I didn't use real keys. Does changing the crc32 function from the one in binascii to the one in zlib make any measurable difference for you?

1.18s binascii.crc32
1.08s zlib.crc32

I'm a bit surprised to see that much of a difference, but certainly
worth doing.

Also, I'm testing it on Windows, and there could be platform specific
differences.

>
>> PyInt_AsUnsignedLongMask can throw an exception, so we should define it as
>> such.
>
> Made the assumption that if crc32 returned successfully, it gives a valid python integer, so it can be converted without exception. Perhaps a leap too far, or might be clearer using PyInt_AS_LONG and casting to unsigned if we want the shortcut.
>

I haven't profiled, but I'm guessing the overhead is in the
boxing/unboxing and other crc32 overhead. And very little is in
UnsignedLongMask.

>> 2) We can get rid of a lot of the bit/<object> ugliness by just switching to:
>> crc_val = PyInt_AsUnsignedLongMask(crc32(key[i]))
>
> Okay, that seems reasonable.
>
>> 3) If we do that, then we may as well get rid of the StaticTuple_CHECK_EXACT
>> check at the beginning. After all, if we are optimizing for clarity, we should
>> do so.
>
> Isn't throwing an error if it's given something that's not a StaticTuple good for correctness?
>

The reason the check was there was because we used
StaticTuple_GET_ITEM_ptr(), so we had to be sure. It actually caused a
fair amount of fallout that could have been prevented if we used the
"simpler" form.

I generally prefer to just go with duck-typing as much as possible,
rather than scattering essentially 'assertions' around the codebase. If
there are performance issues, we should have tests for it, and ensure it
that way.

>> 4) We actually have to be pretty careful about 64-bit platforms, and I don't
>> have one here to test. Specifically, this comment in zlibmodule.c:
>> int len, signed_val;
>> ...
>> /* In Python 2.x we return a signed integer regardless of native platform
>> * long size (the 32bit unsigned long is treated as 32-bit signed and sign
>> * extended into a 64-bit long inside the integer object). 3.0 does the
>> * right thing and returns unsigned. http://bugs.python.org/issue1202 */
>> signed_val = crc32(crc32val, buf, len);
>> return PyInt_FromLong(signed_val);
>>
>> I know you noted stuff like 0xFFFFFFFF and such, I'd just like to make sure
>> that someone actually runs the tests on a 64-bit platform to make sure we
>> don't break anything.
>
> Okay, I *was* reasonably sure 64-bit was covered, but having looked at it again am now worried. I think we need an extra step, checking it would be good.

Yeah, I'm not sure what is going to happen. The "sign extended into a
64-bit long" is a bit worrying. Off-hand I'm ...

Read more...

Vincent Ladeuil (vila) wrote :

Marking as work in progress.

As an experiment, I've setup a job on babune (http://babune.ladeuil.net:24842) called remove_zlib_double_dependency_py that should trigger on every push on the branch associated with this mp.

It will run on hardy (32bits) and karmic (64bits) (I can't use windows here as the test suite
is still too broken for that).

It will run the full test suite so if you have only a subset you're interested in,
let me know, this would make the feedback loop shorter as well.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/_chk_map_pyx.pyx'
2--- bzrlib/_chk_map_pyx.pyx 2010-02-17 17:11:16 +0000
3+++ bzrlib/_chk_map_pyx.pyx 2010-04-19 21:30:29 +0000
4@@ -36,6 +36,7 @@
5 int PyString_CheckExact(object)
6 char *PyString_AS_STRING(object s)
7 Py_ssize_t PyString_GET_SIZE(object)
8+ unsigned long PyInt_AsUnsignedLongMask(object)
9
10 int PyDict_SetItem(object d, object k, object v) except -1
11
12@@ -63,12 +64,8 @@
13 PyObject * StaticTuple_GET_ITEM_ptr "StaticTuple_GET_ITEM" (StaticTuple,
14 Py_ssize_t)
15
16-cdef extern from "zlib.h":
17- ctypedef unsigned long uLong
18- ctypedef unsigned int uInt
19- ctypedef unsigned char Bytef
20-
21- uLong crc32(uLong crc, Bytef *buf, uInt len)
22+cdef object crc32
23+from binascii import crc32
24
25
26 # Set up the StaticTuple C_API functionality
27@@ -101,9 +98,7 @@
28 cdef Py_ssize_t num_bits
29 cdef Py_ssize_t i, j
30 cdef Py_ssize_t num_out_bytes
31- cdef Bytef *c_bit
32- cdef uLong c_len
33- cdef uInt crc_val
34+ cdef unsigned long crc_val
35 cdef Py_ssize_t out_off
36 cdef char *c_out
37 cdef PyObject *bit
38@@ -125,9 +120,7 @@
39 bit = StaticTuple_GET_ITEM_ptr(key, i)
40 if not PyString_CheckExact_ptr(bit):
41 raise TypeError('Bit %d of %r is not a string' % (i, key))
42- c_bit = <Bytef *>PyString_AS_STRING_ptr(bit)
43- c_len = PyString_GET_SIZE_ptr(bit)
44- crc_val = crc32(0, c_bit, c_len)
45+ crc_val = PyInt_AsUnsignedLongMask(crc32(<object>bit))
46 # Hex(val) order
47 sprintf(c_out, '%08X', crc_val)
48 c_out = c_out + 8
49@@ -139,9 +132,7 @@
50 cdef Py_ssize_t num_bits
51 cdef Py_ssize_t i, j
52 cdef Py_ssize_t num_out_bytes
53- cdef Bytef *c_bit
54- cdef uLong c_len
55- cdef uInt crc_val
56+ cdef unsigned long crc_val
57 cdef Py_ssize_t out_off
58 cdef char *c_out
59 cdef PyObject *bit
60@@ -161,9 +152,7 @@
61 if not PyString_CheckExact_ptr(bit):
62 raise TypeError('Bit %d of %r is not a string: %r'
63 % (i, key, <object>bit))
64- c_bit = <Bytef *>PyString_AS_STRING_ptr(bit)
65- c_len = PyString_GET_SIZE_ptr(bit)
66- crc_val = crc32(0, c_bit, c_len)
67+ crc_val = PyInt_AsUnsignedLongMask(crc32(<object>bit))
68 # MSB order
69 c_out[0] = (crc_val >> 24) & 0xFF
70 c_out[1] = (crc_val >> 16) & 0xFF
71
72=== modified file 'setup.py'
73--- setup.py 2010-04-08 04:34:03 +0000
74+++ setup.py 2010-04-19 21:30:29 +0000
75@@ -281,7 +281,6 @@
76 add_pyrex_extension('bzrlib._dirstate_helpers_pyx',
77 libraries=['Ws2_32'])
78 add_pyrex_extension('bzrlib._walkdirs_win32')
79- z_lib = 'zdll'
80 else:
81 if have_pyrex and pyrex_version_info[:3] == (0,9,4):
82 # Pyrex 0.9.4.1 fails to compile this extension correctly
83@@ -298,8 +297,7 @@
84 else:
85 add_pyrex_extension('bzrlib._dirstate_helpers_pyx')
86 add_pyrex_extension('bzrlib._readdir_pyx')
87- z_lib = 'z'
88-add_pyrex_extension('bzrlib._chk_map_pyx', libraries=[z_lib])
89+add_pyrex_extension('bzrlib._chk_map_pyx')
90 ext_modules.append(Extension('bzrlib._patiencediff_c',
91 ['bzrlib/_patiencediff_c.c']))
92 if have_pyrex and pyrex_version_info < (0, 9, 6, 3):