Merge lp:~gz/bzr/remove_zlib_double_dependency_pyapi into lp:bzr
| Status: | Merged | ||||
|---|---|---|---|---|---|
| Merged at revision: | 5215 | ||||
| Proposed branch: | lp:~gz/bzr/remove_zlib_double_dependency_pyapi | ||||
| Merge into: | lp:bzr | ||||
| Diff against target: |
92 lines (+8/-21) 2 files modified
bzrlib/_chk_map_pyx.pyx (+7/-18) setup.py (+1/-3) |
||||
| To merge this branch: | bzr merge lp:~gz/bzr/remove_zlib_double_dependency_pyapi | ||||
| Related bugs: |
|
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Vincent Ladeuil | 2010-04-19 | Approve on 2010-04-22 | |
| John A Meinel | 2010-04-22 | Pending | |
|
Review via email:
|
|||
Description of the Change
This is a competing proposal against <lp:~gz/bzr/remove_zlib_double_dependency_static>
Avoid the need for zlib.h and linking against the library by using a python-level function call to crc32 (in binascii, but zlib would be fine too).
The downside here is that this way is actually more expensive, the extra boxing/unboxing and refcounting costs something like 0.5μs per call. Also, that cost is likely to be similar on nix, which never had any issues with the old way.
I'm not certain I've done the pyrex code in the best possible manner here for recount minimisation, I believe it is correct though.
| Alexander Belchenko (bialix) wrote : | # |
| Martin Packman (gz) wrote : | # |
It's a pain, yes Alexander, see the bug for more.
| Alexander Belchenko (bialix) wrote : | # |
Martin [gz] пишет:
> It's a pain, yes Alexander, see the bug for more.
I'm abstain to vote on this.
But as C-programmer and given in mind that CRC32 is standard function which should have classic
implementation and unlikely will have security problems of hidden bugs, I'd better have the copy of
that function as C-extension for *Windows* *only*. I don't like the idea of make bzr a bit slower on
Windows: it's already slower here than on Linux.
--
All the dude wanted was his rug back
| Martin Packman (gz) wrote : | # |
Well, I'm not certain this does make anything slower. Yeah, I can measure a difference in a microbenchmark, but I don't know what, if any, workloads these functions are a significant part of. For instance, a profile of a problem I ran the other day totalled ~366 seconds, of which 0.0178 seconds were spent in 5 _search_key_255 calls. This branch will make no measurable difference to that operation.
If these functions aren't hot, there's no reason to worry about μs overhead.
| John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin [gz] wrote:
> Well, I'm not certain this does make anything slower. Yeah, I can measure a difference in a microbenchmark, but I don't know what, if any, workloads these functions are a significant part of. For instance, a profile of a problem I ran the other day totalled ~366 seconds, of which 0.0178 seconds were spent in 5 _search_key_255 calls. This branch will make no measurable difference to that operation.
>
> If these functions aren't hot, there's no reason to worry about μs overhead.
bzr log -v
is potentially one.
5 calls is certainly nothing worth optimizing.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkv
mrkAni/
=Yk61
-----END PGP SIGNATURE-----
| Martin Packman (gz) wrote : | # |
Thanks John, that was just what I was after. I've put some profile numbers in the bug, summary: On a 50 second operation, this branch causes about a 0.05 second slowdown across 75311 calls, 0.01% overall. The difference between runs has a wider variation than that.
| Alexander Belchenko (bialix) wrote : | # |
Martin [gz] пишет:
> Thanks John, that was just what I was after. I've put some profile numbers in the bug, summary: On a 50 second operation, this branch causes about a 0.05 second slowdown across 75311 calls, 0.01% overall. The difference between runs has a wider variation than that.
OK for me.
| Vincent Ladeuil (vila) wrote : | # |
@John: ping
| Alexander Belchenko (bialix) wrote : | # |
I've looked at the changes, it seems correct, but there is very obscure interaction with pyrex generator to get the desired behavior. I dunno, maybe that code would look simpler in plain C. Or maybe not.
I'm second on waiting final word from John Meinel.
| John A Meinel (jameinel) wrote : | # |
So it seems microbenchmarking shows a rather significant difference. Specifically, loading all CHK pages and grabbing their keys, and then running:
$ TIMEIT -s "import cPickle; keys = [i[0] for i in cPickle.
from bzrlib.chk_map import _search_key_255 as sk" \
"[sk(k) for k in keys]"
624ms bzr.dev
1150ms this version
1170ms a version that doesn't optimize for being a StaticTuple
(use key[i] rather than <object>
check, etc.)
9420ms _chk_map_
Now, this is doing it something like 2M times in the loop. So while the proposed version is about 2x slower than the fully optimized one, it is still probably beneath the threshold. And certainly it is nice if it doesn't require having the extra lib around.
There are a couple bits I would suggest, though:
1) unsigned long PyInt_AsUnsigne
PyInt_AsUnsigne
2) We can get rid of a lot of the bit/<object> ugliness by just switching to:
crc_val = PyInt_AsUnsigne
3) If we do that, then we may as well get rid of the StaticTuple_
4) We actually have to be pretty careful about 64-bit platforms, and I don't have one here to test. Specifically, this comment in zlibmodule.c:
int len, signed_val;
...
/* In Python 2.x we return a signed integer regardless of native platform
* long size (the 32bit unsigned long is treated as 32-bit signed and sign
* extended into a 64-bit long inside the integer object). 3.0 does the
* right thing and returns unsigned. http://
signed_val = crc32(crc32val, buf, len);
return PyInt_FromLong(
I know you noted stuff like 0xFFFFFFFF and such, I'd just like to make sure that someone actually runs the tests on a 64-bit platform to make sure we don't break anything.
I pushed my version up as:
lp:~jameinel/bzr/remove_zlib_dependency
As for some of the other commentary:
a) On linux, requiring people install zlib-devel to build doesn't seem particularly onerous. Certainly it is just an apt-get away, and if they used "apt-get build-deps bzr" then they should get it along with all the other gcc, pyrex, etc.
b) Cython/Pyrex vs C. It depends if you actually want to write robust C code that handles exceptions, or if you just write a quick and dirty version. For example, in Pyrex you can write:
cdef extern from "Python.h":
unsigned long PyInt_AsUnsigne
def to_mask(obj):
cdef unsigned long myval
myval = PyInt_AsUnsigne
...
In C, you can also write something like that. Except it really turns into:
unsigned long to_mask(PyObject *obj)
{
unsigned long myval;
myval = PyInt_AsUnsigne
if (myval == -1 && PyErr_Occurred()) {
/* Do something with the exception */
}
}
And even further, in Pyrex you can write:
key[i]
In C, it is *much* uglier, especially if you want...
| Martin Packman (gz) wrote : | # |
Thanks for the review!
> Now, this is doing it something like 2M times in the loop. So while the
> proposed version is about 2x slower than the fully optimized one, it is still
> probably beneath the threshold. And certainly it is nice if it doesn't require
> having the extra lib around.
Hm, that's a little worse than I measured, but I didn't use real keys. Does changing the crc32 function from the one in binascii to the one in zlib make any measurable difference for you?
> PyInt_AsUnsigne
> such.
Made the assumption that if crc32 returned successfully, it gives a valid python integer, so it can be converted without exception. Perhaps a leap too far, or might be clearer using PyInt_AS_LONG and casting to unsigned if we want the shortcut.
> 2) We can get rid of a lot of the bit/<object> ugliness by just switching to:
> crc_val = PyInt_AsUnsigne
Okay, that seems reasonable.
> 3) If we do that, then we may as well get rid of the StaticTuple_
> check at the beginning. After all, if we are optimizing for clarity, we should
> do so.
Isn't throwing an error if it's given something that's not a StaticTuple good for correctness?
> 4) We actually have to be pretty careful about 64-bit platforms, and I don't
> have one here to test. Specifically, this comment in zlibmodule.c:
> int len, signed_val;
> ...
> /* In Python 2.x we return a signed integer regardless of native platform
> * long size (the 32bit unsigned long is treated as 32-bit signed and sign
> * extended into a 64-bit long inside the integer object). 3.0 does the
> * right thing and returns unsigned. http://
> signed_val = crc32(crc32val, buf, len);
> return PyInt_FromLong(
>
> I know you noted stuff like 0xFFFFFFFF and such, I'd just like to make sure
> that someone actually runs the tests on a 64-bit platform to make sure we
> don't break anything.
Okay, I *was* reasonably sure 64-bit was covered, but having looked at it again am now worried. I think we need an extra step, checking it would be good.
| John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin [gz] wrote:
> Thanks for the review!
>
>> Now, this is doing it something like 2M times in the loop. So while the
>> proposed version is about 2x slower than the fully optimized one, it is still
>> probably beneath the threshold. And certainly it is nice if it doesn't require
>> having the extra lib around.
>
> Hm, that's a little worse than I measured, but I didn't use real keys. Does changing the crc32 function from the one in binascii to the one in zlib make any measurable difference for you?
1.18s binascii.crc32
1.08s zlib.crc32
I'm a bit surprised to see that much of a difference, but certainly
worth doing.
Also, I'm testing it on Windows, and there could be platform specific
differences.
>
>> PyInt_AsUnsigne
>> such.
>
> Made the assumption that if crc32 returned successfully, it gives a valid python integer, so it can be converted without exception. Perhaps a leap too far, or might be clearer using PyInt_AS_LONG and casting to unsigned if we want the shortcut.
>
I haven't profiled, but I'm guessing the overhead is in the
boxing/unboxing and other crc32 overhead. And very little is in
UnsignedLongMask.
>> 2) We can get rid of a lot of the bit/<object> ugliness by just switching to:
>> crc_val = PyInt_AsUnsigne
>
> Okay, that seems reasonable.
>
>> 3) If we do that, then we may as well get rid of the StaticTuple_
>> check at the beginning. After all, if we are optimizing for clarity, we should
>> do so.
>
> Isn't throwing an error if it's given something that's not a StaticTuple good for correctness?
>
The reason the check was there was because we used
StaticTuple_
fair amount of fallout that could have been prevented if we used the
"simpler" form.
I generally prefer to just go with duck-typing as much as possible,
rather than scattering essentially 'assertions' around the codebase. If
there are performance issues, we should have tests for it, and ensure it
that way.
>> 4) We actually have to be pretty careful about 64-bit platforms, and I don't
>> have one here to test. Specifically, this comment in zlibmodule.c:
>> int len, signed_val;
>> ...
>> /* In Python 2.x we return a signed integer regardless of native platform
>> * long size (the 32bit unsigned long is treated as 32-bit signed and sign
>> * extended into a 64-bit long inside the integer object). 3.0 does the
>> * right thing and returns unsigned. http://
>> signed_val = crc32(crc32val, buf, len);
>> return PyInt_FromLong(
>>
>> I know you noted stuff like 0xFFFFFFFF and such, I'd just like to make sure
>> that someone actually runs the tests on a 64-bit platform to make sure we
>> don't break anything.
>
> Okay, I *was* reasonably sure 64-bit was covered, but having looked at it again am now worried. I think we need an extra step, checking it would be good.
Yeah, I'm not sure what is going to happen. The "sign extended into a
64-bit long" is a bit worrying. Off-hand I'm ...
| Vincent Ladeuil (vila) wrote : | # |
Marking as work in progress.
As an experiment, I've setup a job on babune (http://
It will run on hardy (32bits) and karmic (64bits) (I can't use windows here as the test suite
is still too broken for that).
It will run the full test suite so if you have only a subset you're interested in,
let me know, this would make the feedback loop shorter as well.

Martin [gz] пишет: /bugs.launchpad .net/bugs/ 566923
> Martin [gz] has proposed merging lp:~gz/bzr/remove_zlib_double_dependency_pyapi into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #566923 Remove zlib double-dependency
> https:/
>
>
> This is a competing proposal against <lp:~gz/bzr/remove_zlib_double_dependency_static>
>
> Avoid the need for zlib.h and linking against the library by using a python-level function call to crc32 (in binascii, but zlib would be fine too).
Does zlib.h is really the problem?
--
All the dude wanted was his rug back