Merge lp:~martin-lp/hipl/n900-build-fix into lp:hipl
- n900-build-fix
- Merge into trunk
Status: | Merged |
---|---|
Approved by: | René Hummen |
Approved revision: | 6083 |
Merged at revision: | 6084 |
Proposed branch: | lp:~martin-lp/hipl/n900-build-fix |
Merge into: | lp:hipl |
Diff against target: |
14 lines (+4/-0) 1 file modified
test/lib/core/hostid.c (+4/-0) |
To merge this branch: | bzr merge lp:~martin-lp/hipl/n900-build-fix |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
René Hummen | Approve | ||
Diego Biurrun | Approve | ||
Christof Mroz | Approve | ||
Review via email: mp+78551@code.launchpad.net |
This proposal supersedes a proposal from 2011-09-22.
Commit message
Description of the change
Our check framework has a default timeout of 4 seconds for tests
which is not sufficient for key generation in the scratchbox
environment or on the N900. As a result the unit test and package
build fails there.
This branch sets the timeout to 120 seconds which seems to be
sufficient for current tests.
Failures while building the N900 package or running the unit tests
on the respective environment look like this:
debug(test/
debug(lib/
debug(lib/
Killed
With the changes in this branch it should not happen anymore.
Purpose of the review:
All good now? Everybody happy with how it's solved?
It's cleaner than a #define for sure. :)
Miika Komu (miika-iki) wrote : Posted in a previous version of this proposal | # |
Stefan Götz (stefan.goetz-deactivatedaccount) wrote : Posted in a previous version of this proposal | # |
Maybe https:/
René Hummen (rene-hummen) wrote : Posted in a previous version of this proposal | # |
I also get abort with 1024 bit keys. David can you please post some more details on what you already checked.
David Martin (martin-lp) wrote : Posted in a previous version of this proposal | # |
Hi,
On Thu, Sep 22, 2011 at 5:35 PM, Diego Biurrun <email address hidden> wrote:
> review reject
>
> I think you should really give this another try and dig deeper. Find out
> which function is failing with some printfs, debug some more.
like I said it's RSA_generate_key() from the OpenSSL library which fails.
> You could also try to update the check framework in scratchbox and/or OpenSSL.
This may be worth a try, I'll have a look.
>> --- test/lib/
>> +++ test/lib/
>> @@ -64,10 +64,20 @@
>> +/*
>> + * create_rsa_key() fails for key sizes greater than 1024 on both the
>> + * N900 and the respective scratchbox environment.
>> + * Use HAVE_TCASE_
>> + * the version of check on the scratchbox environment.
>> + */
>> +#ifdef HAVE_TCASE_
>> + int bits[3] = { 1024, 2048, 3072 };
>> +#else
>> + int bits[1] = { 1024 };
>> +#endif
>> + RSA *key = NULL, *key_deserialized = NULL;
>> + EVP_PKEY *key_a = NULL, *key_b = NULL;
>> unsigned char *keyrr;
>> struct hip_host_id_priv hostid;
>
> Not a pretty sight ;)
It wasn't supposed to win any beauty contests and that's why I ask
for a better way to deal with it. :p
On Thu, Sep 22, 2011 at 7:14 PM, Miika Komu <email address hidden> wrote:
> Please show the "backtrace" of gdb.
There's no real backtrace as it is openssl's function that fails.
Nevertheless here is the output:
> debug(test/
> lib/core/hostid
> debug(test/
> debug(lib/
>
>
> debug(lib/
> debug(lib/
> debug(lib/
>
>
> debug(lib/
>
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> (gdb) bt
> No stack.
(before and after generating key are debug prints added by me in the lines before
and after the RSA_generate_key() call in /lib/core/crypto.c create_rsa_key())
On Fri, Sep 23, 2011 at 7:52 PM, Stefan Götz <email address hidden> wrote:
> Maybe https:/
I doubt it. :) It does not seem to be a problem with our code but with
OpenSSL or maybe our use of OpenSSL.
David Martin (martin-lp) wrote : Posted in a previous version of this proposal | # |
Hi,
On Mon, Sep 26, 2011 at 4:32 PM, Christof Mroz <email address hidden> wrote:
> On Mon, 26 Sep 2011 15:56:24 +0200, David Martin
> <email address hidden> wrote:
>
>>> Program terminated with signal SIGKILL, Killed.
>>> The program no longer exists.
>>> (gdb) bt
>>> No stack.
>
> Can you dump the rlimits inside the test? See getrlimit(2).
Not sure but I'll have a look at it.
> If all else fails, you may try building OpenSSL yourself with debug symbols
> so you can circumscript the culprit even further (or learn ASM :) ). Also,
> did you try strace yet?
Nope, didn't try strace yet. Thanks for the hint. Building OpenSSL is my current
plan as our version of OpenSSL already is the most recent from the Maemo repos.
Diego Biurrun (diego-biurrun) wrote : Posted in a previous version of this proposal | # |
On Mon, Sep 26, 2011 at 01:56:24PM +0000, David Martin wrote:
>
> On Thu, Sep 22, 2011 at 5:35 PM, Diego Biurrun <email address hidden> wrote:
> > review reject
> >
> > I think you should really give this another try and dig deeper. Find out
> > which function is failing with some printfs, debug some more.
>
> like I said it's RSA_generate_key() from the OpenSSL library which fails.
As a next step, print out the values that we pass to that function, maybe
some are invalid, NULL or whatever. If pointers are passed, dereference
them and print their values, maybe they point at invalid values.
Diego
David Martin (martin-lp) wrote : Posted in a previous version of this proposal | # |
Hi,
did some more research on this and did actually find something.
On Tue, Sep 27, 2011 at 9:51 AM, Diego Biurrun <email address hidden> wrote:
> On Mon, Sep 26, 2011 at 01:56:24PM +0000, David Martin wrote:
>>
>> On Thu, Sep 22, 2011 at 5:35 PM, Diego Biurrun <email address hidden> wrote:
>> > review reject
>> >
>> > I think you should really give this another try and dig deeper. Find out
>> > which function is failing with some printfs, debug some more.
>>
>> like I said it's RSA_generate_key() from the OpenSSL library which fails.
>
> As a next step, print out the values that we pass to that function, maybe
> some are invalid, NULL or whatever. If pointers are passed, dereference
> them and print their values, maybe they point at invalid values.
We don't pass any pointers (well, NULL) and our values are fine. :)
On Mon, Sep 26, 2011 at 4:32 PM, Christof Mroz <email address hidden> wrote:
> Can you dump the rlimits inside the test? See getrlimit(2).
Any specific rlimit you had in mind? RLIMIT_CPU for cpu time does not seem to be
set and limiting it by hand (eg. to a second and choosing a stronger key) results
in a graceful exit with the respective signal (cpu time exceeded).
On Mon, Sep 26, 2011 at 4:32 PM, Christof Mroz <email address hidden> wrote:
> If all else fails, you may try building OpenSSL yourself with debug symbols
> so you can circumscript the culprit even further (or learn ASM :) ). Also,
> did you try strace yet?
Did not try the newer OpenSSL yet but strace is neat and helped to narrow down
the problem.
Both the N900 and the version compiled on passion set a SIGALRM with rt_sigaction
(which I think is responsible for killing it).
The N900 binary in addition sets a timer to 3 seconds with setitimer. This one
sends the SIGALRM when the timer reaches zero and this is what happens before
it breaks.
I'm not sure yet who is responsible for setting the timer. Is this platform
specific? I think Android does something like this where a process gets killed
if it's unresponsive for more than a few seconds.
Should I upload the traces? I don't see a button for this in launchpad (I'm pretty
sure to have done that before, though).
David Martin (martin-lp) wrote : Posted in a previous version of this proposal | # |
Hi,
seems like our problem was with the check framework which sets a default timeout
of 4 seconds for tests. I'll test which limit works and request another review
of the merge request.
On Wed, Oct 5, 2011 at 6:06 PM, Christof Mroz <email address hidden> wrote:
> On Wed, 05 Oct 2011 16:44:24 +0200, David Martin
> <email address hidden> wrote:
>
>> I'm not sure yet who is responsible for setting the timer.
>
> Then set a BP on setitimer() in gdb, instruct it to BT and CONT on each hit.
> This could give you a rough idea who set the timer (even more useful with a
> debug build of openssl, of course).
>
> You can also try to ignore the timer using LD_PRELOAD and a crafted SO.
> IIRC, just create a C file with your fake setitimer() implementation
> (signature must match, adding a debug print statement is a good idea),
> compile with -fpic and -shared and preload the resulting binary to see if it
> keeps getting killed.
Dang, you come up with the craziest debugging ideas! ;) Think that's not
necessary anymore.
>> Is this platform specific? I think Android does something like this where
>> a process gets killed if it's unresponsive for more than a few seconds.
>
> That's a very good idea, did you try adding a sleep(INT_MAX) to confirm
> this?
Using sleep was a good idea. It's not platform specific but the check framework
which has timeouts for the tests. If you force it to run longer on passion it
exits gracefully with a proper message:
> test/lib/
Seems like the check version on the scratchbox environment is not that convenient
and simply breaks.
Diego Biurrun (diego-biurrun) wrote : | # |
review needs-info
On Fri, Oct 07, 2011 at 09:20:31AM +0000, David Martin wrote:
>
> --- test/lib/
> +++ test/lib/
> @@ -65,7 +65,7 @@
> START_TEST(
> {
> unsigned int i, keyrr_len = 0;
> - int bits[3] = {1024, 2048, 3072};
> + int bits[3] = { 1024, 2048, 3072 };
> RSA *key = NULL, *key_deserialized = NULL;
This you can push anytime.
> @@ -126,6 +126,11 @@
>
> TCase *tc_core = tcase_create(
> + // the default test timeout of 4 seconds is too short,
> + // generating keys in scratchbox or on the N900 takes
> + // a while
nit: no need to break the second line and /* */ comments are preferred
> + tcase_set_
I wonder if we should not set this in a slightly more global place.
Diego
David Martin (martin-lp) wrote : | # |
Hi,
On Fri, Oct 7, 2011 at 11:28 AM, Diego Biurrun <email address hidden> wrote:
> On Fri, Oct 07, 2011 at 09:20:31AM +0000, David Martin wrote:
>>
>> --- test/lib/
>> +++ test/lib/
>> @@ -65,7 +65,7 @@
>> START_TEST(
>> {
>> unsigned int i, keyrr_len = 0;
>> - int bits[3] = {1024, 2048, 3072};
>> + int bits[3] = { 1024, 2048, 3072 };
>> RSA *key = NULL, *key_deserialized = NULL;
>
> This you can push anytime.
kk, will do so later.
>> @@ -126,6 +126,11 @@
>>
>> TCase *tc_core = tcase_create(
>> + // the default test timeout of 4 seconds is too short,
>> + // generating keys in scratchbox or on the N900 takes
>> + // a while
>
> nit: no need to break the second line and /* */ comments are preferred
will fix that.
>> + tcase_set_
>
> I wonder if we should not set this in a slightly more global place.
Well, we can set it in a global variable (see the manual
http://
The key generation seems to be the exception though with taking such
a long time.
Christof Mroz (christof-mroz) wrote : | # |
I wouldn't set a global time limit: this way we can still catch poorly performing functions in other tests. So I'm fine with the new approach.
Christof Mroz (christof-mroz) wrote : | # |
Semi-related: is there a trivial way to change the "killed" message, as printed in case of a timeout, to something more helpful? Without modifying the check framework, I mean.
- 6083. By David Martin
-
Cosmetics: Use /* */ for multiline comment in hostid unit test.
David Martin (martin-lp) wrote : | # |
On Fri, Oct 7, 2011 at 1:21 PM, Christof Mroz <email address hidden> wrote:
> I wouldn't set a global time limit: this way we can still catch poorly performing functions in other tests. So I'm fine with the new approach.
Sounds reasonable.
> Semi-related: is there a trivial way to change the "killed" message, as printed in case of a timeout, to something more helpful? Without modifying the check framework, I mean.
To be honest, no idea. Should I set up a bug report for it and attach the strace logs?
On Fri, Oct 7, 2011 at 11:28 AM, Diego Biurrun <email address hidden> wrote:
> On Fri, Oct 07, 2011 at 09:20:31AM +0000, David Martin wrote:
>> + tcase_set_
>
> I wonder if we should not set this in a slightly more global place.
Any more thoughts on this? I've fixed the cosmetics and it would be nice to get
this fix into trunk.
Diego Biurrun (diego-biurrun) wrote : | # |
review approve
On Mon, Oct 10, 2011 at 12:20:38PM +0000, David Martin wrote:
> On Fri, Oct 7, 2011 at 1:21 PM, Christof Mroz <email address hidden> wrote:
> > I wouldn't set a global time limit: this way we can still catch poorly performing functions in other tests. So I'm fine with the new approach.
>
> Sounds reasonable.
>
> On Fri, Oct 7, 2011 at 11:28 AM, Diego Biurrun <email address hidden> wrote:
> > On Fri, Oct 07, 2011 at 09:20:31AM +0000, David Martin wrote:
>
> >> + tcase_set_
> >
> > I wonder if we should not set this in a slightly more global place.
>
> Any more thoughts on this? I've fixed the cosmetics and it would be nice to get
> this fix into trunk.
I tend to suspect that we may run into this problem in other places in
the future, so I'm inclined to set this (more) globally, but I don't
much mind either way. Push in whichever way you prefer.
Diego
René Hummen (rene-hummen) wrote : | # |
Please add a bug to the bugtracker for future reference. Thanks to everyone for solving this.
Christof Mroz (christof-mroz) wrote : | # |
On Mon, 10 Oct 2011 14:35:06 +0200, Diego Biurrun <email address hidden> wrote:
>>>> + tcase_set_
>>>
>>> I wonder if we should not set this in a slightly more global place.
>>
>> Any more thoughts on this? I've fixed the cosmetics and it would be
>> nice to get
>> this fix into trunk.
>
> I tend to suspect that we may run into this problem in other places in
> the future, so I'm inclined to set this (more) globally, but I don't
> much mind either way.
As a compromise, we could do that the next time we encounter a timeout on
the n900. But with a lower timeout, since 120 seconds are excessive even
on a phone for most operations. In this case (if I see it correctly),
multiple tests were bundled into one which stretched the runtime even
further.
David Martin (martin-lp) wrote : | # |
Hi,
On Mon, Oct 10, 2011 at 3:52 PM, Christof Mroz <email address hidden> wrote:
> On Mon, 10 Oct 2011 14:35:06 +0200, Diego Biurrun <email address hidden> wrote:
>
>>>>> + tcase_set_
>>>>
>>>> I wonder if we should not set this in a slightly more global place.
>>>
>>> Any more thoughts on this? I've fixed the cosmetics and it would be
>>> nice to get
>>> this fix into trunk.
>>
>> I tend to suspect that we may run into this problem in other places in
>> the future, so I'm inclined to set this (more) globally, but I don't
>> much mind either way.
>
> As a compromise, we could do that the next time we encounter a timeout on
> the n900.
Think so too.
> But with a lower timeout, since 120 seconds are excessive even
> on a phone for most operations. In this case (if I see it correctly),
> multiple tests were bundled into one which stretched the runtime even
> further.
It's just the key generation that takes so long, the rest is more or less
done instantly. The 1024 bit key takes a moment, 2048 bit takes a while
and 3072 bit already feels like ages on the N900 (long enough that you start
to wonder if it crapped out on you ;)). Not sure if it's the limited CPU (600MHz?)
or the OpenSSL implementation though, maybe a bit of both.
Preview Diff
1 | === modified file 'test/lib/core/hostid.c' |
2 | --- test/lib/core/hostid.c 2011-10-10 11:37:16 +0000 |
3 | +++ test/lib/core/hostid.c 2011-10-10 12:08:13 +0000 |
4 | @@ -126,6 +126,10 @@ |
5 | Suite *s = suite_create("lib/core/hostid"); |
6 | |
7 | TCase *tc_core = tcase_create("Core"); |
8 | + /* the default test timeout of 4 seconds is too short, |
9 | + * generating keys in scratchbox or on the N900 takes a while */ |
10 | + tcase_set_timeout(tc_core, 120); |
11 | + |
12 | tcase_add_test(tc_core, test_serialize_deserialize_rsa_keys); |
13 | tcase_add_test(tc_core, test_serialize_deserialize_keys); |
14 | tcase_add_test(tc_core, test_invalid_serialize_deserialize); |
Please show the "backtrace" of gdb.