Merge lp:~martin-lp/hipl/n900-build-fix into lp:hipl

Proposed by David Martin
Status: Superseded
Proposed branch: lp:~martin-lp/hipl/n900-build-fix
Merge into: lp:hipl
Diff against target: 24 lines (+6/-1)
1 file modified
test/lib/core/hostid.c (+6/-1)
To merge this branch: bzr merge lp:~martin-lp/hipl/n900-build-fix
Reviewer Review Type Date Requested Status
HIPL core team Pending
Review via email: mp+76570@code.launchpad.net

This proposal has been superseded by a proposal from 2011-10-07.

Description of the change

In short:
One of the recently added unit tests breaks on the N900 and its scratchbox
environment. As a result the deb package build fails. This branch introduces
a hack to skip the troublesome part of the test.

In longer:
The problem lies with the RSA key generation. For whatever reason
RSA_generate_key() dies ungracefully (or is aborted ungracefully)
when it is supposed to create keys with more than 1024 bits.

The output looks like this:
debug(test/lib/core/hostid.c:74@test_serialize_deserialize_rsa_: Trying to serialize and deserialize some RSA keys.
debug(lib/core/crypto.c:768@create_rsa_key): *****************Creating RSA of 1024 bits
debug(lib/core/crypto.c:768@create_rsa_key): *****************Creating RSA of 2048 bits
Killed

Some research on google and gdb haven't really helped find the
source of the problem or a possible solution. As a workaround
this branch simply skips creating keys greater than 1024 bit.

Purpose of the review:
Is using this define a good way to handle this? Anybody know what the problem
is on the platform and the key generation? Any better way to deal with this?

To post a comment you must log in.
Revision history for this message
Miika Komu (miika-iki) wrote :

Please show the "backtrace" of gdb.

Revision history for this message
Stefan Götz (stefan.goetz-deactivatedaccount) wrote :
Revision history for this message
René Hummen (rene-hummen) wrote :

I also get abort with 1024 bit keys. David can you please post some more details on what you already checked.

Revision history for this message
David Martin (martin-lp) wrote :

Hi,

On Thu, Sep 22, 2011 at 5:35 PM, Diego Biurrun <email address hidden> wrote:
> review reject
>
> I think you should really give this another try and dig deeper. Find out
> which function is failing with some printfs, debug some more.

like I said it's RSA_generate_key() from the OpenSSL library which fails.

> You could also try to update the check framework in scratchbox and/or OpenSSL.

This may be worth a try, I'll have a look.

>> --- test/lib/core/hostid.c 2011-09-05 08:46:53 +0000
>> +++ test/lib/core/hostid.c 2011-09-22 11:59:39 +0000
>> @@ -64,10 +64,20 @@
>> +/*
>> + * create_rsa_key() fails for key sizes greater than 1024 on both the
>> + * N900 and the respective scratchbox environment.
>> + * Use HAVE_TCASE_ADD_EXIT_TEST as a kludge as it is not defined for
>> + * the version of check on the scratchbox environment.
>> + */
>> +#ifdef HAVE_TCASE_ADD_EXIT_TEST
>> + int bits[3] = { 1024, 2048, 3072 };
>> +#else
>> + int bits[1] = { 1024 };
>> +#endif
>> + RSA *key = NULL, *key_deserialized = NULL;
>> + EVP_PKEY *key_a = NULL, *key_b = NULL;
>> unsigned char *keyrr;
>> struct hip_host_id_priv hostid;
>
> Not a pretty sight ;)

It wasn't supposed to win any beauty contests and that's why I ask
for a better way to deal with it. :p

On Thu, Sep 22, 2011 at 7:14 PM, Miika Komu <email address hidden> wrote:
> Please show the "backtrace" of gdb.

There's no real backtrace as it is openssl's function that fails.
Nevertheless here is the output:

> debug(test/lib/core/crypto.c:242@test_invalid_impl_ecdsa_sign_v: Successfully passed test on lowlevel sign, verify operations with invalid inputs.
> lib/core/hostid
> debug(test/lib/core/hostid.c:74@test_serialize_deserialize_rsa_: Trying to serialize and deserialize some RSA keys.
> debug(lib/core/crypto.c:768@create_rsa_key): *****************Creating RSA of 1024 bits
>
>
> debug(lib/core/crypto.c:769@create_rsa_key): before generating key
> debug(lib/core/crypto.c:771@create_rsa_key): after generating key
> debug(lib/core/crypto.c:768@create_rsa_key): *****************Creating RSA of 2048 bits
>
>
> debug(lib/core/crypto.c:769@create_rsa_key): before generating key
>
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> (gdb) bt
> No stack.

(before and after generating key are debug prints added by me in the lines before
and after the RSA_generate_key() call in /lib/core/crypto.c create_rsa_key())

On Fri, Sep 23, 2011 at 7:52 PM, Stefan Götz <email address hidden> wrote:
> Maybe https://bugs.launchpad.net/hipl/+bug/788799 plays into this?

I doubt it. :) It does not seem to be a problem with our code but with
OpenSSL or maybe our use of OpenSSL.

Revision history for this message
David Martin (martin-lp) wrote :

Hi,

On Mon, Sep 26, 2011 at 4:32 PM, Christof Mroz <email address hidden> wrote:
> On Mon, 26 Sep 2011 15:56:24 +0200, David Martin
> <email address hidden> wrote:
>
>>> Program terminated with signal SIGKILL, Killed.
>>> The program no longer exists.
>>> (gdb) bt
>>> No stack.
>
> Can you dump the rlimits inside the test? See getrlimit(2).

Not sure but I'll have a look at it.

> If all else fails, you may try building OpenSSL yourself with debug symbols
> so you can circumscript the culprit even further (or learn ASM :) ). Also,
> did you try strace yet?

Nope, didn't try strace yet. Thanks for the hint. Building OpenSSL is my current
plan as our version of OpenSSL already is the most recent from the Maemo repos.

Revision history for this message
Diego Biurrun (diego-biurrun) wrote :

On Mon, Sep 26, 2011 at 01:56:24PM +0000, David Martin wrote:
>
> On Thu, Sep 22, 2011 at 5:35 PM, Diego Biurrun <email address hidden> wrote:
> > review reject
> >
> > I think you should really give this another try and dig deeper. Find out
> > which function is failing with some printfs, debug some more.
>
> like I said it's RSA_generate_key() from the OpenSSL library which fails.

As a next step, print out the values that we pass to that function, maybe
some are invalid, NULL or whatever. If pointers are passed, dereference
them and print their values, maybe they point at invalid values.

Diego

Revision history for this message
David Martin (martin-lp) wrote :

Hi,
did some more research on this and did actually find something.

On Tue, Sep 27, 2011 at 9:51 AM, Diego Biurrun <email address hidden> wrote:
> On Mon, Sep 26, 2011 at 01:56:24PM +0000, David Martin wrote:
>>
>> On Thu, Sep 22, 2011 at 5:35 PM, Diego Biurrun <email address hidden> wrote:
>> > review reject
>> >
>> > I think you should really give this another try and dig deeper. Find out
>> > which function is failing with some printfs, debug some more.
>>
>> like I said it's RSA_generate_key() from the OpenSSL library which fails.
>
> As a next step, print out the values that we pass to that function, maybe
> some are invalid, NULL or whatever. If pointers are passed, dereference
> them and print their values, maybe they point at invalid values.

We don't pass any pointers (well, NULL) and our values are fine. :)

On Mon, Sep 26, 2011 at 4:32 PM, Christof Mroz <email address hidden> wrote:
> Can you dump the rlimits inside the test? See getrlimit(2).

Any specific rlimit you had in mind? RLIMIT_CPU for cpu time does not seem to be
set and limiting it by hand (eg. to a second and choosing a stronger key) results
in a graceful exit with the respective signal (cpu time exceeded).

On Mon, Sep 26, 2011 at 4:32 PM, Christof Mroz <email address hidden> wrote:
> If all else fails, you may try building OpenSSL yourself with debug symbols
> so you can circumscript the culprit even further (or learn ASM :) ). Also,
> did you try strace yet?

Did not try the newer OpenSSL yet but strace is neat and helped to narrow down
the problem.

Both the N900 and the version compiled on passion set a SIGALRM with rt_sigaction
(which I think is responsible for killing it).

The N900 binary in addition sets a timer to 3 seconds with setitimer. This one
sends the SIGALRM when the timer reaches zero and this is what happens before
it breaks.

I'm not sure yet who is responsible for setting the timer. Is this platform
specific? I think Android does something like this where a process gets killed
if it's unresponsive for more than a few seconds.

Should I upload the traces? I don't see a button for this in launchpad (I'm pretty
sure to have done that before, though).

Revision history for this message
David Martin (martin-lp) wrote :

Hi,
seems like our problem was with the check framework which sets a default timeout
of 4 seconds for tests. I'll test which limit works and request another review
of the merge request.

On Wed, Oct 5, 2011 at 6:06 PM, Christof Mroz <email address hidden> wrote:
> On Wed, 05 Oct 2011 16:44:24 +0200, David Martin
> <email address hidden> wrote:
>
>> I'm not sure yet who is responsible for setting the timer.
>
> Then set a BP on setitimer() in gdb, instruct it to BT and CONT on each hit.
> This could give you a rough idea who set the timer (even more useful with a
> debug build of openssl, of course).
>
> You can also try to ignore the timer using LD_PRELOAD and a crafted SO.
> IIRC, just create a C file with your fake setitimer() implementation
> (signature must match, adding a debug print statement is a good idea),
> compile with -fpic and -shared and preload the resulting binary to see if it
> keeps getting killed.

Dang, you come up with the craziest debugging ideas! ;) Think that's not
necessary anymore.

>> Is this platform specific? I think Android does something like this where
>> a process gets killed if it's unresponsive for more than a few seconds.
>
> That's a very good idea, did you try adding a sleep(INT_MAX) to confirm
> this?

Using sleep was a good idea. It's not platform specific but the check framework
which has timeouts for the tests. If you force it to run longer on passion it
exits gracefully with a proper message:

> test/lib/core/hostid.c:39:E:Core:test_serialize_deserialize_rsa_keys:0: (after this point) Test timeout expired

Seems like the check version on the scratchbox environment is not that convenient
and simply breaks.

lp:~martin-lp/hipl/n900-build-fix updated
6081. By David Martin

Revert previous commit: Do not generate RSA keys > 1024 bits.

6082. By David Martin

Increase hostid unit test timeout to 120s.

Key generation on the scratchbox environment or the N900 can take
quite a while. The default timeout of 4 seconds is definitively
not sufficient, therefore increase it.

6083. By David Martin

Cosmetics: Use /* */ for multiline comment in hostid unit test.

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'test/lib/core/hostid.c'
2--- test/lib/core/hostid.c 2011-09-05 08:46:53 +0000
3+++ test/lib/core/hostid.c 2011-10-07 09:07:25 +0000
4@@ -65,7 +65,7 @@
5 START_TEST(test_serialize_deserialize_rsa_keys)
6 {
7 unsigned int i, keyrr_len = 0;
8- int bits[3] = {1024, 2048, 3072};
9+ int bits[3] = { 1024, 2048, 3072 };
10 RSA *key = NULL, *key_deserialized = NULL;
11 EVP_PKEY *key_a = NULL, *key_b = NULL;
12 unsigned char *keyrr;
13@@ -126,6 +126,11 @@
14 Suite *s = suite_create("lib/core/hostid");
15
16 TCase *tc_core = tcase_create("Core");
17+ // the default test timeout of 4 seconds is too short,
18+ // generating keys in scratchbox or on the N900 takes
19+ // a while
20+ tcase_set_timeout(tc_core, 120);
21+
22 tcase_add_test(tc_core, test_serialize_deserialize_rsa_keys);
23 tcase_add_test(tc_core, test_serialize_deserialize_keys);
24 tcase_add_test(tc_core, test_invalid_serialize_deserialize);

Subscribers

People subscribed via source and target branches

to all changes: