Minor segfault on failed startup

Bug #790679 reported by Jookos
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HIPL
Fix Committed
Undecided
Unassigned

Bug Description

If the kernel module checks during startup fails, hipd segfaults instead of exiting nicely. This because the hip_xfrmapi_nl_ipsec (lib/tool/xfrmapi.c:49) hasn't been initialized before it is used in hip_delete_hit_sp_pair() [which is called on exit].

quick & dirty patch attached

Related branches

Revision history for this message
Jookos (jookos) wrote :
Revision history for this message
Miika Komu (miika-iki) wrote :

Is it because of a previous call to hip_delete_default_prefix_sp_pair()? If yes, I would suggest to move the fix there. Otherwise, the patch seems ok and seems to have minimal impact to the rest of the code.

Revision history for this message
René Hummen (rene-hummen) wrote :

I agree with Miika to catch this exception as close to the root as possible. hip_delete_default_prefix_sp_pair() might not be the only exit path reaching the critical location in the future.

Revision history for this message
Jookos (jookos) wrote :

AFAI suspect, this happens as the init procedure hasn't reached a point where that data structure would be initialized (before shutting down, where it assumes that it has been). But then again, I just checked out hipl on my laptop and couldn't get a segfault (even though hip_xfrmapi_nl_ipsec is null when going into hip_delete_default_prefix_sp_pair()), so I'm not sure if my assumptions are right (I'm not that familiar with the hipl code).

This is a rare and only sort of a cosmetic problem anyway so I wouldn't encourage people to spend too much time on it. Unless there's something else with potentially more serious consequences that might not get de-inited properly if the init is cut short.

Revision history for this message
Miika Komu (miika-iki) wrote :

But you can repeat the problem on one machine (other than your laptop?)?

Revision history for this message
Jookos (jookos) wrote :

This has been an interesting morning.. At first I couldn't, but I was starting hipd through sudo which seems (?) to supress those. Now, running hipd on my laptop logged in as as su (not through sudo) I get it here also. So yes, I'm able to repeat it on any machine I try it on.

Here's a gdb trace:

GNU gdb (GDB) 7.2-debian
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/jookos/projects/hipl/new-lp/hipl/hipd/hipd...done.
(gdb) r
Starting program: /home/jookos/projects/hipl/new-lp/hipl/hipd/hipd
info(hipd/hipd.c:338@hipd_main): hipd pid=18353 starting
error(hipd/init.c:402@hip_probe_kernel_modules): Unable to load crypto_null!
error(hipd/init.c:1053@hipd_init): Unable to load the required kernel modules!
error(hipd/hipd.c:345@hipd_main): hipd_init() failed!

Program received signal SIGSEGV, Segmentation fault.
netlink_talk (nl=0x0, n=0xbffff0d0, peer=0, groups=0, answer=0x0, junk=0, arg=0x0) at lib/tool/nlink.c:251
251 n->nlmsg_seq = seq = ++nl->seq;
(gdb) bt
#0 netlink_talk (nl=0x0, n=0xbffff0d0, peer=0, groups=0, answer=0x0, junk=0, arg=0x0) at lib/tool/nlink.c:251
#1 0x080901a4 in hip_xfrm_policy_delete (rth=0x0, hit_our=<value optimized out>, hit_peer=0xbffff190, dir=0,
    hit_prefix=<value optimized out>, preferred_family=<value optimized out>) at lib/tool/xfrmapi.c:381
#2 0x0809022a in hip_delete_hit_sp_pair (src_hit=0xbffff190, dst_hit=0xbffff180, use_full_prefix=0) at lib/tool/xfrmapi.c:820
#3 0x080902be in hip_delete_default_prefix_sp_pair () at lib/tool/xfrmapi.c:838
#4 0x0805dd8b in hip_exit () at hipd/init.c:882
#5 0x0805afc1 in hipd_main (argc=1, argv=0xbffff394) at hipd/hipd.c:416
#6 main (argc=1, argv=0xbffff394) at hipd/hipd.c:464
(gdb)

Revision history for this message
Miika Komu (miika-iki) wrote :

Does the following change fix the problem?

hipd/init.c
+ if (hip_xfrmapi_nl_ipsec) {
     hip_delete_default_prefix_sp_pair ()
+}

If yes, just make the commit?

Revision history for this message
Jookos (jookos) wrote :

Probably not, as hip_xfrmapi_nl_ipsec is static to xfrmapi.c ;)

But ok, another option is to re-organize the initialization so that that variable gets set before it is possible to abort the initialization. Basically move the hip_xfrm_set_nl_ipsec() call to somewhere before the first HIP_IFEL.

.. I'll attach a patch now that I actually tried it out.

Revision history for this message
David Martin (martin-lp) wrote :

Hi,

> But ok, another option is to re-organize the initialization so that that
> variable gets set before it is possible to abort the initialization.
> Basically move the hip_xfrm_set_nl_ipsec() call to somewhere before the
> first HIP_IFEL.
>
> .. I'll attach a patch now that I actually tried it out.

I would say that moving the initialization only fixes the sympton and not the problem. :) What if the initialization for what you are uninitializing fails? It should be robust in any case! In addition the documentation states that hip_xfrm_set_nl_ipsec() gets an initialized netlink socket. If you move this call further forward it will definitely not get an initialized socket because this happens in rtnl_open_byproto() right before it.

What happens here is the following:
We have got a rtnl_handle struct to store the field descriptor and sequence number and stuff for our netlink socket. It is static and located in hipd.c. Our IPsec implementation is in xfrmapi.c and has no direct access to it. What it has instead is a static pointer that gets pointed to the handle struct in the initialization. This does not happen if anything in the initialization already fails before.

The problem here is not really the uninitialization. The null pointer gets passed through all functions that try to remove the default policies that would usually be set up on startup. We could do sanity checks on the pointers in these functions or in the exit call like Miika proposes but this should not really be necessary. In most cases the sockets are correctly initialized and it does not really matter for the calls.

As it turns out netlink_talk() that sends messages on the netlink sockets is not robust. It assumes to get a valid netlink socket and as a consequence segfaults if it doesn't. This is what happens here. I have committed a fix in trunk revision 5950.

Jookos, please check if it works for you.

Revision history for this message
Jookos (jookos) wrote :

Oops, missed the initialization of that struct. Although it stopped crashing, it must have been sending its messages to pretty strange places..

And I'm sure your fix works just fine, thanks.

Revision history for this message
David Martin (martin-lp) wrote :

Your patch works as it makes the pointer point to the struct correctly. It's just that on
exiting it is not yet set up yet and all values initialized to 0 (gcc default for static
structs or all static variables? I'm not sure there). That's why it won't send any messages
with your patch either:

error(hipd/hipd.c:345@hipd_main): hipd_init() failed!
error(lib/tool/nlink.c:279@netlink_talk): Cannot talk to rtnetlinkSocket operation on non-socket
error(lib/tool/xfrmapi.c:382@hip_xfrm_policy_delete): Security policy deletion failed.

And here comes in what Miika and René said. Your patch fixes one wrong call which is nice.
But making netlink_talk() more robust so that it can deal with any wrong call is more reliable.

Changed in hipl:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.