Trusty: multipathd SIGSEGV on path addition or removal

Bug #1628723 reported by Christopher Unkel
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
High
Nish Aravamudan

Bug Description

[Impact]

 * In a system test that involves the repeated addition and removal of iSCSI targets that form multipath devices, multipathd exits with SIGSEGV.

[Test Case]

 * Repeatedly add and remove iSCSI targets that are part of multipath devices. multipathd will segmentation fault without the fix.

[Regression Potential]

 * The fixes in question for this are two use-after-free coding errors upstream. Both have been fixed upstream and this is a backport of the upstream fixes. There should be no functional change from this, purely a bugfix, so I believe the regression potential is very low.

---

In a system test that involves the repeated addition and removal of iSCSI
targets that form multipath devices, I am observing multipathd exiting with
SIGSEGV.

The issue is reproducible on Trusty with multipath-tools 0.4.9-3ubuntu7.13
as well as when built from source for 0.4.9-3ubuntu7.14.

The following is a typical backtrace from a resulting core file:

Core was generated by `/sbin/multipathd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 malloc_consolidate (av=av@entry=0x7fe0bc000020) at malloc.c:4151
4151 malloc.c: No such file or directory.
(gdb) bt
#0 malloc_consolidate (av=av@entry=0x7fe0bc000020) at malloc.c:4151
#1 0x00007fe0c6f82ce8 in _int_malloc (av=0x7fe0bc000020, bytes=16384) at malloc.c:3423
#2 0x00007fe0c6f856c0 in __GI___libc_malloc (bytes=16384) at malloc.c:2891
#3 0x00007fe0c79924d7 in dm_task_run () from /lib/x86_64-linux-gnu/libdevmapper.so.1.02.1
#4 0x00007fe0c72d7e58 in dm_map_present (str=0x7fe0bc5a8730 "mpath10p1") at devmapper.c:304
#5 0x0000000000404a77 in ev_add_map (dev=0x7fe0c0019a53 "dm-13", alias=0x7fe0bc5a8730 "mpath10p1", vecs=0x22da100) at main.c:256
#6 0x0000000000404a3c in uev_add_map (uev=0x7fe0c00199d0, vecs=0x22da100) at main.c:243
#7 0x00000000004061ed in uev_trigger (uev=0x7fe0c00199d0, trigger_data=0x22da100) at main.c:755
#8 0x00007fe0c72f6939 in service_uevq (tmpq=0x7fe0c7f8fde0) at uevent.c:118
#9 0x00007fe0c72f6b48 in uevent_dispatch (uev_trigger=0x406130 <uev_trigger>, trigger_data=0x22da100) at uevent.c:167
#10 0x0000000000406436 in uevqloop (ap=0x22da100) at main.c:814
#11 0x00007fe0c7bac184 in start_thread (arg=0x7fe0c7f90700) at pthread_create.c:312
#12 0x00007fe0c6ffd37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

After debugging with valgrind/memcheck, I have traced the errors reported by
valgrind down to two use-after-free issues that have been resolved in the
upstream multipath-tools but are not included in multipath-tools
0.4.9-3ubuntu7.14.

The first was in commit 828d2fbaab304d1ec7db2f563a59eaf2c7a516ea, which
resolves a bug in which the result value of realloc is assigned to the wrong
place, resulting in continued use of now-freed original block.

The second was in commit cb0f7127ba90ab5e8e71fc534a0a16cdbe96a88f, which
resolves a bug in which a result value from udev_device_get_sysattr_value is
used after the underlying struct udev_device has been released with
udev_unref_device. This also results in a use-after-free.

After applying these patches, running my system stress test no longer results
in SIGSEGV or errors detected by valgrind/memcheck.

I suggest that these two commits be backported.

Revision history for this message
Christopher Unkel (dscunkel) wrote :
Revision history for this message
Christopher Unkel (dscunkel) wrote :
description: updated
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Patch #1 from upstream multipath-tools git" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you so much already for the debugging and identifying the patches!

To continue integrating and testing and finally SRUing the patches one needs to be able to reproduce the test. So far you just said "repeated addition and removal of iSCSI
targets".
Would you have a series of commands at hand from your testing that one can use to recreate this?

tags: added: server-next
Changed in multipath-tools (Ubuntu):
status: New → Triaged
status: Triaged → Incomplete
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Could you check if the recent release for bug 1535898 fixed the issue for you.
It seems so similar.

Revision history for this message
Christopher Unkel (dscunkel) wrote :

Thanks for the reply.

As regards 1535989, as I understand that fix is included in 0.4.9-3ubuntu7.14 and I am able to reproduce the issue in that build.

When it comes to reproducing the issue, currently I have something that cannot easily be replicated outside my test environment. Basically my environment has a number (specifically 4) of iSCSI targets serving the same drives available on the network. This test is a loop that randomly either does an iSCSI login to all the paths available for a drive that is available, or does an iSCSI logout to all paths on the device. In turn multipathd processes the path additions or removals triggering the issue.

I think that the issue in commit cb0f7127ba90ab5e8e71fc534a0a16cdbe96a88f is triggered by multipathd processing a path addition to any iSCSI device. I don't think a SIGSEGV is reproducible from this issue alone, but the valgrind/memcheck report of free-after-use is.

I think that the issue in commit 828d2fbaab304d1ec7db2f563a59eaf2c7a516ea would be triggered by mulitpathd processing any path removal from a multipath with multiple paths and is not about iSCSI in particular. In the iSCSI context the path removal is a consequence of an iSCSI logout rather than a physical reconfiguration. I think this is the issue causing the SIGSEGV in practice.

Let me see if I can isolate a test script than can be used standalone and ideally be incorporated into any regression suite you have. It may take a couple of business days.

Revision history for this message
Christopher Unkel (dscunkel) wrote :
Revision history for this message
Christopher Unkel (dscunkel) wrote :
Revision history for this message
Christopher Unkel (dscunkel) wrote :
Download full text (5.1 KiB)

I've attached two test scripts that configure the local machine to serve as an iSCSI target and use then use those as targets to attach and detach.

The best way to observe the issue is to use the first version of the script with a single target and single iteration of the test loop, while running multipathd under valgrind, i.e.:

    service multipath-tools stop
    valgrind multipathd -d &
    bug1628723 1 1

When I do this, I observe the following in valgrind output:

==31667== Invalid read of size 1
==31667== at 0x4C2E4E1: __GI_strncpy (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31667== by 0x56FD3C9: sysfs_get_tgt_nodename (discovery.c:257)
==31667== by 0x56FDF98: scsi_sysfs_pathinfo (discovery.c:494)
==31667== by 0x56FEA01: sysfs_pathinfo (discovery.c:696)
==31667== by 0x56FF174: pathinfo (discovery.c:830)
==31667== by 0x56FC775: store_pathinfo (discovery.c:57)
==31667== by 0x40504E: uev_add_path (main.c:386)
==31667== by 0x4062EC: uev_trigger (main.c:777)
==31667== by 0x5713938: service_uevq (uevent.c:118)
==31667== by 0x5713B47: uevent_dispatch (uevent.c:167)
==31667== by 0x406435: uevqloop (main.c:814)
==31667== by 0x4E3F183: start_thread (pthread_create.c:312)
==31667== Address 0x7176790 is 0 bytes inside a block of size 37 free'd
==31667== at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31667== by 0x54D7A1B: ??? (in /lib/x86_64-linux-gnu/libudev.so.1.3.5)
==31667== by 0x54D7ADD: ??? (in /lib/x86_64-linux-gnu/libudev.so.1.3.5)
==31667== by 0x54D84B6: udev_device_unref (in /lib/x86_64-linux-gnu/libudev.so.1.3.5)
==31667== by 0x56FD3AA: sysfs_get_tgt_nodename (discovery.c:255)
==31667== by 0x56FDF98: scsi_sysfs_pathinfo (discovery.c:494)
==31667== by 0x56FEA01: sysfs_pathinfo (discovery.c:696)
==31667== by 0x56FF174: pathinfo (discovery.c:830)
==31667== by 0x56FC775: store_pathinfo (discovery.c:57)
==31667== by 0x40504E: uev_add_path (main.c:386)
==31667== by 0x4062EC: uev_trigger (main.c:777)
==31667== by 0x5713938: service_uevq (uevent.c:118)

Which is due to the issue resolved in commit cb0f7127ba90ab5e8e71fc534a0a16cdbe96a88f,
and later:

==31667== Invalid read of size 8
==31667== at 0x56FC388: find_path_by_dev (structs.c:322)
==31667== by 0x404FBF: uev_add_path (main.c:376)
==31667== by 0x4062EC: uev_trigger (main.c:777)
==31667== by 0x5713938: service_uevq (uevent.c:118)
==31667== by 0x5713B47: uevent_dispatch (uevent.c:167)
==31667== by 0x406435: uevqloop (main.c:814)
==31667== by 0x4E3F183: start_thread (pthread_create.c:312)
==31667== by 0x5A2B37C: clone (clone.S:111)
==31667== Address 0x72508a0 is 0 bytes inside a block of size 40 free'd
==31667== at 0x4C2CE8E: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.
so)
==31667== by 0x56F4162: vector_del_slot (vector.c:99)
==31667== by 0x571EF72: verify_paths (structs_vec.c:477)
==31667== by 0x405397: ev_add_path (main.c:446)
==31667== by 0x4050C7: uev_add_path (main.c:398)
==31667== by 0x4062EC: uev_trigger (main.c:777)
==31667== by 0x5713938: service_uevq (uevent.c:118)
==31667== by 0x5713B47: uevent_di...

Read more...

Nish Aravamudan (nacc)
Changed in multipath-tools (Ubuntu):
status: Incomplete → New
status: New → Triaged
Revision history for this message
Nish Aravamudan (nacc) wrote :

Both Xenial & Yakkety have 0.5.0-based multipath-tools, and the referred to fixes are already present.

Changed in multipath-tools (Ubuntu):
status: Triaged → Fix Released
Changed in multipath-tools (Ubuntu Trusty):
status: New → In Progress
importance: Undecided → High
Revision history for this message
Nish Aravamudan (nacc) wrote :

Hello, please test a Trusty build at:

https://launchpad.net/~nacc/+archive/ubuntu/lp1628723

submitted just now, which should include the fixes requested! Please test the PPA and report back, thanks!

Nish Aravamudan (nacc)
Changed in multipath-tools (Ubuntu Trusty):
assignee: nobody → Nish Aravamudan (nacc)
Revision history for this message
Christopher Unkel (dscunkel) wrote :

So far so good with the PPA: it is clean according to valgrind in my test environment, and has not crashed so far. I will continue to run it overnight to be sure and report again tomorrow.

Thanks!

Revision history for this message
Christopher Unkel (dscunkel) wrote :

The PPA has now run for a full day in my test environment without issue. I would regard it as a success.

Nish Aravamudan (nacc)
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Christopher, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.15 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in multipath-tools (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Christopher Unkel (dscunkel) wrote :

Tested multipath-tools 0.4.9-3ubuntu7.15 (arch amd64) from trusty-proposed successfully in my test environment; no crashes and valgrind clean. (kpartx 0.4.9-3ubuntu7.15 also installed.)

Changed tag from verification-needed to verification-done.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.4.9-3ubuntu7.15

---------------
multipath-tools (0.4.9-3ubuntu7.15) trusty; urgency=medium

  * d/p/0047-Add-existing-multipath-devices-to-wwids-file-on.patch:
    Fix multipathd which does not update /etc/multipath/wwids file
    when reconfigure is invoked. (LP: #1621835)

  [ Dragan Stancevic ]
  * d/p/0048-multipathd-delay-free-pathvec.patch :
    Fix SEGV on multipathd shutdown (LP: #1616213)

  [ Nishanth Aravamudan ]
  * debian/patches/fix_use_after_free.patch: Fix use-after-free bugs.
    Thanks to Christof Schmitt <email address hidden> and
    Benjamin Marzinski <email address hidden>. Closes LP: #1628723.

 -- Louis Bouchard <email address hidden> Mon, 12 Sep 2016 10:43:19 +0200

Changed in multipath-tools (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for multipath-tools has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.