Comment 12 for bug 1469214

Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

On Mon, Jul 6, 2015 at 9:28 PM, Colin Ian King
<email address hidden> wrote:
> I re-ran this today with the following script as a non-root user:
>
> #!/bin/bash
> tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr zero zombie"
>
> for t in $tests
> do
> echo $t
> echo $t | sudo tee /dev/kmsg
> ./stress-ng --$t 0 -v -t 60
> done
>
> and hit this issue:
>
> [14098.848615] urandom
> [14111.696335] irqbalance[828]: unhandled level 2 translation fault (11) at 0x00004f64, esr 0x92000006
> [14111.696341] pgd = ffffffcfef71b000
> [14111.737149] [00004f64] *pgd=0000004fef1f3003, *pud=0000004fef1f3003, *pmd=0000000000000000
>

As I suggested, it should be helpful to provide /proc/$(pidof
irqbalance)/maps, otherwise we can't know where both the faulted
and PC address are.

Finally I have figured out one simple way to reproduce the issue:

1) apply the attached debug patch to stress-ng

2) run the following script:

sudo cat /proc/$(pidof irqbalance)/maps
/home/ubuntu/git/stress-ng/stress-ng --sequential 0 --seq-start 80
--seq-end 84 -t 60 --syslog --metrics --times -v

And the above command just runs the following 4 stresses in 4 minutes:

    stress-ng: info: [1067] dispatching hogs: 8 tsearch, 8 udp, 8 udp-flood,
    8 urandom

3) the above may trigger the following faults from irqbalance with
~3/4 probability, and the faulted address is in heap, and PC points to
code of libglib-2.0.so, so looks like a use-after-free in irqbalance or
libglib? And no information shows it is related with kernel, also
the four stresses are quite simple and shouldn't cause trouble to
kernel.

# irqbalance memory maps
00400000-0040a000 r-xp 00000000 08:02 10496929
  /usr/sbin/irqbalance
00419000-0041a000 r-xp 00009000 08:02 10496929
  /usr/sbin/irqbalance
0041a000-0041b000 rwxp 0000a000 08:02 10496929
  /usr/sbin/irqbalance
16294000-162b5000 rwxp 00000000 00:00 0 [heap]
162b5000-162ce000 rwxp 00000000 00:00 0 [heap]
7f8fbf9000-7f8fbfb000 rwxp 00000000 00:00 0
7f8fbfb000-7f8fc11000 r-xp 00000000 08:02 4722034
  /lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc11000-7f8fc20000 ---p 00016000 08:02 4722034
  /lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc20000-7f8fc21000 r-xp 00015000 08:02 4722034
  /lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc21000-7f8fc22000 rwxp 00016000 08:02 4722034
  /lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc22000-7f8fc26000 rwxp 00000000 00:00 0
7f8fc26000-7f8fc7f000 r-xp 00000000 08:02 4718668
  /lib/aarch64-linux-gnu/libpcre.so.3.13.1
7f8fc7f000-7f8fc8f000 ---p 00059000 08:02 4718668
  /lib/aarch64-linux-gnu/libpcre.so.3.13.1
7f8fc8f000-7f8fc90000 r-xp 00059000 08:02 4718668
  /lib/aarch64-linux-gnu/libpcre.so.3.13.1
7f8fc90000-7f8fc91000 rwxp 0005a000 08:02 4718668
  /lib/aarch64-linux-gnu/libpcre.so.3.13.1
7f8fc91000-7f8fdc1000 r-xp 00000000 08:02 4722027
  /lib/aarch64-linux-gnu/libc-2.21.so
7f8fdc1000-7f8fdd0000 ---p 00130000 08:02 4722027
  /lib/aarch64-linux-gnu/libc-2.21.so
7f8fdd0000-7f8fdd4000 r-xp 0012f000 08:02 4722027
  /lib/aarch64-linux-gnu/libc-2.21.so
7f8fdd4000-7f8fdd6000 rwxp 00133000 08:02 4722027
  /lib/aarch64-linux-gnu/libc-2.21.so
7f8fdd6000-7f8fdda000 rwxp 00000000 00:00 0
7f8fdda000-7f8fde3000 r-xp 00000000 08:02 10885206
  /usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0
7f8fde3000-7f8fdf2000 ---p 00009000 08:02 10885206
  /usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0
7f8fdf2000-7f8fdf3000 r-xp 00008000 08:02 10885206
  /usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0
7f8fdf3000-7f8fdf4000 rwxp 00009000 08:02 10885206
  /usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0
7f8fdf4000-7f8fdf8000 rwxp 00000000 00:00 0
7f8fdf8000-7f8fe89000 r-xp 00000000 08:02 4722041
  /lib/aarch64-linux-gnu/libm-2.21.so
7f8fe89000-7f8fe98000 ---p 00091000 08:02 4722041
  /lib/aarch64-linux-gnu/libm-2.21.so
7f8fe98000-7f8fe99000 r-xp 00090000 08:02 4722041
  /lib/aarch64-linux-gnu/libm-2.21.so
7f8fe99000-7f8fe9a000 rwxp 00091000 08:02 4722041
  /lib/aarch64-linux-gnu/libm-2.21.so
7f8fe9a000-7f8ff8c000 r-xp 00000000 08:02 4718610
  /lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1
7f8ff8c000-7f8ff9c000 ---p 000f2000 08:02 4718610
  /lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1
7f8ff9c000-7f8ff9d000 r-xp 000f2000 08:02 4718610
  /lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1
7f8ff9d000-7f8ff9e000 rwxp 000f3000 08:02 4718610
  /lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1
7f8ff9e000-7f8ff9f000 rwxp 00000000 00:00 0
7f8ff9f000-7f8ffa3000 r-xp 00000000 08:02 10879730
  /usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0
7f8ffa3000-7f8ffb2000 ---p 00004000 08:02 10879730
  /usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0
7f8ffb2000-7f8ffb3000 r-xp 00003000 08:02 10879730
  /usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0
7f8ffb3000-7f8ffb4000 rwxp 00004000 08:02 10879730
  /usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0
7f8ffb4000-7f8ffd0000 r-xp 00000000 08:02 4722030
  /lib/aarch64-linux-gnu/ld-2.21.so
7f8ffd0000-7f8ffd3000 rwxp 00000000 00:00 0
7f8ffdc000-7f8ffde000 rwxp 00000000 00:00 0
7f8ffde000-7f8ffdf000 r--p 00000000 00:00 0 [vvar]
7f8ffdf000-7f8ffe0000 r-xp 00000000 00:00 0 [vdso]
7f8ffe0000-7f8ffe1000 r-xp 0001c000 08:02 4722030
  /lib/aarch64-linux-gnu/ld-2.21.so
7f8ffe1000-7f8ffe3000 rwxp 0001d000 08:02 4722030
  /lib/aarch64-linux-gnu/ld-2.21.so
7fecdb1000-7fecdd2000 rw-p 00000000 00:00 0 [stack]

[ 250.276095] irqbalance[779]: unhandled level 2 translation fault
(11) at 0x00162a54, esr 0x92000006
[ 250.276103] pgd = ffffffc0ff812000
[ 250.316917] [00162a54] *pgd=00000040ffa6b003,
*pud=00000040ffa6b003, *pmd=0000000000000000

[ 250.416447] CPU: 5 PID: 779 Comm: irqbalance Not tainted
3.19.0-21-generic #21-Ubuntu
[ 250.416450] Hardware name: HP ProLiant m400 Server Cartridge (DT)
[ 250.416452] task: ffffffcfb46cc980 ti: ffffffc0feba0000 task.ti:
ffffffc0feba0000
[ 250.416464] PC is at 0x7f8ff02834
[ 250.416467] LR is at 0x7f8ff027f4
[ 250.416469] pc : [<0000007f8ff02834>] lr : [<0000007f8ff027f4>]
pstate: 80000000
[ 250.416471] sp : 0000007fecdd1480
[ 250.416472] x29: 0000007fecdd1480 x28: 000000000041a000
[ 250.416476] x27: 000000000041a000 x26: 00000000004094e0
[ 250.416478] x25: 0000000000000001 x24: 0000000000000010
[ 250.416481] x23: 00000000162948a0 x22: 0000000016294880
[ 250.416484] x21: 0000000000000018 x20: 0000007f8ff9e000
[ 250.416486] x19: 0000000000000002 x18: 0000000000000000
[ 250.416489] x17: 0000007f8fc088ec x16: 0000007f8ff9d2e0
[ 250.416491] x15: 0000000000000020 x14: 0000000000000000
[ 250.416494] x13: 0000000000000000 x12: 0000000000000000
[ 250.416496] x11: 0000007fecdceff0 x10: 0000000000000010
[ 250.416499] x9 : 00000000000000a0 x8 : 0000000000000007
[ 250.416501] x7 : 0000000000000033 x6 : 0000000016294c80
[ 250.416504] x5 : 0000000000000001 x4 : 0000007f8fc212a0
[ 250.416506] x3 : 0000000016294880 x2 : 0000000000000001
[ 250.416509] x1 : 00000000000003fa x0 : 0000000000162a4c