Comment 9 for bug 1469214

Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

Hi Colin,

That looks one progress, but still takes time to reproduce that,
and I will use your new approach to reproduce that.

When you are doing that, could you dump the file of /proc/$(pidof
irqbalance)/maps so that we can see where the faulted address are
in the process's vm space?

thanks,

On Sat, Jul 4, 2015 at 4:10 AM, Colin Ian King
<email address hidden> wrote:
> Running the following:
>
> #!/bin/bash
> tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr zero zombie"
>
> for t in $tests
> do
> echo $t
> echo $t > /dev/kmsg
> ./stress-ng --$t 0 -v -t 60
> done
>
> eventually tripped the translation fault in irqbalance. I ran this
> after a clean reboot.
>
> [ 4901.799846] timerfd
> [ 4961.807050] tsearch
> [ 5021.884456] udp
> [ 5081.895058] udp-flood
> [ 5141.674365] irqbalance[827]: unhandled level 2 translation fault (11) at 0x002d6da4, esr 0x92000006
> [ 5141.674376] pgd = ffffffcfb51a0000
> [ 5141.715215] [002d6da4] *pgd=0000004fb677e003, *pud=0000004fb677e003, *pmd=0000000000000000
>
> [ 5141.816183] CPU: 0 PID: 827 Comm: irqbalance Not tainted 3.19.0-21-generic #21-Ubuntu
> [ 5141.816185] Hardware name: HP ProLiant m400 Server Cartridge (DT)
> [ 5141.816188] task: ffffffcfac088000 ti: ffffffcfab710000 task.ti: ffffffcfab710000
> [ 5141.816206] PC is at 0x7f88287834
> [ 5141.816208] LR is at 0x7f882877f4
> [ 5141.816210] pc : [<0000007f88287834>] lr : [<0000007f882877f4>] pstate: 80000000
> [ 5141.816212] sp : 0000007ff2e46b30
> [ 5141.816214] x29: 0000007ff2e46b30 x28: 00000000004095a0
> [ 5141.816217] x27: 0000000000409548 x26: 000000000041a000
> [ 5141.816220] x25: 0000000000000001 x24: 0000000000000010
> [ 5141.816222] x23: 000000002d6c98a0 x22: 000000002d6c9880
> [ 5141.816225] x21: 0000000000000018 x20: 0000007f88323000
> [ 5141.816228] x19: 0000000000000002 x18: 0000000000000000
> [ 5141.816230] x17: 0000007f87f8d8ec x16: 0000007f883222e0
> [ 5141.816233] x15: 0000000000000020 x14: 0000000000000001
> [ 5141.816235] x13: 0000000000000000 x12: 0000000000000000
> [ 5141.816237] x11: 0000007ff2e446a0 x10: 0000000000000010
> [ 5141.816240] x9 : 00000000000000a0 x8 : 0000000000000007
> [ 5141.816242] x7 : 0000000000000033 x6 : 000000002d6c9c80
> [ 5141.816245] x5 : 0000000000000001 x4 : 0000007f87fa62a0
> [ 5141.816247] x3 : 000000002d6c9880 x2 : 0000000000000001
> [ 5141.816250] x1 : 00000000000003fa x0 : 00000000002d6d9c
>
> [ 5141.907792] urandom
> [ 5201.928712] utime
> [ 5261.934534] vecmath
> [ 5321.940302] vfork
> [ 5381.947904] vm
> [ 5441.991784] vm-rw
> [ 5502.017614] vm-splice
> [ 5562.023334] wcs
> [ 5622.037054] wait
> [ 5682.043302] yield
> [ 5742.056595] xattr
> [ 5802.075772] zero
> [ 5862.087396] zombie
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1469214
>
> Title:
> HP ProLiant m400 Server crashes with unhandled level 3 translation
> fault
>
> Status in linux package in Ubuntu:
> Triaged
>
> Bug description:
> Running stress-ng on a HP ProLiant m400 server can cause unhandled
> level 3 translations faults:
>
> use stress-ng from git://kernel.ubuntu.com/cking/stress-ng
>
> ./stress-ng --seq 0 -t 60 -v
>
> and after some time this trips the following:
>
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] systemd-timesyn[481]: unhandled level 3 translation fault (7) at 0x7fa8ea6008, esr 0x92000007
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = ffffffcfb563f000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] *pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, *pte=000000001d151c00
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566]
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP ProLiant m400 Server Cartridge (DT)
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at 0x7fa8d81824
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at 0x7fa8e3b3e4
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : [<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : 0000007ff120d660
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: 0000007ff120d660 x28: 0000007fa8f1c000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: 0000007fa8f32084 x26: 0000007fa8f32000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: 0000007fa8f1d788 x24: 0000007fa8f1d888
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: 0000000000000001 x22: 0000007fa8f1faa0
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: 0000007ff120d7f0 x20: 0000007ff120d7d0
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922600] x19: 0000007fa8f31000 x18: 0000007fa8f1e000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922602] x17: 0000007fa8e3b3b8 x16: 0000007fa8ea6000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922603] x15: 003b9aca00000000 x14: 00219bbdd0000000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922605] x13: ffffffffaa751223 x12: 0000000000000000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922607] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922609] x9 : 37333c43484f5e46 x8 : 0000007ff120d818
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922610] x7 : 0000007ff120d8f0 x6 : 0000007ff120d828
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922612] x5 : ffffff80ffffffd0 x4 : 0000007ff120d8c0
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922613] x3 : 0000007ff120d7d0 x2 : 0000007fa8f1faa0
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922615] x1 : 0000000000000001 x0 : 0000000000000064
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922616]
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469214/+subscriptions