HP ProLiant m400 Server crashes with unhandled level 3 translation fault

Bug #1469214 reported by Colin Ian King
100
This bug affects 10 people
Affects Status Importance Assigned to Milestone
irqbalance (Ubuntu)
Fix Released
Medium
dann frazier
Trusty
Fix Released
Medium
dann frazier
Utopic
Won't Fix
Medium
dann frazier
Vivid
Fix Released
Medium
dann frazier
Wily
Fix Released
Medium
dann frazier

Bug Description

[Impact]
irqbalance can be crashed(got signal of segment fault) on trusty, utopic, vivid and wily.

[Test Case]
stress-ng --seq 0 -t 60 --syslog --metrics --times -v

[Regression Potential]
The proposed patch has been merged irqbalance upstream 1.0.7, so there shouldn't be potential regression.

https://github.com/Irqbalance/irqbalance/commit/a3c812eb6cd627cd3fae45b8345538558b86973c

[Other Info]

See following about the segmentation fault log.

Running stress-ng on a HP ProLiant m400 server can cause unhandled level 3 translations faults:

use stress-ng from git://kernel.ubuntu.com/cking/stress-ng

./stress-ng --seq 0 -t 60 -v

and after some time this trips the following:

Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] systemd-timesyn[481]: unhandled level 3 translation fault (7) at 0x7fa8ea6008, esr 0x92000007
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = ffffffcfb563f000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] *pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, *pte=000000001d151c00
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566]
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP ProLiant m400 Server Cartridge (DT)
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at 0x7fa8d81824
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at 0x7fa8e3b3e4
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : [<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : 0000007ff120d660
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: 0000007ff120d660 x28: 0000007fa8f1c000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: 0000007fa8f32084 x26: 0000007fa8f32000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: 0000007fa8f1d788 x24: 0000007fa8f1d888
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: 0000000000000001 x22: 0000007fa8f1faa0
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: 0000007ff120d7f0 x20: 0000007ff120d7d0
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922600] x19: 0000007fa8f31000 x18: 0000007fa8f1e000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922602] x17: 0000007fa8e3b3b8 x16: 0000007fa8ea6000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922603] x15: 003b9aca00000000 x14: 00219bbdd0000000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922605] x13: ffffffffaa751223 x12: 0000000000000000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922607] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922609] x9 : 37333c43484f5e46 x8 : 0000007ff120d818
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922610] x7 : 0000007ff120d8f0 x6 : 0000007ff120d828
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922612] x5 : ffffff80ffffffd0 x4 : 0000007ff120d8c0
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922613] x3 : 0000007ff120d7d0 x2 : 0000007fa8f1faa0
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922615] x1 : 0000000000000001 x0 : 0000000000000064
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922616]

Changed in linux (Ubuntu):
assignee: nobody → Colin Ian King (colin-king)
assignee: Colin Ian King (colin-king) → dann frazier (dannf)
summary: - HP ProLiant m400 Server
+ HP ProLiant m400 Server crashes with unhandled level 3 translation fault
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1469214

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
Revision history for this message
dann frazier (dannf) wrote :

fyi, I ran this in a loop over the weekend and the issue has not reproduced.

Revision history for this message
Colin Ian King (colin-king) wrote :

Hrm, OK, I'll see if I can find a better reproducer.

Revision history for this message
Ming Lei (tom-leiming) wrote :

I can't reproduce it after running half a day on ms10-36, and OOM is often triggered .

Revision history for this message
Ming Lei (tom-leiming) wrote :

Oops, the test result in #4 is for LP1469218 instead of this one.

Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 1469214] [NEW] HP ProLiant m400 Server crashes with unhandled level 3 translation fault
Download full text (3.8 KiB)

This one looks a problem of systemd-timesyncd, from pmap log[1],
both the PC and faulted address aren't valid, which drop in heap area,
but the faulted address(0x7fa8ea6008) shouldn't have been allocated
and is far away from the start address(0x7f9eb27000) of hear area.

[1] pmap log

ubuntu@ms10-37-mcdivittB0:~$ ps -ax | grep systemd-timesyncd
  412 ? Ssl 0:00 /lib/systemd/systemd-timesyncd
18058 pts/2 S+ 0:00 grep --color=auto systemd-timesyncd
ubuntu@ms10-37-mcdivittB0:~$ sudo pmap 412 | tail -n 10
0000007f82730000 108K r-x-- systemd-timesyncd
0000007f8274e000 16K rw--- [ anon ]
0000007f82757000 8K rw--- [ anon ]
0000007f82759000 4K r---- [ anon ]
0000007f8275a000 4K r-x-- [ anon ]
0000007f8275b000 4K r---- systemd-timesyncd
0000007f8275c000 4K rw--- systemd-timesyncd
0000007f9eb27000 132K rw--- [ anon ]
0000007fd5e13000 132K rw--- [ stack ]
 total 77176K
ubuntu@ms10-37-mcdivittB0:~$ sudo pmap 412
412: /lib/systemd/systemd-timesyncd
0000007f7c000000 132K rw--- [ anon ]
0000007f7c021000 65404K ----- [ anon ]
0000007f81c29000 16K r-x-- libnss_dns-2.21.so
0000007f81c2d000 64K ----- libnss_dns-2.21.so
0000007f81c3d000 4K r---- libnss_dns-2.21.so
0000007f81c3e000 4K rw--- libnss_dns-2.21.so
0000007f81c3f000 4K ----- [ anon ]
0000007f81c40000 8188K rw--- [ anon ]
0000007f8243f000 40K r-x-- libnss_files-2.21.so
0000007f82449000 60K ----- libnss_files-2.21.so
0000007f82458000 4K r---- libnss_files-2.21.so
0000007f82459000 4K rw--- libnss_files-2.21.so
0000007f8245a000 36K r-x-- libnss_nis-2.21.so
0000007f82463000 60K ----- libnss_nis-2.21.so
0000007f82472000 4K r---- libnss_nis-2.21.so
0000007f82473000 4K rw--- libnss_nis-2.21.so
0000007f82474000 72K r-x-- libnsl-2.21.so
0000007f82486000 60K ----- libnsl-2.21.so
0000007f82495000 4K r---- libnsl-2.21.so
0000007f82496000 4K rw--- libnsl-2.21.so
0000007f82497000 8K rw--- [ anon ]
0000007f82499000 24K r-x-- libnss_compat-2.21.so
0000007f8249f000 64K ----- libnss_compat-2.21.so
0000007f824af000 4K r---- libnss_compat-2.21.so
0000007f824b0000 4K rw--- libnss_compat-2.21.so
0000007f824b1000 580K r-x-- libm-2.21.so
0000007f82542000 60K ----- libm-2.21.so
0000007f82551000 4K r---- libm-2.21.so
0000007f82552000 4K rw--- libm-2.21.so
0000007f82553000 16K r-x-- libcap.so.2.24
0000007f82557000 60K ----- libcap.so.2.24
0000007f82566000 4K r---- libcap.so.2.24
0000007f82567000 4K rw--- libcap.so.2.24
0000007f82568000 68K r-x-- libresolv-2.21.so
0000007f82579000 64K ----- libresolv-2.21.so
0000007f82589000 4K r---- libresolv-2.21.so
0000007f8258a000 4K rw--- libresolv-2.21.so
0000007f8258b000 8K rw--- [ anon ]
0000007f8258d000 1216K r-x-- libc-2.21.so
0000007f826bd000 60K ----- libc-2.21.so
0000007f826cc000 16K r---- libc-2.21.so
0000007f826d0000 8K rw--- libc-2.21.so
0000007f826d2000 16K rw--- [ anon ]
0000007f826d6000 88K r-x-- libpthread-2.21.so
0000007f826ec000 60K ----- libpthread-2.21.so
0000007f826fb000 4K r---- libpthr...

Read more...

Revision history for this message
Colin Ian King (colin-king) wrote :

I was able to hit the following translation fault running sudo ./stress-ng --seq 0 -t 60 --syslog --metrics --times -v

[90103.913447] irqbalance[807]: unhandled level 2 translation fault (11) at 0x001754a4, esr 0x92000006
[90103.913454] pgd = ffffffcfb5926000
[90103.954271] [001754a4] *pgd=0000004fb5a8b003, *pud=0000004fb5a8b003, *pmd=0000000000000000

[90104.053696] CPU: 1 PID: 807 Comm: irqbalance Not tainted 3.19.0-21-generic #21-Ubuntu
[90104.053698] Hardware name: HP ProLiant m400 Server Cartridge (DT)
[90104.053701] task: ffffffcfb59c4980 ti: ffffffcfb5814000 task.ti: ffffffcfb5814000
[90104.053717] PC is at 0x7f95548834
[90104.053719] LR is at 0x7f955487f4
[90104.053721] pc : [<0000007f95548834>] lr : [<0000007f955487f4>] pstate: 80000000
[90104.053723] sp : 0000007fcf72a410
[90104.053725] x29: 0000007fcf72a410 x28: 00000000004095a0
[90104.053728] x27: 0000000000409548 x26: 000000000041a000
[90104.053731] x25: 0000000000000001 x24: 0000000000000010
[90104.053733] x23: 00000000175398a0 x22: 0000000017539880
[90104.053736] x21: 0000000000000018 x20: 0000007f955e4000
[90104.053738] x19: 0000000000000002 x18: 0000000000000000
[90104.053741] x17: 0000007f9524e8ec x16: 0000007f955e32e0
[90104.053743] x15: 0000000000000020 x14: 0000000000000001
[90104.053745] x13: 0000000000000000 x12: 0000000000000000
[90104.053748] x11: 0000007fcf727f80 x10: 0000000000000010
[90104.053750] x9 : 00000000000000a0 x8 : 0000000000000007
[90104.053753] x7 : 0000000000000033 x6 : 0000000017539c80
[90104.053755] x5 : 0000000000000001 x4 : 0000007f952672a0
[90104.053758] x3 : 0000000017539880 x2 : 0000000000000001
[90104.053760] x1 : 00000000000003fa x0 : 000000000017549c

Revision history for this message
Colin Ian King (colin-king) wrote :

Running the following:

#!/bin/bash
tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr zero zombie"

for t in $tests
do
        echo $t
        echo $t > /dev/kmsg
        ./stress-ng --$t 0 -v -t 60
done

eventually tripped the translation fault in irqbalance. I ran this after a clean reboot.

[ 4901.799846] timerfd
[ 4961.807050] tsearch
[ 5021.884456] udp
[ 5081.895058] udp-flood
[ 5141.674365] irqbalance[827]: unhandled level 2 translation fault (11) at 0x002d6da4, esr 0x92000006
[ 5141.674376] pgd = ffffffcfb51a0000
[ 5141.715215] [002d6da4] *pgd=0000004fb677e003, *pud=0000004fb677e003, *pmd=0000000000000000

[ 5141.816183] CPU: 0 PID: 827 Comm: irqbalance Not tainted 3.19.0-21-generic #21-Ubuntu
[ 5141.816185] Hardware name: HP ProLiant m400 Server Cartridge (DT)
[ 5141.816188] task: ffffffcfac088000 ti: ffffffcfab710000 task.ti: ffffffcfab710000
[ 5141.816206] PC is at 0x7f88287834
[ 5141.816208] LR is at 0x7f882877f4
[ 5141.816210] pc : [<0000007f88287834>] lr : [<0000007f882877f4>] pstate: 80000000
[ 5141.816212] sp : 0000007ff2e46b30
[ 5141.816214] x29: 0000007ff2e46b30 x28: 00000000004095a0
[ 5141.816217] x27: 0000000000409548 x26: 000000000041a000
[ 5141.816220] x25: 0000000000000001 x24: 0000000000000010
[ 5141.816222] x23: 000000002d6c98a0 x22: 000000002d6c9880
[ 5141.816225] x21: 0000000000000018 x20: 0000007f88323000
[ 5141.816228] x19: 0000000000000002 x18: 0000000000000000
[ 5141.816230] x17: 0000007f87f8d8ec x16: 0000007f883222e0
[ 5141.816233] x15: 0000000000000020 x14: 0000000000000001
[ 5141.816235] x13: 0000000000000000 x12: 0000000000000000
[ 5141.816237] x11: 0000007ff2e446a0 x10: 0000000000000010
[ 5141.816240] x9 : 00000000000000a0 x8 : 0000000000000007
[ 5141.816242] x7 : 0000000000000033 x6 : 000000002d6c9c80
[ 5141.816245] x5 : 0000000000000001 x4 : 0000007f87fa62a0
[ 5141.816247] x3 : 000000002d6c9880 x2 : 0000000000000001
[ 5141.816250] x1 : 00000000000003fa x0 : 00000000002d6d9c

[ 5141.907792] urandom
[ 5201.928712] utime
[ 5261.934534] vecmath
[ 5321.940302] vfork
[ 5381.947904] vm
[ 5441.991784] vm-rw
[ 5502.017614] vm-splice
[ 5562.023334] wcs
[ 5622.037054] wait
[ 5682.043302] yield
[ 5742.056595] xattr
[ 5802.075772] zero
[ 5862.087396] zombie

Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault
Download full text (6.7 KiB)

Hi Colin,

That looks one progress, but still takes time to reproduce that,
and I will use your new approach to reproduce that.

When you are doing that, could you dump the file of /proc/$(pidof
irqbalance)/maps so that we can see where the faulted address are
in the process's vm space?

thanks,

On Sat, Jul 4, 2015 at 4:10 AM, Colin Ian King
<email address hidden> wrote:
> Running the following:
>
> #!/bin/bash
> tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr zero zombie"
>
> for t in $tests
> do
> echo $t
> echo $t > /dev/kmsg
> ./stress-ng --$t 0 -v -t 60
> done
>
> eventually tripped the translation fault in irqbalance. I ran this
> after a clean reboot.
>
> [ 4901.799846] timerfd
> [ 4961.807050] tsearch
> [ 5021.884456] udp
> [ 5081.895058] udp-flood
> [ 5141.674365] irqbalance[827]: unhandled level 2 translation fault (11) at 0x002d6da4, esr 0x92000006
> [ 5141.674376] pgd = ffffffcfb51a0000
> [ 5141.715215] [002d6da4] *pgd=0000004fb677e003, *pud=0000004fb677e003, *pmd=0000000000000000
>
> [ 5141.816183] CPU: 0 PID: 827 Comm: irqbalance Not tainted 3.19.0-21-generic #21-Ubuntu
> [ 5141.816185] Hardware name: HP ProLiant m400 Server Cartridge (DT)
> [ 5141.816188] task: ffffffcfac088000 ti: ffffffcfab710000 task.ti: ffffffcfab710000
> [ 5141.816206] PC is at 0x7f88287834
> [ 5141.816208] LR is at 0x7f882877f4
> [ 5141.816210] pc : [<0000007f88287834>] lr : [<0000007f882877f4>] pstate: 80000000
> [ 5141.816212] sp : 0000007ff2e46b30
> [ 5141.816214] x29: 0000007ff2e46b30 x28: 00000000004095a0
> [ 5141.816217] x27: 0000000000409548 x26: 000000000041a000
> [ 5141.816220] x25: 0000000000000001 x24: 0000000000000010
> [ 5141.816222] x23: 000000002d6c98a0 x22: 000000002d6c9880
> [ 5141.816225] x21: 0000000000000018 x20: 0000007f88323000
> [ 5141.816228] x19: 0000000000000002 x18: 0000000000000000
> [ 5141.816230] x17: 0000007f87f8d8ec x16: 0000007f883222e0
> [ 5141.816233] x15: 0000000000000020 x14: 0000000000000001
> [ 5141.816235] x13: 0000000000000000 x12: 0000000000000000
> [ 5141.816237] x11: 0000007ff2e446a0 x10: 0000000000000010
> [ 5141.816240] x9 : 00000000000000a0 x8 : 0000000000000007
> [ 5141.816242] x7 : 0000000000000033 x6 : 000000002d6c9c80
> [ 5141.816245] x5 : 0000000000000001 x4 : 0000007f87fa62a0
> [ 5141.816247] x3 : 000000002d6c9880 x2 : 0000000000000001
> [ 5141.816250] x1 : 00000000000003fa x0 : 00000000002d6d9c
>
> [ 5141.907792] urandom
> [ 5201.928712] utime
> [ 5261.934534] vecmath
> [ 5321.940302] vfork
> [ 5381.947904] vm
> [ 5441.991784] vm-rw
> [ 5502.017614] vm-splice
> [ 5562.023334] wcs
> [ 5622.037054] wait
> [ 5682.043302] yield
> ...

Read more...

Revision history for this message
Ming Lei (tom-leiming) wrote :
Download full text (3.2 KiB)

Hi Colin,

On Sat, Jul 4, 2015 at 12:43 AM, Colin Ian King
<email address hidden> wrote:
> I was able to hit the following translation fault running sudo ./stress-
> ng --seq 0 -t 60 --syslog --metrics --times -v

I suggest to not run stress-ng as root, otherwise it can be less
serious because:

  - root user can do bad things easily, and it is quite easy to kill any
of process
  - in reality most of loads are run as non-root

If some system processes(irqbalance, systemd-*) are only killed
becasue stress-ng is running as root, it can be a low priority issue.
Otherwise we need pay close attention to the issue.

And I always run 'stress-ng' as ubuntu user without sudo, that may
be the reason why it is difficult for me to reproduce that.

Even with the two new approaches, it is still not easy for me to
reproduce that. I only see one time of translation fault by your
first approach(./stress-ng --seq 0 ...) in 6 hours, and can't trigger
that with your 2nd approach(by bash script).

Folllows the log[1] I triggered, and I think it is very likely a userspace
issue. From irqbalanc-dbgsym package, we can easily find 'PC is at
0x406078' is one address in text section, and it should be inside
function of 'place_irq_in_node' because the exec file isn't built as
relocation. One thing I still can't understand is that why the fault
address is '0x00000040' in the context.

[1]
[ 3616.333392] Bits 55-60 of /proc/PID/pagemap entries are about to
stop being page-shift some time soon. See the
linux/Documentation/vm/pagemap.txt for details.
[ 3616.333393] Bits 55-60 of /proc/PID/pagemap entries are about to
stop being page-shift some time soon. See the
linux/Documentation/vm/pagemap.txt for details.
[ 5316.367265] irqbalance[1457]: unhandled level 2 translation fault
(11) at 0x00000040, esr 0x92000006
[ 5316.476937] pgd = ffffffcfb5478000
[ 5316.520692] [00000040] *pgd=0000004fb4a3c003,
*pud=0000004fb4a3c003, *pmd=0000000000000000
[ 5316.620270]
[ 5316.638140] CPU: 7 PID: 1457 Comm: irqbalance Not tain-21-generic #21-Ubuntu
[ 5316.733212] Hardware name: HP ProLiant m400 Server Cartridge (DT)
[ 5316.806382] task: ffffffcfb55e6e40 ti: ffffffcfa72b0000 task.ti:
ffffffcfa72b0000
[ 5316.896258] PC is at 0x406078
[ 5316.931865] LR is at 0x404100
[ 5316.967457] pc : [<0000000000406078>] lr : [<0000000000404100>]
pstate: 20000000
[ 5317.056268] sp : 0000007fc07ff2d0
[ 5317.096038] x29: 0000007fc07ff2d0 x28: 00000000004095a0
[ 5317.160023] x27: 0000000000409548 x26: 000000000041a000
[ 5317.223897] x25: 0000000000405000 x24: 000000000041acf8
[ 5317.287868] x23: 000000000041a000 x22: 000000000041a000
[ 5317.351841] x21: 000000002e0d6050 x20: 000000000041a000
[ 5317.415744] x19: 000000002e0e9020 x18: 0000000000000000
[ 5317.479620] x17: 0000007fb5ac287c x16: 000000000041a188
[ 5317.543490] x15: 003bdd2370f74a1c x14: 2030203020302030
[ 5317.607373] x13: 2030203020302030 x12: 2030203020302030
[ 5317.671263] x11: 2030203020302030 x10: 2030203020302030
[ 5317.735137] x9 : 00000000000000a0 x8 : 0000000000000001
[ 5317.799113] x7 : 0000000000000033 x6 : 000000002e0d6e08
[ 5317.862983] x5 : 0000000000000040 x4 : 0000000000000000
[ 5317.926867] x3 : 000000002e0d7008 x2 : 0000...

Read more...

Revision history for this message
Colin Ian King (colin-king) wrote :

I re-ran this today with the following script as a non-root user:

#!/bin/bash
tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr zero zombie"

for t in $tests
do
        echo $t
 echo $t | sudo tee /dev/kmsg
        ./stress-ng --$t 0 -v -t 60
done

and hit this issue:

[14098.848615] urandom
[14111.696335] irqbalance[828]: unhandled level 2 translation fault (11) at 0x00004f64, esr 0x92000006
[14111.696341] pgd = ffffffcfef71b000
[14111.737149] [00004f64] *pgd=0000004fef1f3003, *pud=0000004fef1f3003, *pmd=0000000000000000

[14111.836705] CPU: 0 PID: 828 Comm: irqbalance Not tainted 3.19.0-21-generic #21-Ubuntu
[14111.836707] Hardware name: HP ProLiant m400 Server Cartridge (DT)
[14111.836710] task: ffffffcfefb0bd40 ti: ffffffcfb452c000 task.ti: ffffffcfb452c000
[14111.836723] PC is at 0x7fb1061834
[14111.836725] LR is at 0x7fb10617f4
[14111.836728] pc : [<0000007fb1061834>] lr : [<0000007fb10617f4>] pstate: 80000000
[14111.836729] sp : 0000007fc7cef6e0
[14111.836731] x29: 0000007fc7cef6e0 x28: 00000000004095a0
[14111.836735] x27: 0000000000409548 x26: 000000000041a000
[14111.836737] x25: 0000000000000001 x24: 0000000000000010
[14111.836740] x23: 00000000004e58a0 x22: 00000000004e5880
[14111.836750] x21: 0000000000000018 x20: 0000007fb10fd000
[14111.836762] x19: 0000000000000002 x18: 0000000000000000
[14111.836765] x17: 0000007fb0d678ec x16: 0000007fb10fc2e0
[14111.836768] x15: 0000000000000020 x14: 0000000000000001
[14111.836770] x13: 0000000000000000 x12: 0000000000000000
[14111.836773] x11: 0000007fc7ced250 x10: 0000000000000010
[14111.836775] x9 : 00000000000000a0 x8 : 0000000000000007
[14111.836778] x7 : 0000000000000033 x6 : 00000000004e5c80
[14111.836780] x5 : 0000000000000001 x4 : 0000007fb0d802a0
[14111.836783] x3 : 00000000004e5880 x2 : 0000000000000001
[14111.836785] x1 : 00000000000003fa x0 : 0000000000004f5c

Revision history for this message
Ming Lei (tom-leiming) wrote :
Download full text (7.4 KiB)

On Mon, Jul 6, 2015 at 9:28 PM, Colin Ian King
<email address hidden> wrote:
> I re-ran this today with the following script as a non-root user:
>
> #!/bin/bash
> tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr zero zombie"
>
> for t in $tests
> do
> echo $t
> echo $t | sudo tee /dev/kmsg
> ./stress-ng --$t 0 -v -t 60
> done
>
> and hit this issue:
>
> [14098.848615] urandom
> [14111.696335] irqbalance[828]: unhandled level 2 translation fault (11) at 0x00004f64, esr 0x92000006
> [14111.696341] pgd = ffffffcfef71b000
> [14111.737149] [00004f64] *pgd=0000004fef1f3003, *pud=0000004fef1f3003, *pmd=0000000000000000
>

As I suggested, it should be helpful to provide /proc/$(pidof
irqbalance)/maps, otherwise we can't know where both the faulted
and PC address are.

Finally I have figured out one simple way to reproduce the issue:

1) apply the attached debug patch to stress-ng

2) run the following script:

sudo cat /proc/$(pidof irqbalance)/maps
/home/ubuntu/git/stress-ng/stress-ng --sequential 0 --seq-start 80
--seq-end 84 -t 60 --syslog --metrics --times -v

And the above command just runs the following 4 stresses in 4 minutes:

    stress-ng: info: [1067] dispatching hogs: 8 tsearch, 8 udp, 8 udp-flood,
    8 urandom

3) the above may trigger the following faults from irqbalance with
~3/4 probability, and the faulted address is in heap, and PC points to
code of libglib-2.0.so, so looks like a use-after-free in irqbalance or
libglib? And no information shows it is related with kernel, also
the four stresses are quite simple and shouldn't cause trouble to
kernel.

# irqbalance memory maps
00400000-0040a000 r-xp 00000000 08:02 10496929
  /usr/sbin/irqbalance
00419000-0041a000 r-xp 00009000 08:02 10496929
  /usr/sbin/irqbalance
0041a000-0041b000 rwxp 0000a000 08:02 10496929
  /usr/sbin/irqbalance
16294000-162b5000 rwxp 00000000 00:00 0 [heap]
162b5000-162ce000 rwxp 00000000 00:00 0 [heap]
7f8fbf9000-7f8fbfb000 rwxp 00000000 00:00 0
7f8fbfb000-7f8fc11000 r-xp 00000000 08:02 4722034
  /lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc11000-7f8fc20000 ---p 00016000 08:02 4722034
  /lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc20000-7f8fc21000 r-xp 00015000 08:02 4722034
  /lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc21000-7f8fc22000 rwxp 00016000 08:02 4722034
  /lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc22000-7f8fc26000 rwxp 00000000 00:00 0
7f8fc26000-7f8fc7f000 r-xp 00000000 08:02 4718668
  /lib/aarch64-linux-gnu/libpcre.so.3.13.1
7f8fc7f000-7f8fc8f000 ---p 00059000 08:02 4718668
  /lib/aarch64-linux-gnu...

Read more...

tags: added: patch
Revision history for this message
Colin Ian King (colin-king) wrote :

captured irqbalance segfaulting:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000408f8c in place_irq_in_node (info=0x2c3d0050, data=0x0) at placement.c:145
145 if (irq_numa_node(info)->number != -1) {
(gdb) where
#0 0x0000000000408f8c in place_irq_in_node (info=0x2c3d0050, data=0x0) at placement.c:145
#1 0x0000000000405154 in for_each_irq (list=0x2c3df660, cb=0x408f4c <place_irq_in_node>, data=0x0)
    at classify.c:508
#2 0x000000000040923c in calculate_placement () at placement.c:196
#3 0x0000000000407800 in main (argc=2, argv=0x7fcd014928) at irqbalance.c:372

(gdb) print info
$1 = (struct irq_info *) 0x2c3d0050

Revision history for this message
Ming Lei (tom-leiming) wrote :
Download full text (4.3 KiB)

On Tue, Jul 7, 2015 at 2:37 AM, Colin Ian King
<email address hidden> wrote:
> captured irqbalance segfaulting:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000408f8c in place_irq_in_node (info=0x2c3d0050, data=0x0) at placement.c:145
> 145 if (irq_numa_node(info)->number != -1) {
> (gdb) where
> #0 0x0000000000408f8c in place_irq_in_node (info=0x2c3d0050, data=0x0) at placement.c:145
> #1 0x0000000000405154 in for_each_irq (list=0x2c3df660, cb=0x408f4c <place_irq_in_node>, data=0x0)
> at classify.c:508
> #2 0x000000000040923c in calculate_placement () at placement.c:196
> #3 0x0000000000407800 in main (argc=2, argv=0x7fcd014928) at irqbalance.c:372
>
> (gdb) print info
> $1 = (struct irq_info *) 0x2c3d0050

Suppose info is one address in heap, then it is valid, and the segfault
should be caused by invalid info->numa_node.

Thanks

>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1469214
>
> Title:
> HP ProLiant m400 Server crashes with unhandled level 3 translation
> fault
>
> Status in linux package in Ubuntu:
> Triaged
>
> Bug description:
> Running stress-ng on a HP ProLiant m400 server can cause unhandled
> level 3 translations faults:
>
> use stress-ng from git://kernel.ubuntu.com/cking/stress-ng
>
> ./stress-ng --seq 0 -t 60 -v
>
> and after some time this trips the following:
>
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] systemd-timesyn[481]: unhandled level 3 translation fault (7) at 0x7fa8ea6008, esr 0x92000007
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = ffffffcfb563f000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] *pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, *pte=000000001d151c00
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566]
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP ProLiant m400 Server Cartridge (DT)
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at 0x7fa8d81824
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at 0x7fa8e3b3e4
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : [<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : 0000007ff120d660
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: 0000007ff120d660 x28: 0000007fa8f1c000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: 0000007fa8f32084 x26: 0000007fa8f32000
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: 0000007fa8f1d788 x24: 0000007fa8f1d888
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: 0000000000000001 x22: 0000007fa8f1faa0
> Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: 0000007ff120d7f0 x20: 0000007ff120...

Read more...

Revision history for this message
Ming Lei (tom-leiming) wrote :

Looks there are two kinds of translation fault from irqbalance:

1) happend in place_irq_in_node() which can reproduce in vivid package

2) the 2nd one happened in glib2, which is built by myself, because
irqbalance can choose to use its own local glib if there isn't glib2 available,
and the glib2 does exist in my server in which I build irqbalance.

Revision history for this message
Ming Lei (tom-leiming) wrote :

On Tue, Jul 7, 2015 at 11:16 AM, Ming Lei <email address hidden> wrote:
> Looks there are two kinds of translation fault from irqbalance:
>
> 1) happend in place_irq_in_node() which can reproduce in vivid package
>
> 2) the 2nd one happened in glib2, which is built by myself, because
> irqbalance can choose to use its own local glib if there isn't glib2 available,
> and the glib2 does exist in my server in which I build irqbalance.

Both of two above reports can be fixed by the following irqbalance commit:

NUMA is not available fix

https://github.com/Irqbalance/irqbalance/commit/a3c812eb6cd627cd3fae45b8345538558b86973c

Looks stress-ng can't only find kernel bug, but also userspace
issue, :-)

Thanks,
Ming

Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks Ming for finding the fix. I was going to do a bisect on the upstream code but ran out of time last night. Nice find!

Colin

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Following Ming's identification of an irqbalance patch that fixes this issue, I'm marking the "Affected" status on "linux (Ubuntu)" as being "invalid".

Changed in linux (Ubuntu Trusty):
status: New → Invalid
Changed in linux (Ubuntu Utopic):
status: New → Invalid
Changed in linux (Ubuntu Vivid):
status: New → Invalid
Changed in linux (Ubuntu Wily):
status: Triaged → Invalid
Changed in irqbalance (Ubuntu Vivid):
status: New → In Progress
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "0001-stress-ng-support-sequential-range.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

Ming Lei (tom-leiming)
Changed in irqbalance (Ubuntu Vivid):
status: In Progress → Confirmed
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in irqbalance (Ubuntu Trusty):
status: New → Confirmed
Changed in irqbalance (Ubuntu Utopic):
status: New → Confirmed
Changed in irqbalance (Ubuntu):
status: New → Confirmed
Changed in irqbalance (Ubuntu Trusty):
assignee: nobody → dann frazier (dannf)
Changed in irqbalance (Ubuntu Utopic):
assignee: nobody → dann frazier (dannf)
Changed in irqbalance (Ubuntu Vivid):
assignee: nobody → dann frazier (dannf)
Changed in irqbalance (Ubuntu Wily):
assignee: nobody → dann frazier (dannf)
Changed in irqbalance (Ubuntu Trusty):
importance: Undecided → Medium
Changed in irqbalance (Ubuntu Utopic):
importance: Undecided → Medium
Changed in irqbalance (Ubuntu Vivid):
importance: Undecided → Medium
Changed in irqbalance (Ubuntu Wily):
importance: Undecided → Medium
no longer affects: linux (Ubuntu)
no longer affects: linux (Ubuntu Trusty)
no longer affects: linux (Ubuntu Utopic)
no longer affects: linux (Ubuntu Vivid)
no longer affects: linux (Ubuntu Wily)
tags: added: trusty utopic vivid wily
Changed in irqbalance (Ubuntu Trusty):
status: Confirmed → Triaged
Changed in irqbalance (Ubuntu Utopic):
status: Confirmed → Triaged
Changed in irqbalance (Ubuntu Vivid):
status: Confirmed → Triaged
Changed in irqbalance (Ubuntu Wily):
status: Confirmed → Triaged
Ming Lei (tom-leiming)
description: updated
Revision history for this message
dann frazier (dannf) wrote :

On Tue, Jul 7, 2015 at 2:25 AM, Ming Lei <email address hidden> wrote:
> On Tue, Jul 7, 2015 at 11:16 AM, Ming Lei <email address hidden> wrote:
>> Looks there are two kinds of translation fault from irqbalance:
>>
>> 1) happend in place_irq_in_node() which can reproduce in vivid package
>>
>> 2) the 2nd one happened in glib2, which is built by myself, because
>> irqbalance can choose to use its own local glib if there isn't glib2 available,
>> and the glib2 does exist in my server in which I build irqbalance.
>
>
> Both of two above reports can be fixed by the following irqbalance commit:
>
> NUMA is not available fix
>
> https://github.com/Irqbalance/irqbalance/commit/a3c812eb6cd627cd3fae45b8345538558b86973c
>
> Looks stress-ng can't only find kernel bug, but also userspace
> issue, :-)

I was looking to upload a fix for wily, but I haven't been able to
reproduce it to in order to verify the fix. I ran 'stress-ng --seq 0
-t 60 --syslog --metrics --times -v' overnight in a loop, but
irqbalance never crashed. How long should I expect this to take on
average? Does it usually crash in a single run?

Revision history for this message
Ming Lei (tom-leiming) wrote :

Dann,

Please follow the steps in #12, in which you should trigger the crash in 4 minutes.

BTW, looks wily kernel can't boot to shell prompt on mcdivitt.

Thanks,

Revision history for this message
dann frazier (dannf) wrote :

On Mon, Jul 13, 2015 at 9:27 AM, Ming Lei <email address hidden> wrote:
> Dann,
>
> Please follow the steps in #12, in which you should trigger the crash in
> 4 minutes.

I've been running that in a loop and I'm currently on iteration #76
w/o a crash :(

Maybe it's
Linux ms10-33-mcdivittB0 3.19.0-22-generic #22-Ubuntu SMP Tue Jun 16
17:18:17 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux

> BTW, looks wily kernel can't boot to shell prompt on mcdivitt.

OK - mind filing a separate bug for that?

Revision history for this message
Ming Lei (tom-leiming) wrote :

> BTW, looks wily kernel can't boot to shell prompt on mcdivitt.

That kernel(v4.0) isn't the final kernel for wily, so do we need to pay attention to that?

Revision history for this message
Ming Lei (tom-leiming) wrote :

On Mon, Jul 13, 2015 at 9:27 AM, Ming Lei <email address hidden> wrote:
> Dann,
>
> Please follow the steps in #12, in which you should trigger the crash in
> 4 minutes.

> I've been running that in a loop and I'm currently on iteration #76
> w/o a crash :(

The issue is nothing to do with kernel, and it should be made sure that irqbalance
is running first.

I can reproduce the issue on trusty, utopic and vivid easily with the approach in #12.

Revision history for this message
dann frazier (dannf) wrote :

Ming was able to help me reliable reproduce this with the command:
  stress-ng --sequential 0 --seq-start 86 --seq-end 90 -t 60 --syslog --metrics --times -v

I prepared a wily package w/ the proposed upstream backport for testing:
   lp:~dannf/ubuntu/wily/irqbalance/lp1469214

Unfortunately, I'm still seeing irqbalance crash even with this backport:

[ 2461.635168] irqbalance[558]: unhandled input address range fault (11) at 0x20202020202034, esr 0x92000004
[ 2461.635175] pgd = ffffffcfab3f3000
[ 2461.675979] [20202020202034] *pgd=0000000000000000

[ 2461.733566] CPU: 4 PID: 558 Comm: irqbalance Not tainted 3.13.0-57-generic #95-Ubuntu
[ 2461.733570] task: ffffffcfa9cdcd00 ti: ffffffcfa9df8000 task.ti: ffffffcfa9df8000
[ 2461.733577] PC is at 0x40605c
[ 2461.733580] LR is at 0x4040e4
[ 2461.733582] pc : [<000000000040605c>] lr : [<00000000004040e4>] pstate: 80000000
[ 2461.733584] sp : 0000007fd95cf7a0
[ 2461.733585] x29: 0000007fd95cf7a0 x28: 000000000041a000
[ 2461.733588] x27: 000000000041a000 x26: 0000000000409510
[ 2461.733591] x25: 000000000041a000 x24: 0000000000405000
[ 2461.733593] x23: 000000000041acf8 x22: 000000000041a000
[ 2461.733596] x21: 0000000014ab0130 x20: 000000000041a000
[ 2461.733598] x19: 0000000014a9f0e0 x18: 0000000000000000
[ 2461.733601] x17: 0000007fa72118ec x16: 0000007fa75a72e0
[ 2461.733603] x15: 003bcfb11b54656b x14: 2030203020302030
[ 2461.733606] x13: 2030203020302030 x12: 2030203020302030
[ 2461.733608] x11: 2030203020302030 x10: 2030203020302030
[ 2461.733611] x9 : 2030203020302030 x8 : 0000000014a9bc80
[ 2461.733613] x7 : 0000000000000020 x6 : 0000000014a9bc90
[ 2461.733616] x5 : 0000000000000001 x4 : 0000007fa722a2a0
[ 2461.733618] x3 : 0000000014a9b880 x2 : 0000000000000001
[ 2461.733620] x1 : 4320202020202020 x0 : 000000003355000a

Revision history for this message
Ming Lei (tom-leiming) wrote :

> I prepared a wily package w/ the proposed upstream backport for testing:
> lp:~dannf/ubuntu/wily/irqbalance/lp1469214

> Unfortunately, I'm still seeing irqbalance crash even with this backport:

I guess you still test irqbalance on c33, looks that upgrade from trusty isn't good, and
I can see lots of this kind of falut in different processes(sshd, stress-ng, systemd...)
just after a fresh boot with irqbalance disabled(see attachment), and sounds like a bad upgrade.

If you verify the patch on trusty/utopic/vivid, it does fix the issue according to my tests.

Revision history for this message
Ming Lei (tom-leiming) wrote :

Dann,

I have figured out patches for fixing wily kernel, see following link:

         https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1474171/comments/4

so you can reproduce the issue on a totally clean wily distribution, :-)

dann frazier (dannf)
Changed in irqbalance (Ubuntu Wily):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package irqbalance - 1.0.6-3ubuntu3

---------------
irqbalance (1.0.6-3ubuntu3) wily; urgency=medium

  * d/p/NUMA-is-not-available-fix.patch: Avoid crashes when NUMA
    is not available. (LP: #1469214)

 -- dann frazier <email address hidden> Wed, 09 Sep 2015 17:35:26 -0600

Changed in irqbalance (Ubuntu Wily):
status: In Progress → Fix Released
dann frazier (dannf)
Changed in irqbalance (Ubuntu Vivid):
status: Triaged → In Progress
Changed in irqbalance (Ubuntu Trusty):
status: Triaged → In Progress
Changed in irqbalance (Ubuntu Utopic):
status: Triaged → Won't Fix
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Colin, or anyone else affected,

Accepted irqbalance into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/irqbalance/1.0.6-3ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in irqbalance (Ubuntu Vivid):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in irqbalance (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Hello Colin, or anyone else affected,

Accepted irqbalance into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/irqbalance/1.0.6-2ubuntu0.14.04.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi there,

I need access to the machine to test this, any hints on the machine name and how to access it would be useful. Thanks.

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Hi Colin, I believe you now have access to the necessary hardware, but please let me know if this is still an issue. Thanks.

Revision history for this message
Colin Ian King (colin-king) wrote :

I've tested 1.0.6-2ubuntu0.14.04.4 for several hours and the problem is fixed, I can't reproduce this at all.

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Great! Many thanks...

Revision history for this message
Colin Ian King (colin-king) wrote :

Andrew, I'm still testing it for vivid, will be done in a few hours.

Revision history for this message
Colin Ian King (colin-king) wrote :

Tested with vivid 1.0.6-3ubuntu1.1, bug is fixed

Revision history for this message
Colin Ian King (colin-king) wrote :

Tested with wily 1.0.6-3ubuntu3, bug fixed.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package irqbalance - 1.0.6-2ubuntu0.14.04.4

---------------
irqbalance (1.0.6-2ubuntu0.14.04.4) trusty; urgency=medium

  * d/p/NUMA-is-not-available-fix.patch: Avoid crashes when NUMA
    is not available. (LP: #1469214)

 -- dann frazier <email address hidden> Thu, 10 Sep 2015 13:11:21 -0600

Changed in irqbalance (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for irqbalance has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package irqbalance - 1.0.6-3ubuntu1.1

---------------
irqbalance (1.0.6-3ubuntu1.1) vivid; urgency=medium

  * d/p/NUMA-is-not-available-fix.patch: Avoid crashes when NUMA
    is not available. (LP: #1469214)

 -- dann frazier <email address hidden> Thu, 10 Sep 2015 13:01:56 -0600

Changed in irqbalance (Ubuntu Vivid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.