On Sat, Jul 4, 2015 at 12:43 AM, Colin Ian King
<email address hidden> wrote:
> I was able to hit the following translation fault running sudo ./stress-
> ng --seq 0 -t 60 --syslog --metrics --times -v
I suggest to not run stress-ng as root, otherwise it can be less
serious because:
- root user can do bad things easily, and it is quite easy to kill any
of process
- in reality most of loads are run as non-root
If some system processes(irqbalance, systemd-*) are only killed
becasue stress-ng is running as root, it can be a low priority issue.
Otherwise we need pay close attention to the issue.
And I always run 'stress-ng' as ubuntu user without sudo, that may
be the reason why it is difficult for me to reproduce that.
Even with the two new approaches, it is still not easy for me to
reproduce that. I only see one time of translation fault by your
first approach(./stress-ng --seq 0 ...) in 6 hours, and can't trigger
that with your 2nd approach(by bash script).
Folllows the log[1] I triggered, and I think it is very likely a userspace
issue. From irqbalanc-dbgsym package, we can easily find 'PC is at
0x406078' is one address in text section, and it should be inside
function of 'place_irq_in_node' because the exec file isn't built as
relocation. One thing I still can't understand is that why the fault
address is '0x00000040' in the context.
[1]
[ 3616.333392] Bits 55-60 of /proc/PID/pagemap entries are about to
stop being page-shift some time soon. See the
linux/Documentation/vm/pagemap.txt for details.
[ 3616.333393] Bits 55-60 of /proc/PID/pagemap entries are about to
stop being page-shift some time soon. See the
linux/Documentation/vm/pagemap.txt for details.
[ 5316.367265] irqbalance[1457]: unhandled level 2 translation fault
(11) at 0x00000040, esr 0x92000006
[ 5316.476937] pgd = ffffffcfb5478000
[ 5316.520692] [00000040] *pgd=0000004fb4a3c003,
*pud=0000004fb4a3c003, *pmd=0000000000000000
[ 5316.620270]
[ 5316.638140] CPU: 7 PID: 1457 Comm: irqbalance Not tain-21-generic #21-Ubuntu
[ 5316.733212] Hardware name: HP ProLiant m400 Server Cartridge (DT)
[ 5316.806382] task: ffffffcfb55e6e40 ti: ffffffcfa72b0000 task.ti:
ffffffcfa72b0000
[ 5316.896258] PC is at 0x406078
[ 5316.931865] LR is at 0x404100
[ 5316.967457] pc : [<0000000000406078>] lr : [<0000000000404100>]
pstate: 20000000
[ 5317.056268] sp : 0000007fc07ff2d0
[ 5317.096038] x29: 0000007fc07ff2d0 x28: 00000000004095a0
[ 5317.160023] x27: 0000000000409548 x26: 000000000041a000
[ 5317.223897] x25: 0000000000405000 x24: 000000000041acf8
[ 5317.287868] x23: 000000000041a000 x22: 000000000041a000
[ 5317.351841] x21: 000000002e0d6050 x20: 000000000041a000
[ 5317.415744] x19: 000000002e0e9020 x18: 0000000000000000
[ 5317.479620] x17: 0000007fb5ac287c x16: 000000000041a188
[ 5317.543490] x15: 003bdd2370f74a1c x14: 2030203020302030
[ 5317.607373] x13: 2030203020302030 x12: 2030203020302030
[ 5317.671263] x11: 2030203020302030 x10: 2030203020302030
[ 5317.735137] x9 : 00000000000000a0 x8 : 0000000000000001
[ 5317.799113] x7 : 0000000000000033 x6 : 000000002e0d6e08
[ 5317.862983] x5 : 0000000000000040 x4 : 0000000000000000
[ 5317.926867] x3 : 000000002e0d7008 x2 : 0000000000000000
[ 5317.990840] x1 : 000000000000002c x0 : 0000000000000003
[ 5318.054713]
Hi Colin,
On Sat, Jul 4, 2015 at 12:43 AM, Colin Ian King
<email address hidden> wrote:
> I was able to hit the following translation fault running sudo ./stress-
> ng --seq 0 -t 60 --syslog --metrics --times -v
I suggest to not run stress-ng as root, otherwise it can be less
serious because:
- root user can do bad things easily, and it is quite easy to kill any
of process
- in reality most of loads are run as non-root
If some system processes( irqbalance, systemd-*) are only killed
becasue stress-ng is running as root, it can be a low priority issue.
Otherwise we need pay close attention to the issue.
And I always run 'stress-ng' as ubuntu user without sudo, that may
be the reason why it is difficult for me to reproduce that.
Even with the two new approaches, it is still not easy for me to ./stress- ng --seq 0 ...) in 6 hours, and can't trigger
reproduce that. I only see one time of translation fault by your
first approach(
that with your 2nd approach(by bash script).
Folllows the log[1] I triggered, and I think it is very likely a userspace
issue. From irqbalanc-dbgsym package, we can easily find 'PC is at
0x406078' is one address in text section, and it should be inside
function of 'place_irq_in_node' because the exec file isn't built as
relocation. One thing I still can't understand is that why the fault
address is '0x00000040' in the context.
[1] tion/vm/ pagemap. txt for details. tion/vm/ pagemap. txt for details. a3c003, a3c003, *pmd=0000000000 000000 078>] lr : [<0000000000404 100>]
[ 3616.333392] Bits 55-60 of /proc/PID/pagemap entries are about to
stop being page-shift some time soon. See the
linux/Documenta
[ 3616.333393] Bits 55-60 of /proc/PID/pagemap entries are about to
stop being page-shift some time soon. See the
linux/Documenta
[ 5316.367265] irqbalance[1457]: unhandled level 2 translation fault
(11) at 0x00000040, esr 0x92000006
[ 5316.476937] pgd = ffffffcfb5478000
[ 5316.520692] [00000040] *pgd=0000004fb4
*pud=0000004fb4
[ 5316.620270]
[ 5316.638140] CPU: 7 PID: 1457 Comm: irqbalance Not tain-21-generic #21-Ubuntu
[ 5316.733212] Hardware name: HP ProLiant m400 Server Cartridge (DT)
[ 5316.806382] task: ffffffcfb55e6e40 ti: ffffffcfa72b0000 task.ti:
ffffffcfa72b0000
[ 5316.896258] PC is at 0x406078
[ 5316.931865] LR is at 0x404100
[ 5316.967457] pc : [<0000000000406
pstate: 20000000
[ 5317.056268] sp : 0000007fc07ff2d0
[ 5317.096038] x29: 0000007fc07ff2d0 x28: 00000000004095a0
[ 5317.160023] x27: 0000000000409548 x26: 000000000041a000
[ 5317.223897] x25: 0000000000405000 x24: 000000000041acf8
[ 5317.287868] x23: 000000000041a000 x22: 000000000041a000
[ 5317.351841] x21: 000000002e0d6050 x20: 000000000041a000
[ 5317.415744] x19: 000000002e0e9020 x18: 0000000000000000
[ 5317.479620] x17: 0000007fb5ac287c x16: 000000000041a188
[ 5317.543490] x15: 003bdd2370f74a1c x14: 2030203020302030
[ 5317.607373] x13: 2030203020302030 x12: 2030203020302030
[ 5317.671263] x11: 2030203020302030 x10: 2030203020302030
[ 5317.735137] x9 : 00000000000000a0 x8 : 0000000000000001
[ 5317.799113] x7 : 0000000000000033 x6 : 000000002e0d6e08
[ 5317.862983] x5 : 0000000000000040 x4 : 0000000000000000
[ 5317.926867] x3 : 000000002e0d7008 x2 : 0000000000000000
[ 5317.990840] x1 : 000000000000002c x0 : 0000000000000003
[ 5318.054713]