ubuntu_ftrace_smoke_test is a bit flaky especially on ARM64 node wright

Bug #1829963 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
High
Colin Ian King

Bug Description

Issue found on these nodes / releases:
 * X-hwe (P8 "modoc" , fail rate 1 out of 1 attempts)
 * X-hwe (ARM64 "wirght", fail rate 1 out of 1 attempts)
 * X-hwe (AMD64 "onibi" , fail rate 1 out of 2 attempts)
 * B (ARM64 "wirght", fail rate 1 out of 2 attempts)
 * B (i386 "pepe" , fail rate 1 out of 3 attempts)
 * B-hwe (ARM64 "wirght", fail rate 2 out of 3 attempts)

Test failed with:
   FAILED (SyS_write or ksys_write traces found must be > 32, got 0)
   FAILED (vfs_write traces found must be > 32, got 0)

(sometimes it's just SyS_write that's failing)

Changed in ubuntu-kernel-tests:
assignee: nobody → Colin Ian King (colin-king)
importance: Undecided → High
status: New → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :
Changed in ubuntu-kernel-tests:
status: In Progress → Fix Committed
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Repo updated on the jenkins server, commencing retest on all these affected nodes.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hello Colin,
this test will timeout with node wright on X-hwe:
 START ubuntu_ftrace_smoke_test.ftrace-smoke-test ubuntu_ftrace_smoke_test.ftrace-smoke-test timestamp=1558610023 localtime=May 23 11:13:43
 Persistent state client._record_indent now set to 2
 Persistent state client.unexpected_reboot now set to ('ubuntu_ftrace_smoke_test.ftrace-smoke-test', 'ubuntu_ftrace_smoke_test.ftrace-smoke-test')
 Running '/home/ubuntu/autotest/client/tests/ubuntu_ftrace_smoke_test/ubuntu_ftrace_smoke_test.sh'
 PASSED (CONFIG_FUNCTION_TRACER=y in /boot/config-4.15.0-51-generic)
 PASSED (CONFIG_FUNCTION_GRAPH_TRACER=y in /boot/config-4.15.0-51-generic)
 PASSED (CONFIG_STACK_TRACER=y in /boot/config-4.15.0-51-generic)
 PASSED (CONFIG_DYNAMIC_FTRACE=y in /boot/config-4.15.0-51-generic)
 PASSED all expected /sys/kernel/debug/tracing files exist
 PASSED (function_graph in /sys/kernel/debug/tracing/available_tracers)
 PASSED (function in /sys/kernel/debug/tracing/available_tracers)
 PASSED (nop in /sys/kernel/debug/tracing/available_tracers)
 PASSED (function can be enabled)
 TIMER END Thu May 23 11:14:45 UTC 2019
 TIMEOUT
 FAILED: aborting, timeout, took way too long to complete
Build was aborted

Changed in ubuntu-kernel-tests:
status: Fix Committed → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Now I can see this hang with node wright:

 Running '/home/ubuntu/autotest/client/tests/ubuntu_ftrace_smoke_test/ubuntu_ftrace_smoke_test.sh'
 PASSED (CONFIG_FUNCTION_TRACER=y in /boot/config-4.18.0-25-generic)
 PASSED (CONFIG_FUNCTION_GRAPH_TRACER=y in /boot/config-4.18.0-25-generic)
 PASSED (CONFIG_STACK_TRACER=y in /boot/config-4.18.0-25-generic)
 PASSED (CONFIG_DYNAMIC_FTRACE=y in /boot/config-4.18.0-25-generic)
 PASSED all expected /sys/kernel/debug/tracing files exist
 PASSED (function_graph in /sys/kernel/debug/tracing/available_tracers)
 PASSED (function in /sys/kernel/debug/tracing/available_tracers)
 PASSED (nop in /sys/kernel/debug/tracing/available_tracers)
 PASSED (function can be enabled)
 TIMER END Wed Jul 3 01:49:00 UTC 2019
 TIMEOUT
 FAILED: aborting, timeout, took way too long to complete

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This is the syslog for node wright being tested with 4.18 Bionic, with the smoke test SRU test plan.

tags: added: arm64 ubuntu-ftrace-smoke-test
Revision history for this message
Colin Ian King (colin-king) wrote :

is this still an issue?

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hi,
yes it's still there.

Sorry for the delay,
I can still see this failing on Bionic 4.15.0-58 with node wright for a test finished 30hrs ago.

 Running '/home/ubuntu/autotest/client/tests/ubuntu_ftrace_smoke_test/ubuntu_ftrace_smoke_test.sh'
 [stdout] PASSED (CONFIG_FUNCTION_TRACER=y in /boot/config-4.15.0-58-generic)
 [stdout] PASSED (CONFIG_FUNCTION_GRAPH_TRACER=y in /boot/config-4.15.0-58-generic)
 [stdout] PASSED (CONFIG_STACK_TRACER=y in /boot/config-4.15.0-58-generic)
 [stdout] PASSED (CONFIG_DYNAMIC_FTRACE=y in /boot/config-4.15.0-58-generic)
 [stdout] PASSED all expected /sys/kernel/debug/tracing files exist
 [stdout] PASSED (function_graph in /sys/kernel/debug/tracing/available_tracers)
 [stdout] PASSED (function in /sys/kernel/debug/tracing/available_tracers)
 [stdout] PASSED (nop in /sys/kernel/debug/tracing/available_tracers)
 [stdout] PASSED (function can be enabled)
 [stdout] TIMER END Mon Aug 12 07:36:20 UTC 2019
 [stdout] TIMEOUT
 [stdout] FAILED: aborting, timeout, took way too long to complete

Will get you access to it.

Changed in ubuntu-kernel-tests:
status: Incomplete → Confirmed
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

BTW, it has passed with B-hwe, X-hwe this time.

Revision history for this message
Colin Ian King (colin-king) wrote :

Clean boot, I can run this test 50+ times without it failing, so I'm thinking it's potentially an earlier test that also contributes to this broken state, for example:

Jul 3 01:32:31 wright-kernel kernel: [ 4069.650382] EXT4-fs (zram0): mounted filesystem with ordered data mode. Opts: (null)
Jul 3 01:35:15 wright-kernel kernel: [ 4233.613182] INFO: task sync:19483 blocked for more than 120 seconds.
Jul 3 01:35:15 wright-kernel kernel: [ 4233.619632] Tainted: P W OE 4.18.0-25-generic #26~18.04.1-Ubuntu
Jul 3 01:35:15 wright-kernel kernel: [ 4233.627312] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635254] sync D 0 19483 19256 0x00000000
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635262] Call trace:
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635274] __switch_to+0x94/0xd8
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635287] __schedule+0x2fc/0x8d8
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635296] schedule+0x2c/0x88
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635302] wb_wait_for_completion+0x8c/0xb8
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635305] sync_inodes_sb+0xb0/0x248
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635310] sync_inodes_one_sb+0x28/0x38
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635315] iterate_supers+0x9c/0x130
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635319] ksys_sync+0x50/0xc0
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635322] sys_sync+0x14/0x20
Jul 3 01:35:15 wright-kernel kernel: [ 4233.635325] el0_svc_naked+0x30/0x34
Jul 3 01:35:59 wright-kernel kernel: [ 4278.202106] zram0: detected capacity change from 43612372992 to 0
Jul 3 01:36:05 wright-kernel kernel: [ 4283.621628] zram0: detected capacity change from 0 to 43613421568

Changed in ubuntu-kernel-tests:
status: Confirmed → In Progress
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hmm,
maybe we can stress it with the smoke_tests test suite that we use for SRU.

Revision history for this message
Colin Ian King (colin-king) wrote :

I'm trying something like that now.

Perhaps you can point me to the exact tests we use for SRUs.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This is the full list for the smoke test we run:

    'smoke' : [
        'ubuntu_aufs_smoke_test',
        'ubuntu_stress_smoke_test',
        'ubuntu_zfs_smoke_test',
        'ubuntu_fan_smoke_test',
        'ubuntu_squashfs_smoke_test',
        'ubuntu_loop_smoke_test',
        'ubuntu_lttng_smoke_test',
        'ubuntu_blktrace_smoke_test',
        'ubuntu_sysdig_smoke_test',
        'ubuntu_kvm_smoke_test',
        'ubuntu_docker_smoke_test',
        'ubuntu_cve_kernel',
        'ubuntu_zram_smoke_test',
        'ubuntu_nbd_smoke_test',
        'ubuntu_quota_smoke_test',
        'ubuntu_ftrace_smoke_test',
        'ubuntu_overlayfs_smoke_test',
    ],

Unluckily, we can only skip the ubuntu_overlayfs_smoke_test.
From the kernel message in #9, it might be a good idea to start with ubuntu_zram_smoke_test and then this one. (Or just run the whole test suite)

Revision history for this message
Colin Ian King (colin-king) wrote :

I've re-run with those specific tests several times now and cannot trigger this issue. I'll keep on trying to see if I can get a reproducer on this.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

In this cycle I can see it failing with:
Bionic i386 generic / lowlatency with node pepe
Bionic ARM64 generic with node wright

08/16 16:49:53 DEBUG| utils:0116| Running '/home/ubuntu/autotest/client/tests/ubuntu_ftrace_smoke_test/ubuntu_ftrace_smoke_test.sh'
08/16 16:49:53 DEBUG| utils:0153| [stdout] PASSED (CONFIG_FUNCTION_TRACER=y in /boot/config-4.15.0-59-generic)
08/16 16:49:53 DEBUG| utils:0153| [stdout] PASSED (CONFIG_FUNCTION_GRAPH_TRACER=y in /boot/config-4.15.0-59-generic)
08/16 16:49:53 DEBUG| utils:0153| [stdout] PASSED (CONFIG_STACK_TRACER=y in /boot/config-4.15.0-59-generic)
08/16 16:49:53 DEBUG| utils:0153| [stdout] PASSED (CONFIG_DYNAMIC_FTRACE=y in /boot/config-4.15.0-59-generic)
08/16 16:49:55 DEBUG| utils:0153| [stdout] PASSED all expected /sys/kernel/debug/tracing files exist
08/16 16:49:55 DEBUG| utils:0153| [stdout] PASSED (function_graph in /sys/kernel/debug/tracing/available_tracers)
08/16 16:49:55 DEBUG| utils:0153| [stdout] PASSED (function in /sys/kernel/debug/tracing/available_tracers)
08/16 16:49:55 DEBUG| utils:0153| [stdout] PASSED (nop in /sys/kernel/debug/tracing/available_tracers)
08/16 16:49:58 DEBUG| utils:0153| [stdout] PASSED (function can be enabled)
08/16 16:50:55 DEBUG| utils:0153| [stdout] TIMER END Fri Aug 16 16:50:55 UTC 2019
08/16 16:50:55 DEBUG| utils:0153| [stdout] TIMEOUT
08/16 16:50:55 DEBUG| utils:0153| [stdout] FAILED: aborting, timeout, took way too long to complete

tags: added: sru-20190812
Revision history for this message
Sean Feole (sfeole) wrote :

Sam is this something that you have tried in Canonistack on an RM64 instance? I believe the hardware backend should be of the same vendor/type

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Canonicalstack? No these were gathered from the SRU test result.
So it is the ThunderX ARM64 node.

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Sam, can I get access on the ThunderX ARM64 node so I can debug this further?

Revision history for this message
Colin Ian King (colin-king) wrote :
Changed in ubuntu-kernel-tests:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.