pm_test failed to reboot system in graphics/2_auto_switch_card test case

Bug #1866597 reported by Alex Tu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Next Generation Checkbox (CLI)
Fix Released
High
Sylvain Pineau
OEM Priority Project
Fix Released
Critical
Alex Tu

Bug Description

on daily build plainbox-provider-certification-client 0.39.0+202002190811+111~ubuntu20.04.1
run checkbox plan "com.canonical.certification::client-cert-auto" by testflinger (afaik. it's coverred by checkbox remote)

pm_test failed to reboot system in graphics/2_auto_switch_card test case

but this issue not happens to locally run plan "com.canonical.certification::client-cert-auto"

04:06:11 -------------[ Running job 43 / 130. Estimated time left: unknown ]-------------
04:06:11 ---------[ Switch GPU to NVIDIA Corporation PCI ID 0x1d11 and reboot ]----------
04:06:11 ID: com.canonical.certification::graphics/2_auto_switch_card_PCI_ID_0x1d11
04:06:11 Category: com.canonical.plainbox::graphics
04:06:11 Waiting for the system to shut down or reboot...
04:06:11 ... 8< -------------------------------------------------------------------------
04:06:11 Info: selecting the nvidia profile
04:06:11 No protocol specified
04:06:11 Unable to init server: Could not connect: Connection refused
04:06:11 No protocol specified
04:06:11 Unable to init server: Could not connect: Connection refused
04:06:11 DEBUG Invoking username: ubuntu
04:06:11 DEBUG Arguments: Namespace(append=False, check_hardware_list=False, checkbox_respawn_cmd='/home/ubuntu/.cache/plainbox/sessions/generated_launcher-2020-03-08T20.02.27.session/CHECKBOX_DATA/__respawn_checkbox', fwts=False, hardware_delay=30, log_dir='/home/ubuntu/.cache/plainbox/sessions/generated_launcher-2020-03-08T20.02.27.session/CHECKBOX_DATA', log_filename='/home/ubuntu/.cache/plainbox/sessions/generated_launcher-2020-03-08T20.02.27.session/CHECKBOX_DATA/pm_test.reboot.1.log', log_level=10, log_level_str='debug', max_pm_time=300, min_pm_time=0, pm_delay=5, pm_operation='reboot', pm_timestamp=0, repetitions=1, silent=True, start=1583697950, suspends_before_reboot=0, total=1, wakeup=0)
04:06:11 DEBUG Extra Arguments: []
04:06:11 DEBUG Enabling autologin for this user...
04:06:11 DEBUG Enabling user to execute test as root...
04:06:11 DEBUG Executing: "sed -i -e '$a# Automatically added by pm.py\\nubuntu ALL=NOPASSWD: /usr/bin/python3' /etc/sudoers"...
04:06:11 DEBUG Writing desktop file ('/home/ubuntu/.config/autostart/pm_test.desktop')...
04:06:11 DEBUG
04:06:11 [Desktop Entry]
04:06:11 Name=reboot test
04:06:11 Comment=Verify reboot works properly
04:06:11 Exec=sudo /usr/bin/python3 /usr/lib/plainbox-provider-checkbox/bin/pm_test -r 0 -w 0 --hardware-delay 30 --pm-delay 5 --min-pm-time 0 --max-pm-time 300 --append --total 1 --start 1583697950 --pm-timestamp 1583697950 --silent --log-level=debug --log-dir=/home/ubuntu/.cache/plainbox/sessions/generated_launcher-2020-03-08T20.02.27.session/CHECKBOX_DATA --suspends-before-reboot=0 --checkbox-respawn-cmd=/home/ubuntu/.cache/plainbox/sessions/generated_launcher-2020-03-08T20.02.27.session/CHECKBOX_DATA/__respawn_checkbox reboot
04:06:11 Type=Application
04:06:11 X-GNOME-Autostart-enabled=true
04:06:11 Hidden=false
04:06:11
04:06:11 INFO reboot operations remaining: 1
06:06:20
06:06:20 ERROR: Output timeout reached! (7200s)

Related branches

Alex Tu (alextu)
description: updated
Changed in oem-priority:
assignee: nobody → Alex Tu (alextu)
importance: Undecided → Critical
Revision history for this message
Alex Tu (alextu) wrote :

I guess the clue is related these error message:

04:06:11 No protocol specified
04:06:11 Unable to init server: Could not connect: Connection refused
04:06:11 No protocol specified
04:06:11 Unable to init server: Could not connect: Connection refused

Revision history for this message
Alex Tu (alextu) wrote :

so far, it looks pm_test just stop in [1]
"
class CountdownDialog(Gtk.Dialog):
....
        super(CountdownDialog, self).__init__(title=title,
                                              buttons=buttons)
"

This issue not happens to com.canonical.certification::sru, because it's not using pm_test.

[1] https://git.launchpad.net/plainbox-provider-checkbox/tree/bin/pm_test#n477

Revision history for this message
Alex Tu (alextu) wrote :

and it looks not related to the value of 'normal_user'
I copy exist jobs to rewrote a simple job to simulate using pm_test by checkbox remote, and it can reproduce this issue which checkbox remote failed but local run is ok.

job:

plugin: shell
category_id: com.canonical.plainbox::stress
id: stress/reboot_10
requires: executable.name == 'fwts'
 executable.name == 'x-terminal-emulator'
command:
 set -x
 pm_test --checkbox-respawn-cmd $PLAINBOX_SESSION_SHARE/__respawn_checkbox -r 10 --silent --log-level=notset reboot --log-dir=$PLAINBOX_SESSION_SHARE
flags: noreturn
estimated_duration: 900
user: root
environ: PLAINBOX_SESSION_SHARE PM_TEST_DRY_RUN
_description:
 Stress reboot system (10 cycles)

plan:
id: stress-10-reboot-automated
unit: test plan
_name: Power Management reboot and power off stress tests (automated)
_description: Power Management reboot and power off stress tests (automated)
include:
    stress/reboot_10 certification-status=blocker
    stress/reboot_10_log

launcher and config:
https://pastebin.canonical.com/p/8sRyWWsbNt/

Revision history for this message
Alex Tu (alextu) wrote :

it seems that pm_test was not designed to be run on either over ssh ,checkbox remote or testflinger.

When I execute pm_test over ssh, there're error show:
No protocol specified
Unable to init server: Could not connect: Connection refused

But if I enable ForwardX11Trusted and ForwardX11, then the pm_test dialog shows to me locally and remote machine can be reboot successfully.

Revision history for this message
Alex Tu (alextu) wrote :

per-talked with spineau on irc, rebooting classic SUT remotely is not yet supported.

Alex Tu (alextu)
tags: added: lp1849446
Alex Tu (alextu)
Changed in oem-priority:
status: New → Triaged
Changed in plainbox-provider-certification-client:
assignee: nobody → Sylvain Pineau (sylvain-pineau)
status: New → In Progress
affects: plainbox-provider-certification-client → checkbox-ng
Changed in checkbox-ng:
importance: Undecided → High
Changed in checkbox-ng:
status: In Progress → Fix Committed
milestone: none → 1.8.0
Changed in checkbox-ng:
status: Fix Committed → Fix Released
Revision history for this message
Alex Tu (alextu) wrote :

per talked with spieau on irc,

the pm_test rebooting target machine by checkbox remote is not yet working.
This is a test case: https://pastebin.canonical.com/p/PKVYbSY9cc/

the normal_user can not be kept after slave resumes so master keep printing error message "Unable to determine invoking user" [1]

[1] https://git.launchpad.net/checkbox-ng/tree/plainbox/impl/execution.py#n281

Changed in checkbox-ng:
status: Fix Released → Confirmed
Changed in checkbox-ng:
status: Confirmed → In Progress
milestone: 1.8.0 → 1.9.0
Changed in checkbox-ng:
status: In Progress → Fix Committed
Revision history for this message
Pierre Equoy (pieq) wrote :

I used the launcher provided by Sylvain in

https://code.launchpad.net/~sylvain-pineau/checkbox-ng/+git/checkbox-ng/+merge/384713

and tried it with the version of Checkbox currently in the Testing PPA (i.e. the one that includes the required fixes), and it worked as expected:

==================================[ Results ]===================================
32.0kB [00:00, 712kB/s, file=python://stdout]
  job passed : Enumerate available system executables
  job passed : power-management/fwts_wakealarm
  job passed : power-management/poweroff
  job passed : power-management/poweroff-log-attach
  job passed : power-management/reboot
  job passed : power-management/reboot-log-attach

Marking as `cqa-verified`.

tags: added: cqa-verified
Changed in checkbox-ng:
status: Fix Committed → Fix Released
Rex Tsai (chihchun)
Changed in oem-priority:
status: Triaged → Fix Committed
Rex Tsai (chihchun)
tags: added: oem-priority
Rex Tsai (chihchun)
Changed in oem-priority:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.