console stops echoing text and is wonky during checkbox-certification-server run

Bug #1164028 reported by Jeff Lane 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox
Fix Released
Medium
Jeff Lane 

Bug Description

Discovered this while doing test runs to try out the xen tests added to checkbox.

Basically, I have run checkbox-certification-server and deselected all tests except for the power-management and virtualization suites.

What happens is that checkbox runs the power-management tests. It then runs virtualization/kvm_ok and then virtualization/kvm_check_vm.

Now, after THIS test, what SHOULD happen is that a manual reboot tests appears so the tester can reboot into the Xen hypervisor and complete testing.

What ACTUALLY happens is that the console kinda dies. The manual test appears and is automatically skipped, cauisng the xen_ok test to fail because we haven't rebooted into Xen yet.

I've tested this by skipping the KVM tests altogether. When I skip them, the reboot test runs correctly and I am able to boot into Xen and do the xen tests and get passing results. As soon as I add KVM tests back in, I get the console failure.

This is seen on my 1U supermicro server, using 12.04.1 and checkbox built in my personal PPA to test the packages and installation of them.

I made no changes to the core of checkbox and checkbox-certification that would cause this.

It also occurs any time I run the KVM tests in checkbox, regardless of whether I run the Xen and reboot tests or no. The issue appears to be something that the xen_check_vm test is doing.

As a workaround, I have done the following:

Log into a console, start checkbox-certifification-server

log into a different console and do the following:

ps axf |grep checkbox

get the PID of the main checkbox-cert process

run this:
watch -n1 pstree -a PID

Switch back to the other console and continue with checkbox. At the test selection screen, de-select everything except for "power-management" and "virtualization".

once checkbox starts testing, switch to the 2nd console and watch the output of pstree until you see that the kvm tasks are no longer running (the line that says "python3 /usr/share/checkbox/scripts/virtualization kvm --debug" should disappear)

once that happens, I am able to successfully switch back to the first console, execute the reboot test, and go about my merry way.

So this seems to only happen if you stay active on the console that checkbox-certification-server is running on throughout the KVM testing.

Related branches

Revision history for this message
Jeff Lane  (bladernr) wrote :

I tried a couple things... if I run the kvm test script from console manually, it works fine and the console is not hosed.

However, as soon as I introduce the kvm_check_vm test into the test run, the console stops responding as described above while the kvm_check_vm test executes.

If I don't run that test, I can do the Xen test successfully. Otherwise, as mentioned above, kvm_check_vm runs, console goes nuts, this seems to be triggering a keypress that causes the manual reboot test to be skipped.

Revision history for this message
Jeff Lane  (bladernr) wrote :

the main issue appears to be that the KVM instance is being launched via Popen using shell=true, which isn't necessary. This causes it to launch in a subshell that grabs user input from the calling shell (the one running checkbox) and never releases it (because of the secondary issue below). I've fixed this by simply removing shell=true from the bit that launches the KVM instance of the Ubuntu cloud image.

Changed in checkbox:
assignee: nobody → Jeff Lane (bladernr)
status: New → Incomplete
status: Incomplete → Triaged
importance: Undecided → Medium
Revision history for this message
Jeff Lane  (bladernr) wrote :

Second issue discovered after fixing the first one is that the kvm instance is never killed.

The code looks like this:
if instance is not False:
                    time.sleep(self.timeout)
                    # Check to be sure VM boot was successful
                    if "END SSH HOST KEY KEYS" \
                    in open(self.debug_file, 'r').read():
                        print("Booted successfully", file=sys.stderr)
                        status = 0
                    else:
                        print("KVM instance failed to boot", file=sys.stderr)
                    self.process.terminate()
            else:
                print("Could not find: {}".format(self.image), file=sys.stderr)

self.process.terminate() isn't actually shutting down the VM. Also, the first part of that isn't working either, so status is never set to 0 and we never discover if the boot was successful.

I'm working on debugging this now.

Changed in checkbox:
status: Triaged → In Progress
Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

I'm seeing the same failure here with the console being hijacked is is causing an issue with checkbox.

With that said,

1.) The shell=True argument is in fact necessary, otherwise the generated VM boot messages are never forwarded to the virt.debug log.
2.) Problem number 1 results in the fact that the test will never be successful due to the VM console output never being available anywhere due to the omission of shell=True.
3.) Killing the VM unfortunately doesn't fix any of this.

So, with that said we have a few options

1.) Figure out where we can redirect the console output of the VM without disrupting or hijacking the console terminal in which the python script runs.
2.) Configure the cloud data supplied to VM to create an ssh key and IP address so that we can connect to it later, or some other cloud-init verification means.

Revision history for this message
Jeff Lane  (bladernr) wrote :

fixed after we did a lot of debug and trial and error to figure out why the KVM test was hijacking the console stdin.

Changed in checkbox:
status: In Progress → Fix Committed
Changed in checkbox:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.