qemu-guest-agent shutdown never reached - hangs

Bug #1881668 reported by Matthias Ferdinand
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Debian)
New
Unknown
qemu (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

( probably the same as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=951781 )

package qemu-guest-agent 1:4.2-3ubuntu6.1, Ubuntu 20.04

When initiating a VM shutdown via virt-manager menu using Virtual Machine / Shutdown / Shutdown, qemu-guest-agent inside the VM hits an assert, and systemd never quite sees the unit as stopped, CPU usage springs up to 100% and shutdown does not proceed. To stop the VM, you have to use "Force Stop".

Log messages:
    Jun 1 23:06:26 cvs qemu-ga: info: guest-shutdown called, mode: powerdown
    Jun 1 23:06:26 cvs qemu-ga[3480]: **
=> Jun 1 23:06:26 cvs qemu-ga[3480]: ERROR:/build/qemu-7aKH5L/qemu-4.2/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)
    Jun 1 23:06:26 cvs kernel: [84364.248385] systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart.
=> Jun 1 23:06:26 cvs qemu-ga[3480]: Bail out! ERROR:/build/qemu-7aKH5L/qemu-4.2/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)
    Jun 1 23:06:26 cvs kernel: [84364.264870] systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart.
    Jun 1 23:06:26 cvs systemd[1]: qemu-guest-agent.service: Succeeded.
    Jun 1 23:06:26 cvs systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart.
    Jun 1 23:07:27 cvs kernel: [84425.268423] systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart.
    Jun 1 23:07:27 cvs kernel: [84425.284391] systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart.
    Jun 1 23:07:27 cvs kernel: [84425.300401] systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart.
    Jun 1 23:07:27 cvs kernel: [84425.316382] systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart.
=> Jun 1 23:07:44 cvs systemd[1]: message repeated 2041319 times: [ qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart.]

affects: sendmail (Ubuntu) → qemu (Ubuntu)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Interesting Matthias, thanks for the report.

First I was trying to reproduce the case, taking a Focal system with the version of qemu-guest-agent you reported and then making sure it is installed, up and running:

● qemu-guest-agent.service - QEMU Guest Agent
     Loaded: loaded (/lib/systemd/system/qemu-guest-agent.service; static; vendor preset: enabled)
     Active: active (running) since Tue 2020-06-02 05:25:04 UTC; 3min 8s ago
   Main PID: 624 (qemu-ga)
      Tasks: 1 (limit: 533)
     Memory: 900.0K
     CGroup: /system.slice/qemu-guest-agent.service
             └─624 /usr/sbin/qemu-ga

Jun 02 05:25:04 focal systemd[1]: Started QEMU Guest Agent.

I was trailing main console, journal, dmesg and htop and passed a shutdown command through virsh.
  $ virsh shutdown focal --mode agent
The guest almost immediately went down, without issues.

In fact it was so fast that none of the outputs I was tracking spit out a message like the one you reported.

I was checking the logs afterwards and found:
Jun 02 05:25:04 focal systemd[1]: Started QEMU Guest Agent.
...
Jun 02 05:29:54 focal qemu-ga[624]: info: guest-shutdown called, mode: powerdown
Jun 02 05:29:54 focal qemu-ga[624]: **
Jun 02 05:29:54 focal qemu-ga[624]: ERROR:/build/qemu-74sXTC/qemu-4.2/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)
Jun 02 05:29:54 focal qemu-ga[624]: Bail out! ERROR:/build/qemu-74sXTC/qemu-4.2/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)
Jun 02 05:29:55 focal systemd[1]: Stopping QEMU Guest Agent...
Jun 02 05:29:55 focal systemd[1]: qemu-guest-agent.service: Succeeded.
Jun 02 05:29:55 focal systemd[1]: Stopped QEMU Guest Agent.

So yes there seems to be an issue in regard to this assertion, but it might be a red herring for your case of being a hang on shutdown. Things work fine with it being triggered in - at least - my system.

We'll have to find what is different in yours to then cause the hang.
Would you have a guest-xml of your guest so I can try to spot differences?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Unless we find that they are really the same issue let us track the assertion in bug 1878973 and this bug here for the hang.

Revision history for this message
Matthias Ferdinand (mf+ubuntu1) wrote :

Hi,

thanks for looking into this.

Apparently it is some race condition, but not a true heisenbug. I scripted vm start and shutdown (script attached), and I get a hang in about 1/3 of the cases.

Note: the script uses /usr/bin/retry from the package "retry".

Revision history for this message
Matthias Ferdinand (mf+ubuntu1) wrote :

and here is the libvirt xml for the VM

Changed in qemu (Debian):
status: Unknown → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was (for bug 1878973) this kind of shutdown quite often on multiple Ubuntu releases.
I generally do shutdown loops (just not with guest-agent) and in neither did I reproduce the hang :-/

Our Guest definitions aren't too different, mine is newer (not using the trusty type) but not much more differences. The /dev/k2/* devices maybe?

To be sure I was using your loop (thanks for providing) with slight modifications as I didn't have the retry tool around. But I gave up at:
  # hangs: 0, success: 24

I still consider it unlikely, but can you check if using the qemu from [1] in the guest makes any difference for you? I created it for the other bug anyway and it might be worth a try.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4081

summary: - qemu-guest-agent asserts on shutdown, shutdown never reached
+ qemu-guest-agent shutdown never reached - hangs
Revision history for this message
Matthias Ferdinand (mf+ubuntu1) wrote :

Hi,

using qemu-guest-agent from the given ppa, I did not get a single hang in 30 tries.

Re-installed the version from the Ubuntu focal repo, and I am back to
    # hangs: 10, success: 25

Host VG /dev/k2/is quite stacked, perhaps making it easier to run into some race condition: lvm on luks on bcache on mdraid. There are some other guest VMs that may generate quite a bit of I/O and CPU load at times, while the "cvs" VM is usually idle.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for checking the PPA.

Ok if the PPA really helps you then we can declare this bug a dup of 1878973.
The part of resolving that assert has gone further there.

Please participate there once we have a final fix to test, that would be awesome.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.