Libvirt does not follow RESUME qemu monitor events. VMs remain in "paused" state forever

Bug #1097824 reported by Andres Lagar-Cavilla
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
High
Serge Hallyn
Precise
Fix Released
High
Serge Hallyn
Quantal
Won't Fix
High
Unassigned

Bug Description

=================================
SRU Justification:
1. Impact: if a Vm is paused over the monitor, and then resumed, libvirt will continue to report the running VM as paused.
2. Development fix: add a hook to follow the resume event
3. Stable fix: same as development fix
4. Test case: see below
5. Regression potential: an error in the hook could cause the above situation to cause a crash instead of libvirt following the VM resume. All regression tests passed with this fix.
=================================
If a qemu/KVM VM is paused through a monitor by manual issuing of the "stop" command, the state of the VM in libvirtd's view will transition to "paused". This is because libvirtd listens to "STOP" events in the JSON monitor. However, libvirt does not listen to RESUME events on any monitor. So, when the VM is resumed by manually issuing "cont", the internal state will remain as "paused" even though the VM is running.

Libvirt maintains its internal view of the state in sync for migration, etc. But without listening to RESUME events it cannot correctly cope with third parties issuing stop commands (such as GDB, virsh qemu-monitor-command, or software opening another QMP monitor).

This is verified to happen on Precise and Quantal's libvirt versions. Since it's a bug in upstream, I expect it to be faulty in Raring as well.

The upshot in Openstack is that VMs, even though running, will be reported as paused to nova. Due to (https://bugs.launchpad.net/nova/+bug/1097806), nova compute will erroneously destroy them. This is a nova-compute problem that is exacerbated by this bug.

Steps to Reproduce:
# virsh list
 Id Name State
----------------------------------------------------
 1 instance-00000020 running

# virsh qemu-monitor-command 1 '{"execute":"stop"}'
{"return":{},"id":"libvirt-10"}

# virsh list
 Id Name State
----------------------------------------------------
 1 instance-00000020 paused

# virsh qemu-monitor-command 1 '{"execute":"cont"}'
{"return":{},"id":"libvirt-11"}

# virsh list
 Id Name State
----------------------------------------------------
 1 instance-00000020 paused

(the state should be "running")

Another way to reproduce this is by if attaching GDB to qemu and start single-stepping, libvirt will drop dozens RESUME events and be mightily confused.

Client software like OpenStack will tag the VM as paused.

Upstream:
Reported to libvirt upstream: https://bugzilla.redhat.com/show_bug.cgi?id=892791
Fixed in libvirt's master git: http://libvirt.org/git/?p=libvirt.git;a=commit;h=aedfcce33e4c2f266668a39fd655574fe34f1265

I will attach a backport of the master branch fix to 0.9.13-0ubuntu12~cloud0

Revision history for this message
Andres Lagar-Cavilla (andreslc-x) wrote :
Revision history for this message
Andres Lagar-Cavilla (andreslc-x) wrote :

With the above patch:
# virsh list
 Id Name State
----------------------------------------------------
 1 instance-00000022 running

# virsh qemu-monitor-command 1 '{"execute":"stop"}'
{"return":{},"id":"libvirt-12"}

# virsh list
 Id Name State
----------------------------------------------------
 1 instance-00000022 paused

# virsh qemu-monitor-command 1 '{"execute":"cont"}'
{"return":{},"id":"libvirt-13"}

# virsh list
 Id Name State
----------------------------------------------------
 1 instance-00000022 running

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "handle_resume_0.9.13-0ubuntu12~cloud0.patch" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks very much for the info, Andres.

Changed in libvirt (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in libvirt (Ubuntu Precise):
status: New → Triaged
Changed in libvirt (Ubuntu Quantal):
status: New → Triaged
Changed in libvirt (Ubuntu Precise):
importance: Undecided → High
Changed in libvirt (Ubuntu Quantal):
importance: Undecided → High
Revision history for this message
Andres Lagar-Cavilla (andreslc-x) wrote :

Serge, no problem. What is the status for Raring? The upstream commit is not in 1.0.0. I bet the patch as is won't apply, should I rebase?

Thanks

Revision history for this message
Andres Lagar-Cavilla (andreslc-x) wrote :

The backport needs a small tweak for Raring/1.0.0

Revision history for this message
Andres Lagar-Cavilla (andreslc-x) wrote :

Serge, additional (and unexpected!) motivation to include this patch
http://www.redhat.com/archives/libvir-list/2013-January/msg01049.html

Thanks
Andres

Changed in libvirt (Ubuntu):
assignee: nobody → Serge Hallyn (serge-hallyn)
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 1.0.0-0ubuntu5

---------------
libvirt (1.0.0-0ubuntu5) raring; urgency=low

  * handle_resume_1.0.0-0ubuntu4.patch: Add RESUME event listener to qemu
    monitor (LP: #1097824)
  * build-work-around-broken-kernel-header: work around FTBFS due to a
    broken linux/if_bridge.h.
 -- Serge Hallyn <email address hidden> Wed, 16 Jan 2013 09:15:20 -0600

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(Due to upcoming 12.04.2 release on Feb 14, I've marked this to work on it on Feb 15)

Changed in libvirt (Ubuntu Precise):
assignee: nobody → Serge Hallyn (serge-hallyn)
status: Triaged → In Progress
description: updated
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Andres, or anyone else affected,

Accepted libvirt into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/libvirt/0.9.8-2ubuntu17.8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

verified in precise.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.9.8-2ubuntu17.8

---------------
libvirt (0.9.8-2ubuntu17.8) precise-proposed; urgency=low

  [ Adam Conrad ]
  * libvirt-bin.postinst: also put admin group members into the libvirtd
    group, to support systems installed before precise. (LP: #1124127)
  * libvirt-bin.postinst: use getent group instead of grep /etc/group

  [ Serge Hallyn ]
  * Update README.Debian:
    - we use libvirtd, not libvirt group (LP: #1095140)
    - we add users from sudo, not admin group, to libvirtd.
  * Handle two usb devices with same vendor/id (LP: #1082213)
    - ubuntu/qemu-Keep-list-of-USB-devices-attached-to-domains
    - ubuntu/usb-create-functions-to-search-usb-device
    - ubuntu/qemu-call-usb-search-function-for-hostdev

  [ Andres Lagar-Cavilla ]
  * Add RESUME event listener to qemu monitor (LP: #1097824)
    - ubuntu/handle_resume.patch

  [ Kirill Zaborsky ]
  * Add proper handling for EINTR signal (LP: #1092826)
    - ubuntu/fix-poll.patch
 -- Serge Hallyn <email address hidden> Thu, 21 Feb 2013 08:38:35 -0600

Changed in libvirt (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Rolf Leggewie (r0lf) wrote :

quantal has seen the end of its life and is no longer receiving any updates. Marking the quantal task for this ticket as "Won't Fix".

Changed in libvirt (Ubuntu Quantal):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.