Precise to Trusty live migration failing

Bug #1536331 reported by Jacob Godin
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Fix Released
High
Unassigned

Bug Description

[Impact]

 * Migration from Precise to Trusty not working

 * Fix by dropping a duplicate alias which lets the Trusty qemu fail to
   properly detect and handle the type of the incoming guest.

[Test Case]

 * Prepare a Precise and a KVM Host capable to run KVM and reaching each
   other for live migration.
 * Spawn a guest on Precise e.g. with uvt
   $ uvt-kvm create --password=ubuntu kvmguest-precise release=precise \
     arch=amd64 label=daily
 * Migrate that guest to the Trusty host like:
     virsh migrate --live kvmguest-precise "qemu+ssh://10.0.4.193/system
 * Note, here is a more complex test for it that takes care of setting up
   the rest of the details that can be needed (Host/Env setup): https://code.launchpad.net/~ubuntu-server/ubuntu/+source/qemu-migration-test/+git/qemu-migration-test/+ref/master

[Regression Potential]

 * We are modifying machine types which can affect:
   - migrations from former / to future types
   - upgrades (starting a guest formerly working fails after upgrade)
   - guest view of the virtual system as defined by the machine type

 * We tested for all of these, see the logs attached - but there might
   always be a miss so the more testing the better.

[Other Info]

 * n/a

---

Even though #1291321 was supposed to fix this issue, I'm running into live migration issues when attempting to move a VM from a Precise host to a Trusty host. The VM is using the pc-1.0-qemu-kvm machine type as required.

Process:
/usr/bin/kvm -name instance-## -S -machine pc-1.0-qemu-kvm,accel=kvm -cpu kvm64,+lahf_lm,+rdtscp,+avx,+xsave,+aes,+tsc-deadline,+popcnt,+x2apic,+sse4.2,+sse4.1,+ssse3,+pclmuldq -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid XX -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=2013.1.5,serial=4c4c4544-0038-4810-804d-b6c04f515731,uuid=XX -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-##.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,clock=vm,driftfix=slew -no-kvm-pit-reinjection -no-hpet -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/XX/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/disk/by-id/dm-uuid-mpath-3600144f07a6e924e000056950d010025,if=none,id=drive-virtio-disk1,format=raw,serial=7903c916-8aba-4ea3-b38c-964d104cdada,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/var/lib/nova/instances/XX/disk.config,if=none,id=drive-ide0-1-1,readonly=on,format=raw,cache=none -device ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev tap,fd=88,id=hostnet0,vhost=on,vhostfd=90 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:c8:14:90,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/XX/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:53 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

Source version: 1.0+noroms-0ubuntu14.22
Target version: 2.0.0+dfsg-2ubuntu1.21

Error when migrating:
...
char device redirected to /dev/pts/3 (label charserial1)
Length mismatch: vga.vram: 1000000 in != 800000
qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1536331] [NEW] Precise to Trusty live migration failing

Thanks for reporting this bug - I will work on reproducing later today.

Revision history for this message
Jacob Godin (jacobgodin) wrote :

Hi Serge,

Were you able to reproduce? Let me know if you need any more info.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Confirmed. The same thing happens for me, but I don't see why. I see nothing in the debdiff for qemu to explain it.

Changed in qemu (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Reverting to older qemu and libvirt on the target trusty machine also does not help.

Revision history for this message
Jacob Godin (jacobgodin) wrote :

Serge, wanted to see if you've made any headway here. It's still a blocker for upgrading

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1536331] Re: Precise to Trusty live migration failing

I'm afraid I've not yet made any headway.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Ok, so reverting the qemu-kvm on the precise source host to 1.0+noroms-0ubuntu14.18 seems to fix it. (And then using pc-1.0 machine type and setting allow_incoming_qemukvm = 1 on the destination)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Well this is unnerving. After re-installing the current versions of qemu-kvm on the source host, it continues to work.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Ok, so two things I've found are that

1. tcg emulated vms can't be migrated (this doesn't apply to you)
2. pc-1.0-qemu-kvm as the source machine type doesn't actually work. What works for me is when i set the machine type to pc-1.0, then set allow_incoming_qemukvm=1 in /etc/libvirt/qemu.conf on the target host.

Can you try and see whether that helps in your case?

Changed in qemu (Ubuntu):
assignee: nobody → ChristianEhrhardt (paelzer)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Testing I ran recently ran into the same issue - setting the recommended machine type and qemu conf did not fix the issue for me.

Changed in qemu (Ubuntu):
assignee: ChristianEhrhardt (paelzer) → nobody
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Old bug was bug 1291321
That was closed and passed regression testing and all that, but it seems something was lost.
Quoting Serge from the old closed bug:

"This doesn't really make sense, but the patch adding the pc-1.0-qemu-kvm machine
type seems to be also adding pc-1.0-qemu-kvm as an alias for pc-1.0, all the way
back to the original version where we introduced the patch. Which doesn't really make sense as the package did pass SRU testing.

When I build a new package without that extra alias, the migration gets further, though it then stops on

Unknown savevm section or instance 'kvm-tpr-opt' 0"

We will keep the discussion in this bug - that is why I transferred the latest statement of Serge here.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was analyzing the case for myself now, here my notes as documentation.

The default type in Precise is (now): pc-1.0-qemu-kvm
The logic in Trusty's libvirt to receive is in src/qemu/qemu_migration.c:
  IF TYPE = pc-1.0 && allow_incoming_qemukvm
    THEN SET TYPE => pc-1.0-precise

But as shown above the type is pc-1.0-qemu-kvm in Precise, so the upper logic doesn't apply
This is in patch: debian/patches/define-qemu-kvm-mt
In since 1.0+noroms-0ubuntu14.18 due to LP: #1374612

That would still be fine as on Precise pc-1.0-qemu-kvm is a type with alias pc-1.0-precise.
So this is the Distribution specific type. Only Trusty has to agree on the definition of the type.

On Trusty this seems wrong, here I see double pc-1.0-qemu-kvm:
kvm -M ? | egrep '(precise|1.0)'
pc-1.0-qemu-kvm Standard PC (i440FX + PIIX, 1996) (alias of pc-1.0)
pc-1.0 Standard PC (i440FX + PIIX, 1996)
pc-1.0-precise Standard PC (i440FX + PIIX, 1996) (alias of pc-1.0-qemu-kvm)
pc-1.0-qemu-kvm Standard PC (i440FX + PIIX, 1996)

One is the alias of pc-1.0, and one is a type on its own.
Remember on Precise this looks like:
pc-1.0-precise Ubuntu 12.04 Standard PC (alias of pc-1.0-qemu-kvm)
pc-1.0-qemu-kvm Ubuntu 12.04 Standard PC (default)
pc Standard PC (alias of pc-1.0)
pc-1.0 Standard PC

Which one is it picking on migration?
This code for Trusty is at: debian/patches/ubuntu/add-machine-type-pc-1.0-qemu-kvm-for-live-migrate-co.patch
And this is in since: 2.0.0+dfsg-2ubuntu1.6 due to LP: #1374612

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I tried setting pc-1.0 on the migration source as type as suggested by Serge.
But that didn't get it running for me with the current 14.30 verison.
Doing so with 1.0+noroms-0ubuntu14.18 on built locally and inserted into the test container.
But I still see the same migration failure - default machien types look the same as with 14.30.
I spawned "new" guests on 14.18 with default pc-1.0-qemu-kvm and pc-1.0 - but all those failed as well.

Then as Serge I also tried rebuilding qemu on Trusty without that double alias.
Still having 14.18 on the source I tried to migrate all three guests I had already (created with 14.30 modified to pc-1.0, created with 14.18 and created as pc-1.0 with 14.18) again, but they all fail.

I wanted to make sure my overall setup at least worked with the last known good states.
And last good state is what was published for bug 1374612.
P: 1.0+noroms-0ubuntu14.18
T: 2.0.0+dfsg-2ubuntu1.6
To ensure we cover all qemu packages installed with:
apt-get install $(dpkg -l '*qemu*' | awk '/ii qemu/ {printf("%s=2.0.0+dfsg-2ubuntu1.6 ", $2)}')
apt-get install $(dpkg -l '*qemu*' | awk '/ii qemu/ {printf("%s=1.0+noroms-0ubuntu14.18 ", $2)}')
That would be exactly the combination that the old bug was verified with.
But even that was failing as well for me.

To round things up I checked which libvirt versions where active back then and prepared matching old libvirt packages as well.
P: 0.9.8-2ubuntu17.19
T: 1.2.2-0ubuntu13.1.6
Note: has to be build with DEB_BUILD_OPTIONS="nocheck" these days.
apt-get install $(dpkg -l '*libvirt*' | awk '/ii libvirt/ {printf("%s=0.9.8-2ubuntu17.19 ", $2)}')
apt-get install $(dpkg -l '*libvirt*' | awk '/ii libvirt/ {printf("%s=1.2.2-0ubuntu13.1.6 ", $2)}')
This still failed to migrate any of my three guests.

I was afraid I might miss something obvious and checked my setup once more.
And I was right and found that kvm-ipxe-precise is not installed as it is only a suggest on Trusty.
That could explain my failures, trying with that installed next.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

OTOH it might be good that I set up all these combinations already before thinking of kvm-ipxe-precise - that provided good coverage now :-)

So with that set up and kvm-ipxe-precise installed on the target I get this combination of test results. Always keeping the combination of qemu/libvirt that belonged together - so old means qemu&libvirt of the 1374612 fix, while new is as of today.

Define:
P(old) libvirt 0.9.8-2ubuntu17.19 qemu 1.0+noroms-0ubuntu14.18
P(new) libvirt 0.9.8-2ubuntu17.23 qemu 1.0+noroms-0ubuntu14.30
T(old) libvirt 1.2.2-0ubuntu13.1.6 qemu 2.0.0+dfsg-2ubuntu1.6
T(new) libvirt 1.2.2-0ubuntu13.1.19 qemu 2.0.0+dfsg-2ubuntu1.28
T(no alias) libvirt 1.2.2-0ubuntu13.1.19 qemu 2.0.0+dfsg-2ubuntu1.29
.29 is a local build dropping the double alias in Trusty qemu

Testing migrations types: converted-to-pc-1.0 / created-as-pc-1.0 / pc-1.0-qemu-kvm
P(old) -> T(old): working / working / fail
P(old) -> T(new): working / working / fail
P(new) -> T(new): working / working / fail
P(new) -> T(no alias): working / working / working

So it seems not to be a regression at least.
Instead back then fixing the migration of the former default pc-1.0 type it was missed, that the newly introduced pc-1.0-qemu-kvm/pc-1.0-precise did not migrate.
That would also explain why things calmed down (all former started guests migrated fine).
But later then reports came in - as the later started guests got the new pc-1.0-qemu-kvm type by default.

Fortunately it seems that dropping the false alias in Trusty fixes the migration for all cases.
Building that in a ppa now and preparing for a wider migration test run (e.g. from T->X with that) using my migration testsuite at https://code.launchpad.net/~ubuntu-server/ubuntu/+source/qemu-migration-test/+git/qemu-migration-test/+ref/master

ppa builds will be in https://launchpad.net/~paelzer/+archive/ubuntu/qemu-machine-type-dev

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It seems to work through all my tests.
But I don't want to rush it into the weekend.
If anyone can try the ppa at https://launchpad.net/~paelzer/+archive/ubuntu/qemu-machine-type-dev please do so.

I'll get back to this on Monday.

description: updated
Changed in qemu (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Logs contain:
migrations inside trusty, pre/post upgrade, start/stop pre post upgrade and type checks:
- test_x86_bug_1536331-migration.*

migrating cross releases
- test_x86_bug_1536331-crossmigration.*

qa test from the test ppa (those can be a bit unstable at times, but no new issues added - therefore pre/post logs):
- test_x86_bug_1536331-qa-.*

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I uploaded for review by the SRU Team - here as reference the debdiff

Changed in qemu (Ubuntu):
status: In Progress → Fix Committed
status: Fix Committed → In Progress
tags: added: patch
Revision history for this message
Robie Basak (racb) wrote :

I created the Trusty task for you. If you can't do it yourself (I think maybe you don't have the right permission bits?) please ask in #ubuntu-bugs.

This is unfortunately blocked by qemu 2.0.0+dfsg-2ubuntu1.28 which is still in trusty-proposed in bug 1606940.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Robie, I didn't realize that .28 is still blocking.
Thanks for pining on it - I hope Ryan still has the test env to get this unblocked.

And yes, I was only able to create tasks >=X - thanks to point me where to ask for it.

Changed in qemu (Ubuntu Trusty):
status: New → In Progress
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We were able to clear bug 1606940, time to get this one rolling to fully unblock the Trusty queue for qemu.

Since this was blocked before it could be skipped when the SRU team passes the unapproved queue.
Therefore subscribing the SRU Team to make them aware that this can now progress.

Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Jacob, or anyone else affected,

Accepted qemu into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/2.0.0+dfsg-2ubuntu1.29 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in qemu (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 2.0.0+dfsg-2ubuntu1.29

---------------
qemu (2.0.0+dfsg-2ubuntu1.29) trusty; urgency=medium

  * Drop pc-1.0-qemu-kvm alias to pc-1.0, which is a duplicate id to the
    pc-1.0-qemu-kvm type, to fix migration from precise (LP: #1536331).

 -- Christian Ehrhardt <email address hidden> Mon, 10 Oct 2016 09:06:28 +0200

Changed in qemu (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Changed in qemu (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.