port allocator allocates the same SPICE port for multiple guests (race condition)

Bug #1697729 reported by Aaron Johnson
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
High
Unassigned
Ocata
Fix Released
High
Unassigned
Pike
Fix Released
High
Unassigned
libvirt (Ubuntu)
Fix Released
High
Christian Ehrhardt 
Zesty
Fix Released
High
Christian Ehrhardt 
Artful
Fix Released
High
Christian Ehrhardt 

Bug Description

[Impact]

 * VMs start to fail depending on a race around spice port allocation

 * Solution is the Backport of an upstream fix that avoids a double
   release on the ports

[Test Case]

 * Prepare a set of VMs using spice and start them concurrently.
$ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 label=daily release=xenial
$ sed 's/vnc/spice/' /usr/share/uvtool/libvirt/template.xml > spice-template.xml
$ for idx in {1..20}; do uvt-kvm create --template spice-template.xml --password=ubuntu test-${idx} release=xenial arch=amd64 label=daily; done
$ for idx in {1..20}; do virsh shutdown test-${idx}; done
# wait until all are gone
$ for idx in {1..20}; do (virsh start test-${idx} &); done
$ for idx in {1..20}; do virsh domdisplay test-${idx} ; done | sort

* expectation - all work, ports are used one by one
* current status - failing to intialize:
  error: internal error: process exited while connecting to monitor: ((null):31733): Spice-Warning **: reds.c:2493:reds_init_socket: reds_init_socket: binding socket to 127.0.0.1:5901 failed

[Regression Potential]

 * It is race after all, so we might miss some corner cases in the
   testing, but reviewing the patch and given the verifications so far it
   should be safe. From the patch the change is like:
     Old: Spice-Init -> Cleanup -> Release [...] QemuStop -> Release
                                             ^
                If new alloc in this time it was released unintentionally
     New: Spice-Init -> Fail [...] QemuStop -> Release
   This eliminates the race, but still releases the port as intended.

 * This change only affects users of spice ports.

[Other Info]

 * n/a

---

Using the UCA ocata release of libvirt we sporatically recieve this error message in nova-compute.log:

2017-06-12 14:32:54.359 19007 ERROR nova.compute.manager [instance: d1af2a13-0a53-4d9c-ada3-683e4973f28a] libvirtError: internal error: process exited while connecting to monitor: ((null):63256): Spice-Warning **: reds.c:2463:reds_init_socket: reds_init_socket: binding socket to 10.141.112.21:5900 failed

Please backport the fix for the following bug into UCA ocata/pike releases:
https://bugzilla.redhat.com/show_bug.cgi?id=1397440

The patch is documented here:
https://www.spinics.net/linux/fedora/libvir/msg144093.html

We've tested backporting this same fix using the ocata UCA libvirt 2.5.0-3ubuntu5~cloud0 source package and it fixes the problem for us.

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

Checked libvirt source from Artful:

$ fgrep -r "virPortAllocatorRelease(driver->remotePorts, port);" *
src/qemu/qemu_process.c: virPortAllocatorRelease(driver->remotePorts, port);

Changed in libvirt (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Changed in libvirt (Ubuntu Zesty):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
James Page (james-page) wrote :

Also confirmed on Zesty libvirt

Changed in libvirt (Ubuntu Zesty):
status: Confirmed → Triaged
Changed in libvirt (Ubuntu Artful):
status: Confirmed → Triaged
James Page (james-page)
tags: added: openstack
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for all the pre-work Aaron!
Thanks James for refiling against libvirt and pinging me.

Lets test in a ppa and run all sorts of regression tests against it this week.
If all works out I should be able to push to Artful early next week and consider SRUs from there.

For now I build the fix in a ppa at [1] (atm still building).
If one could please test that PPA against your original case that would be great.

Also if you want to help please help me adding a SRU template here, you can find an empty one at [2] and every bit you can fill helps me to get that fixed sooner.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2815
[2]: https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template

Changed in libvirt (Ubuntu Artful):
assignee: nobody → ChristianEhrhardt (paelzer)
status: Triaged → In Progress
Revision history for this message
Aaron Johnson (acjohnson) wrote :

[Impact]

 * libvirt sporadically throws the following error and can be seen in nova-compute.log:

2017-06-12 14:32:54.359 19007 ERROR nova.compute.manager [instance: d1af2a13-0a53-4d9c-ada3-683e4973f28a] libvirtError: internal error: process exited while connecting to monitor: ((null):63256): Spice-Warning **: reds.c:2463:reds_init_socket: reds_init_socket: binding socket to 10.141.112.21:5900 failed

 * This should be backported to improve the supportability of the SPICE console in OpenStack

[Test Case]

 * Create multiple instances with nova-compute (via horizon or openstack cli) using the spice-html5 console and watch your instances attempt to re-use already bound ports...

[Regression Potential]

 * Unknown regression potential but worth pointing out that this patch was used to fix this bug in the libvirt-3.1.0-1.el7 package release...

[Other Info]

 * We ran in to this bug as a result of the openstack ansible project using spice-html5 as the default console viewer in ocata (not sure when they switched from novnc to spice as default).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In theory this could exist back to Xenial as well (the double cleanup is there), but the cleanup code was changed later which could affect the race. Also we so far got no reports on this pre Ocata/Zesty so for now do not push further back than Zesty.
From the Upstream/Redhat Tests it was also confirmed that pre libvirt 2.4 it is not reproducible, that confirms our timeline of Zesty/Artful, but not further back for now.

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In Artful-proposed [1] now, and Zesty SRU is prepared in [2] for regression tests series.

[1]: http://people.canonical.com/~ubuntu-archive/proposed-migration/update_excuses.html
[2]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2825

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Manual tests on Zesty succeeded, regression tests running.

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Actually as lessons learned we need a somewhat extended concurrent start test - with more guest options ... looking into that now.

Revision history for this message
James Page (james-page) wrote : Please test proposed package

Hello Aaron, or anyone else affected,

Accepted libvirt into pike-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:pike-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-pike-needed to verification-pike-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-pike-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-pike-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 2.5.0-3ubuntu9

---------------
libvirt (2.5.0-3ubuntu9) artful; urgency=medium

  * d/p/ubuntu/qemu_process-spice-don-t-release-used-port.patch: qemu_process
    spice: don't release used port (LP: #1697729) - upstream in libvirt 3.1.

 -- Christian Ehrhardt <email address hidden> Wed, 14 Jun 2017 14:49:16 +0200

Changed in libvirt (Ubuntu Artful):
status: In Progress → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - the dependent cockpit test had a few unrelated hickups. I discussed with pitti who maintains cockpit and we agreed that short term retrying until passed is ok, but he will look into it. We shared a bit of the local repro experience and it passed for artful on the second retry.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI qemu/libvirt tests now have this and a few other concurrent start/stop tests to check for known (this) and unknown races - the architecture should be easy to extend for more cases as we want to add them (modify uvt template + function = test).
Stage 3 is not yet part of daily but will become added once it passed a certain maturity level.

Changed in libvirt (Ubuntu Zesty):
status: Triaged → In Progress
assignee: nobody → ChristianEhrhardt (paelzer)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Tests on Zesty are good as well, we have:
- regression tests on ppa ok
- bug fix on ppa confirmed
- added case to regular testing
- dep8's seem to be happy as well [1]
- SRU Template is complete

That said pushing for SRU review now [2].

[1]: https://bileto.ubuntu.com/excuses/2825/zesty.html
[2]: https://launchpad.net/ubuntu/zesty/+queue?queue_state=1

Revision history for this message
Chris Halse Rogers (raof) wrote :

Hello Aaron, or anyone else affected,

Accepted libvirt into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/2.5.0-3ubuntu5.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Zesty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Tested proposed, working as it did in the PPA.
Setting verification-done.

@Aaron - if you can please verify as well in your context.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Aaron Johnson (acjohnson) wrote :

I would like to test this but right now we are running Xenial...

Is there any chance you could get this pushed into xenial-proposed as well?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: [Bug 1697729] Re: port allocator allocates the same SPICE port for multiple guests (race condition)

On Wed, Jun 21, 2017 at 6:49 PM, Aaron Johnson <email address hidden>
wrote:

> Is there any chance you could get this pushed into xenial-proposed as
> well?
>

Hi Aaron,
Xenial itself is not affected - you might be on a Ubuntu Cloud Archive on
Xenial and you can then test that.
James made one of them available already (Pike) - Ocata might follow soon.

Revision history for this message
Alexander J. Maidak (ajmaidak) wrote :

Hi,

I recreated the issue in our environment with: 2.5.0-3ubuntu5~cloud0. The first time I attempted to provision 10 instances at the same time many errorred out with the same error described in the bug report (qemu-system-x86_64: failed to initialize spice server)

I upgraded to: 2.5.0-3ubuntu9~cloud0 from pike-proposed.

I attempted to recreate the issue by provisioning 10 instances at once, I did this twice. All instances provisioned successfully without error. I believe 2.5.0-3ubuntu9~cloud0 has resolved this issue for us. Would be nice to see it go back to UCA ocata.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Per former comment also setting verification-done for Pike.
Also making clear in the tags that the general v-d was for zesty.

Thanks Alexander for checking.

tags: added: verification-pike-done verification-zesty-done
removed: verification-pike-needed
Revision history for this message
James Page (james-page) wrote : Please test proposed package

Hello Aaron, or anyone else affected,

Accepted libvirt into ocata-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ocata-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ocata-needed to verification-ocata-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ocata-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ocata-needed
Revision history for this message
James Page (james-page) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package libvirt - 2.5.0-3ubuntu9~cloud0
---------------

 libvirt (2.5.0-3ubuntu9~cloud0) xenial-pike; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 libvirt (2.5.0-3ubuntu9) artful; urgency=medium
 .
   * d/p/ubuntu/qemu_process-spice-don-t-release-used-port.patch: qemu_process
     spice: don't release used port (LP: #1697729) - upstream in libvirt 3.1.

tags: removed: verification-done
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Change of SRU verification policy

As part of a recent change in the Stable Release Update verification policy we would like to inform that for a bug to be considered verified for a given release a verification-done-$RELEASE tag needs to be added to the bug where $RELEASE is the name of the series the package that was tested (e.g. verification-done-xenial). Please note that the global 'verification-done' tag can no longer be used for this purpose.

Thank you!

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I can say that the package in zesty-proposed resolves the issue for me. I did not go through the test case but was happy to see this bug fixed in proposed (for some reason today I started losing the race all the time; annoying but thankful at the same time! :).

Revision history for this message
Alexander J. Maidak (ajmaidak) wrote :

Confirmed 2.5.0-3ubuntu9~cloud0 from ocata-proposed works for me.

Revision history for this message
Alexander J. Maidak (ajmaidak) wrote :

Sorry, My comment had the incorrect package version. I'm running 2.5.0-3ubuntu5.2~cloud0 from ocata proposed and it fixes the issue for me.

Revision history for this message
James Page (james-page) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package libvirt - 2.5.0-3ubuntu5.2~cloud0
---------------

 libvirt (2.5.0-3ubuntu5.2~cloud0) xenial-ocata; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 libvirt (2.5.0-3ubuntu5.2) zesty; urgency=medium
 .
   * d/p/ubuntu/qemu_process-spice-don-t-release-used-port.patch: qemu_process
     spice: don't release used port (LP: #1697729) - upstream in libvirt 3.1.

tags: added: verification-ocata-done
removed: verification-ocata-needed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

untwisted my verification tag - sorry

tags: added: verification-done-zesty
removed: verification-zesty-done
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 2.5.0-3ubuntu5.2

---------------
libvirt (2.5.0-3ubuntu5.2) zesty; urgency=medium

  * d/p/ubuntu/qemu_process-spice-don-t-release-used-port.patch: qemu_process
    spice: don't release used port (LP: #1697729) - upstream in libvirt 3.1.

 -- Christian Ehrhardt <email address hidden> Mon, 19 Jun 2017 07:52:32 +0200

Changed in libvirt (Ubuntu Zesty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.