vhost-user: qemu stops processing packets under high load of traffic

Bug #1556306 reported by Vincent JARDIN
26
This bug affects 2 people
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Fix Released
High
Unassigned
Trusty
Won't Fix
High
Unassigned
Wily
Won't Fix
High
Unassigned
qemu-kvm (Ubuntu)
Fix Released
High
Unassigned
Precise
Invalid
High
Unassigned

Bug Description

[SRU Justification]
Impact: qemu stops processing traffic
Test case: see below (stress test using udp traffic generator)
Fix: cherrypick of upstream patch to fix this problem.

Description of problem:
- qemu socket becomes full, causing qemu to send incomplete
SET_VRING_CALL messages to vhost-user backend (without proper fd set in
ancillary data).
- after some time, some interrupts are lost, causing the VM to stop
transmitting packets.

How reproducible:
Run a stress tests of a vhost-user interface using an UDP
traffic generator. Traffic generator (IXIA) was connected to 2 physical ports that are in turn connected to 2 virtio ports through a linux bridge, VM
(running linux) doing routing to forward packets between the 2 virtio ports.
When traffic reaches high pps rates of small packets,

Actual results:
- VM stop transmitting packets

Expected results:
- VM should never stop transmitting packets

Additional info:
We do propose a fix at:
  http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg00652.html

Revision history for this message
Vincent JARDIN (vincent-jardin) wrote :
Revision history for this message
mst (mst-0) wrote : Re: [Bug 1556306] Re: vhost-user: qemu stops processing packets under high load of traffic

On Fri, Mar 11, 2016 at 10:51:33PM -0000, Vincent JARDIN wrote:
> for tracking,
> http://git.qemu.org/?p=qemu.git;a=patch;h=5669655aafdb88a8797c74a989dd0c0ebb1349fa
>
> --
> You received this bug notification because you are a member of qemu-
> devel-ml, which is subscribed to QEMU.
> https://bugs.launchpad.net/bugs/1556306
>
> Title:
> vhost-user: qemu stops processing packets under high load of traffic
>
> Status in QEMU:
> New

I presume you'll also close this bu at some point?
It's fixed in upstream QEMU.

> Bug description:
> Description of problem:
> - qemu socket becomes full, causing qemu to send incomplete
> SET_VRING_CALL messages to vhost-user backend (without proper fd set in
> ancillary data).
> - after some time, some interrupts are lost, causing the VM to stop
> transmitting packets.
>
> How reproducible:
> Run a stress tests of a vhost-user interface using an UDP
> traffic generator. Traffic generator (IXIA) was connected to 2 physical ports that are in turn connected to 2 virtio ports through a linux bridge, VM
> (running linux) doing routing to forward packets between the 2 virtio ports.
> When traffic reaches high pps rates of small packets,
>
> Actual results:
> - VM stop transmitting packets
>
> Expected results:
> - VM should never stop transmitting packets
>
> Additional info:
> We do propose a fix at:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg00652.html
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1556306/+subscriptions

Revision history for this message
Vincent JARDIN (vincent-jardin) wrote :

Correct, it is fixed in Qemu upstream. Just need to get it used into my ubuntu.

Revision history for this message
Vincent JARDIN (vincent-jardin) wrote :

Let's close it. Sorry, it should be opened into:
  https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/

Revision history for this message
cargonza (cargonza) wrote :

you can also add the project 'qemu-kvm' on the bug in order to get it into the ubuntu qemu-kvm bug list.

Revision history for this message
cargonza (cargonza) wrote :

apologize but I was corrected that for qemu issues. The bug should be in the following:

Distribution: ubuntu
package: qemu <--instead of project.

I will correct this in the bug.

affects: qemu → qemu (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status: New → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug. I'll push into the xenial package today.

Changed in qemu (Ubuntu):
importance: Undecided → High
Changed in qemu-kvm (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Vincent JARDIN (vincent-jardin) wrote :

Side question, will you apply it to qemu-kvm from
  https://launchpad.net/~ubuntu-cloud-archive/+archive/ubuntu/mitaka-staging/+files/qemu-kvm_2.5+dfsg-5ubuntu5~cloud0_amd64.deb
too?

or should I open another bug?

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.5+dfsg-5ubuntu6

---------------
qemu (1:2.5+dfsg-5ubuntu6) xenial; urgency=medium

  * Cherrypick upstream patch vhost-user-interrupt-management-fixes.patch
    (LP: #1556306)

 -- Serge Hallyn <email address hidden> Wed, 16 Mar 2016 16:35:22 -0700

Changed in qemu (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
cargonza (cargonza) wrote :

It should also be fixed in the qemu-kvm package. No additional bug needed as this bug covers both qemu and qemu-kvm packages.

Sergey - any chance you can also push the patch into the qemu-kvm package?

Revision history for this message
Vincent JARDIN (vincent-jardin) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
no longer affects: qemu (Ubuntu Precise)
no longer affects: qemu-kvm (Ubuntu Trusty)
no longer affects: qemu-kvm (Ubuntu Wily)
Changed in qemu (Ubuntu Trusty):
importance: Undecided → High
Changed in qemu (Ubuntu Wily):
importance: Undecided → High
Changed in qemu-kvm (Ubuntu Precise):
importance: Undecided → High
Changed in qemu-kvm (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Vincent JARDIN (vincent-jardin) wrote :

Great, thanks for your ack't of the update being available for ppc.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

This is marked as affecting precise, but has anyone reproduced this with qemu-kvm 1.0+noroms-0ubuntu14.27 ?

The patch is completely inapplicable to that code base, so it would need to be rewritten from scratch if so.

Changed in qemu-kvm (Ubuntu Precise):
status: New → Invalid
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(if someone says they have reproduced it on 1.0+noroms-0ubuntu14.27 I'll unmark it invalid.)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Actually even porting to trusty is complicated by a set of endianness patches.

description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Vincent, or anyone else affected,

Accepted qemu into wily-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.3+dfsg-5ubuntu9.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in qemu (Ubuntu Wily):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

The wily SRU has been waiting for validation for quite some time. I'm wondering whether that is because noone is using wily, or because it's not high priority?

The patch does not apply cleanly to trusty. In particular, the chunk in ./hw/net/vhost_net.c.rej is quite obsolete in the trusty source. So I'd like to hear from someone that they are hitting this before risking an erroneous backport.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
cleaning up old issues.
In all the time we had no confirmed report on trusty, also as serge outlined in c#19 the backport would be much harder and therefore carry more risk for the SRU.
Since wily was haniging in verification so long and now is EOD this is dead.

I'm cleaning up the bug states to match that accordingly.

Changed in qemu (Ubuntu Wily):
status: Fix Committed → Won't Fix
Changed in qemu (Ubuntu Trusty):
status: New → Invalid
status: Invalid → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.