Openvswitch 2.5 + dpdk 2.2 totally failing for virtio PMD

Bug #1559912 reported by Christian Ehrhardt 
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
dpdk (Ubuntu)
Triaged
Critical
Christian Ehrhardt 
openvswitch-dpdk (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

*as a start same text as my reach for the involved project mailing lists*

I was trying to replicate a setup that I have working on physical devices
(ixgbe) under kvm since there is a virtio pmd driver.

TL;DR:
- under KVM with virtio-pci (working on baremetal with ixgbe cards)
- adding dpdk port to ovs fails with memzone <port0_tvq0> already exists
and causes a segfault
- I couldn't find a solution in similar mails that popped up here recently,
any help or pointer appreciated.

## Details ##
I thought I've read that others have it working I thought that would be a
great way to gain more debug control of the environment, but something
seems to be eluding me.

There were quite some similar mails on the List recently, but none seemed
to hit the same issue as I do. At least none of the tunings/workarounds
seemed to apply to me.
As versions I have Openvswitch 2.5, DPDK 2.2, Qemu 2.5, Kernel 4.4 - so a
fairly recent software stack.

The super-short repro summary is:
1. starting ovs-dpdk like
ovs-vswitchd --dpdk -c 0x1 -n 4 --pci-blacklist 0000:00:03.0 -m 2048 --
unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info
--mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log
--pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
2. add a bridge and a ovs dpdk port
ovs-vsctl add-port ovsdpdkbr0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl add-port ovsdpdkbr0 dpdk0 -- set Interface dpdk0 type=dpdk

The log of the initialization after #1 looks good to me - I can see two of
my three virtio devices recognized and one blacklisted.
Memory allocation looks good, ... I'll attach the log at the end of the mail

## ISSUE ##
But when I add a port and refer to one of the dpdk ports it fails with the
following:
ovs-vsctl[14023]: ovs|00001|vsctl|INFO|Called as ovs-vsctl add-port
ovsdpdkbr0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vswitchd[13903]: EAL: memzone_reserve_aligned_thread_unsafe(): memzone
<port0_tvq0> already exists
ovs-vswitchd[13903]: EAL: memzone_reserve_aligned_thread_unsafe(): memzone
<port0_tvq0_hdrzone> already exists
ovs-vswitchd[13903]: EAL: memzone_reserve_aligned_thread_unsafe(): memzone
<port0_rvq0> already exists
kernel: show_signal_msg: 18 callbacks suppressed
kernel: pmd12[14025]: segfault at 2 ip 00007f3eb205eab2 sp 00007f3e3dffa590
error 4 in libdpdk.so.0[7f3eb1fdf000+1e9000]
ovs-vswitchd[13902]: ovs|00003|daemon_unix(monitor)|ERR|1 crashes: pid
13903 died, killed (Segmentation fault), core dumped, restarting
systemd-udevd[14040]: Could not generate persistent MAC address for
ovs-netdev: No such file or directory
kernel: device ovs-netdev entered promiscuous mode
ovs-vswitchd[14036]: EAL: memzone_reserve_aligned_thread_unsafe(): memzone
<RG_MP_ovs_mp_1500_0_262144> already exists
ovs-vswitchd[14036]: RING: Cannot reserve memory
kernel: device ovsdpdkbr0 entered promiscuous mode
ovs-vswitchd[14036]: EAL: memzone_reserve_aligned_thread_unsafe(): memzone
<RG_MP_ovs_mp_1500_0_262144> already exists
ovs-vswitchd[14036]: RING: Cannot reserve memory

## Experiments (failed) ##
I thought it could be related to all the multiqueue chances that recently
going in.
My usual setup has 4 vCPUs and 4 queues per virtio-net device.
I tried them with only 1 of 4 queues, also with only 1 queue defined and
only 1 CPU - but all fail the same way.
I have testpmd and l2fwd on the same devices working, so I hope they are
not totally set up badly.

I also tried hilarious things like reassigning to uio_pci_generic before,
but well its virtio_pmd eventually anyways - so it made no difference.

>From how it appears I felt that it could be related to the old discussions
around
[1] http://dpdk.org/ml/archives/dev/2015-May/017589.html
[2] http://openvswitch.org/pipermail/dev/2015-March/052344.html
But they are (partially) applied upstream already and the issue doesn't
100% match the old discussions.

## Logs ##
[3] log of openvswitch start (attached for more readability)

This needs to be debugged and fixed or openvswitch-dpdk it will be unusable within virtio environments.

Upstream discussion started:
http://dpdk.org/ml/archives/dev/2016-March/036021.html
http://openvswitch.org/pipermail/discuss/2016-March/020488.html

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It was also reported in 1559408 that this also affects BNX2X_PMD

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (7.3 KiB)

systemd[1]: Starting Open vSwitch Internal Unit...
ovs-ctl[13868]: * Starting ovsdb-server
ovs-vsctl[13893]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait --
init -- set Open_vSwitch . db-version=7.12.1
ovs-vsctl[13898]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set
Open_vSwitch . ovs-version=2.5.0
"external-ids:system-id=\"8ddb892c-53a5-410d-a765-0031ad6eb1be\""
"system-type=\"Ubuntu\"" "system-version=\"16.04-xenial\""
ovs-ctl[13868]: * Configuring Open vSwitch system IDs
ovs-ctl[13868]: 2016-03-18T14:28:28Z|00001|dpdk|INFO|No -vhost_sock_dir
provided - defaulting to /var/run/openvswitch
ovs-vswitchd[13900]: ovs|00001|dpdk|INFO|No -vhost_sock_dir provided -
defaulting to /var/run/openvswitch
ovs-ctl[13868]: EAL: Detected lcore 0 as core 0 on socket 0
ovs-ctl[13868]: EAL: Detected lcore 1 as core 0 on socket 0
ovs-ctl[13868]: EAL: Detected lcore 2 as core 0 on socket 0
ovs-ctl[13868]: EAL: Detected lcore 3 as core 0 on socket 0
ovs-ctl[13868]: EAL: Support maximum 128 logical core(s) by configuration.
ovs-ctl[13868]: EAL: Detected 4 lcore(s)
ovs-ctl[13868]: EAL: No free hugepages reported in hugepages-1048576kB
ovs-ctl[13868]: EAL: VFIO modules not all loaded, skip VFIO support...
ovs-ctl[13868]: EAL: Setting up physically contiguous memory...
ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3eaf800000 (size = 0x200000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x5ac00000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e54a00000 (size =
0x5ac00000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0xc00000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e53c00000 (size = 0xc00000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x1200000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e52800000 (size = 0x1200000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x400000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e52200000 (size = 0x400000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e51e00000 (size = 0x200000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x400000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e51800000 (size = 0x400000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e51400000 (size = 0x200000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0xc00000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e50600000 (size = 0xc00000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x1000000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4f400000 (size = 0x1000000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4f000000 (size = 0x200000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4ec00000 (size = 0x200000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4e800000 (size = 0x200000)
ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4e400000 (size = 0x200000)
ovs-ctl[13868]: EAL: Ask a virtual ...

Read more...

Changed in dpdk (Ubuntu):
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → ChristianEhrhardt (paelzer)
Changed in openvswitch-dpdk (Ubuntu):
status: New → Triaged
status: Triaged → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Info on progress.
No feedback from communities so far.
After various other DPDK fixes took 3/4 of my day I set up a more thorough debugging environment derived from my testing env, but the real debugging itself has to start tomorrow morning.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in openvswitch-dpdk (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

After quite some analysis and debugging I could identify an issue regarding double configuring the device.
Daniele replied to the mailing list request I had and suggested backporting an virtio fix that came after 2.2.

That worked well, with backporting these I can get it to work now.
Thanks Daniele.

commit 3b1e3e4e362453df8cecbc6d481444be8b84326e
    virtio: fix descriptors pointing to the same buffer
commit c680a4a88c4312068f60937a7ba51eac8211c9a6
    virtio: fix crash in statistics functions
commit 9a0615af7746485d73d10561cc0743bc2fcd4bf7
    virtio: fix restart

Of those
commit 9a0615af7746485d73d10561cc0743bc2fcd4bf7
    virtio: fix restart
is directly tied to this bug.

While the other two will become part of bug 1559981 (the general cherry picking stability fixes bug)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.