Under Yakkety, network test script does not adequately shut down network devices

Bug #1636301 reported by Rod Smith
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox Provider - Base
Fix Released
Undecided
Rod Smith

Bug Description

Under Yakkety (the 4.8.0 kernel, or perhaps some other system component), the "network" test script does not adequately shut down network devices that are not under test, enabling network traffic to pass over an undesired interface. Symptoms include:

* On Lenovo x3550 and x3650 servers, the pseudo-network device that links to the BMC
  passes network tests, despite the fact that it doesn't directly connect to anything.
  (See https://certification.canonical.com/hardware/201605-22234/submission/113461/test/66780/result/8299870/)
* On a system with different-speed NICs connected to two different networks, the slower
  NIC may pass with a speed over 100% of its theoretical maximum. See, for instance,
  https://certification.canonical.com/hardware/201003-5452/submission/113548/test-results/;
  eno1 in that test is a 1Gbps NIC that produced an alleged speed of 9030Mbps (903%
  of its theoretical maximum).

Using ping and its -I option seems to correlate with these failures, too. For instance, on the Lenovo servers, "ping -I enp0s20u13u5 10.1.10.1", where enp0s20u13u5 is the BMC interface and 10.1.10.1 is an address on a real Ethernet network, fails under Xenial but succeeds under Yakkety.

It's unclear to me if this constitutes a new bug in the kernel or if the kernel developers have deliberately broadened the scope of the way the kernel directs traffic over its network interfaces in order to improve reliability. Completely shutting down the network interface with "ifdown" causes "ping -I..." to fail in an expected way (that is, Yakkety acts more like Xenial). Thus, if this is NOT a new kernel bug, re-writing the "network" script may fix this problem.

Revision history for this message
Rod Smith (rodsmith) wrote :

Further testing has revealed that this issue affects "normal" network devices *IF* each device connects to its own LAN:

* On betelnut in Lexington, using both 1Gbps and 10Gbps devices can
  produce ~900% of theoretical throughput on the 1Gbps NIC if an
  attempt is made to test against an iperf server accessible only
  from the 10Gbps device.
* On a system on my home network with a built-in 1Gbps link and a
  100Mbps USB-to-Ethernet dongle, each connected to a separate LAN,
  the 100Mbps device can produce ~900% of theoretical maximum
  throughput if tested against an iperf server on the 1Gbps LAN.

In both cases, traffic seems to be passing through the necessary device as a sort of fallback when normal connections fail. If both devices are connected to the same LAN, both produce the correct results.

I've also tried installing the Yakkety (4.8.0) kernel to a Xenial system. This test did NOT reproduce the problem, so I don't think it's the kernel that's the root cause; there must be something else in Yakkety that's changed.

Rod Smith (rodsmith)
Changed in plainbox-provider-checkbox:
status: New → Fix Released
assignee: nobody → Rod Smith (rodsmith)
Revision history for this message
Rod Smith (rodsmith) wrote :

The "related branches" fix is more of a workaround than a fix; it checks that the iperf Target is on a network that's accessible to the NIC that's being tested. If it is, the test proceeds; if not, the next Target is tried. (If no Target is on the NIC's network, the test fails.)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.