Merge lp:~rodsmith/checkbox/network-fixes into lp:checkbox

Proposed by Rod Smith
Status: Merged
Approved by: Daniel Manrique
Approved revision: 3155
Merged at revision: 3155
Proposed branch: lp:~rodsmith/checkbox/network-fixes
Merge into: lp:checkbox
Diff against target: 201 lines (+66/-25)
1 file modified
providers/plainbox-provider-checkbox/bin/network (+66/-25)
To merge this branch: bzr merge lp:~rodsmith/checkbox/network-fixes
Reviewer Review Type Date Requested Status
Daniel Manrique (community) Approve
Zygmunt Krynicki (community) Needs Fixing
Review via email: mp+228923@code.launchpad.net

Commit message

Fixes problems with failing network tests and trashed routing tables in network tests.

Description of the change

Fixes for bug 1329029:

First, the script now tries to ping the iperf or FTP server for up to four minutes before proceeding. It does not proceed with tests until the server begins responding to pings. If the server doesn't respond to pings after ~4 minutes, the tests fail. This resolves problems with some smart switches, which can be sluggish to restore connections when they're restored via "ip link set dev eth0 up" (or similar for other interfaces). The original script used a 10-second pause, which is inadequate in these cases. Note that if the config file lacks an IP address for the test server, this will add a ~4-minute delay to each NIC's test time. (The tests would fail either way, of course; only the extra delay is new.)

Second, the script now saves the routing table (via "ip route save table all") to a temporary file (/tmp/plainbox-network.route) before running each test, then restores the routing table (via "ip route restore") from the same file after each test. This prevents the default route from being lost when its interface is tested; such loss has caused problems with running the virtualization test and submitting results once tests are run.

To post a comment you must log in.
Revision history for this message
Daniel Manrique (roadmr) wrote :

This all looks good, I suggested a few improvements but they are just style fixes, which I think would be nice to have but not mandatory. Let me know, if you're pressed for time we can merge as-is.

lp:~rodsmith/checkbox/network-fixes updated
3153. By Rod Smith

Implemented (most of) the suggested changes

Revision history for this message
Rod Smith (rodsmith) wrote :

I've implemented most of the changes suggested by Daniel. The one exception was a change to make a loop a bit more "Pythonic." My level of Python-fu is still fairly low, and the suggested changes were a little over my comfort level. Since it was described as being of little real benefit and the effort to figure it out would take me a while, I'm passing on that change for now.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Some more reviews from me inline below

review: Needs Fixing
Revision history for this message
Daniel Manrique (roadmr) wrote :

Thanks! Time permitting, do have a look at Python's iterator handling, you seldom need to manage incrementing a counter yourself. If you ever have some spare time and want to refactor this to use what I suggested, it would be a fun exercise.

http://www.diveintopython.net/file_handling/for_loops.html

Also, bear in mind that opening any file can throw an exception. The assumption that /dev/null can be successfully opened is reasonable, as is the one about TemporaryFile() being mostly likely to succeed; if they do fail, a very clear trace will be thrown and that will help debug the problem. But usually it's good practice to try to catch those file opening exceptions. I won't block this merge on that though, so we're golden in that respect.

With that in mind, there's only one tiny change I'd ask you to make (it slipped by both of us!). See below. Once that's in, we can proceed to approve and merge this.

review: Needs Fixing
lp:~rodsmith/checkbox/network-fixes updated
3154. By Rod Smith

Fixes in response to Zygmunt's and Daniel's comments

Revision history for this message
Rod Smith (rodsmith) wrote :

I've implemented changes in response to Zygmunt's comments, as well as Daniel's final one.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Almost okay.

One last comment: run 'python3 -m flake8' on that file and try to fix anything reported (haven't tried but there's always some things mentioned). Ping me on IRC if you have any questions about that please.

review: Needs Fixing
Revision history for this message
Daniel Manrique (roadmr) wrote :

The part about not using TemporaryFile as context manager is my fault, I told Rod it was OK not to do that as he would use the file descriptor later (one write, one read). Maybe I was mistaken :(

lp:~rodsmith/checkbox/network-fixes updated
3155. By Rod Smith

Fixes for pre-existing flake8-identified issues

Revision history for this message
Rod Smith (rodsmith) wrote :

These changes were all to fix pre-existing problems identified by flake8. The temp file access is being left without context protection because protecting it would cause the whole test to fail if the temp file couldn't be created; but the point of storing the routing table in the temp file is to fix a minor annoyance (a missing default route after the test completes), and a failed test would be a bigger annoyance than that. Also, if the temp file can't be created, that implies that some other test -- probably a disk test -- would fail spectacularly, so the underlying problem can be left to that test for debugging.

Revision history for this message
Daniel Manrique (roadmr) wrote :

Thanks! Let's approve this.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'providers/plainbox-provider-checkbox/bin/network'
--- providers/plainbox-provider-checkbox/bin/network 2014-07-21 14:45:11 +0000
+++ providers/plainbox-provider-checkbox/bin/network 2014-07-31 17:16:53 +0000
@@ -23,7 +23,6 @@
23 ArgumentParser,23 ArgumentParser,
24 RawTextHelpFormatter24 RawTextHelpFormatter
25)25)
26import configparser
27import fcntl26import fcntl
28import ftplib27import ftplib
29from ftplib import FTP28from ftplib import FTP
@@ -34,6 +33,7 @@
34import socket33import socket
35import struct34import struct
36import subprocess35import subprocess
36import tempfile
37from subprocess import (37from subprocess import (
38 CalledProcessError,38 CalledProcessError,
39 check_call,39 check_call,
@@ -71,12 +71,12 @@
71 if self.iface.max_speed == 0:71 if self.iface.max_speed == 0:
72 logging.warning("No max speed detected, assuming Wireless device "72 logging.warning("No max speed detected, assuming Wireless device "
73 "and continuing with test.")73 "and continuing with test.")
74 # Otherwise, no sense in running if we're not running at full speed. 74 # Otherwise, no sense in running if we're not running at full speed.
75 elif self.iface.max_speed > self.iface.link_speed:75 elif self.iface.max_speed > self.iface.link_speed:
76 logging.error("Detected link speed (%s) is lower than "76 logging.error("Detected link speed (%s) is lower than "
77 "detected max speed (%s)" %77 "detected max speed (%s)" %
78 (self.iface.link_speed, self.iface.max_speed))78 (self.iface.link_speed, self.iface.max_speed))
79 logging.error("Please check your device configuration and try again")79 logging.error("Check your device configuration and try again")
80 return 180 return 1
8181
82 cmd = "timeout 180 iperf -c {} -n {}".format(self.target, self.mbytes)82 cmd = "timeout 180 iperf -c {} -n {}".format(self.target, self.mbytes)
@@ -116,7 +116,7 @@
116 scaled_throughput /= 1000116 scaled_throughput /= 1000
117 try:117 try:
118 percent = scaled_throughput / int(self.iface.max_speed) * 100118 percent = scaled_throughput / int(self.iface.max_speed) * 100
119 except (ZeroDivisionError, TypeError) as error:119 except (ZeroDivisionError, TypeError):
120 # Catches a condition where the interface functions fine but120 # Catches a condition where the interface functions fine but
121 # ethtool fails to properly report max speed. In this case121 # ethtool fails to properly report max speed. In this case
122 # it's up to the reviewer to pass or fail.122 # it's up to the reviewer to pass or fail.
@@ -134,7 +134,8 @@
134 return 0134 return 0
135 # Below is guaranteed to not throw an exception because we'll135 # Below is guaranteed to not throw an exception because we'll
136 # have exited above if it did.136 # have exited above if it did.
137 print("{:03.2f}% of theoretical max {} Mb/s\n".format(percent, int(self.iface.max_speed)))137 print("{:03.2f}% of theoretical max {} Mb/s\n".format(percent,
138 int(self.iface.max_speed)))
138 if percent < self.fail_threshold:139 if percent < self.fail_threshold:
139 logging.warn("Poor network performance detected")140 logging.warn("Poor network performance detected")
140 return 30141 return 30
@@ -224,7 +225,8 @@
224 self.remote.set_debuglevel(2)225 self.remote.set_debuglevel(2)
225 self.remote.set_pasv(True)226 self.remote.set_pasv(True)
226 except socket.error as connect_exception:227 except socket.error as connect_exception:
227 logging.error("Failed to connect to: %s", self.target)228 logging.error("Failed to connect to: %s: %s", self.target,
229 connect_exception)
228 return False230 return False
229231
230 logging.debug("Logging in")232 logging.debug("Logging in")
@@ -233,7 +235,8 @@
233 try:235 try:
234 self.remote.login(self.username, self.password)236 self.remote.login(self.username, self.password)
235 except ftplib.error_perm as login_exception:237 except ftplib.error_perm as login_exception:
236 logging.error("failed to log into target: %s", self.target)238 logging.error("failed to log into target: %s: %s", self.target,
239 login_exception)
237 return False240 return False
238241
239 default_out_dir = ""242 default_out_dir = ""
@@ -380,12 +383,12 @@
380 # Parse ethtool data for max speed since /sys/class/net/DEV/speed only383 # Parse ethtool data for max speed since /sys/class/net/DEV/speed only
381 # reports link speed.384 # reports link speed.
382385
383 # Search for things that look like 100baseSX, 386 # Search for things that look like 100baseSX,
384 #40000baseNX, 10000baseT387 # 40000baseNX, 10000baseT
385 try:388 try:
386 ethinfo = check_output(['ethtool', self.interface],389 ethinfo = check_output(['ethtool', self.interface],
387 universal_newlines = True,390 universal_newlines=True,
388 stderr=STDOUT).split(' ')391 stderr=STDOUT).split(' ')
389 except FileNotFoundError:392 except FileNotFoundError:
390 logging.warning('ethtool not found! Unable to get max speed')393 logging.warning('ethtool not found! Unable to get max speed')
391 ethinfo = None394 ethinfo = None
@@ -450,8 +453,28 @@
450 return params453 return params
451454
452455
456def can_ping(the_interface, test_target):
457 working_interface = False
458 num_loops = 0
459 while (not working_interface) and (num_loops < 48):
460 working_interface = True
461
462 try:
463 check_call(["ping", "-I", the_interface, "-c", "1", test_target],
464 stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
465 except CalledProcessError as excp:
466 working_interface = False
467 logging.warning("Ping failure on %s (%s)", the_interface, excp)
468
469 if not working_interface:
470 time.sleep(5)
471 num_loops += 1
472
473 return working_interface
474
475
453def interface_test(args):476def interface_test(args):
454 if not "test_type" in vars(args):477 if not ("test_type" in vars(args)):
455 return478 return
456479
457 # Get the actual test data from one of two possible sources480 # Get the actual test data from one of two possible sources
@@ -489,16 +512,20 @@
489512
490 # Testing begins here!513 # Testing begins here!
491 #514 #
492 # Check and make sure that interface is indeed connected515 # Make sure that the interface is indeed connected
493 try:516 try:
494 cmd = "ip link set dev %s up" % args.interface517 check_call(["ip", "link", "set", "dev", args.interface, "up"])
495 check_call(shlex.split(cmd))
496 except CalledProcessError as interface_failure:518 except CalledProcessError as interface_failure:
497 logging.error("Failed to use %s:%s", cmd, interface_failure)519 logging.error("Failed to use %s:%s", args.interface, interface_failure)
498 return 1520 return 1
499521
500 # Give interface enough time to get DHCP address522 # Back up routing table, since network down/up process
501 time.sleep(10)523 # tends to trash it....
524 temp = tempfile.TemporaryFile()
525 try:
526 check_call(["ip", "route", "save", "table", "all"], stdout=temp)
527 except CalledProcessError as route_error:
528 logging.warning("Unable to save routing table: %s", route_error)
502529
503 result = 0530 result = 0
504 # Stop all other interfaces531 # Stop all other interfaces
@@ -509,13 +536,18 @@
509 for iface in extra_interfaces:536 for iface in extra_interfaces:
510 logging.debug("Shutting down interface:%s", iface)537 logging.debug("Shutting down interface:%s", iface)
511 try:538 try:
512 cmd = "ip link set dev %s down" % iface539 check_call(["ip", "link", "set", "dev", iface, "down"])
513 check_call(shlex.split(cmd))
514 except CalledProcessError as interface_failure:540 except CalledProcessError as interface_failure:
515 logging.error("Failed to use %s:%s", cmd, interface_failure)541 logging.error("Failed to shut down %s:%s", iface, interface_failure)
516 result = 3542 result = 3
517543
518 if result == 0:544 if result == 0:
545 # Ensure that interface is fully up by waiting until it can
546 # ping the test server
547 if not can_ping(args.interface, test_target):
548 logging.error("Can't ping test server on %s", args.interface)
549 return 1
550
519 # Execute FTP transfer benchmarking test551 # Execute FTP transfer benchmarking test
520 if args.test_type.lower() == "ftp":552 if args.test_type.lower() == "ftp":
521 ftp_benchmark = FTPPerformanceTest(553 ftp_benchmark = FTPPerformanceTest(
@@ -526,7 +558,7 @@
526 result = ftp_benchmark.run()558 result = ftp_benchmark.run()
527559
528 elif args.test_type.lower() == "iperf":560 elif args.test_type.lower() == "iperf":
529 iperf_benchmark = IPerfPerformanceTest(args.interface, test_target, 561 iperf_benchmark = IPerfPerformanceTest(args.interface, test_target,
530 args.fail_threshold)562 args.fail_threshold)
531 result = iperf_benchmark.run()563 result = iperf_benchmark.run()
532564
@@ -538,12 +570,21 @@
538 for iface in extra_interfaces:570 for iface in extra_interfaces:
539 logging.debug("Restoring interface:%s", iface)571 logging.debug("Restoring interface:%s", iface)
540 try:572 try:
541 cmd = "ip link set dev %s up" % iface573 check_call(["ip", "link", "set", "dev", iface, "up"])
542 check_call(shlex.split(cmd))
543 except CalledProcessError as interface_failure:574 except CalledProcessError as interface_failure:
544 logging.error("Failed to use %s:%s", cmd, interface_failure)575 logging.error("Failed to restore %s:%s", iface, interface_failure)
545 result = 3576 result = 3
546577
578 # Restore routing table to original state
579 temp.seek(0)
580 try:
581 # Harmless "RTNETLINK answers: File exists" messages on stderr
582 check_call(["ip", "route", "restore"], stdin=temp,
583 stderr=subprocess.DEVNULL)
584 except CalledProcessError as restore_failure:
585 logging.warning("Unable to restore routing table: %s", restore_failure)
586 temp.close()
587
547 return result588 return result
548589
549590

Subscribers

People subscribed via source and target branches