Checkbox

[network_bandwidth_test] network scan can be overzealous causing script to freeze

Bug #1009658 reported by Jeff Lane  on 2012-06-06

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Checkbox	Fix Released	High	Brendan Donegan

Bug Description

While running the cert tests on a remote network, I discovered a problem with the network_bandwidth_test script that can cause it to freeze (or die).

The script, while looking for targets, does an nmap ping sweep of the network the target ethernet device is attached to:

class PingScan(Command):

name = "nmap -n -sP"

argument_count = 1

    def parse_lines(self, lines):
        hosts = []
        host_lines = []
        # Skip header lines
        for line in lines[1:]:
            host_lines.append(line)
            if line.startswith("MAC Address"):
                host = PingHost().parse_lines(host_lines)
                hosts.append(host)
                host_lines = []

return hosts

Ordinarily, this works just fine on a network with a standard class C subnet, which yields, at most, 256 possible IPs.

However, it does not scale well at all. When run on a network with a standard class B subnet, it attempts a ping sweep of 256*256 hosts, for a total of 65536 hosts.

Good example is this device found on a server I was recently testing in a remote lab:

eth0 Link encap:Ethernet HWaddr 80:c1:6e:63:80:c0
inet addr:155.208.98.66 Bcast:155.208.255.255 Mask:255.255.0.0
--
eth7 Link encap:Ethernet HWaddr e8:39:35:00:55:4f
inet addr:155.208.98.46 Bcast:155.208.255.255 Mask:255.255.0.0

That's two devices being tested, and each one would trigger an nmap scan of 65536 IPs each :( Needless to say, this caused nmap to freeze and I had to skip the test.

Related branches

lp:~brendan-donegan/checkbox-certification/bug1009658

Merged into lp:checkbox-certification at revision 541

Zygmunt Krynicki (community): Approve on 2012-09-27

lp:~brendan-donegan/checkbox/bug1009658

Merged into lp:checkbox at revision 1713

Zygmunt Krynicki (community): Approve on 2012-09-27

lp:~sylvain-pineau/checkbox/0.15

Merged into lp:checkbox at revision 1876

Daniel Manrique (community): Approve on 2013-01-09

lp:ubuntu/raring-proposed/checkbox

Revision history for this message

Daniel Manrique (roadmr) wrote on 2012-09-07:

OK, so the reason to do the nmap scan is to determine pingable ip addresses. Since this bug report is a testimony that this technique does not scale, we could potentially switch to other techniques; they won't be as thorough as brute-forcing every single IP in the range, but they can potentially yield reasonable results.

Sciri showed us a Fluke network diagnostics tool and what it does essentially is ping either the gateway or the DHCP server. Another possible technique is to send a broadcast ping to populate the ARP table and then take an IP from that. The internet_test script already contains code to do this, so it could potentially be reused.

A "best of both worlds" approach for network_bandwidth_test would be trying to use the above mentioned techniques, and fall back to nmap if no other usable IPs are found.

Also, tweaking nmap so it's faster (I see from the manpage that even with -sP it sends two or three probes per IP), or even using the "timeout" command to limit the time it takes to find hosts could make it more usable.

I'll set this as triaged so we can start working on it.

Changed in checkbox:
status:	New → Triaged
importance:	Undecided → Medium
importance:	Medium → High

Revision history for this message

Daniel Manrique (roadmr) wrote on 2012-09-20:

After a lot of discussion this test has been deemed not useful, so the actual solution is to remove it. Changing statement of work to: Remove this test from the jobs, remove the script, remove from all whitelists.

Brendan Donegan (brendan-donegan) on 2012-09-27

Changed in checkbox:
status:	Triaged → In Progress
assignee:	nobody → Brendan Donegan (brendan-donegan)

Brendan Donegan (brendan-donegan) on 2012-09-27

Changed in checkbox:
status:	In Progress → Fix Committed

Revision history for this message

Marc Tardif (cr3) wrote on 2012-10-09:

Just in case this bug is revisited, I would advise against the solution proposed by Sciri to ping either the gateway or the DHCP server. The problem is that if all systems under test ping the same host at the same time, this may affect the results. The solution proposed by Daniel to populate the arp table seems much more reasonable.

Brendan Donegan (brendan-donegan) on 2012-11-01

Changed in checkbox:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.