Merge lp:~pwlars/lava-dispatcher/move-network-checks into lp:lava-dispatcher

Proposed by Paul Larson
Status: Merged
Merged at revision: 13
Proposed branch: lp:~pwlars/lava-dispatcher/move-network-checks
Merge into: lp:lava-dispatcher
Diff against target: 79 lines (+25/-15)
2 files modified
lava/actions/deploy.py (+1/-15)
lava/client.py (+24/-0)
To merge this branch: bzr merge lp:~pwlars/lava-dispatcher/move-network-checks
Reviewer Review Type Date Requested Status
Paul Larson (community) Needs Resubmitting
Review via email: mp+51364@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Loïc Minier (lool) wrote :

- self.wait_for_network()
+ if self.client.wait_network_up() is False:
+ print "Failed to bring up network on master image"
+ return False

You're changing cmd_deploy_linaro_image.run() to return False, but not other cmd_*.run() returns anything; I think all cmd_*.run() should implement the same interface

+ master_str = 'root@master:'

right, I also included this fix in http://bazaar.launchpad.net/~lool/lava/misc-cleanups/revision/10

Otherwise, the move looks good

Some comments about the expect ping test:
* I'm not entirely comfortable with using expect; I'm not sure how we can make sure the ping process has been created; of course ping is relatively harmless, but it seems this could result in 20 ping commands being queued; do we need to pass some timeout?
* default ping timeout seems to be relatively random and might be high; I tried ping -c1 1.0.0.1 and it takes 20 seconds to timeout here
* maybe testing whether there's a default route up is enough/better?
* parsing the ping textual output makes me think that LC_ALL=C should be set in the environment in the case where the output gets translated (apparently it isn't translated on my system which French locales, but still sounds like a good idea)
* it seems pexpect has a 30 seconds timeout and the other outputs aren't tested so that even if ping times out quickly, we have to wait for a 30s expect timeout before retrying a ping
* "64 bytes" sounds like a ping implementation detail; perhaps "bytes from" would be less risky, or "1 received"?

Anyway, the implementation is probably just fine at this stage; just dumping my thoughts :-)

Revision history for this message
Spring Zhang (qzhang) wrote :

> + id = self.proc.expect(["64 bytes from", "Network is unreachable"])
>
Do we need to list all of the possibilities? Like timeout error: "100%
packet loss"
--
Best wishes,
Spring Zhang

Revision history for this message
Paul Larson (pwlars) wrote :

> You're changing cmd_deploy_linaro_image.run() to return False, but not other
> cmd_*.run() returns anything; I think all cmd_*.run() should implement the
> same interface
Good point, although some of these run() commands are not yet implemented, so I'm not too concerned with those just yet. Another thing I had considered was raising an exception instead. Do you have a preference?

> Some comments about the expect ping test:
> * I'm not entirely comfortable with using expect; I'm not sure how we can make
> sure the ping process has been created; of course ping is relatively harmless,
> but it seems this could result in 20 ping commands being queued; do we need to
> pass some timeout?
We do, in wait_network_up, which calls check_network_up

> * default ping timeout seems to be relatively random and might be high; I
> tried ping -c1 1.0.0.1 and it takes 20 seconds to timeout here
Adjusted ping timeout to 4 seconds, timeout the expect command for 5

> * maybe testing whether there's a default route up is enough/better?
We can do that, but it's an extra check and we really need to know if we can get to the control server.

> * parsing the ping textual output makes me think that LC_ALL=C should be set
Added

> * it seems pexpect has a 30 seconds timeout and the other outputs aren't
> tested so that even if ping times out quickly, we have to wait for a 30s
> expect timeout before retrying a ping
I think this was addressed previously.

> * "64 bytes" sounds like a ping implementation detail; perhaps "bytes from"
> would be less risky, or "1 received"?
Changed to have it look for 1 received, or 0 received. I think that should cover everything unless there's some strange failure case that doesn't mention 0 received.

12. By Paul Larson

Fix up some issues with ping

13. By Paul Larson

Adjust ping timeout

Revision history for this message
Loïc Minier (lool) wrote :

On Tue, Mar 01, 2011, Paul Larson wrote:
> Good point, although some of these run() commands are not yet
> implemented, so I'm not too concerned with those just yet. Another
> thing I had considered was raising an exception instead. Do you have
> a preference?

 Raising an exception seems better here, yeah

> > Some comments about the expect ping test:
> > * I'm not entirely comfortable with using expect; I'm not sure how we can make
> > sure the ping process has been created; of course ping is relatively harmless,
> > but it seems this could result in 20 ping commands being queued; do we need to
> > pass some timeout?
> We do, in wait_network_up, which calls check_network_up

 What I mean is that if you expect timeout is 2 seconds and your ping
 times out only after 10 seconds, then that loop in wait_network_up()
 might launch 5 pings (since expect will timeout before the ping does)
 before the first one finally returns, and then your console will be
 running these other pings when you are trying to run another command
 etc.

 So an expect timeout should be a fatal error.

 Ideally, we'd switch to something more solid than parsing the output of
 the console to decide whether something is finished or not; that might
 be for later though

> > * default ping timeout seems to be relatively random and might be high; I
> > tried ping -c1 1.0.0.1 and it takes 20 seconds to timeout here
> Adjusted ping timeout to 4 seconds, timeout the expect command for 5

 Looks good; thanks

> > * "64 bytes" sounds like a ping implementation detail; perhaps "bytes from"
> > would be less risky, or "1 received"?
> Changed to have it look for 1 received, or 0 received. I think that
> should cover everything unless there's some strange failure case that
> doesn't mention 0 received.

 I was actually thinking "Heck, why don't we just test the ping exit
 code" which would indeed do the right thing from my quick local
 testing -- but we're parsing the console output, so ain't so easy. I
 guess we could:
  ping -q -W4 -c1 1.0.0.1 >/dev/null 2>&1 && echo worked || echo failed

 Anyway, it's probably ok to match "0 received" or "1 received" for now

 I see you added a _find_default_nic(); you probably want ip route show
 if you're looking for the default route (ip route show exact 0/0) or ip
 route get to 192.168.0.254 if you want the route to a specific host
 (such as the gateway)

--
Loïc Minier

14. By Paul Larson

Raise an exception when the network fails to come up rather than
returning an error, and remove an unintentional checkin

Revision history for this message
Paul Larson (pwlars) wrote :

> On Tue, Mar 01, 2011, Paul Larson wrote:
> > Good point, although some of these run() commands are not yet
> > implemented, so I'm not too concerned with those just yet. Another
> > thing I had considered was raising an exception instead. Do you have
> > a preference?
>
> Raising an exception seems better here, yeah
Done

> > > Some comments about the expect ping test:
> > > * I'm not entirely comfortable with using expect; I'm not sure how we can
> make
> > > sure the ping process has been created; of course ping is relatively
> harmless,
> > > but it seems this could result in 20 ping commands being queued; do we
> need to
> > > pass some timeout?
> > We do, in wait_network_up, which calls check_network_up
>
> What I mean is that if you expect timeout is 2 seconds and your ping
> times out only after 10 seconds, then that loop in wait_network_up()
> might launch 5 pings (since expect will timeout before the ping does)
> before the first one finally returns, and then your console will be
> running these other pings when you are trying to run another command
> etc.
But I think I addressed this already by setting the ping timeout to 4 seconds and the expect timeout to 5. So if it passes check_network_up(), then it will come back right away most likely, and go through without another ping command being sent. If it fails the ping, it won't return from check_network_up() for 5 seconds, after which another one will be sent. I don't see where command stacking could occur here.

> Ideally, we'd switch to something more solid than parsing the output of
> the console to decide whether something is finished or not; that might
> be for later though
Yeah, I think we'll have a cleaner way to do this later. Using expect is a necessary evil of relying on the serial port for access to the device.

> I see you added a _find_default_nic(); you probably want ip route show
oops, that was a note to myself for something I wanted to do after I have this merged, and was clearly not intended to be something we want to include right now.

review: Needs Resubmitting
Revision history for this message
Loïc Minier (lool) wrote :

On Wed, Mar 02, 2011, Paul Larson wrote:
> > > > * I'm not entirely comfortable with using expect; I'm not sure how we can
> > make
> > > > sure the ping process has been created; of course ping is relatively
> > harmless,
> > > > but it seems this could result in 20 ping commands being queued; do we
> > need to
> > > > pass some timeout?
> > > We do, in wait_network_up, which calls check_network_up
> >
> > What I mean is that if you expect timeout is 2 seconds and your ping
> > times out only after 10 seconds, then that loop in wait_network_up()
> > might launch 5 pings (since expect will timeout before the ping does)
> > before the first one finally returns, and then your console will be
> > running these other pings when you are trying to run another command
> > etc.
> But I think I addressed this already by setting the ping timeout to 4
> seconds and the expect timeout to 5. So if it passes
> check_network_up(), then it will come back right away most likely, and
> go through without another ping command being sent. If it fails the
> ping, it won't return from check_network_up() for 5 seconds, after
> which another one will be sent. I don't see where command stacking
> could occur here.

 Sure; since your comment was on my original review, I thought you meant
 it wasn't an issue in the first place; with the 4 / 5 seconds timeouts,
 it's much better. Still, there's room for ping not to timeout despite
 being told to, which is why I was suggesting that any expect timeout
 should be a fatal error. Anyway, the sooner we move away from expect,
 the better :-)

--
Loïc Minier

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lava/actions/deploy.py'
2--- lava/actions/deploy.py 2011-02-26 14:41:09 +0000
3+++ lava/actions/deploy.py 2011-03-02 20:03:22 +0000
4@@ -1,6 +1,5 @@
5 #!/usr/bin/python
6 from lava.actions import BaseAction
7-import time
8
9 class cmd_deploy_linaro_image(BaseAction):
10 def run(self, hwpack, rootfs):
11@@ -11,7 +10,7 @@
12 self.client.boot_master_image()
13
14 print "Waiting for network to come up"
15- self.wait_for_network()
16+ self.client.wait_network_up()
17
18 def generate_tarballs(self):
19 """
20@@ -22,19 +21,6 @@
21 For reference, see magma-chamber branch, extract-image script
22 """
23
24- def wait_for_network(self, timeout=60):
25- now = time.time()
26- while time.time() < now+timeout:
27- self.client.proc.sendline("ping -c1 192.168.1.10")
28- id = self.client.proc.expect(
29- ["64 bytes from", "Network is unreachable"])
30- if id == 0:
31- break
32- if id != 0:
33- print "Failed to bring up network on master image"
34- raise TimeoutError
35- self.client.proc.expect('root@master:')
36-
37 def deploy_linaro_rootfs(self, rootfs):
38 print "Deploying linaro image"
39 master_str = 'root@master:'
40
41=== modified file 'lava/client.py'
42--- lava/client.py 2011-02-26 14:48:34 +0000
43+++ lava/client.py 2011-03-02 20:03:22 +0000
44@@ -1,5 +1,6 @@
45 import pexpect
46 import sys
47+import time
48
49 """
50 This is an ugly hack, the uboot commands for a given board type and the board
51@@ -103,6 +104,29 @@
52 if response:
53 self.proc.expect(response, timeout=timeout)
54
55+ def check_network_up(self):
56+ self.proc.sendline("LC_ALL=C ping -W4 -c1 192.168.1.10")
57+ id = self.proc.expect(["1 received", "0 received"], timeout=5)
58+ if id == 0:
59+ return True
60+ else:
61+ return False
62+
63+ def wait_network_up(self, timeout=60):
64+ now = time.time()
65+ while time.time() < now+timeout:
66+ if self.check_network_up():
67+ return
68+ raise NetworkError
69+
70+
71+class NetworkError(Exception):
72+ """
73+ This is used when a network error occurs, such as failing to bring up
74+ the network interface on the client
75+ """
76+
77+
78 class OperationFailed(Exception):
79 pass
80

Subscribers

People subscribed via source and target branches