Merge lp:~pwlars/lava-dispatcher/move-network-checks into lp:lava-dispatcher
- move-network-checks
- Merge into trunk
Status: | Merged |
---|---|
Merged at revision: | 13 |
Proposed branch: | lp:~pwlars/lava-dispatcher/move-network-checks |
Merge into: | lp:lava-dispatcher |
Diff against target: |
79 lines (+25/-15) 2 files modified
lava/actions/deploy.py (+1/-15) lava/client.py (+24/-0) |
To merge this branch: | bzr merge lp:~pwlars/lava-dispatcher/move-network-checks |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Paul Larson (community) | Needs Resubmitting | ||
Review via email: mp+51364@code.launchpad.net |
Commit message
Description of the change
Loïc Minier (lool) wrote : | # |
Spring Zhang (qzhang) wrote : | # |
> + id = self.proc.
>
Do we need to list all of the possibilities? Like timeout error: "100%
packet loss"
--
Best wishes,
Spring Zhang
Paul Larson (pwlars) wrote : | # |
> You're changing cmd_deploy_
> cmd_*.run() returns anything; I think all cmd_*.run() should implement the
> same interface
Good point, although some of these run() commands are not yet implemented, so I'm not too concerned with those just yet. Another thing I had considered was raising an exception instead. Do you have a preference?
> Some comments about the expect ping test:
> * I'm not entirely comfortable with using expect; I'm not sure how we can make
> sure the ping process has been created; of course ping is relatively harmless,
> but it seems this could result in 20 ping commands being queued; do we need to
> pass some timeout?
We do, in wait_network_up, which calls check_network_up
> * default ping timeout seems to be relatively random and might be high; I
> tried ping -c1 1.0.0.1 and it takes 20 seconds to timeout here
Adjusted ping timeout to 4 seconds, timeout the expect command for 5
> * maybe testing whether there's a default route up is enough/better?
We can do that, but it's an extra check and we really need to know if we can get to the control server.
> * parsing the ping textual output makes me think that LC_ALL=C should be set
Added
> * it seems pexpect has a 30 seconds timeout and the other outputs aren't
> tested so that even if ping times out quickly, we have to wait for a 30s
> expect timeout before retrying a ping
I think this was addressed previously.
> * "64 bytes" sounds like a ping implementation detail; perhaps "bytes from"
> would be less risky, or "1 received"?
Changed to have it look for 1 received, or 0 received. I think that should cover everything unless there's some strange failure case that doesn't mention 0 received.
- 12. By Paul Larson
-
Fix up some issues with ping
- 13. By Paul Larson
-
Adjust ping timeout
Loïc Minier (lool) wrote : | # |
On Tue, Mar 01, 2011, Paul Larson wrote:
> Good point, although some of these run() commands are not yet
> implemented, so I'm not too concerned with those just yet. Another
> thing I had considered was raising an exception instead. Do you have
> a preference?
Raising an exception seems better here, yeah
> > Some comments about the expect ping test:
> > * I'm not entirely comfortable with using expect; I'm not sure how we can make
> > sure the ping process has been created; of course ping is relatively harmless,
> > but it seems this could result in 20 ping commands being queued; do we need to
> > pass some timeout?
> We do, in wait_network_up, which calls check_network_up
What I mean is that if you expect timeout is 2 seconds and your ping
times out only after 10 seconds, then that loop in wait_network_up()
might launch 5 pings (since expect will timeout before the ping does)
before the first one finally returns, and then your console will be
running these other pings when you are trying to run another command
etc.
So an expect timeout should be a fatal error.
Ideally, we'd switch to something more solid than parsing the output of
the console to decide whether something is finished or not; that might
be for later though
> > * default ping timeout seems to be relatively random and might be high; I
> > tried ping -c1 1.0.0.1 and it takes 20 seconds to timeout here
> Adjusted ping timeout to 4 seconds, timeout the expect command for 5
Looks good; thanks
> > * "64 bytes" sounds like a ping implementation detail; perhaps "bytes from"
> > would be less risky, or "1 received"?
> Changed to have it look for 1 received, or 0 received. I think that
> should cover everything unless there's some strange failure case that
> doesn't mention 0 received.
I was actually thinking "Heck, why don't we just test the ping exit
code" which would indeed do the right thing from my quick local
testing -- but we're parsing the console output, so ain't so easy. I
guess we could:
ping -q -W4 -c1 1.0.0.1 >/dev/null 2>&1 && echo worked || echo failed
Anyway, it's probably ok to match "0 received" or "1 received" for now
I see you added a _find_default_
if you're looking for the default route (ip route show exact 0/0) or ip
route get to 192.168.0.254 if you want the route to a specific host
(such as the gateway)
--
Loïc Minier
- 14. By Paul Larson
-
Raise an exception when the network fails to come up rather than
returning an error, and remove an unintentional checkin
Paul Larson (pwlars) wrote : | # |
> On Tue, Mar 01, 2011, Paul Larson wrote:
> > Good point, although some of these run() commands are not yet
> > implemented, so I'm not too concerned with those just yet. Another
> > thing I had considered was raising an exception instead. Do you have
> > a preference?
>
> Raising an exception seems better here, yeah
Done
> > > Some comments about the expect ping test:
> > > * I'm not entirely comfortable with using expect; I'm not sure how we can
> make
> > > sure the ping process has been created; of course ping is relatively
> harmless,
> > > but it seems this could result in 20 ping commands being queued; do we
> need to
> > > pass some timeout?
> > We do, in wait_network_up, which calls check_network_up
>
> What I mean is that if you expect timeout is 2 seconds and your ping
> times out only after 10 seconds, then that loop in wait_network_up()
> might launch 5 pings (since expect will timeout before the ping does)
> before the first one finally returns, and then your console will be
> running these other pings when you are trying to run another command
> etc.
But I think I addressed this already by setting the ping timeout to 4 seconds and the expect timeout to 5. So if it passes check_network_up(), then it will come back right away most likely, and go through without another ping command being sent. If it fails the ping, it won't return from check_network_up() for 5 seconds, after which another one will be sent. I don't see where command stacking could occur here.
> Ideally, we'd switch to something more solid than parsing the output of
> the console to decide whether something is finished or not; that might
> be for later though
Yeah, I think we'll have a cleaner way to do this later. Using expect is a necessary evil of relying on the serial port for access to the device.
> I see you added a _find_default_
oops, that was a note to myself for something I wanted to do after I have this merged, and was clearly not intended to be something we want to include right now.
Loïc Minier (lool) wrote : | # |
On Wed, Mar 02, 2011, Paul Larson wrote:
> > > > * I'm not entirely comfortable with using expect; I'm not sure how we can
> > make
> > > > sure the ping process has been created; of course ping is relatively
> > harmless,
> > > > but it seems this could result in 20 ping commands being queued; do we
> > need to
> > > > pass some timeout?
> > > We do, in wait_network_up, which calls check_network_up
> >
> > What I mean is that if you expect timeout is 2 seconds and your ping
> > times out only after 10 seconds, then that loop in wait_network_up()
> > might launch 5 pings (since expect will timeout before the ping does)
> > before the first one finally returns, and then your console will be
> > running these other pings when you are trying to run another command
> > etc.
> But I think I addressed this already by setting the ping timeout to 4
> seconds and the expect timeout to 5. So if it passes
> check_network_up(), then it will come back right away most likely, and
> go through without another ping command being sent. If it fails the
> ping, it won't return from check_network_up() for 5 seconds, after
> which another one will be sent. I don't see where command stacking
> could occur here.
Sure; since your comment was on my original review, I thought you meant
it wasn't an issue in the first place; with the 4 / 5 seconds timeouts,
it's much better. Still, there's room for ping not to timeout despite
being told to, which is why I was suggesting that any expect timeout
should be a fatal error. Anyway, the sooner we move away from expect,
the better :-)
--
Loïc Minier
Preview Diff
1 | === modified file 'lava/actions/deploy.py' |
2 | --- lava/actions/deploy.py 2011-02-26 14:41:09 +0000 |
3 | +++ lava/actions/deploy.py 2011-03-02 20:03:22 +0000 |
4 | @@ -1,6 +1,5 @@ |
5 | #!/usr/bin/python |
6 | from lava.actions import BaseAction |
7 | -import time |
8 | |
9 | class cmd_deploy_linaro_image(BaseAction): |
10 | def run(self, hwpack, rootfs): |
11 | @@ -11,7 +10,7 @@ |
12 | self.client.boot_master_image() |
13 | |
14 | print "Waiting for network to come up" |
15 | - self.wait_for_network() |
16 | + self.client.wait_network_up() |
17 | |
18 | def generate_tarballs(self): |
19 | """ |
20 | @@ -22,19 +21,6 @@ |
21 | For reference, see magma-chamber branch, extract-image script |
22 | """ |
23 | |
24 | - def wait_for_network(self, timeout=60): |
25 | - now = time.time() |
26 | - while time.time() < now+timeout: |
27 | - self.client.proc.sendline("ping -c1 192.168.1.10") |
28 | - id = self.client.proc.expect( |
29 | - ["64 bytes from", "Network is unreachable"]) |
30 | - if id == 0: |
31 | - break |
32 | - if id != 0: |
33 | - print "Failed to bring up network on master image" |
34 | - raise TimeoutError |
35 | - self.client.proc.expect('root@master:') |
36 | - |
37 | def deploy_linaro_rootfs(self, rootfs): |
38 | print "Deploying linaro image" |
39 | master_str = 'root@master:' |
40 | |
41 | === modified file 'lava/client.py' |
42 | --- lava/client.py 2011-02-26 14:48:34 +0000 |
43 | +++ lava/client.py 2011-03-02 20:03:22 +0000 |
44 | @@ -1,5 +1,6 @@ |
45 | import pexpect |
46 | import sys |
47 | +import time |
48 | |
49 | """ |
50 | This is an ugly hack, the uboot commands for a given board type and the board |
51 | @@ -103,6 +104,29 @@ |
52 | if response: |
53 | self.proc.expect(response, timeout=timeout) |
54 | |
55 | + def check_network_up(self): |
56 | + self.proc.sendline("LC_ALL=C ping -W4 -c1 192.168.1.10") |
57 | + id = self.proc.expect(["1 received", "0 received"], timeout=5) |
58 | + if id == 0: |
59 | + return True |
60 | + else: |
61 | + return False |
62 | + |
63 | + def wait_network_up(self, timeout=60): |
64 | + now = time.time() |
65 | + while time.time() < now+timeout: |
66 | + if self.check_network_up(): |
67 | + return |
68 | + raise NetworkError |
69 | + |
70 | + |
71 | +class NetworkError(Exception): |
72 | + """ |
73 | + This is used when a network error occurs, such as failing to bring up |
74 | + the network interface on the client |
75 | + """ |
76 | + |
77 | + |
78 | class OperationFailed(Exception): |
79 | pass |
80 |
- self.wait_ for_network( ) wait_network_ up() is False:
+ if self.client.
+ print "Failed to bring up network on master image"
+ return False
You're changing cmd_deploy_ linaro_ image.run( ) to return False, but not other cmd_*.run() returns anything; I think all cmd_*.run() should implement the same interface
+ master_str = 'root@master:'
right, I also included this fix in http:// bazaar. launchpad. net/~lool/ lava/misc- cleanups/ revision/ 10
Otherwise, the move looks good
Some comments about the expect ping test:
* I'm not entirely comfortable with using expect; I'm not sure how we can make sure the ping process has been created; of course ping is relatively harmless, but it seems this could result in 20 ping commands being queued; do we need to pass some timeout?
* default ping timeout seems to be relatively random and might be high; I tried ping -c1 1.0.0.1 and it takes 20 seconds to timeout here
* maybe testing whether there's a default route up is enough/better?
* parsing the ping textual output makes me think that LC_ALL=C should be set in the environment in the case where the output gets translated (apparently it isn't translated on my system which French locales, but still sounds like a good idea)
* it seems pexpect has a 30 seconds timeout and the other outputs aren't tested so that even if ping times out quickly, we have to wait for a 30s expect timeout before retrying a ping
* "64 bytes" sounds like a ping implementation detail; perhaps "bytes from" would be less risky, or "1 received"?
Anyway, the implementation is probably just fine at this stage; just dumping my thoughts :-)