Merge into trunk : move-network-checks : Code : LAVA Dispatcher

Status:	Merged
Merged at revision:	13
Proposed branch:	lp:~pwlars/lava-dispatcher/move-network-checks
Merge into:	lp:lava-dispatcher
Diff against target:	79 lines (+25/-15) 2 files modified lava/actions/deploy.py (+1/-15) lava/client.py (+24/-0)
To merge this branch:	bzr merge lp:~pwlars/lava-dispatcher/move-network-checks
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Paul Larson (community)			Needs Resubmitting on 2011-03-02
Review via email: mp+51364@code.launchpad.net

Revision history for this message

Loïc Minier (lool) wrote on 2011-02-26:

#

- self.wait_for_network()
+ if self.client.wait_network_up() is False:
+ print "Failed to bring up network on master image"
+ return False

You're changing cmd_deploy_linaro_image.run() to return False, but not other cmd_*.run() returns anything; I think all cmd_*.run() should implement the same interface

+ master_str = 'root@master:'

right, I also included this fix in http://bazaar.launchpad.net/~lool/lava/misc-cleanups/revision/10

Otherwise, the move looks good

Some comments about the expect ping test:
* I'm not entirely comfortable with using expect; I'm not sure how we can make sure the ping process has been created; of course ping is relatively harmless, but it seems this could result in 20 ping commands being queued; do we need to pass some timeout?
* default ping timeout seems to be relatively random and might be high; I tried ping -c1 1.0.0.1 and it takes 20 seconds to timeout here
* maybe testing whether there's a default route up is enough/better?
* parsing the ping textual output makes me think that LC_ALL=C should be set in the environment in the case where the output gets translated (apparently it isn't translated on my system which French locales, but still sounds like a good idea)
* it seems pexpect has a 30 seconds timeout and the other outputs aren't tested so that even if ping times out quickly, we have to wait for a 30s expect timeout before retrying a ping
* "64 bytes" sounds like a ping implementation detail; perhaps "bytes from" would be less risky, or "1 received"?

Anyway, the implementation is probably just fine at this stage; just dumping my thoughts :-)

Revision history for this message

Spring Zhang (qzhang) wrote on 2011-02-27:

#

> + id = self.proc.expect(["64 bytes from", "Network is unreachable"])
>
Do we need to list all of the possibilities? Like timeout error: "100%
packet loss"
--
Best wishes,
Spring Zhang

Revision history for this message

Paul Larson (pwlars) wrote on 2011-03-01:

#

> You're changing cmd_deploy_linaro_image.run() to return False, but not other
> cmd_*.run() returns anything; I think all cmd_*.run() should implement the
> same interface
Good point, although some of these run() commands are not yet implemented, so I'm not too concerned with those just yet. Another thing I had considered was raising an exception instead. Do you have a preference?

> Some comments about the expect ping test:
> * I'm not entirely comfortable with using expect; I'm not sure how we can make
> sure the ping process has been created; of course ping is relatively harmless,
> but it seems this could result in 20 ping commands being queued; do we need to
> pass some timeout?
We do, in wait_network_up, which calls check_network_up

> * default ping timeout seems to be relatively random and might be high; I
> tried ping -c1 1.0.0.1 and it takes 20 seconds to timeout here
Adjusted ping timeout to 4 seconds, timeout the expect command for 5

> * maybe testing whether there's a default route up is enough/better?
We can do that, but it's an extra check and we really need to know if we can get to the control server.

> * parsing the ping textual output makes me think that LC_ALL=C should be set
Added

> * it seems pexpect has a 30 seconds timeout and the other outputs aren't
> tested so that even if ping times out quickly, we have to wait for a 30s
> expect timeout before retrying a ping
I think this was addressed previously.

> * "64 bytes" sounds like a ping implementation detail; perhaps "bytes from"
> would be less risky, or "1 received"?
Changed to have it look for 1 received, or 0 received. I think that should cover everything unless there's some strange failure case that doesn't mention 0 received.

lp:~pwlars/lava-dispatcher/move-network-checks updated on 2011-03-01

12. By Paul Larson on 2011-03-01: Fix up some issues with ping
13. By Paul Larson on 2011-03-01: Adjust ping timeout

Revision history for this message

Loïc Minier (lool) wrote on 2011-03-01:

#

On Tue, Mar 01, 2011, Paul Larson wrote:
> Good point, although some of these run() commands are not yet
> implemented, so I'm not too concerned with those just yet. Another
> thing I had considered was raising an exception instead. Do you have
> a preference?

Raising an exception seems better here, yeah

> > Some comments about the expect ping test:
> > * I'm not entirely comfortable with using expect; I'm not sure how we can make
> > sure the ping process has been created; of course ping is relatively harmless,
> > but it seems this could result in 20 ping commands being queued; do we need to
> > pass some timeout?
> We do, in wait_network_up, which calls check_network_up

What I mean is that if you expect timeout is 2 seconds and your ping
times out only after 10 seconds, then that loop in wait_network_up()
might launch 5 pings (since expect will timeout before the ping does)
before the first one finally returns, and then your console will be
running these other pings when you are trying to run another command
etc.

So an expect timeout should be a fatal error.

Ideally, we'd switch to something more solid than parsing the output of
the console to decide whether something is finished or not; that might
be for later though

> > * default ping timeout seems to be relatively random and might be high; I
> > tried ping -c1 1.0.0.1 and it takes 20 seconds to timeout here
> Adjusted ping timeout to 4 seconds, timeout the expect command for 5

Looks good; thanks

> > * "64 bytes" sounds like a ping implementation detail; perhaps "bytes from"
> > would be less risky, or "1 received"?
> Changed to have it look for 1 received, or 0 received. I think that
> should cover everything unless there's some strange failure case that
> doesn't mention 0 received.

I was actually thinking "Heck, why don't we just test the ping exit
code" which would indeed do the right thing from my quick local
testing -- but we're parsing the console output, so ain't so easy. I
guess we could:
ping -q -W4 -c1 1.0.0.1 >/dev/null 2>&1 && echo worked || echo failed

Anyway, it's probably ok to match "0 received" or "1 received" for now

I see you added a _find_default_nic(); you probably want ip route show
if you're looking for the default route (ip route show exact 0/0) or ip
route get to 192.168.0.254 if you want the route to a specific host
(such as the gateway)

--
Loïc Minier

On Tue, Mar 01, 2011, Paul Larson wrote:
> Good point, although some of these run() commands are not yet
> implemented, so I'm not too concerned with those just yet.  Another
> thing I had considered was raising an exception instead.  Do you have
> a preference?

Raising an exception seems better here, yeah

> > Some comments about the expect ping test:
> > * I'm not entirely comfortable with using expect; I'm not sure how we can make
> > sure the ping process has been created; of course ping is relatively harmless,
> > but it seems this could result in 20 ping commands being queued; do we need to
> > pass some timeout?
> We do, in wait_network_up, which calls check_network_up

What I mean is that if you expect timeout is 2 seconds and your ping
 times out only after 10 seconds, then that loop in wait_network_up()
 might launch 5 pings (since expect will timeout before the ping does)
 before the first one finally returns, and then your console will be
 running these other pings when you are trying to run another command
 etc.

So an expect timeout should be a fatal error.

Ideally, we'd switch to something more solid than parsing the output of
 the console to decide whether something is finished or not; that might
 be for later though

> > * default ping timeout seems to be relatively random and might be high; I
> > tried ping -c1 1.0.0.1 and it takes 20 seconds to timeout here
> Adjusted ping timeout to 4 seconds, timeout the expect command for 5

Looks good; thanks

> > * "64 bytes" sounds like a ping implementation detail; perhaps "bytes from"
> > would be less risky, or "1 received"?
> Changed to have it look for 1 received, or 0 received.  I think that
> should cover everything unless there's some strange failure case that
> doesn't mention 0 received.

I was actually thinking "Heck, why don't we just test the ping exit
 code" which would indeed do the right thing from my quick local
 testing -- but we're parsing the console output, so ain't so easy.  I
 guess we could:
  ping -q -W4 -c1 1.0.0.1 >/dev/null 2>&1 && echo worked || echo failed

Anyway, it's probably ok to match "0 received" or "1 received" for now

I see you added a _find_default_nic(); you probably want ip route show
 if you're looking for the default route (ip route show exact 0/0) or ip
 route get to 192.168.0.254 if you want the route to a specific host
 (such as the gateway)

-- 
Loïc Minier

lp:~pwlars/lava-dispatcher/move-network-checks updated on 2011-03-02

14. By Paul Larson on 2011-03-02: Raise an exception when the network fails to come up rather than
returning an error, and remove an unintentional checkin

Revision history for this message

Paul Larson (pwlars) wrote on 2011-03-02:

#

> On Tue, Mar 01, 2011, Paul Larson wrote:
> > Good point, although some of these run() commands are not yet
> > implemented, so I'm not too concerned with those just yet. Another
> > thing I had considered was raising an exception instead. Do you have
> > a preference?
>
> Raising an exception seems better here, yeah
Done

> > > Some comments about the expect ping test:
> > > * I'm not entirely comfortable with using expect; I'm not sure how we can
> make
> > > sure the ping process has been created; of course ping is relatively
> harmless,
> > > but it seems this could result in 20 ping commands being queued; do we
> need to
> > > pass some timeout?
> > We do, in wait_network_up, which calls check_network_up
>
> What I mean is that if you expect timeout is 2 seconds and your ping
> times out only after 10 seconds, then that loop in wait_network_up()
> might launch 5 pings (since expect will timeout before the ping does)
> before the first one finally returns, and then your console will be
> running these other pings when you are trying to run another command
> etc.
But I think I addressed this already by setting the ping timeout to 4 seconds and the expect timeout to 5. So if it passes check_network_up(), then it will come back right away most likely, and go through without another ping command being sent. If it fails the ping, it won't return from check_network_up() for 5 seconds, after which another one will be sent. I don't see where command stacking could occur here.

> Ideally, we'd switch to something more solid than parsing the output of
> the console to decide whether something is finished or not; that might
> be for later though
Yeah, I think we'll have a cleaner way to do this later. Using expect is a necessary evil of relying on the serial port for access to the device.

> I see you added a _find_default_nic(); you probably want ip route show
oops, that was a note to myself for something I wanted to do after I have this merged, and was clearly not intended to be something we want to include right now.

> On Tue, Mar 01, 2011, Paul Larson wrote:
> > Good point, although some of these run() commands are not yet
> > implemented, so I'm not too concerned with those just yet.  Another
> > thing I had considered was raising an exception instead.  Do you have
> > a preference?
> 
>  Raising an exception seems better here, yeah
Done

> > > Some comments about the expect ping test:
> > > * I'm not entirely comfortable with using expect; I'm not sure how we can
> make
> > > sure the ping process has been created; of course ping is relatively
> harmless,
> > > but it seems this could result in 20 ping commands being queued; do we
> need to
> > > pass some timeout?
> > We do, in wait_network_up, which calls check_network_up
> 
>  What I mean is that if you expect timeout is 2 seconds and your ping
>  times out only after 10 seconds, then that loop in wait_network_up()
>  might launch 5 pings (since expect will timeout before the ping does)
>  before the first one finally returns, and then your console will be
>  running these other pings when you are trying to run another command
>  etc.
But I think I addressed this already by setting the ping timeout to 4 seconds and the expect timeout to 5.  So if it passes check_network_up(), then it will come back right away most likely, and go through without another ping command being sent.  If it fails the ping, it won't return from check_network_up() for 5 seconds, after which another one will be sent.  I don't see where command stacking could occur here.

>  Ideally, we'd switch to something more solid than parsing the output of
>  the console to decide whether something is finished or not; that might
>  be for later though
Yeah, I think we'll have a cleaner way to do this later.  Using expect is a necessary evil of relying on the serial port for access to the device.

>  I see you added a _find_default_nic(); you probably want ip route show
oops, that was a note to myself for something I wanted to do after I have this merged, and was clearly not intended to be something we want to include right now.

review: Needs Resubmitting

Revision history for this message

Loïc Minier (lool) wrote on 2011-03-02:

#

On Wed, Mar 02, 2011, Paul Larson wrote:
> > > > * I'm not entirely comfortable with using expect; I'm not sure how we can
> > make
> > > > sure the ping process has been created; of course ping is relatively
> > harmless,
> > > > but it seems this could result in 20 ping commands being queued; do we
> > need to
> > > > pass some timeout?
> > > We do, in wait_network_up, which calls check_network_up
> >
> > What I mean is that if you expect timeout is 2 seconds and your ping
> > times out only after 10 seconds, then that loop in wait_network_up()
> > might launch 5 pings (since expect will timeout before the ping does)
> > before the first one finally returns, and then your console will be
> > running these other pings when you are trying to run another command
> > etc.
> But I think I addressed this already by setting the ping timeout to 4
> seconds and the expect timeout to 5. So if it passes
> check_network_up(), then it will come back right away most likely, and
> go through without another ping command being sent. If it fails the
> ping, it won't return from check_network_up() for 5 seconds, after
> which another one will be sent. I don't see where command stacking
> could occur here.

Sure; since your comment was on my original review, I thought you meant
it wasn't an issue in the first place; with the 4 / 5 seconds timeouts,
it's much better. Still, there's room for ping not to timeout despite
being told to, which is why I was suggesting that any expect timeout
should be a fatal error. Anyway, the sooner we move away from expect,
the better :-)

--
Loïc Minier

 === modified file 'lava/actions/deploy.py'
 --- lava/actions/deploy.py	2011-02-26 14:41:09 +0000
 +++ lava/actions/deploy.py	2011-03-02 20:03:22 +0000
@@ -1,6 +1,5 @@
  #!/usr/bin/python
  from lava.actions import BaseAction
--import time
  class cmd_deploy_linaro_image(BaseAction):
      def run(self, hwpack, rootfs):
@@ -11,7 +10,7 @@
          self.client.boot_master_image()
          print "Waiting for network to come up"
--        self.wait_for_network()
++        self.client.wait_network_up()
      def generate_tarballs(self):
          """
@@ -22,19 +21,6 @@
          For reference, see magma-chamber branch, extract-image script
          """
--    def wait_for_network(self, timeout=60):
--        now = time.time()
--        while time.time() < now+timeout:
--            self.client.proc.sendline("ping -c1 192.168.1.10")
--            id = self.client.proc.expect(
--                ["64 bytes from", "Network is unreachable"])
--            if id == 0:
--                break
--        if id != 0:
--            print "Failed to bring up network on master image"
--            raise TimeoutError
--        self.client.proc.expect('root@master:')
--
      def deploy_linaro_rootfs(self, rootfs):
          print "Deploying linaro image"
          master_str = 'root@master:'
 === modified file 'lava/client.py'
 --- lava/client.py	2011-02-26 14:48:34 +0000
 +++ lava/client.py	2011-03-02 20:03:22 +0000
@@ -1,5 +1,6 @@
  import pexpect
  import sys
++import time
  """
  This is an ugly hack, the uboot commands for a given board type and the board
@@ -103,6 +104,29 @@
          if response:
              self.proc.expect(response, timeout=timeout)
++    def check_network_up(self):
++        self.proc.sendline("LC_ALL=C ping -W4 -c1 192.168.1.10")
++        id = self.proc.expect(["1 received", "0 received"], timeout=5)
++        if id == 0:
++            return True
++        else:
++            return False
++
++    def wait_network_up(self, timeout=60):
++        now = time.time()
++        while time.time() < now+timeout:
++            if self.check_network_up():
++                return
++        raise NetworkError
++
++
++class NetworkError(Exception):
++    """
++    This is used when a network error occurs, such as failing to bring up
++    the network interface on the client
++    """
++
++
  class OperationFailed(Exception):
      pass

LAVA Dispatcher

Merge lp:~pwlars/lava-dispatcher/move-network-checks into lp:lava-dispatcher

Commit message

Description of the change

Preview Diff

Subscribers