Merge lp:~thumper/juju-core/fix-intermittent-failure into lp:~go-bot/juju-core/trunk

Proposed by Tim Penhey
Status: Merged
Approved by: Tim Penhey
Approved revision: no longer in the source branch.
Merged at revision: 1847
Proposed branch: lp:~thumper/juju-core/fix-intermittent-failure
Merge into: lp:~go-bot/juju-core/trunk
Diff against target: 93 lines (+28/-19)
1 file modified
environs/sshstorage/storage_test.go (+28/-19)
To merge this branch: bzr merge lp:~thumper/juju-core/fix-intermittent-failure
Reviewer Review Type Date Requested Status
Juju Engineering Pending
Review via email: mp+186690@code.launchpad.net

Commit message

Fix race condition in SSHStorage test

I found an intermittent failure in the synchronization
test for the SSHStorage. It only happened once, and not
again, but I felt it was worth fixing anyway.

The race was in the flock subprocess actually starting
before the following lines in the test. The lines following
expected the flock to be taken, but since the flock was
managed by an executed command, there is a race where it
may not have started.

I broke the synchronisation test into three as it was
really testing three distinct things.

The flock helper method now waits for the flock to be taken
by incrementally reading from stdout waiting for the initial
echo to be written out prior to the sleep.

The flock cleanup is also now handled by a cleanup method.

By breaking the test up, we no longer need to manually kill the
process as part of the test.

https://codereview.appspot.com/13799043/

Description of the change

Fix race condition in SSHStorage test

I found an intermittent failure in the synchronization
test for the SSHStorage. It only happened once, and not
again, but I felt it was worth fixing anyway.

The race was in the flock subprocess actually starting
before the following lines in the test. The lines following
expected the flock to be taken, but since the flock was
managed by an executed command, there is a race where it
may not have started.

I broke the synchronisation test into three as it was
really testing three distinct things.

The flock helper method now waits for the flock to be taken
by incrementally reading from stdout waiting for the initial
echo to be written out prior to the sleep.

The flock cleanup is also now handled by a cleanup method.

By breaking the test up, we no longer need to manually kill the
process as part of the test.

https://codereview.appspot.com/13799043/

To post a comment you must log in.
Revision history for this message
Tim Penhey (thumper) wrote :
Download full text (4.9 KiB)

Reviewers: mp+186690_code.launchpad.net,

Message:
Please take a look.

Description:
Fix race condition in SSHStorage test

I found an intermittent failure in the synchronization
test for the SSHStorage. It only happened once, and not
again, but I felt it was worth fixing anyway.

The race was in the flock subprocess actually starting
before the following lines in the test. The lines following
expected the flock to be taken, but since the flock was
managed by an executed command, there is a race where it
may not have started.

I broke the synchronisation test into three as it was
really testing three distinct things.

The flock helper method now waits for the flock to be taken
by incrementally reading from stdout waiting for the initial
echo to be written out prior to the sleep.

The flock cleanup is also now handled by a cleanup method.

By breaking the test up, we no longer need to manually kill the
process as part of the test.

https://code.launchpad.net/~thumper/juju-core/fix-intermittent-failure/+merge/186690

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/13799043/

Affected files (+33, -19 lines):
   A [revision details]
   M environs/sshstorage/storage_test.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20130919221201-urd9lbpjtto8a7pk
+New revision: <email address hidden>

Index: environs/sshstorage/storage_test.go
=== modified file 'environs/sshstorage/storage_test.go'
--- environs/sshstorage/storage_test.go 2013-09-18 22:54:32 +0000
+++ environs/sshstorage/storage_test.go 2013-09-19 23:29:41 +0000
@@ -244,33 +244,42 @@
   c.Assert(stor.DefaultConsistencyStrategy(), gc.Equals,
utils.AttemptStrategy{})
  }

-// flock is a test helper that flocks a file,
-// executes "sleep" with the specified duration,
-// and returns the *Cmd so it can be early terminated.
-func (s *storageSuite) flock(c *gc.C, mode flockmode, lockfile string,
duration time.Duration) *os.Process {
- sleepcmd := fmt.Sprintf("sleep %vs", duration.Seconds())
+const defaultFlockTimeout = 5 * time.Second
+
+// flock is a test helper that flocks a file, executes "sleep" with the
+// specified duration, the command is terminated in the test tear down.
+func (s *storageSuite) flock(c *gc.C, mode flockmode, lockfile string) {
+ sleepcmd := fmt.Sprintf("echo started && sleep %vs",
defaultFlockTimeout.Seconds())
   cmd := exec.Command(flockBin, "--nonblock", "--close", string(mode),
lockfile, "-c", sleepcmd)
+ stdout, err := cmd.StdoutPipe()
+ c.Assert(err, gc.IsNil)
   c.Assert(cmd.Start(), gc.IsNil)
- return cmd.Process
+ // Make sure the flock has been taken before returning by reading stdout
waiting for "started"
+ for count := len("started"); count > 0; {
+ result := make([]byte, count)
+ bytesRead, err := stdout.Read(result)
+ c.Assert(err, gc.IsNil)
+ count -= bytesRead
+ }
+ s.AddCleanup(func(*gc.C) {
+ cmd.Process.Kill()
+ cmd.Process.Wait()
+ })
  }

-const defaultFlockTimeout = 5 * time.Second
-
-func (s *sto...

Read more...

Revision history for this message
Andrew Wilkins (axwalk) wrote :

LGTM, thanks for fixing my crap.

https://codereview.appspot.com/13799043/diff/1/environs/sshstorage/storage_test.go
File environs/sshstorage/storage_test.go (right):

https://codereview.appspot.com/13799043/diff/1/environs/sshstorage/storage_test.go#newcode258
environs/sshstorage/storage_test.go:258: for count := len("started");
count > 0; {
I'd probably just use
err = io.ReadFull(stdout, make([]byte, len("started")))
c.Assert(err, gc.IsNil)

https://codereview.appspot.com/13799043/

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'environs/sshstorage/storage_test.go'
--- environs/sshstorage/storage_test.go 2013-09-18 22:54:32 +0000
+++ environs/sshstorage/storage_test.go 2013-09-20 01:34:23 +0000
@@ -6,6 +6,7 @@
6import (6import (
7 "bytes"7 "bytes"
8 "fmt"8 "fmt"
9 "io"
9 "io/ioutil"10 "io/ioutil"
10 "os"11 "os"
11 "os/exec"12 "os/exec"
@@ -244,33 +245,38 @@
244 c.Assert(stor.DefaultConsistencyStrategy(), gc.Equals, utils.AttemptStrategy{})245 c.Assert(stor.DefaultConsistencyStrategy(), gc.Equals, utils.AttemptStrategy{})
245}246}
246247
247// flock is a test helper that flocks a file,248const defaultFlockTimeout = 5 * time.Second
248// executes "sleep" with the specified duration,249
249// and returns the *Cmd so it can be early terminated.250// flock is a test helper that flocks a file, executes "sleep" with the
250func (s *storageSuite) flock(c *gc.C, mode flockmode, lockfile string, duration time.Duration) *os.Process {251// specified duration, the command is terminated in the test tear down.
251 sleepcmd := fmt.Sprintf("sleep %vs", duration.Seconds())252func (s *storageSuite) flock(c *gc.C, mode flockmode, lockfile string) {
253 sleepcmd := fmt.Sprintf("echo started && sleep %vs", defaultFlockTimeout.Seconds())
252 cmd := exec.Command(flockBin, "--nonblock", "--close", string(mode), lockfile, "-c", sleepcmd)254 cmd := exec.Command(flockBin, "--nonblock", "--close", string(mode), lockfile, "-c", sleepcmd)
255 stdout, err := cmd.StdoutPipe()
256 c.Assert(err, gc.IsNil)
253 c.Assert(cmd.Start(), gc.IsNil)257 c.Assert(cmd.Start(), gc.IsNil)
254 return cmd.Process258 // Make sure the flock has been taken before returning by reading stdout waiting for "started"
259 _, err = io.ReadFull(stdout, make([]byte, len("started")))
260 c.Assert(err, gc.IsNil)
261 s.AddCleanup(func(*gc.C) {
262 cmd.Process.Kill()
263 cmd.Process.Wait()
264 })
255}265}
256266
257const defaultFlockTimeout = 5 * time.Second267func (s *storageSuite) TestCreateFailsIfFlockNotAvailable(c *gc.C) {
258
259func (s *storageSuite) TestSynchronisation(c *gc.C) {
260 storageDir := c.MkDir()268 storageDir := c.MkDir()
261 proc := s.flock(c, flockShared, storageDir, defaultFlockTimeout)269 s.flock(c, flockShared, storageDir)
262 defer proc.Wait()
263 defer proc.Kill()
264
265 // Creating storage requires an exclusive lock initially.270 // Creating storage requires an exclusive lock initially.
266 //271 //
267 // flock exits with exit code 1 if it can't acquire the272 // flock exits with exit code 1 if it can't acquire the
268 // lock immediately in non-blocking mode (which the tests force).273 // lock immediately in non-blocking mode (which the tests force).
269 _, err := NewSSHStorage("example.com", storageDir)274 _, err := NewSSHStorage("example.com", storageDir)
270 c.Assert(err, gc.ErrorMatches, "exit code 1")275 c.Assert(err, gc.ErrorMatches, "exit code 1")
276}
271277
272 proc.Kill()278func (s *storageSuite) TestWithSharedLocks(c *gc.C) {
273 proc.Wait()279 storageDir := c.MkDir()
274 stor, err := NewSSHStorage("example.com", storageDir)280 stor, err := NewSSHStorage("example.com", storageDir)
275 c.Assert(err, gc.IsNil)281 c.Assert(err, gc.IsNil)
276282
@@ -279,7 +285,7 @@
279 data := []byte("abc\000def")285 data := []byte("abc\000def")
280 c.Assert(ioutil.WriteFile(filepath.Join(storageDir, contentdir, "a"), data, 0644), gc.IsNil)286 c.Assert(ioutil.WriteFile(filepath.Join(storageDir, contentdir, "a"), data, 0644), gc.IsNil)
281287
282 proc = s.flock(c, flockShared, storageDir, defaultFlockTimeout)288 s.flock(c, flockShared, storageDir)
283 _, err = storage.Get(stor, "a")289 _, err = storage.Get(stor, "a")
284 c.Assert(err, gc.IsNil)290 c.Assert(err, gc.IsNil)
285 _, err = storage.List(stor, "")291 _, err = storage.List(stor, "")
@@ -287,12 +293,15 @@
287 c.Assert(stor.Put("a", bytes.NewBuffer(nil), 0), gc.NotNil)293 c.Assert(stor.Put("a", bytes.NewBuffer(nil), 0), gc.NotNil)
288 c.Assert(stor.Remove("a"), gc.NotNil)294 c.Assert(stor.Remove("a"), gc.NotNil)
289 c.Assert(stor.RemoveAll(), gc.NotNil)295 c.Assert(stor.RemoveAll(), gc.NotNil)
290 proc.Kill()296}
291 proc.Wait()
292297
298func (s *storageSuite) TestWithExclusiveLocks(c *gc.C) {
299 storageDir := c.MkDir()
300 stor, err := NewSSHStorage("example.com", storageDir)
301 c.Assert(err, gc.IsNil)
293 // None of the methods (apart from URL) should be able to do anything302 // None of the methods (apart from URL) should be able to do anything
294 // while an exclusive lock is held.303 // while an exclusive lock is held.
295 proc = s.flock(c, flockExclusive, storageDir, defaultFlockTimeout)304 s.flock(c, flockExclusive, storageDir)
296 _, err = stor.URL("a")305 _, err = stor.URL("a")
297 c.Assert(err, gc.IsNil)306 c.Assert(err, gc.IsNil)
298 c.Assert(stor.Put("a", bytes.NewBuffer(nil), 0), gc.NotNil)307 c.Assert(stor.Put("a", bytes.NewBuffer(nil), 0), gc.NotNil)

Subscribers

People subscribed via source and target branches

to status/vote changes: