Merge lp:~thumper/juju-core/fix-intermittent-failure into lp:~go-bot/juju-core/trunk

Proposed by Tim Penhey
Status: Merged
Approved by: Tim Penhey
Approved revision: no longer in the source branch.
Merged at revision: 1847
Proposed branch: lp:~thumper/juju-core/fix-intermittent-failure
Merge into: lp:~go-bot/juju-core/trunk
Diff against target: 93 lines (+28/-19)
1 file modified
environs/sshstorage/storage_test.go (+28/-19)
To merge this branch: bzr merge lp:~thumper/juju-core/fix-intermittent-failure
Reviewer Review Type Date Requested Status
Juju Engineering Pending
Review via email: mp+186690@code.launchpad.net

Commit message

Fix race condition in SSHStorage test

I found an intermittent failure in the synchronization
test for the SSHStorage. It only happened once, and not
again, but I felt it was worth fixing anyway.

The race was in the flock subprocess actually starting
before the following lines in the test. The lines following
expected the flock to be taken, but since the flock was
managed by an executed command, there is a race where it
may not have started.

I broke the synchronisation test into three as it was
really testing three distinct things.

The flock helper method now waits for the flock to be taken
by incrementally reading from stdout waiting for the initial
echo to be written out prior to the sleep.

The flock cleanup is also now handled by a cleanup method.

By breaking the test up, we no longer need to manually kill the
process as part of the test.

https://codereview.appspot.com/13799043/

Description of the change

Fix race condition in SSHStorage test

I found an intermittent failure in the synchronization
test for the SSHStorage. It only happened once, and not
again, but I felt it was worth fixing anyway.

The race was in the flock subprocess actually starting
before the following lines in the test. The lines following
expected the flock to be taken, but since the flock was
managed by an executed command, there is a race where it
may not have started.

I broke the synchronisation test into three as it was
really testing three distinct things.

The flock helper method now waits for the flock to be taken
by incrementally reading from stdout waiting for the initial
echo to be written out prior to the sleep.

The flock cleanup is also now handled by a cleanup method.

By breaking the test up, we no longer need to manually kill the
process as part of the test.

https://codereview.appspot.com/13799043/

To post a comment you must log in.
Revision history for this message
Tim Penhey (thumper) wrote :
Download full text (4.9 KiB)

Reviewers: mp+186690_code.launchpad.net,

Message:
Please take a look.

Description:
Fix race condition in SSHStorage test

I found an intermittent failure in the synchronization
test for the SSHStorage. It only happened once, and not
again, but I felt it was worth fixing anyway.

The race was in the flock subprocess actually starting
before the following lines in the test. The lines following
expected the flock to be taken, but since the flock was
managed by an executed command, there is a race where it
may not have started.

I broke the synchronisation test into three as it was
really testing three distinct things.

The flock helper method now waits for the flock to be taken
by incrementally reading from stdout waiting for the initial
echo to be written out prior to the sleep.

The flock cleanup is also now handled by a cleanup method.

By breaking the test up, we no longer need to manually kill the
process as part of the test.

https://code.launchpad.net/~thumper/juju-core/fix-intermittent-failure/+merge/186690

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/13799043/

Affected files (+33, -19 lines):
   A [revision details]
   M environs/sshstorage/storage_test.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20130919221201-urd9lbpjtto8a7pk
+New revision: <email address hidden>

Index: environs/sshstorage/storage_test.go
=== modified file 'environs/sshstorage/storage_test.go'
--- environs/sshstorage/storage_test.go 2013-09-18 22:54:32 +0000
+++ environs/sshstorage/storage_test.go 2013-09-19 23:29:41 +0000
@@ -244,33 +244,42 @@
   c.Assert(stor.DefaultConsistencyStrategy(), gc.Equals,
utils.AttemptStrategy{})
  }

-// flock is a test helper that flocks a file,
-// executes "sleep" with the specified duration,
-// and returns the *Cmd so it can be early terminated.
-func (s *storageSuite) flock(c *gc.C, mode flockmode, lockfile string,
duration time.Duration) *os.Process {
- sleepcmd := fmt.Sprintf("sleep %vs", duration.Seconds())
+const defaultFlockTimeout = 5 * time.Second
+
+// flock is a test helper that flocks a file, executes "sleep" with the
+// specified duration, the command is terminated in the test tear down.
+func (s *storageSuite) flock(c *gc.C, mode flockmode, lockfile string) {
+ sleepcmd := fmt.Sprintf("echo started && sleep %vs",
defaultFlockTimeout.Seconds())
   cmd := exec.Command(flockBin, "--nonblock", "--close", string(mode),
lockfile, "-c", sleepcmd)
+ stdout, err := cmd.StdoutPipe()
+ c.Assert(err, gc.IsNil)
   c.Assert(cmd.Start(), gc.IsNil)
- return cmd.Process
+ // Make sure the flock has been taken before returning by reading stdout
waiting for "started"
+ for count := len("started"); count > 0; {
+ result := make([]byte, count)
+ bytesRead, err := stdout.Read(result)
+ c.Assert(err, gc.IsNil)
+ count -= bytesRead
+ }
+ s.AddCleanup(func(*gc.C) {
+ cmd.Process.Kill()
+ cmd.Process.Wait()
+ })
  }

-const defaultFlockTimeout = 5 * time.Second
-
-func (s *sto...

Read more...

Revision history for this message
Andrew Wilkins (axwalk) wrote :

LGTM, thanks for fixing my crap.

https://codereview.appspot.com/13799043/diff/1/environs/sshstorage/storage_test.go
File environs/sshstorage/storage_test.go (right):

https://codereview.appspot.com/13799043/diff/1/environs/sshstorage/storage_test.go#newcode258
environs/sshstorage/storage_test.go:258: for count := len("started");
count > 0; {
I'd probably just use
err = io.ReadFull(stdout, make([]byte, len("started")))
c.Assert(err, gc.IsNil)

https://codereview.appspot.com/13799043/

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'environs/sshstorage/storage_test.go'
2--- environs/sshstorage/storage_test.go 2013-09-18 22:54:32 +0000
3+++ environs/sshstorage/storage_test.go 2013-09-20 01:34:23 +0000
4@@ -6,6 +6,7 @@
5 import (
6 "bytes"
7 "fmt"
8+ "io"
9 "io/ioutil"
10 "os"
11 "os/exec"
12@@ -244,33 +245,38 @@
13 c.Assert(stor.DefaultConsistencyStrategy(), gc.Equals, utils.AttemptStrategy{})
14 }
15
16-// flock is a test helper that flocks a file,
17-// executes "sleep" with the specified duration,
18-// and returns the *Cmd so it can be early terminated.
19-func (s *storageSuite) flock(c *gc.C, mode flockmode, lockfile string, duration time.Duration) *os.Process {
20- sleepcmd := fmt.Sprintf("sleep %vs", duration.Seconds())
21+const defaultFlockTimeout = 5 * time.Second
22+
23+// flock is a test helper that flocks a file, executes "sleep" with the
24+// specified duration, the command is terminated in the test tear down.
25+func (s *storageSuite) flock(c *gc.C, mode flockmode, lockfile string) {
26+ sleepcmd := fmt.Sprintf("echo started && sleep %vs", defaultFlockTimeout.Seconds())
27 cmd := exec.Command(flockBin, "--nonblock", "--close", string(mode), lockfile, "-c", sleepcmd)
28+ stdout, err := cmd.StdoutPipe()
29+ c.Assert(err, gc.IsNil)
30 c.Assert(cmd.Start(), gc.IsNil)
31- return cmd.Process
32+ // Make sure the flock has been taken before returning by reading stdout waiting for "started"
33+ _, err = io.ReadFull(stdout, make([]byte, len("started")))
34+ c.Assert(err, gc.IsNil)
35+ s.AddCleanup(func(*gc.C) {
36+ cmd.Process.Kill()
37+ cmd.Process.Wait()
38+ })
39 }
40
41-const defaultFlockTimeout = 5 * time.Second
42-
43-func (s *storageSuite) TestSynchronisation(c *gc.C) {
44+func (s *storageSuite) TestCreateFailsIfFlockNotAvailable(c *gc.C) {
45 storageDir := c.MkDir()
46- proc := s.flock(c, flockShared, storageDir, defaultFlockTimeout)
47- defer proc.Wait()
48- defer proc.Kill()
49-
50+ s.flock(c, flockShared, storageDir)
51 // Creating storage requires an exclusive lock initially.
52 //
53 // flock exits with exit code 1 if it can't acquire the
54 // lock immediately in non-blocking mode (which the tests force).
55 _, err := NewSSHStorage("example.com", storageDir)
56 c.Assert(err, gc.ErrorMatches, "exit code 1")
57+}
58
59- proc.Kill()
60- proc.Wait()
61+func (s *storageSuite) TestWithSharedLocks(c *gc.C) {
62+ storageDir := c.MkDir()
63 stor, err := NewSSHStorage("example.com", storageDir)
64 c.Assert(err, gc.IsNil)
65
66@@ -279,7 +285,7 @@
67 data := []byte("abc\000def")
68 c.Assert(ioutil.WriteFile(filepath.Join(storageDir, contentdir, "a"), data, 0644), gc.IsNil)
69
70- proc = s.flock(c, flockShared, storageDir, defaultFlockTimeout)
71+ s.flock(c, flockShared, storageDir)
72 _, err = storage.Get(stor, "a")
73 c.Assert(err, gc.IsNil)
74 _, err = storage.List(stor, "")
75@@ -287,12 +293,15 @@
76 c.Assert(stor.Put("a", bytes.NewBuffer(nil), 0), gc.NotNil)
77 c.Assert(stor.Remove("a"), gc.NotNil)
78 c.Assert(stor.RemoveAll(), gc.NotNil)
79- proc.Kill()
80- proc.Wait()
81+}
82
83+func (s *storageSuite) TestWithExclusiveLocks(c *gc.C) {
84+ storageDir := c.MkDir()
85+ stor, err := NewSSHStorage("example.com", storageDir)
86+ c.Assert(err, gc.IsNil)
87 // None of the methods (apart from URL) should be able to do anything
88 // while an exclusive lock is held.
89- proc = s.flock(c, flockExclusive, storageDir, defaultFlockTimeout)
90+ s.flock(c, flockExclusive, storageDir)
91 _, err = stor.URL("a")
92 c.Assert(err, gc.IsNil)
93 c.Assert(stor.Put("a", bytes.NewBuffer(nil), 0), gc.NotNil)

Subscribers

People subscribed via source and target branches

to status/vote changes: