Merge lp:~rogpeppe/juju-core/359-no-lax into lp:~go-bot/juju-core/trunk

Proposed by Roger Peppe
Status: Merged
Approved by: Roger Peppe
Approved revision: no longer in the source branch.
Merged at revision: 1625
Proposed branch: lp:~rogpeppe/juju-core/359-no-lax
Merge into: lp:~go-bot/juju-core/trunk
Diff against target: 216 lines (+19/-53)
5 files modified
state/machine_test.go (+4/-4)
state/service_test.go (+2/-2)
state/state_test.go (+3/-3)
state/testing/watcher.go (+8/-42)
state/unit_test.go (+2/-2)
To merge this branch: bzr merge lp:~rogpeppe/juju-core/359-no-lax
Reviewer Review Type Date Requested Status
Juju Engineering Pending
Review via email: mp+178357@code.launchpad.net

Commit message

state: hopefully make tests more reliable

We had a sporadic test failure in MachineSuite.TestWatchMachine:
the last AssertOneChange call received an extra event.

My analysis of the failure goes something like this.
There are two possible scenarios:

scenario 1:

 call Machine.Watch
 remove machine
 StartSync
 state/watcher polls, finds that object has disappeared
 EntityWatcher asks to watch object
 EntityWatcher sends initial value on channel
 state/watcher produces no value, because there's no object.
 test passes

scenario 2:

 call Machine.Watch
 remove machine
 StartSync
 EntityWatcher asks to watch object
 EntityWatcher sends initial value on channel
 state/watcher produces a value for the object
 state/watcher polls, finds that object has disappeared
 EntityWatcher receives watcher change and sends to Changes
 test reads unexpected value
 test fails

The problem is that when StartSync returns we have no idea
whether the state/watcher code has started to poll for new
changes or not.

By using Sync instead of StartSync, we ensure that at least
the state/watcher code is in a known state before continuing
to assert whether the watchers built on top of it it will or won't
send events.

The only reason for StartSync's existence, AFAIR, is to avoid
a possible deadlock situation where the state/watcher code
is trying to send on a watcher channel but the test code is
not ready to receive on that channel.

However, in all the places that use NotifyWatcherC and StringsWatcherC,
the state/watcher code is actually sending to a separate goroutine
which is always ready to receive any change. So it's safe
to use Sync rather than StartSync.

I do not understand the "hence cannot verify real-world coalescence behaviour"
comment in the NewLax* functions. If there is coalescence to be verified,
using StartSync rather than Sync is a poor way of verifying it, as
the only allowance that StartSync makes is a few nanoseconds and one or two context
switches - that's not much opportunity for coalescence to happen,
and all tests pass with this CL, which leads me to suspect smoke
and mirrors.

If anyone sees a MachineSuite.TestWatchMachine sporadic failure
with this CL applied, I'd very much like to know, as it means my
analysis is flawed :-).

FWIW, another possibility would be to make StartSync return when
it's certain that the watcher will start a poll before doing anything else.
This would involve a certain amount of restructuring of the
state/watcher code.

https://codereview.appspot.com/12352044/

Description of the change

state: hopefully make tests more reliable

We had a sporadic test failure in MachineSuite.TestWatchMachine:
the last AssertOneChange call received an extra event.

My analysis of the failure goes something like this.
There are two possible scenarios:

scenario 1:

 call Machine.Watch
 remove machine
 StartSync
 state/watcher polls, finds that object has disappeared
 EntityWatcher asks to watch object
 EntityWatcher sends initial value on channel
 state/watcher produces no value, because there's no object.
 test passes

scenario 2:

 call Machine.Watch
 remove machine
 StartSync
 EntityWatcher asks to watch object
 EntityWatcher sends initial value on channel
 state/watcher produces a value for the object
 state/watcher polls, finds that object has disappeared
 EntityWatcher receives watcher change and sends to Changes
 test reads unexpected value
 test fails

The problem is that when StartSync returns we have no idea
whether the state/watcher code has started to poll for new
changes or not.

By using Sync instead of StartSync, we ensure that at least
the state/watcher code is in a known state before continuing
to assert whether the watchers built on top of it it will or won't
send events.

The only reason for StartSync's existence, AFAIR, is to avoid
a possible deadlock situation where the state/watcher code
is trying to send on a watcher channel but the test code is
not ready to receive on that channel.

However, in all the places that use NotifyWatcherC and StringsWatcherC,
the state/watcher code is actually sending to a separate goroutine
which is always ready to receive any change. So it's safe
to use Sync rather than StartSync.

I do not understand the "hence cannot verify real-world coalescence behaviour"
comment in the NewLax* functions. If there is coalescence to be verified,
using StartSync rather than Sync is a poor way of verifying it, as
the only allowance that StartSync makes is a few nanoseconds and one or two context
switches - that's not much opportunity for coalescence to happen,
and all tests pass with this CL, which leads me to suspect smoke
and mirrors.

If anyone sees a MachineSuite.TestWatchMachine sporadic failure
with this CL applied, I'd very much like to know, as it means my
analysis is flawed :-).

FWIW, another possibility would be to make StartSync return when
it's certain that the watcher will start a poll before doing anything else.
This would involve a certain amount of restructuring of the
state/watcher code.

https://codereview.appspot.com/12352044/

To post a comment you must log in.
Revision history for this message
Roger Peppe (rogpeppe) wrote :
Download full text (9.8 KiB)

Reviewers: mp+178357_code.launchpad.net,

Message:
Please take a look.

Description:
state: hopefully make tests more reliable

We had a sporadic test failure in MachineSuite.TestWatchMachine:
the last AssertOneChange call received an extra event.

My analysis of the failure goes something like this.
There are two possible scenarios:

scenario 1:

 call Machine.Watch
 remove machine
 StartSync
 state/watcher polls, finds that object has disappeared
 EntityWatcher asks to watch object
 EntityWatcher sends initial value on channel
 state/watcher produces no value, because there's no object.
 test passes

scenario 2:

 call Machine.Watch
 remove machine
 StartSync
 EntityWatcher asks to watch object
 EntityWatcher sends initial value on channel
 state/watcher produces a value for the object
 state/watcher polls, finds that object has disappeared
 EntityWatcher receives watcher change and sends to Changes
 test reads unexpected value
 test fails

The problem is that when StartSync returns we have no idea
whether the state/watcher code has started to poll for new
changes or not.

By using Sync instead of StartSync, we ensure that at least
the state/watcher code is in a known state before continuing
to assert whether the watchers built on top of it it will or won't
send events.

The only reason for StartSync's existence, AFAIR, is to avoid
a possible deadlock situation where the state/watcher code
is trying to send on a watcher channel but the test code is
not ready to receive on that channel.

However, in all the places that use NotifyWatcherC and StringsWatcherC,
the state/watcher code is actually sending to a separate goroutine
which is always ready to receive any change. So it's safe
to use Sync rather than StartSync.

I do not understand the "hence cannot verify real-world coalescence
behaviour"
comment in the NewLax* functions. If there is coalescence to be
verified,
using StartSync rather than Sync is a poor way of verifying it, as
the only allowance that StartSync makes is a few nanoseconds and one or
two context
switches - that's not much opportunity for coalescence to happen,
and all tests pass with this CL, which leads me to suspect smoke
and mirrors.

If anyone sees a MachineSuite.TestWatchMachine sporadic failure
with this CL applied, I'd very much like to know, as it means my
analysis is flawed :-).

FWIW, another possibility would be to make StartSync return when
it's certain that the watcher will start a poll before doing anything
else.
This would involve a certain amount of restructuring of the
state/watcher code.

https://code.launchpad.net/~rogpeppe/juju-core/359-no-lax/+merge/178357

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/12352044/

Affected files:
   A [revision details]
   M state/machine_test.go
   M state/service_test.go
   M state/state_test.go
   M state/testing/watcher.go
   M state/unit_test.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20130802141811-3fnqb3k4j8yz9ea8
+New revision: roger.peppe@canonical...

Revision history for this message
Dimiter Naydenov (dimitern) wrote :
Revision history for this message
Martin Packman (gz) wrote :

LGTM. If I understand this proposal correctly, it should also fix the
bug I just filed due to a failure on the bot, even though that test was
not touched here:

<https://bugs.launchpad.net/juju-core/+bug/1209344>

https://codereview.appspot.com/12352044/

Revision history for this message
Andrew Wilkins (axwalk) wrote :

On 2013/08/02 18:37:40, rog wrote:
> Please take a look.

Unnecessary now, but I had a closer look this morning and LGTM.

https://codereview.appspot.com/12352044/

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'state/machine_test.go'
2--- state/machine_test.go 2013-07-31 14:19:29 +0000
3+++ state/machine_test.go 2013-08-02 18:36:32 +0000
4@@ -598,7 +598,7 @@
5 // Start a watch on an empty machine; check no units reported.
6 w := s.machine.WatchPrincipalUnits()
7 defer testing.AssertStop(c, w)
8- wc := testing.NewLaxStringsWatcherC(c, s.State, w)
9+ wc := testing.NewStringsWatcherC(c, s.State, w)
10 wc.AssertChange()
11 wc.AssertNoChange()
12
13@@ -668,7 +668,7 @@
14 // Start a fresh watcher; check both principals reported.
15 w = s.machine.WatchPrincipalUnits()
16 defer testing.AssertStop(c, w)
17- wc = testing.NewLaxStringsWatcherC(c, s.State, w)
18+ wc = testing.NewStringsWatcherC(c, s.State, w)
19 wc.AssertChange("mysql/0", "mysql/1")
20 wc.AssertNoChange()
21
22@@ -693,7 +693,7 @@
23 // Start a watch on an empty machine; check no units reported.
24 w := s.machine.WatchUnits()
25 defer testing.AssertStop(c, w)
26- wc := testing.NewLaxStringsWatcherC(c, s.State, w)
27+ wc := testing.NewStringsWatcherC(c, s.State, w)
28 wc.AssertChange()
29 wc.AssertNoChange()
30
31@@ -763,7 +763,7 @@
32 // Start a fresh watcher; check all units reported.
33 w = s.machine.WatchUnits()
34 defer testing.AssertStop(c, w)
35- wc = testing.NewLaxStringsWatcherC(c, s.State, w)
36+ wc = testing.NewStringsWatcherC(c, s.State, w)
37 wc.AssertChange("mysql/0", "mysql/1", "logging/0")
38 wc.AssertNoChange()
39
40
41=== modified file 'state/service_test.go'
42--- state/service_test.go 2013-07-18 16:46:59 +0000
43+++ state/service_test.go 2013-08-02 18:36:32 +0000
44@@ -1292,7 +1292,7 @@
45 // TODO(fwereade) split this test up a bit.
46 w := s.mysql.WatchRelations()
47 defer testing.AssertStop(c, w)
48- wc := testing.StringsWatcherC{c, s.State, w, false}
49+ wc := testing.NewStringsWatcherC(c, s.State, w)
50 wc.AssertChange()
51 wc.AssertNoChange()
52
53@@ -1335,7 +1335,7 @@
54 rel2 := addRelation()
55 w = s.mysql.WatchRelations()
56 defer testing.AssertStop(c, w)
57- wc = testing.StringsWatcherC{c, s.State, w, false}
58+ wc = testing.NewStringsWatcherC(c, s.State, w)
59 wc.AssertChange(rel1.String(), rel2.String())
60 wc.AssertNoChange()
61
62
63=== modified file 'state/state_test.go'
64--- state/state_test.go 2013-08-01 15:41:09 +0000
65+++ state/state_test.go 2013-08-02 18:36:32 +0000
66@@ -1722,7 +1722,7 @@
67 // Check initial event.
68 w := s.State.WatchCleanups()
69 defer statetesting.AssertStop(c, w)
70- wc := statetesting.NewLaxNotifyWatcherC(c, s.State, w)
71+ wc := statetesting.NewNotifyWatcherC(c, s.State, w)
72 wc.AssertOneChange()
73
74 // Set up two relations for later use, check no events.
75@@ -1768,7 +1768,7 @@
76 // Check initial event.
77 w := s.State.WatchCleanups()
78 defer statetesting.AssertStop(c, w)
79- wc := statetesting.NewLaxNotifyWatcherC(c, s.State, w)
80+ wc := statetesting.NewNotifyWatcherC(c, s.State, w)
81 wc.AssertOneChange()
82
83 // Create two peer relations by creating their services.
84@@ -1799,7 +1799,7 @@
85 // Check initial event.
86 w := s.State.WatchMinUnits()
87 defer statetesting.AssertStop(c, w)
88- wc := statetesting.NewLaxStringsWatcherC(c, s.State, w)
89+ wc := statetesting.NewStringsWatcherC(c, s.State, w)
90 wc.AssertChange()
91 wc.AssertNoChange()
92
93
94=== modified file 'state/testing/watcher.go'
95--- state/testing/watcher.go 2013-07-22 10:29:03 +0000
96+++ state/testing/watcher.go 2013-08-02 18:36:32 +0000
97@@ -51,9 +51,8 @@
98 // the behaviour of any watcher that uses a <-chan struct{}.
99 type NotifyWatcherC struct {
100 *C
101- State *state.State
102- Watcher NotifyWatcher
103- FullSync bool
104+ State *state.State
105+ Watcher NotifyWatcher
106 }
107
108 // NewNotifyWatcherC returns a NotifyWatcherC that checks for aggressive
109@@ -66,20 +65,8 @@
110 }
111 }
112
113-// NewLaxNotifyWatcherC returns a NotifyWatcherC that runs a full watcher
114-// sync before reading from the watcher's Changes channel, and hence cannot
115-// verify real-world coalescence behaviour.
116-func NewLaxNotifyWatcherC(c *C, st *state.State, w NotifyWatcher) NotifyWatcherC {
117- return NotifyWatcherC{
118- C: c,
119- State: st,
120- Watcher: w,
121- FullSync: true,
122- }
123-}
124-
125 func (c NotifyWatcherC) AssertNoChange() {
126- c.State.StartSync()
127+ c.State.Sync()
128 select {
129 case _, ok := <-c.Watcher.Changes():
130 c.Fatalf("watcher sent unexpected change: (_, %v)", ok)
131@@ -88,11 +75,7 @@
132 }
133
134 func (c NotifyWatcherC) AssertOneChange() {
135- if c.FullSync {
136- c.State.Sync()
137- } else {
138- c.State.StartSync()
139- }
140+ c.State.Sync()
141 select {
142 case _, ok := <-c.Watcher.Changes():
143 c.Assert(ok, Equals, true)
144@@ -115,9 +98,8 @@
145 // the behaviour of any watcher that uses a <-chan []string.
146 type StringsWatcherC struct {
147 *C
148- State *state.State
149- Watcher StringsWatcher
150- FullSync bool
151+ State *state.State
152+ Watcher StringsWatcher
153 }
154
155 // NewStringsWatcherC returns a StringsWatcherC that checks for aggressive
156@@ -130,25 +112,13 @@
157 }
158 }
159
160-// NewLaxStringsWatcherC returns a StringsWatcherC that runs a full watcher
161-// sync before reading from the watcher's Changes channel, and hence cannot
162-// verify real-world coalescence behaviour.
163-func NewLaxStringsWatcherC(c *C, st *state.State, w StringsWatcher) StringsWatcherC {
164- return StringsWatcherC{
165- C: c,
166- State: st,
167- Watcher: w,
168- FullSync: true,
169- }
170-}
171-
172 type StringsWatcher interface {
173 Stop() error
174 Changes() <-chan []string
175 }
176
177 func (c StringsWatcherC) AssertNoChange() {
178- c.State.StartSync()
179+ c.State.Sync()
180 select {
181 case actual, ok := <-c.Watcher.Changes():
182 c.Fatalf("watcher sent unexpected change: (%v, %v)", actual, ok)
183@@ -159,11 +129,7 @@
184 // AssertChange asserts the given list of changes was reported by
185 // the watcher, but does not assume there are no following changes.
186 func (c StringsWatcherC) AssertChange(expect ...string) {
187- if c.FullSync {
188- c.State.Sync()
189- } else {
190- c.State.StartSync()
191- }
192+ c.State.Sync()
193 timeout := time.After(testing.LongWait)
194 var actual []string
195 loop:
196
197=== modified file 'state/unit_test.go'
198--- state/unit_test.go 2013-07-31 14:19:29 +0000
199+++ state/unit_test.go 2013-08-02 18:36:32 +0000
200@@ -953,7 +953,7 @@
201 func (s *UnitSuite) TestWatchSubordinates(c *C) {
202 w := s.unit.WatchSubordinateUnits()
203 defer testing.AssertStop(c, w)
204- wc := testing.NewLaxStringsWatcherC(c, s.State, w)
205+ wc := testing.NewStringsWatcherC(c, s.State, w)
206 wc.AssertChange()
207 wc.AssertNoChange()
208
209@@ -1006,7 +1006,7 @@
210 // Start a new watch, check Dead unit is reported.
211 w = s.unit.WatchSubordinateUnits()
212 defer testing.AssertStop(c, w)
213- wc = testing.NewLaxStringsWatcherC(c, s.State, w)
214+ wc = testing.NewStringsWatcherC(c, s.State, w)
215 wc.AssertChange(subUnits[0].Name())
216 wc.AssertNoChange()
217

Subscribers

People subscribed via source and target branches

to status/vote changes: