Merge lp:~axwalk/juju-core/lp1271504-ensureavailability-agentalive into lp:~go-bot/juju-core/trunk

Proposed by Andrew Wilkins
Status: Merged
Approved by: Andrew Wilkins
Approved revision: no longer in the source branch.
Merged at revision: 2624
Proposed branch: lp:~axwalk/juju-core/lp1271504-ensureavailability-agentalive
Merge into: lp:~go-bot/juju-core/trunk
Diff against target: 565 lines (+373/-87)
5 files modified
cmd/juju/ensureavailability_test.go (+13/-0)
state/addmachine.go (+154/-50)
state/apiserver/client/client_test.go (+24/-20)
state/export_test.go (+2/-0)
state/state_test.go (+180/-17)
To merge this branch: bzr merge lp:~axwalk/juju-core/lp1271504-ensureavailability-agentalive
Reviewer Review Type Date Requested Status
Juju Engineering Pending
Review via email: mp+215172@code.launchpad.net

Commit message

state: (demote)promote (un)available state servers

When EnsureAvailability is called, we now check the
availability of existing state servers. If a state server
is unavailable, then we either set WantsVote=false, or
remove it entirely if it doesn't want a vote.

If we need to create new state servers, we will first
promote existing, available, non-voting state servers
before creating new ones.

Availability is currently defined by the state.Machine's
agent presence. We may want to extend this later to also
consider the associated mongo's replicaset member health.

https://codereview.appspot.com/86490043/

Description of the change

state: (demote)promote (un)available state servers

When EnsureAvailability is called, we now check the
availability of existing state servers. If a state server
is unavailable, then we either set WantsVote=false, or
remove it entirely if it doesn't want a vote.

If we need to create new state servers, we will first
promote existing, available, non-voting state servers
before creating new ones.

Availability is currently defined by the state.Machine's
agent presence. We may want to extend this later to also
consider the associated mongo's replicaset member health.

https://codereview.appspot.com/86490043/

To post a comment you must log in.
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Reviewers: mp+215172_code.launchpad.net,

Message:
Please take a look.

Description:
state: (demote)promote (un)available state servers

When EnsureAvailability is called, we now check the
availability of existing state servers. If a state server
is unavailable, then we either set WantsVote=false, or
remove it entirely if it doesn't want a vote.

If we need to create new state servers, we will first
promote existing, available, non-voting state servers
before creating new ones.

Availability is currently defined by the state.Machine's
agent presence. We may want to extend this later to also
consider the associated mongo's replicaset member health.

https://code.launchpad.net/~axwalk/juju-core/lp1271504-ensureavailability-agentalive/+merge/215172

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/86490043/

Affected files (+272, -61 lines):
   A [revision details]
   M cmd/juju/ensureavailability_test.go
   M state/addmachine.go
   M state/apiserver/client/client_test.go
   M state/export_test.go
   M state/state_test.go

Revision history for this message
Roger Peppe (rogpeppe) wrote :

LGTM modulo the following.
Very nice.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go
File state/addmachine.go (right):

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode531
state/addmachine.go:531: mdocs := make([]*machineDoc,
numStateServers-len(info.VotingMachineIds))
nice clear logic here.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode555
state/addmachine.go:555: if err != nil {
If err==txn.ErrAborted, I think it's only going to be because someone
else has run EnsureAvailability concurrently. Perhaps we should check if
the voting server count has changed - if it has, return an error,
otherwise return nil because our work will already have been done.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode578
state/addmachine.go:578: var newVotingMachineIds, newMachineIds []string
rather than two variables, it might be clearer if we did:

var newInfo StateServerInfo

and then, e.g.
newInfo.MachineIds = append(newInfo.MachineIds, ...)

then it's more obvious when we're just building
up ids in the new info.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode589
state/addmachine.go:589: if !m.WantsVote() {
switch the sense of the condition to avoid the negative?

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode598
state/addmachine.go:598: // If the machine does want to vote, we simply
set novote and
Since the comments are inside the if statement, we can phrase them
unconditionally, which might read slightly more clearly.

// The machine wants to vote, so we simply set novote etc

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode605
state/addmachine.go:605: } else if !m.HasVote() {
perhaps invert the condition here to save negatives?

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode606
state/addmachine.go:606: // If the machine does not want to vote and
doesn't have a vote,
again, we can phrase it unconditionally.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode625
state/addmachine.go:625: Assert: bson.D{{"novote", bson.D{{"$ne",
false}}}},
I *think* this should be {{"$eq", true}}, because nil!=false
and legacy machines may have a nil novote field.

similarly below.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode655
state/addmachine.go:655: Update: bson.D{{"$pull", bson.D{{"jobs",
JobManageEnviron}}}},
perhaps set novote=false here too?

https://codereview.appspot.com/86490043/

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Please take a look.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go
File state/addmachine.go (right):

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode555
state/addmachine.go:555: if err != nil {
On 2014/04/11 10:09:17, rog wrote:
> If err==txn.ErrAborted, I think it's only going to be because someone
> else has run EnsureAvailability concurrently. Perhaps we should check
if the
> voting server count has changed - if it has, return an error,
otherwise return
> nil because our work will already have been done.

Done, I made it a transaction loop and added a few tests.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode578
state/addmachine.go:578: var newVotingMachineIds, newMachineIds []string
On 2014/04/11 10:09:17, rog wrote:
> rather than two variables, it might be clearer if we did:

> var newInfo StateServerInfo

> and then, e.g.
> newInfo.MachineIds = append(newInfo.MachineIds, ...)

> then it's more obvious when we're just building
> up ids in the new info.

Done.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode589
state/addmachine.go:589: if !m.WantsVote() {
On 2014/04/11 10:09:17, rog wrote:
> switch the sense of the condition to avoid the negative?

Done.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode598
state/addmachine.go:598: // If the machine does want to vote, we simply
set novote and
On 2014/04/11 10:09:17, rog wrote:
> Since the comments are inside the if statement, we can phrase them
> unconditionally, which might read slightly more clearly.

> // The machine wants to vote, so we simply set novote etc

Done.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode605
state/addmachine.go:605: } else if !m.HasVote() {
On 2014/04/11 10:09:17, rog wrote:
> perhaps invert the condition here to save negatives?

Done.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode606
state/addmachine.go:606: // If the machine does not want to vote and
doesn't have a vote,
On 2014/04/11 10:09:17, rog wrote:
> again, we can phrase it unconditionally.

Done.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode625
state/addmachine.go:625: Assert: bson.D{{"novote", bson.D{{"$ne",
false}}}},
On 2014/04/11 10:09:17, rog wrote:
> I *think* this should be {{"$eq", true}}, because nil!=false
> and legacy machines may have a nil novote field.

> similarly below.

Done.

https://codereview.appspot.com/86490043/diff/1/state/addmachine.go#newcode655
state/addmachine.go:655: Update: bson.D{{"$pull", bson.D{{"jobs",
JobManageEnviron}}}},
On 2014/04/11 10:09:17, rog wrote:
> perhaps set novote=false here too?

Done.

https://codereview.appspot.com/86490043/

Revision history for this message
Go Bot (go-bot) wrote :
Download full text (107.9 KiB)

The attempt to merge lp:~axwalk/juju-core/lp1271504-ensureavailability-agentalive into lp:juju-core failed. Below is the output from the failed tests.

ok launchpad.net/juju-core 0.014s
ok launchpad.net/juju-core/agent 1.205s
ok launchpad.net/juju-core/agent/mongo 0.919s
ok launchpad.net/juju-core/agent/tools 0.174s
ok launchpad.net/juju-core/bzr 5.064s
ok launchpad.net/juju-core/cert 2.691s
ok launchpad.net/juju-core/charm 0.350s
? launchpad.net/juju-core/charm/hooks [no test files]
? launchpad.net/juju-core/charm/testing [no test files]
ok launchpad.net/juju-core/cloudinit 0.032s
ok launchpad.net/juju-core/cloudinit/sshinit 0.782s
ok launchpad.net/juju-core/cmd 0.197s
ok launchpad.net/juju-core/cmd/charm-admin 0.759s
? launchpad.net/juju-core/cmd/charmd [no test files]
? launchpad.net/juju-core/cmd/charmload [no test files]
ok launchpad.net/juju-core/cmd/envcmd 0.211s
ok launchpad.net/juju-core/cmd/juju 226.333s
ok launchpad.net/juju-core/cmd/jujud 77.802s
ok launchpad.net/juju-core/cmd/plugins/juju-metadata 9.276s
? launchpad.net/juju-core/cmd/plugins/juju-restore [no test files]
ok launchpad.net/juju-core/cmd/plugins/local 0.224s
? launchpad.net/juju-core/cmd/plugins/local/juju-local [no test files]
ok launchpad.net/juju-core/constraints 0.029s
ok launchpad.net/juju-core/container 0.039s
ok launchpad.net/juju-core/container/factory 0.054s
ok launchpad.net/juju-core/container/kvm 0.174s
ok launchpad.net/juju-core/container/kvm/mock 0.042s
? launchpad.net/juju-core/container/kvm/testing [no test files]
ok launchpad.net/juju-core/container/lxc 4.261s
? launchpad.net/juju-core/container/lxc/mock [no test files]
? launchpad.net/juju-core/container/lxc/testing [no test files]
? launchpad.net/juju-core/container/testing [no test files]
ok launchpad.net/juju-core/downloader 5.264s
ok launchpad.net/juju-core/environs 2.706s
ok launchpad.net/juju-core/environs/bootstrap 11.064s
ok launchpad.net/juju-core/environs/cloudinit 0.520s
ok launchpad.net/juju-core/environs/config 1.741s
ok launchpad.net/juju-core/environs/configstore 0.028s
ok launchpad.net/juju-core/environs/filestorage 0.027s
ok launchpad.net/juju-core/environs/httpstorage 0.706s
ok launchpad.net/juju-core/environs/imagemetadata 0.421s
? launchpad.net/juju-core/environs/imagemetadata/testing [no test files]
ok launchpad.net/juju-core/environs/instances 0.044s
ok launchpad.net/juju-core/environs/jujutest 0.176s
ok launchpad.net/juju-core/environs/manual 11.771s
ok launchpad.net/juju-core/environs/simplestreams 0.257s
? launchpad.net/juju-core/environs/simplestreams/testing [no test files]
ok launchpad.net/juju-core/environs/sshstorage 0.902s
ok launchpad.net/juju-core/environs/storage 0.968s
ok launchpad.net/juju-core/environs/sync 49.180s
ok launchpad.net/juju-core/environs/testing 0.123s
ok launchpad.net/juju-core/environs/tools 4.880s
? launchpad.net/juju-core/environs/tools/testing [no test files]
ok launchpad.net/juju-core/errors 0.017s
ok launchpad.net/juju-core/instance 0.021s
? launchpad.net/juju-core/instance/testing [no test files]
ok launchpad.net/juju...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'cmd/juju/ensureavailability_test.go'
2--- cmd/juju/ensureavailability_test.go 2014-03-28 09:55:10 +0000
3+++ cmd/juju/ensureavailability_test.go 2014-04-14 03:42:21 +0000
4@@ -83,6 +83,19 @@
5 func (s *EnsureAvailabilitySuite) TestEnsureAvailabilityMultiple(c *gc.C) {
6 err := runEnsureAvailability(c, "-n", "1")
7 c.Assert(err, gc.IsNil)
8+
9+ // make sure machine-0 remains alive for the second call to
10+ // EnsureAvailability, or machine-0 will get bumped down to
11+ // non-voting.
12+ m0, err := s.BackingState.Machine("0")
13+ c.Assert(err, gc.IsNil)
14+ pinger, err := m0.SetAgentAlive()
15+ c.Assert(err, gc.IsNil)
16+ defer pinger.Kill()
17+ s.BackingState.StartSync()
18+ err = m0.WaitAgentAlive(testing.LongWait)
19+ c.Assert(err, gc.IsNil)
20+
21 err = runEnsureAvailability(c, "-n", "3", "--constraints", "mem=4G")
22 c.Assert(err, gc.IsNil)
23
24
25=== modified file 'state/addmachine.go'
26--- state/addmachine.go 2014-04-10 14:40:34 +0000
27+++ state/addmachine.go 2014-04-14 03:42:21 +0000
28@@ -497,14 +497,6 @@
29 // EnsureAvailability adds state server machines as necessary to make
30 // the number of live state servers equal to numStateServers. The given
31 // constraints and series will be attached to any new machines.
32-//
33-// TODO(rog):
34-// If any current state servers are down, they will be
35-// removed from the current set of voting replica set
36-// peers (although the machines themselves will remain
37-// and they will still remain part of the replica set).
38-// Once a machine's voting status has been removed,
39-// the machine itself may be removed.
40 func (st *State) EnsureAvailability(numStateServers int, cons constraints.Value, series string) error {
41 if numStateServers%2 != 1 || numStateServers <= 0 {
42 return fmt.Errorf("number of state servers must be odd and greater than zero")
43@@ -512,46 +504,158 @@
44 if numStateServers > replicaset.MaxPeers {
45 return fmt.Errorf("state server count is too large (allowed %d)", replicaset.MaxPeers)
46 }
47- info, err := st.StateServerInfo()
48- if err != nil {
49- return err
50- }
51- if len(info.VotingMachineIds) == numStateServers {
52- // TODO(rog) #1271504 2014-01-22
53- // Find machines which are down, set
54- // their NoVote flag and add new machines to
55- // replace them.
56- return nil
57- }
58- if len(info.VotingMachineIds) > numStateServers {
59- return fmt.Errorf("cannot reduce state server count")
60- }
61- mdocs := make([]*machineDoc, 0, numStateServers-len(info.MachineIds))
62- var ops []txn.Op
63- for i := len(info.MachineIds); i < numStateServers; i++ {
64- template := MachineTemplate{
65- Series: series,
66- Jobs: []MachineJob{
67- JobHostUnits,
68- JobManageEnviron,
69- },
70- Constraints: cons,
71- }
72- mdoc, addOps, err := st.addMachineOps(template)
73- if err != nil {
74- return err
75- }
76- mdocs = append(mdocs, mdoc)
77- ops = append(ops, addOps...)
78- }
79- ssOps, err := st.maintainStateServersOps(mdocs, info)
80- if err != nil {
81- return fmt.Errorf("cannot prepare machine add operations: %v", err)
82- }
83- ops = append(ops, ssOps...)
84- err = st.runTransaction(ops)
85- if err != nil {
86- return fmt.Errorf("failed to create new state server machines: %v", err)
87- }
88- return nil
89+ for i := 0; i < 5; i++ {
90+ currentInfo, err := st.StateServerInfo()
91+ if err != nil {
92+ return err
93+ }
94+ if len(currentInfo.VotingMachineIds) > numStateServers {
95+ return fmt.Errorf("cannot reduce state server count")
96+ }
97+ removeOps, info, promotableMachines, err := st.updateAvailableStateServersOps(currentInfo)
98+ if err != nil {
99+ return err
100+ }
101+ if len(info.VotingMachineIds) == numStateServers && len(removeOps) == 0 {
102+ return nil
103+ }
104+ // Promote existing machines first.
105+ var ops []txn.Op
106+ if n := numStateServers - len(info.VotingMachineIds); n < len(promotableMachines) {
107+ promotableMachines = promotableMachines[:n]
108+ }
109+ for _, m := range promotableMachines {
110+ ops = append(ops, promoteStateServerOps(m)...)
111+ info.VotingMachineIds = append(info.VotingMachineIds, m.Id())
112+ }
113+ // Create new machines to make up the shortfall.
114+ mdocs := make([]*machineDoc, numStateServers-len(info.VotingMachineIds))
115+ for i := range mdocs {
116+ template := MachineTemplate{
117+ Series: series,
118+ Jobs: []MachineJob{
119+ JobHostUnits,
120+ JobManageEnviron,
121+ },
122+ Constraints: cons,
123+ }
124+ mdoc, addOps, err := st.addMachineOps(template)
125+ if err != nil {
126+ return err
127+ }
128+ mdocs[i] = mdoc
129+ ops = append(ops, addOps...)
130+ }
131+ ssOps, err := st.maintainStateServersOps(mdocs, currentInfo)
132+ if err != nil {
133+ return fmt.Errorf("cannot prepare machine add operations: %v", err)
134+ }
135+ ops = append(ops, removeOps...)
136+ ops = append(ops, ssOps...)
137+ err = st.runTransaction(ops)
138+ if err == nil {
139+ return nil
140+ } else if err != txn.ErrAborted {
141+ return fmt.Errorf("failed to create new state server machines: %v", err)
142+ }
143+ // The transaction will only be aborted if another call to
144+ // EnsureAvailability completed. Loop back around and try again.
145+ }
146+ return ErrExcessiveContention
147+}
148+
149+// stateServerAvailable returns true if the specified state server machine is
150+// available.
151+var stateServerAvailable = func(m *Machine) (bool, error) {
152+ // TODO(axw) #1271504 2014-01-22
153+ // Check the state server's associated mongo health;
154+ // requires coordination with worker/peergrouper.
155+ return m.AgentAlive()
156+}
157+
158+// updateAvailableStateServersOps checks the availability of state servers,
159+// demoting unavailable, voting machines;
160+// removing unavailable, non-voting, non-vote-holding machines;
161+// gathering available, non-voting machines that may be promoted;
162+// updating StateServerInfo to reflect reality.
163+func (st *State) updateAvailableStateServersOps(info *StateServerInfo) (ops []txn.Op, newInfo *StateServerInfo, promotableMachines []*Machine, err error) {
164+ newInfo = &StateServerInfo{}
165+ for _, mid := range info.MachineIds {
166+ m, err := st.Machine(mid)
167+ if err != nil {
168+ return nil, nil, nil, err
169+ }
170+ available, err := stateServerAvailable(m)
171+ if err != nil {
172+ return nil, nil, nil, err
173+ }
174+ if available {
175+ if m.WantsVote() {
176+ newInfo.VotingMachineIds = append(newInfo.VotingMachineIds, mid)
177+ } else {
178+ promotableMachines = append(promotableMachines, m)
179+ }
180+ newInfo.MachineIds = append(newInfo.MachineIds, mid)
181+ continue
182+ }
183+ if m.WantsVote() {
184+ // The machine wants to vote, so we simply set novote and allow it
185+ // to run its course to have its vote removed by the worker that
186+ // maintains the replicaset. We will replace it with an existing
187+ // non-voting state server if there is one, starting a new one if
188+ // not.
189+ ops = append(ops, demoteStateServerOps(m)...)
190+ newInfo.MachineIds = append(newInfo.MachineIds, mid)
191+ } else if m.HasVote() {
192+ // The machine still has a vote, so keep it in MachineIds for now.
193+ newInfo.MachineIds = append(newInfo.MachineIds, mid)
194+ } else {
195+ // The machine neither wants to nor has a vote, so remove its
196+ // JobManageEnviron job immediately.
197+ ops = append(ops, removeStateServerOps(m)...)
198+ }
199+ }
200+ return ops, newInfo, promotableMachines, nil
201+}
202+
203+func promoteStateServerOps(m *Machine) []txn.Op {
204+ return []txn.Op{{
205+ C: m.st.machines.Name,
206+ Id: m.doc.Id,
207+ Assert: bson.D{{"novote", true}},
208+ Update: bson.D{{"$set", bson.D{{"novote", false}}}},
209+ }, {
210+ C: m.st.stateServers.Name,
211+ Id: environGlobalKey,
212+ Update: bson.D{{"$addToSet", bson.D{{"votingmachineids", m.doc.Id}}}},
213+ }}
214+}
215+
216+func demoteStateServerOps(m *Machine) []txn.Op {
217+ return []txn.Op{{
218+ C: m.st.machines.Name,
219+ Id: m.doc.Id,
220+ Assert: bson.D{{"novote", false}},
221+ Update: bson.D{{"$set", bson.D{{"novote", true}}}},
222+ }, {
223+ C: m.st.stateServers.Name,
224+ Id: environGlobalKey,
225+ Update: bson.D{{"$pull", bson.D{{"votingmachineids", m.doc.Id}}}},
226+ }}
227+}
228+
229+func removeStateServerOps(m *Machine) []txn.Op {
230+ return []txn.Op{{
231+ C: m.st.machines.Name,
232+ Id: m.doc.Id,
233+ Assert: bson.D{{"novote", true}, {"hasvote", false}},
234+ Update: bson.D{
235+ {"$pull", bson.D{{"jobs", JobManageEnviron}}},
236+ {"$set", bson.D{{"novote", false}}},
237+ },
238+ }, {
239+ C: m.st.stateServers.Name,
240+ Id: environGlobalKey,
241+ Update: bson.D{{"$pull", bson.D{{"machineids", m.doc.Id}}}},
242+ }}
243 }
244
245=== modified file 'state/apiserver/client/client_test.go'
246--- state/apiserver/client/client_test.go 2014-04-11 15:16:00 +0000
247+++ state/apiserver/client/client_test.go 2014-04-14 03:42:21 +0000
248@@ -29,6 +29,7 @@
249 "launchpad.net/juju-core/state/api/params"
250 "launchpad.net/juju-core/state/api/usermanager"
251 "launchpad.net/juju-core/state/apiserver/client"
252+ "launchpad.net/juju-core/state/presence"
253 "launchpad.net/juju-core/state/statecmd"
254 coretesting "launchpad.net/juju-core/testing"
255 "launchpad.net/juju-core/utils"
256@@ -2132,17 +2133,24 @@
257 c.Assert(data["transient"], gc.Equals, true)
258 }
259
260+func (s *clientSuite) setAgentAlive(c *gc.C, machineId string) *presence.Pinger {
261+ m, err := s.BackingState.Machine(machineId)
262+ c.Assert(err, gc.IsNil)
263+ pinger, err := m.SetAgentAlive()
264+ c.Assert(err, gc.IsNil)
265+ s.BackingState.StartSync()
266+ err = m.WaitAgentAlive(coretesting.LongWait)
267+ c.Assert(err, gc.IsNil)
268+ return pinger
269+}
270+
271 func (s *clientSuite) TestClientEnsureAvailabilitySeries(c *gc.C) {
272- apiParams := []params.EnsureAvailability{{
273- NumStateServers: 1,
274- }, {
275- NumStateServers: 3,
276- Series: "non-default",
277- }}
278- for _, p := range apiParams {
279- err := s.APIState.Client().EnsureAvailability(p.NumStateServers, p.Constraints, p.Series)
280- c.Assert(err, gc.IsNil)
281- }
282+ err := s.APIState.Client().EnsureAvailability(1, constraints.Value{}, "")
283+ c.Assert(err, gc.IsNil)
284+ pinger := s.setAgentAlive(c, "0")
285+ defer pinger.Kill()
286+ err = s.APIState.Client().EnsureAvailability(3, constraints.Value{}, "non-default")
287+ c.Assert(err, gc.IsNil)
288 machines, err := s.State.AllMachines()
289 c.Assert(err, gc.IsNil)
290 c.Assert(machines, gc.HasLen, 3)
291@@ -2152,16 +2160,12 @@
292 }
293
294 func (s *clientSuite) TestClientEnsureAvailabilityConstraints(c *gc.C) {
295- apiParams := []params.EnsureAvailability{{
296- NumStateServers: 1,
297- }, {
298- NumStateServers: 3,
299- Constraints: constraints.MustParse("mem=4G"),
300- }}
301- for _, p := range apiParams {
302- err := s.APIState.Client().EnsureAvailability(p.NumStateServers, p.Constraints, p.Series)
303- c.Assert(err, gc.IsNil)
304- }
305+ err := s.APIState.Client().EnsureAvailability(1, constraints.Value{}, "")
306+ c.Assert(err, gc.IsNil)
307+ pinger := s.setAgentAlive(c, "0")
308+ defer pinger.Kill()
309+ err = s.APIState.Client().EnsureAvailability(3, constraints.MustParse("mem=4G"), "")
310+ c.Assert(err, gc.IsNil)
311 machines, err := s.State.AllMachines()
312 c.Assert(err, gc.IsNil)
313 c.Assert(machines, gc.HasLen, 3)
314
315=== modified file 'state/export_test.go'
316--- state/export_test.go 2014-03-25 15:06:31 +0000
317+++ state/export_test.go 2014-04-14 03:42:21 +0000
318@@ -259,3 +259,5 @@
319 func CheckUserExists(st *State, name string) (bool, error) {
320 return st.checkUserExists(name)
321 }
322+
323+var StateServerAvailable = &stateServerAvailable
324
325=== modified file 'state/state_test.go'
326--- state/state_test.go 2014-04-10 14:40:34 +0000
327+++ state/state_test.go 2014-04-14 03:42:21 +0000
328@@ -6,7 +6,6 @@
329 import (
330 "fmt"
331 "net/url"
332- "sort"
333 "strconv"
334 "strings"
335 "time"
336@@ -1735,6 +1734,10 @@
337 VotingMachineIds: []string{"0"},
338 })
339
340+ s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) {
341+ return true, nil
342+ })
343+
344 err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
345 c.Assert(err, gc.IsNil)
346
347@@ -2889,6 +2892,11 @@
348 }
349
350 func (s *StateSuite) TestEnsureAvailabilityAddsNewMachines(c *gc.C) {
351+ // Don't use agent presence to decide on machine availability.
352+ s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) {
353+ return true, nil
354+ })
355+
356 ids := make([]string, 3)
357 m0, err := s.State.AddMachine("quantal", state.JobHostUnits, state.JobManageEnviron)
358 c.Assert(err, gc.IsNil)
359@@ -2898,12 +2906,7 @@
360 _, err = s.State.AddMachine("quantal", state.JobHostUnits)
361 c.Assert(err, gc.IsNil)
362
363- info, err := s.State.StateServerInfo()
364- c.Assert(err, gc.IsNil)
365- c.Assert(info, gc.DeepEquals, &state.StateServerInfo{
366- MachineIds: []string{m0.Id()},
367- VotingMachineIds: []string{m0.Id()},
368- })
369+ s.assertStateServerInfo(c, []string{"0"}, []string{"0"})
370
371 cons := constraints.Value{
372 Mem: newUint64(100),
373@@ -2924,22 +2927,182 @@
374 c.Assert(m.WantsVote(), jc.IsTrue)
375 ids[i] = m.Id()
376 }
377- sort.Strings(ids)
378-
379- info, err = s.State.StateServerInfo()
380- c.Assert(err, gc.IsNil)
381- sort.Strings(info.MachineIds)
382- sort.Strings(info.VotingMachineIds)
383- c.Assert(info, gc.DeepEquals, &state.StateServerInfo{
384- MachineIds: ids,
385- VotingMachineIds: ids,
386- })
387+ s.assertStateServerInfo(c, ids, ids)
388 }
389
390 func newUint64(i uint64) *uint64 {
391 return &i
392 }
393
394+func (s *StateSuite) assertStateServerInfo(c *gc.C, machineIds []string, votingMachineIds []string) {
395+ info, err := s.State.StateServerInfo()
396+ c.Assert(err, gc.IsNil)
397+ c.Assert(info.MachineIds, jc.SameContents, machineIds)
398+ c.Assert(info.VotingMachineIds, jc.SameContents, votingMachineIds)
399+}
400+
401+func (s *StateSuite) TestEnsureAvailabilityDemotesUnavailableMachines(c *gc.C) {
402+ err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
403+ c.Assert(err, gc.IsNil)
404+ s.assertStateServerInfo(c, []string{"0", "1", "2"}, []string{"0", "1", "2"})
405+ s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) {
406+ return m.Id() != "0", nil
407+ })
408+ err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
409+ c.Assert(err, gc.IsNil)
410+
411+ // New state server machine "3" is created; "0" still exists in MachineIds,
412+ // but no longer in VotingMachineIds.
413+ s.assertStateServerInfo(c, []string{"0", "1", "2", "3"}, []string{"1", "2", "3"})
414+ m0, err := s.State.Machine("0")
415+ c.Assert(err, gc.IsNil)
416+ c.Assert(m0.WantsVote(), jc.IsFalse)
417+ c.Assert(m0.IsManager(), jc.IsTrue) // job still intact for now
418+ m3, err := s.State.Machine("3")
419+ c.Assert(err, gc.IsNil)
420+ c.Assert(m3.WantsVote(), jc.IsTrue)
421+ c.Assert(m3.IsManager(), jc.IsTrue)
422+}
423+
424+func (s *StateSuite) TestEnsureAvailabilityPromotesAvailableMachines(c *gc.C) {
425+ err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
426+ c.Assert(err, gc.IsNil)
427+ s.assertStateServerInfo(c, []string{"0", "1", "2"}, []string{"0", "1", "2"})
428+ s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) {
429+ return m.Id() != "0", nil
430+ })
431+ err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
432+ c.Assert(err, gc.IsNil)
433+
434+ // New state server machine "3" is created; "0" still exists in MachineIds,
435+ // but no longer in VotingMachineIds.
436+ s.assertStateServerInfo(c, []string{"0", "1", "2", "3"}, []string{"1", "2", "3"})
437+ m0, err := s.State.Machine("0")
438+ c.Assert(err, gc.IsNil)
439+ c.Assert(m0.WantsVote(), jc.IsFalse)
440+
441+ // Mark machine 0 as having a vote, so it doesn't get removed, and make it
442+ // available once more.
443+ err = m0.SetHasVote(true)
444+ c.Assert(err, gc.IsNil)
445+ s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) {
446+ return true, nil
447+ })
448+ err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
449+ c.Assert(err, gc.IsNil)
450+ // No change; we've got as many voting machines as we need.
451+ s.assertStateServerInfo(c, []string{"0", "1", "2", "3"}, []string{"1", "2", "3"})
452+
453+ // Make machine 3 unavailable; machine 0 should be promoted, and two new
454+ // machines created.
455+ s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) {
456+ return m.Id() != "3", nil
457+ })
458+ err = s.State.EnsureAvailability(5, constraints.Value{}, "quantal")
459+ c.Assert(err, gc.IsNil)
460+ s.assertStateServerInfo(c, []string{"0", "1", "2", "3", "4", "5"}, []string{"0", "1", "2", "4", "5"})
461+ err = m0.Refresh()
462+ c.Assert(err, gc.IsNil)
463+ c.Assert(m0.WantsVote(), jc.IsTrue)
464+}
465+
466+func (s *StateSuite) TestEnsureAvailabilityRemovesUnavailableMachines(c *gc.C) {
467+ err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
468+ c.Assert(err, gc.IsNil)
469+ s.assertStateServerInfo(c, []string{"0", "1", "2"}, []string{"0", "1", "2"})
470+ s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) {
471+ return m.Id() != "0", nil
472+ })
473+ err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
474+ c.Assert(err, gc.IsNil)
475+ s.assertStateServerInfo(c, []string{"0", "1", "2", "3"}, []string{"1", "2", "3"})
476+ // machine 0 does not have a vote, so another call to EnsureAvailability
477+ // will remove machine 0's JobEnvironManager job.
478+ err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
479+ c.Assert(err, gc.IsNil)
480+ s.assertStateServerInfo(c, []string{"1", "2", "3"}, []string{"1", "2", "3"})
481+ m0, err := s.State.Machine("0")
482+ c.Assert(err, gc.IsNil)
483+ c.Assert(m0.IsManager(), jc.IsFalse)
484+}
485+
486+func (s *StateSuite) TestEnsureAvailabilityConcurrentSame(c *gc.C) {
487+ s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) {
488+ return true, nil
489+ })
490+
491+ defer state.SetBeforeHooks(c, s.State, func() {
492+ err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
493+ c.Assert(err, gc.IsNil)
494+ // The outer EnsureAvailability call will allocate IDs 0..2,
495+ // and the inner one 3..5.
496+ expected := []string{"3", "4", "5"}
497+ s.assertStateServerInfo(c, expected, expected)
498+ }).Check()
499+
500+ err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
501+ c.Assert(err, gc.IsNil)
502+ s.assertStateServerInfo(c, []string{"3", "4", "5"}, []string{"3", "4", "5"})
503+
504+ // Machine 0 should never have been created.
505+ _, err = s.State.Machine("0")
506+ c.Assert(err, jc.Satisfies, errors.IsNotFoundError)
507+}
508+
509+func (s *StateSuite) TestEnsureAvailabilityConcurrentLess(c *gc.C) {
510+ s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) {
511+ return true, nil
512+ })
513+
514+ defer state.SetBeforeHooks(c, s.State, func() {
515+ err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
516+ c.Assert(err, gc.IsNil)
517+ // The outer EnsureAvailability call will initially allocate IDs 0..4,
518+ // and the inner one 5..7.
519+ expected := []string{"5", "6", "7"}
520+ s.assertStateServerInfo(c, expected, expected)
521+ }).Check()
522+
523+ // This call to EnsureAvailability will initially attempt to allocate
524+ // machines 0..4, and fail due to the concurrent change. It will then
525+ // allocate machines 8..9 to make up the difference from the concurrent
526+ // EnsureAvailability call.
527+ err := s.State.EnsureAvailability(5, constraints.Value{}, "quantal")
528+ c.Assert(err, gc.IsNil)
529+ expected := []string{"5", "6", "7", "8", "9"}
530+ s.assertStateServerInfo(c, expected, expected)
531+
532+ // Machine 0 should never have been created.
533+ _, err = s.State.Machine("0")
534+ c.Assert(err, jc.Satisfies, errors.IsNotFoundError)
535+}
536+
537+func (s *StateSuite) TestEnsureAvailabilityConcurrentMore(c *gc.C) {
538+ s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) {
539+ return true, nil
540+ })
541+
542+ defer state.SetBeforeHooks(c, s.State, func() {
543+ err := s.State.EnsureAvailability(5, constraints.Value{}, "quantal")
544+ c.Assert(err, gc.IsNil)
545+ // The outer EnsureAvailability call will allocate IDs 0..2,
546+ // and the inner one 3..7.
547+ expected := []string{"3", "4", "5", "6", "7"}
548+ s.assertStateServerInfo(c, expected, expected)
549+ }).Check()
550+
551+ // This call to EnsureAvailability will initially attempt to allocate
552+ // machines 0..2, and fail due to the concurrent change. It will then
553+ // find that the number of voting machines in state is greater than
554+ // what we're attempting to ensure, and fail.
555+ err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal")
556+ c.Assert(err, gc.ErrorMatches, "cannot reduce state server count")
557+
558+ // Machine 0 should never have been created.
559+ _, err = s.State.Machine("0")
560+ c.Assert(err, jc.Satisfies, errors.IsNotFoundError)
561+}
562+
563 func (s *StateSuite) TestStateServingInfo(c *gc.C) {
564 info, err := s.State.StateServingInfo()
565 c.Assert(info, jc.DeepEquals, params.StateServingInfo{})

Subscribers

People subscribed via source and target branches

to status/vote changes: