Merge lp:~axwalk/juju-core/lp1271504-ensureavailability-agentalive into lp:~go-bot/juju-core/trunk
- lp1271504-ensureavailability-agentalive
- Merge into trunk
Status: | Merged |
---|---|
Approved by: | Andrew Wilkins |
Approved revision: | no longer in the source branch. |
Merged at revision: | 2624 |
Proposed branch: | lp:~axwalk/juju-core/lp1271504-ensureavailability-agentalive |
Merge into: | lp:~go-bot/juju-core/trunk |
Diff against target: |
565 lines (+373/-87) 5 files modified
cmd/juju/ensureavailability_test.go (+13/-0) state/addmachine.go (+154/-50) state/apiserver/client/client_test.go (+24/-20) state/export_test.go (+2/-0) state/state_test.go (+180/-17) |
To merge this branch: | bzr merge lp:~axwalk/juju-core/lp1271504-ensureavailability-agentalive |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Juju Engineering | Pending | ||
Review via email: mp+215172@code.launchpad.net |
Commit message
state: (demote)promote (un)available state servers
When EnsureAvailability is called, we now check the
availability of existing state servers. If a state server
is unavailable, then we either set WantsVote=false, or
remove it entirely if it doesn't want a vote.
If we need to create new state servers, we will first
promote existing, available, non-voting state servers
before creating new ones.
Availability is currently defined by the state.Machine's
agent presence. We may want to extend this later to also
consider the associated mongo's replicaset member health.
Description of the change
state: (demote)promote (un)available state servers
When EnsureAvailability is called, we now check the
availability of existing state servers. If a state server
is unavailable, then we either set WantsVote=false, or
remove it entirely if it doesn't want a vote.
If we need to create new state servers, we will first
promote existing, available, non-voting state servers
before creating new ones.
Availability is currently defined by the state.Machine's
agent presence. We may want to extend this later to also
consider the associated mongo's replicaset member health.
Andrew Wilkins (axwalk) wrote : | # |
Roger Peppe (rogpeppe) wrote : | # |
LGTM modulo the following.
Very nice.
https:/
File state/addmachine.go (right):
https:/
state/addmachin
numStateServers
nice clear logic here.
https:/
state/addmachin
If err==txn.
else has run EnsureAvailability concurrently. Perhaps we should check if
the voting server count has changed - if it has, return an error,
otherwise return nil because our work will already have been done.
https:/
state/addmachin
rather than two variables, it might be clearer if we did:
var newInfo StateServerInfo
and then, e.g.
newInfo.MachineIds = append(
then it's more obvious when we're just building
up ids in the new info.
https:/
state/addmachin
switch the sense of the condition to avoid the negative?
https:/
state/addmachin
set novote and
Since the comments are inside the if statement, we can phrase them
unconditionally, which might read slightly more clearly.
// The machine wants to vote, so we simply set novote etc
https:/
state/addmachin
perhaps invert the condition here to save negatives?
https:/
state/addmachin
doesn't have a vote,
again, we can phrase it unconditionally.
https:/
state/addmachin
false}}}},
I *think* this should be {{"$eq", true}}, because nil!=false
and legacy machines may have a nil novote field.
similarly below.
https:/
state/addmachin
JobManageEnviro
perhaps set novote=false here too?
Andrew Wilkins (axwalk) wrote : | # |
Please take a look.
https:/
File state/addmachine.go (right):
https:/
state/addmachin
On 2014/04/11 10:09:17, rog wrote:
> If err==txn.
> else has run EnsureAvailability concurrently. Perhaps we should check
if the
> voting server count has changed - if it has, return an error,
otherwise return
> nil because our work will already have been done.
Done, I made it a transaction loop and added a few tests.
https:/
state/addmachin
On 2014/04/11 10:09:17, rog wrote:
> rather than two variables, it might be clearer if we did:
> var newInfo StateServerInfo
> and then, e.g.
> newInfo.MachineIds = append(
> then it's more obvious when we're just building
> up ids in the new info.
Done.
https:/
state/addmachin
On 2014/04/11 10:09:17, rog wrote:
> switch the sense of the condition to avoid the negative?
Done.
https:/
state/addmachin
set novote and
On 2014/04/11 10:09:17, rog wrote:
> Since the comments are inside the if statement, we can phrase them
> unconditionally, which might read slightly more clearly.
> // The machine wants to vote, so we simply set novote etc
Done.
https:/
state/addmachin
On 2014/04/11 10:09:17, rog wrote:
> perhaps invert the condition here to save negatives?
Done.
https:/
state/addmachin
doesn't have a vote,
On 2014/04/11 10:09:17, rog wrote:
> again, we can phrase it unconditionally.
Done.
https:/
state/addmachin
false}}}},
On 2014/04/11 10:09:17, rog wrote:
> I *think* this should be {{"$eq", true}}, because nil!=false
> and legacy machines may have a nil novote field.
> similarly below.
Done.
https:/
state/addmachin
JobManageEnviro
On 2014/04/11 10:09:17, rog wrote:
> perhaps set novote=false here too?
Done.
Go Bot (go-bot) wrote : | # |
The attempt to merge lp:~axwalk/juju-core/lp1271504-ensureavailability-agentalive into lp:juju-core failed. Below is the output from the failed tests.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
? launchpad.
? launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
? launchpad.
? launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
? launchpad.
ok launchpad.
? launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
? launchpad.
ok launchpad.
? launchpad.
? launchpad.
? launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
? launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
? launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
ok launchpad.
? launchpad.
ok launchpad.
ok launchpad.
? launchpad.
ok launchpad.
Preview Diff
1 | === modified file 'cmd/juju/ensureavailability_test.go' | |||
2 | --- cmd/juju/ensureavailability_test.go 2014-03-28 09:55:10 +0000 | |||
3 | +++ cmd/juju/ensureavailability_test.go 2014-04-14 03:42:21 +0000 | |||
4 | @@ -83,6 +83,19 @@ | |||
5 | 83 | func (s *EnsureAvailabilitySuite) TestEnsureAvailabilityMultiple(c *gc.C) { | 83 | func (s *EnsureAvailabilitySuite) TestEnsureAvailabilityMultiple(c *gc.C) { |
6 | 84 | err := runEnsureAvailability(c, "-n", "1") | 84 | err := runEnsureAvailability(c, "-n", "1") |
7 | 85 | c.Assert(err, gc.IsNil) | 85 | c.Assert(err, gc.IsNil) |
8 | 86 | |||
9 | 87 | // make sure machine-0 remains alive for the second call to | ||
10 | 88 | // EnsureAvailability, or machine-0 will get bumped down to | ||
11 | 89 | // non-voting. | ||
12 | 90 | m0, err := s.BackingState.Machine("0") | ||
13 | 91 | c.Assert(err, gc.IsNil) | ||
14 | 92 | pinger, err := m0.SetAgentAlive() | ||
15 | 93 | c.Assert(err, gc.IsNil) | ||
16 | 94 | defer pinger.Kill() | ||
17 | 95 | s.BackingState.StartSync() | ||
18 | 96 | err = m0.WaitAgentAlive(testing.LongWait) | ||
19 | 97 | c.Assert(err, gc.IsNil) | ||
20 | 98 | |||
21 | 86 | err = runEnsureAvailability(c, "-n", "3", "--constraints", "mem=4G") | 99 | err = runEnsureAvailability(c, "-n", "3", "--constraints", "mem=4G") |
22 | 87 | c.Assert(err, gc.IsNil) | 100 | c.Assert(err, gc.IsNil) |
23 | 88 | 101 | ||
24 | 89 | 102 | ||
25 | === modified file 'state/addmachine.go' | |||
26 | --- state/addmachine.go 2014-04-10 14:40:34 +0000 | |||
27 | +++ state/addmachine.go 2014-04-14 03:42:21 +0000 | |||
28 | @@ -497,14 +497,6 @@ | |||
29 | 497 | // EnsureAvailability adds state server machines as necessary to make | 497 | // EnsureAvailability adds state server machines as necessary to make |
30 | 498 | // the number of live state servers equal to numStateServers. The given | 498 | // the number of live state servers equal to numStateServers. The given |
31 | 499 | // constraints and series will be attached to any new machines. | 499 | // constraints and series will be attached to any new machines. |
32 | 500 | // | ||
33 | 501 | // TODO(rog): | ||
34 | 502 | // If any current state servers are down, they will be | ||
35 | 503 | // removed from the current set of voting replica set | ||
36 | 504 | // peers (although the machines themselves will remain | ||
37 | 505 | // and they will still remain part of the replica set). | ||
38 | 506 | // Once a machine's voting status has been removed, | ||
39 | 507 | // the machine itself may be removed. | ||
40 | 508 | func (st *State) EnsureAvailability(numStateServers int, cons constraints.Value, series string) error { | 500 | func (st *State) EnsureAvailability(numStateServers int, cons constraints.Value, series string) error { |
41 | 509 | if numStateServers%2 != 1 || numStateServers <= 0 { | 501 | if numStateServers%2 != 1 || numStateServers <= 0 { |
42 | 510 | return fmt.Errorf("number of state servers must be odd and greater than zero") | 502 | return fmt.Errorf("number of state servers must be odd and greater than zero") |
43 | @@ -512,46 +504,158 @@ | |||
44 | 512 | if numStateServers > replicaset.MaxPeers { | 504 | if numStateServers > replicaset.MaxPeers { |
45 | 513 | return fmt.Errorf("state server count is too large (allowed %d)", replicaset.MaxPeers) | 505 | return fmt.Errorf("state server count is too large (allowed %d)", replicaset.MaxPeers) |
46 | 514 | } | 506 | } |
89 | 515 | info, err := st.StateServerInfo() | 507 | for i := 0; i < 5; i++ { |
90 | 516 | if err != nil { | 508 | currentInfo, err := st.StateServerInfo() |
91 | 517 | return err | 509 | if err != nil { |
92 | 518 | } | 510 | return err |
93 | 519 | if len(info.VotingMachineIds) == numStateServers { | 511 | } |
94 | 520 | // TODO(rog) #1271504 2014-01-22 | 512 | if len(currentInfo.VotingMachineIds) > numStateServers { |
95 | 521 | // Find machines which are down, set | 513 | return fmt.Errorf("cannot reduce state server count") |
96 | 522 | // their NoVote flag and add new machines to | 514 | } |
97 | 523 | // replace them. | 515 | removeOps, info, promotableMachines, err := st.updateAvailableStateServersOps(currentInfo) |
98 | 524 | return nil | 516 | if err != nil { |
99 | 525 | } | 517 | return err |
100 | 526 | if len(info.VotingMachineIds) > numStateServers { | 518 | } |
101 | 527 | return fmt.Errorf("cannot reduce state server count") | 519 | if len(info.VotingMachineIds) == numStateServers && len(removeOps) == 0 { |
102 | 528 | } | 520 | return nil |
103 | 529 | mdocs := make([]*machineDoc, 0, numStateServers-len(info.MachineIds)) | 521 | } |
104 | 530 | var ops []txn.Op | 522 | // Promote existing machines first. |
105 | 531 | for i := len(info.MachineIds); i < numStateServers; i++ { | 523 | var ops []txn.Op |
106 | 532 | template := MachineTemplate{ | 524 | if n := numStateServers - len(info.VotingMachineIds); n < len(promotableMachines) { |
107 | 533 | Series: series, | 525 | promotableMachines = promotableMachines[:n] |
108 | 534 | Jobs: []MachineJob{ | 526 | } |
109 | 535 | JobHostUnits, | 527 | for _, m := range promotableMachines { |
110 | 536 | JobManageEnviron, | 528 | ops = append(ops, promoteStateServerOps(m)...) |
111 | 537 | }, | 529 | info.VotingMachineIds = append(info.VotingMachineIds, m.Id()) |
112 | 538 | Constraints: cons, | 530 | } |
113 | 539 | } | 531 | // Create new machines to make up the shortfall. |
114 | 540 | mdoc, addOps, err := st.addMachineOps(template) | 532 | mdocs := make([]*machineDoc, numStateServers-len(info.VotingMachineIds)) |
115 | 541 | if err != nil { | 533 | for i := range mdocs { |
116 | 542 | return err | 534 | template := MachineTemplate{ |
117 | 543 | } | 535 | Series: series, |
118 | 544 | mdocs = append(mdocs, mdoc) | 536 | Jobs: []MachineJob{ |
119 | 545 | ops = append(ops, addOps...) | 537 | JobHostUnits, |
120 | 546 | } | 538 | JobManageEnviron, |
121 | 547 | ssOps, err := st.maintainStateServersOps(mdocs, info) | 539 | }, |
122 | 548 | if err != nil { | 540 | Constraints: cons, |
123 | 549 | return fmt.Errorf("cannot prepare machine add operations: %v", err) | 541 | } |
124 | 550 | } | 542 | mdoc, addOps, err := st.addMachineOps(template) |
125 | 551 | ops = append(ops, ssOps...) | 543 | if err != nil { |
126 | 552 | err = st.runTransaction(ops) | 544 | return err |
127 | 553 | if err != nil { | 545 | } |
128 | 554 | return fmt.Errorf("failed to create new state server machines: %v", err) | 546 | mdocs[i] = mdoc |
129 | 555 | } | 547 | ops = append(ops, addOps...) |
130 | 556 | return nil | 548 | } |
131 | 549 | ssOps, err := st.maintainStateServersOps(mdocs, currentInfo) | ||
132 | 550 | if err != nil { | ||
133 | 551 | return fmt.Errorf("cannot prepare machine add operations: %v", err) | ||
134 | 552 | } | ||
135 | 553 | ops = append(ops, removeOps...) | ||
136 | 554 | ops = append(ops, ssOps...) | ||
137 | 555 | err = st.runTransaction(ops) | ||
138 | 556 | if err == nil { | ||
139 | 557 | return nil | ||
140 | 558 | } else if err != txn.ErrAborted { | ||
141 | 559 | return fmt.Errorf("failed to create new state server machines: %v", err) | ||
142 | 560 | } | ||
143 | 561 | // The transaction will only be aborted if another call to | ||
144 | 562 | // EnsureAvailability completed. Loop back around and try again. | ||
145 | 563 | } | ||
146 | 564 | return ErrExcessiveContention | ||
147 | 565 | } | ||
148 | 566 | |||
149 | 567 | // stateServerAvailable returns true if the specified state server machine is | ||
150 | 568 | // available. | ||
151 | 569 | var stateServerAvailable = func(m *Machine) (bool, error) { | ||
152 | 570 | // TODO(axw) #1271504 2014-01-22 | ||
153 | 571 | // Check the state server's associated mongo health; | ||
154 | 572 | // requires coordination with worker/peergrouper. | ||
155 | 573 | return m.AgentAlive() | ||
156 | 574 | } | ||
157 | 575 | |||
158 | 576 | // updateAvailableStateServersOps checks the availability of state servers, | ||
159 | 577 | // demoting unavailable, voting machines; | ||
160 | 578 | // removing unavailable, non-voting, non-vote-holding machines; | ||
161 | 579 | // gathering available, non-voting machines that may be promoted; | ||
162 | 580 | // updating StateServerInfo to reflect reality. | ||
163 | 581 | func (st *State) updateAvailableStateServersOps(info *StateServerInfo) (ops []txn.Op, newInfo *StateServerInfo, promotableMachines []*Machine, err error) { | ||
164 | 582 | newInfo = &StateServerInfo{} | ||
165 | 583 | for _, mid := range info.MachineIds { | ||
166 | 584 | m, err := st.Machine(mid) | ||
167 | 585 | if err != nil { | ||
168 | 586 | return nil, nil, nil, err | ||
169 | 587 | } | ||
170 | 588 | available, err := stateServerAvailable(m) | ||
171 | 589 | if err != nil { | ||
172 | 590 | return nil, nil, nil, err | ||
173 | 591 | } | ||
174 | 592 | if available { | ||
175 | 593 | if m.WantsVote() { | ||
176 | 594 | newInfo.VotingMachineIds = append(newInfo.VotingMachineIds, mid) | ||
177 | 595 | } else { | ||
178 | 596 | promotableMachines = append(promotableMachines, m) | ||
179 | 597 | } | ||
180 | 598 | newInfo.MachineIds = append(newInfo.MachineIds, mid) | ||
181 | 599 | continue | ||
182 | 600 | } | ||
183 | 601 | if m.WantsVote() { | ||
184 | 602 | // The machine wants to vote, so we simply set novote and allow it | ||
185 | 603 | // to run its course to have its vote removed by the worker that | ||
186 | 604 | // maintains the replicaset. We will replace it with an existing | ||
187 | 605 | // non-voting state server if there is one, starting a new one if | ||
188 | 606 | // not. | ||
189 | 607 | ops = append(ops, demoteStateServerOps(m)...) | ||
190 | 608 | newInfo.MachineIds = append(newInfo.MachineIds, mid) | ||
191 | 609 | } else if m.HasVote() { | ||
192 | 610 | // The machine still has a vote, so keep it in MachineIds for now. | ||
193 | 611 | newInfo.MachineIds = append(newInfo.MachineIds, mid) | ||
194 | 612 | } else { | ||
195 | 613 | // The machine neither wants to nor has a vote, so remove its | ||
196 | 614 | // JobManageEnviron job immediately. | ||
197 | 615 | ops = append(ops, removeStateServerOps(m)...) | ||
198 | 616 | } | ||
199 | 617 | } | ||
200 | 618 | return ops, newInfo, promotableMachines, nil | ||
201 | 619 | } | ||
202 | 620 | |||
203 | 621 | func promoteStateServerOps(m *Machine) []txn.Op { | ||
204 | 622 | return []txn.Op{{ | ||
205 | 623 | C: m.st.machines.Name, | ||
206 | 624 | Id: m.doc.Id, | ||
207 | 625 | Assert: bson.D{{"novote", true}}, | ||
208 | 626 | Update: bson.D{{"$set", bson.D{{"novote", false}}}}, | ||
209 | 627 | }, { | ||
210 | 628 | C: m.st.stateServers.Name, | ||
211 | 629 | Id: environGlobalKey, | ||
212 | 630 | Update: bson.D{{"$addToSet", bson.D{{"votingmachineids", m.doc.Id}}}}, | ||
213 | 631 | }} | ||
214 | 632 | } | ||
215 | 633 | |||
216 | 634 | func demoteStateServerOps(m *Machine) []txn.Op { | ||
217 | 635 | return []txn.Op{{ | ||
218 | 636 | C: m.st.machines.Name, | ||
219 | 637 | Id: m.doc.Id, | ||
220 | 638 | Assert: bson.D{{"novote", false}}, | ||
221 | 639 | Update: bson.D{{"$set", bson.D{{"novote", true}}}}, | ||
222 | 640 | }, { | ||
223 | 641 | C: m.st.stateServers.Name, | ||
224 | 642 | Id: environGlobalKey, | ||
225 | 643 | Update: bson.D{{"$pull", bson.D{{"votingmachineids", m.doc.Id}}}}, | ||
226 | 644 | }} | ||
227 | 645 | } | ||
228 | 646 | |||
229 | 647 | func removeStateServerOps(m *Machine) []txn.Op { | ||
230 | 648 | return []txn.Op{{ | ||
231 | 649 | C: m.st.machines.Name, | ||
232 | 650 | Id: m.doc.Id, | ||
233 | 651 | Assert: bson.D{{"novote", true}, {"hasvote", false}}, | ||
234 | 652 | Update: bson.D{ | ||
235 | 653 | {"$pull", bson.D{{"jobs", JobManageEnviron}}}, | ||
236 | 654 | {"$set", bson.D{{"novote", false}}}, | ||
237 | 655 | }, | ||
238 | 656 | }, { | ||
239 | 657 | C: m.st.stateServers.Name, | ||
240 | 658 | Id: environGlobalKey, | ||
241 | 659 | Update: bson.D{{"$pull", bson.D{{"machineids", m.doc.Id}}}}, | ||
242 | 660 | }} | ||
243 | 557 | } | 661 | } |
244 | 558 | 662 | ||
245 | === modified file 'state/apiserver/client/client_test.go' | |||
246 | --- state/apiserver/client/client_test.go 2014-04-11 15:16:00 +0000 | |||
247 | +++ state/apiserver/client/client_test.go 2014-04-14 03:42:21 +0000 | |||
248 | @@ -29,6 +29,7 @@ | |||
249 | 29 | "launchpad.net/juju-core/state/api/params" | 29 | "launchpad.net/juju-core/state/api/params" |
250 | 30 | "launchpad.net/juju-core/state/api/usermanager" | 30 | "launchpad.net/juju-core/state/api/usermanager" |
251 | 31 | "launchpad.net/juju-core/state/apiserver/client" | 31 | "launchpad.net/juju-core/state/apiserver/client" |
252 | 32 | "launchpad.net/juju-core/state/presence" | ||
253 | 32 | "launchpad.net/juju-core/state/statecmd" | 33 | "launchpad.net/juju-core/state/statecmd" |
254 | 33 | coretesting "launchpad.net/juju-core/testing" | 34 | coretesting "launchpad.net/juju-core/testing" |
255 | 34 | "launchpad.net/juju-core/utils" | 35 | "launchpad.net/juju-core/utils" |
256 | @@ -2132,17 +2133,24 @@ | |||
257 | 2132 | c.Assert(data["transient"], gc.Equals, true) | 2133 | c.Assert(data["transient"], gc.Equals, true) |
258 | 2133 | } | 2134 | } |
259 | 2134 | 2135 | ||
260 | 2136 | func (s *clientSuite) setAgentAlive(c *gc.C, machineId string) *presence.Pinger { | ||
261 | 2137 | m, err := s.BackingState.Machine(machineId) | ||
262 | 2138 | c.Assert(err, gc.IsNil) | ||
263 | 2139 | pinger, err := m.SetAgentAlive() | ||
264 | 2140 | c.Assert(err, gc.IsNil) | ||
265 | 2141 | s.BackingState.StartSync() | ||
266 | 2142 | err = m.WaitAgentAlive(coretesting.LongWait) | ||
267 | 2143 | c.Assert(err, gc.IsNil) | ||
268 | 2144 | return pinger | ||
269 | 2145 | } | ||
270 | 2146 | |||
271 | 2135 | func (s *clientSuite) TestClientEnsureAvailabilitySeries(c *gc.C) { | 2147 | func (s *clientSuite) TestClientEnsureAvailabilitySeries(c *gc.C) { |
282 | 2136 | apiParams := []params.EnsureAvailability{{ | 2148 | err := s.APIState.Client().EnsureAvailability(1, constraints.Value{}, "") |
283 | 2137 | NumStateServers: 1, | 2149 | c.Assert(err, gc.IsNil) |
284 | 2138 | }, { | 2150 | pinger := s.setAgentAlive(c, "0") |
285 | 2139 | NumStateServers: 3, | 2151 | defer pinger.Kill() |
286 | 2140 | Series: "non-default", | 2152 | err = s.APIState.Client().EnsureAvailability(3, constraints.Value{}, "non-default") |
287 | 2141 | }} | 2153 | c.Assert(err, gc.IsNil) |
278 | 2142 | for _, p := range apiParams { | ||
279 | 2143 | err := s.APIState.Client().EnsureAvailability(p.NumStateServers, p.Constraints, p.Series) | ||
280 | 2144 | c.Assert(err, gc.IsNil) | ||
281 | 2145 | } | ||
288 | 2146 | machines, err := s.State.AllMachines() | 2154 | machines, err := s.State.AllMachines() |
289 | 2147 | c.Assert(err, gc.IsNil) | 2155 | c.Assert(err, gc.IsNil) |
290 | 2148 | c.Assert(machines, gc.HasLen, 3) | 2156 | c.Assert(machines, gc.HasLen, 3) |
291 | @@ -2152,16 +2160,12 @@ | |||
292 | 2152 | } | 2160 | } |
293 | 2153 | 2161 | ||
294 | 2154 | func (s *clientSuite) TestClientEnsureAvailabilityConstraints(c *gc.C) { | 2162 | func (s *clientSuite) TestClientEnsureAvailabilityConstraints(c *gc.C) { |
305 | 2155 | apiParams := []params.EnsureAvailability{{ | 2163 | err := s.APIState.Client().EnsureAvailability(1, constraints.Value{}, "") |
306 | 2156 | NumStateServers: 1, | 2164 | c.Assert(err, gc.IsNil) |
307 | 2157 | }, { | 2165 | pinger := s.setAgentAlive(c, "0") |
308 | 2158 | NumStateServers: 3, | 2166 | defer pinger.Kill() |
309 | 2159 | Constraints: constraints.MustParse("mem=4G"), | 2167 | err = s.APIState.Client().EnsureAvailability(3, constraints.MustParse("mem=4G"), "") |
310 | 2160 | }} | 2168 | c.Assert(err, gc.IsNil) |
301 | 2161 | for _, p := range apiParams { | ||
302 | 2162 | err := s.APIState.Client().EnsureAvailability(p.NumStateServers, p.Constraints, p.Series) | ||
303 | 2163 | c.Assert(err, gc.IsNil) | ||
304 | 2164 | } | ||
311 | 2165 | machines, err := s.State.AllMachines() | 2169 | machines, err := s.State.AllMachines() |
312 | 2166 | c.Assert(err, gc.IsNil) | 2170 | c.Assert(err, gc.IsNil) |
313 | 2167 | c.Assert(machines, gc.HasLen, 3) | 2171 | c.Assert(machines, gc.HasLen, 3) |
314 | 2168 | 2172 | ||
315 | === modified file 'state/export_test.go' | |||
316 | --- state/export_test.go 2014-03-25 15:06:31 +0000 | |||
317 | +++ state/export_test.go 2014-04-14 03:42:21 +0000 | |||
318 | @@ -259,3 +259,5 @@ | |||
319 | 259 | func CheckUserExists(st *State, name string) (bool, error) { | 259 | func CheckUserExists(st *State, name string) (bool, error) { |
320 | 260 | return st.checkUserExists(name) | 260 | return st.checkUserExists(name) |
321 | 261 | } | 261 | } |
322 | 262 | |||
323 | 263 | var StateServerAvailable = &stateServerAvailable | ||
324 | 262 | 264 | ||
325 | === modified file 'state/state_test.go' | |||
326 | --- state/state_test.go 2014-04-10 14:40:34 +0000 | |||
327 | +++ state/state_test.go 2014-04-14 03:42:21 +0000 | |||
328 | @@ -6,7 +6,6 @@ | |||
329 | 6 | import ( | 6 | import ( |
330 | 7 | "fmt" | 7 | "fmt" |
331 | 8 | "net/url" | 8 | "net/url" |
332 | 9 | "sort" | ||
333 | 10 | "strconv" | 9 | "strconv" |
334 | 11 | "strings" | 10 | "strings" |
335 | 12 | "time" | 11 | "time" |
336 | @@ -1735,6 +1734,10 @@ | |||
337 | 1735 | VotingMachineIds: []string{"0"}, | 1734 | VotingMachineIds: []string{"0"}, |
338 | 1736 | }) | 1735 | }) |
339 | 1737 | 1736 | ||
340 | 1737 | s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) { | ||
341 | 1738 | return true, nil | ||
342 | 1739 | }) | ||
343 | 1740 | |||
344 | 1738 | err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | 1741 | err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal") |
345 | 1739 | c.Assert(err, gc.IsNil) | 1742 | c.Assert(err, gc.IsNil) |
346 | 1740 | 1743 | ||
347 | @@ -2889,6 +2892,11 @@ | |||
348 | 2889 | } | 2892 | } |
349 | 2890 | 2893 | ||
350 | 2891 | func (s *StateSuite) TestEnsureAvailabilityAddsNewMachines(c *gc.C) { | 2894 | func (s *StateSuite) TestEnsureAvailabilityAddsNewMachines(c *gc.C) { |
351 | 2895 | // Don't use agent presence to decide on machine availability. | ||
352 | 2896 | s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) { | ||
353 | 2897 | return true, nil | ||
354 | 2898 | }) | ||
355 | 2899 | |||
356 | 2892 | ids := make([]string, 3) | 2900 | ids := make([]string, 3) |
357 | 2893 | m0, err := s.State.AddMachine("quantal", state.JobHostUnits, state.JobManageEnviron) | 2901 | m0, err := s.State.AddMachine("quantal", state.JobHostUnits, state.JobManageEnviron) |
358 | 2894 | c.Assert(err, gc.IsNil) | 2902 | c.Assert(err, gc.IsNil) |
359 | @@ -2898,12 +2906,7 @@ | |||
360 | 2898 | _, err = s.State.AddMachine("quantal", state.JobHostUnits) | 2906 | _, err = s.State.AddMachine("quantal", state.JobHostUnits) |
361 | 2899 | c.Assert(err, gc.IsNil) | 2907 | c.Assert(err, gc.IsNil) |
362 | 2900 | 2908 | ||
369 | 2901 | info, err := s.State.StateServerInfo() | 2909 | s.assertStateServerInfo(c, []string{"0"}, []string{"0"}) |
364 | 2902 | c.Assert(err, gc.IsNil) | ||
365 | 2903 | c.Assert(info, gc.DeepEquals, &state.StateServerInfo{ | ||
366 | 2904 | MachineIds: []string{m0.Id()}, | ||
367 | 2905 | VotingMachineIds: []string{m0.Id()}, | ||
368 | 2906 | }) | ||
370 | 2907 | 2910 | ||
371 | 2908 | cons := constraints.Value{ | 2911 | cons := constraints.Value{ |
372 | 2909 | Mem: newUint64(100), | 2912 | Mem: newUint64(100), |
373 | @@ -2924,22 +2927,182 @@ | |||
374 | 2924 | c.Assert(m.WantsVote(), jc.IsTrue) | 2927 | c.Assert(m.WantsVote(), jc.IsTrue) |
375 | 2925 | ids[i] = m.Id() | 2928 | ids[i] = m.Id() |
376 | 2926 | } | 2929 | } |
387 | 2927 | sort.Strings(ids) | 2930 | s.assertStateServerInfo(c, ids, ids) |
378 | 2928 | |||
379 | 2929 | info, err = s.State.StateServerInfo() | ||
380 | 2930 | c.Assert(err, gc.IsNil) | ||
381 | 2931 | sort.Strings(info.MachineIds) | ||
382 | 2932 | sort.Strings(info.VotingMachineIds) | ||
383 | 2933 | c.Assert(info, gc.DeepEquals, &state.StateServerInfo{ | ||
384 | 2934 | MachineIds: ids, | ||
385 | 2935 | VotingMachineIds: ids, | ||
386 | 2936 | }) | ||
388 | 2937 | } | 2931 | } |
389 | 2938 | 2932 | ||
390 | 2939 | func newUint64(i uint64) *uint64 { | 2933 | func newUint64(i uint64) *uint64 { |
391 | 2940 | return &i | 2934 | return &i |
392 | 2941 | } | 2935 | } |
393 | 2942 | 2936 | ||
394 | 2937 | func (s *StateSuite) assertStateServerInfo(c *gc.C, machineIds []string, votingMachineIds []string) { | ||
395 | 2938 | info, err := s.State.StateServerInfo() | ||
396 | 2939 | c.Assert(err, gc.IsNil) | ||
397 | 2940 | c.Assert(info.MachineIds, jc.SameContents, machineIds) | ||
398 | 2941 | c.Assert(info.VotingMachineIds, jc.SameContents, votingMachineIds) | ||
399 | 2942 | } | ||
400 | 2943 | |||
401 | 2944 | func (s *StateSuite) TestEnsureAvailabilityDemotesUnavailableMachines(c *gc.C) { | ||
402 | 2945 | err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
403 | 2946 | c.Assert(err, gc.IsNil) | ||
404 | 2947 | s.assertStateServerInfo(c, []string{"0", "1", "2"}, []string{"0", "1", "2"}) | ||
405 | 2948 | s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) { | ||
406 | 2949 | return m.Id() != "0", nil | ||
407 | 2950 | }) | ||
408 | 2951 | err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
409 | 2952 | c.Assert(err, gc.IsNil) | ||
410 | 2953 | |||
411 | 2954 | // New state server machine "3" is created; "0" still exists in MachineIds, | ||
412 | 2955 | // but no longer in VotingMachineIds. | ||
413 | 2956 | s.assertStateServerInfo(c, []string{"0", "1", "2", "3"}, []string{"1", "2", "3"}) | ||
414 | 2957 | m0, err := s.State.Machine("0") | ||
415 | 2958 | c.Assert(err, gc.IsNil) | ||
416 | 2959 | c.Assert(m0.WantsVote(), jc.IsFalse) | ||
417 | 2960 | c.Assert(m0.IsManager(), jc.IsTrue) // job still intact for now | ||
418 | 2961 | m3, err := s.State.Machine("3") | ||
419 | 2962 | c.Assert(err, gc.IsNil) | ||
420 | 2963 | c.Assert(m3.WantsVote(), jc.IsTrue) | ||
421 | 2964 | c.Assert(m3.IsManager(), jc.IsTrue) | ||
422 | 2965 | } | ||
423 | 2966 | |||
424 | 2967 | func (s *StateSuite) TestEnsureAvailabilityPromotesAvailableMachines(c *gc.C) { | ||
425 | 2968 | err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
426 | 2969 | c.Assert(err, gc.IsNil) | ||
427 | 2970 | s.assertStateServerInfo(c, []string{"0", "1", "2"}, []string{"0", "1", "2"}) | ||
428 | 2971 | s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) { | ||
429 | 2972 | return m.Id() != "0", nil | ||
430 | 2973 | }) | ||
431 | 2974 | err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
432 | 2975 | c.Assert(err, gc.IsNil) | ||
433 | 2976 | |||
434 | 2977 | // New state server machine "3" is created; "0" still exists in MachineIds, | ||
435 | 2978 | // but no longer in VotingMachineIds. | ||
436 | 2979 | s.assertStateServerInfo(c, []string{"0", "1", "2", "3"}, []string{"1", "2", "3"}) | ||
437 | 2980 | m0, err := s.State.Machine("0") | ||
438 | 2981 | c.Assert(err, gc.IsNil) | ||
439 | 2982 | c.Assert(m0.WantsVote(), jc.IsFalse) | ||
440 | 2983 | |||
441 | 2984 | // Mark machine 0 as having a vote, so it doesn't get removed, and make it | ||
442 | 2985 | // available once more. | ||
443 | 2986 | err = m0.SetHasVote(true) | ||
444 | 2987 | c.Assert(err, gc.IsNil) | ||
445 | 2988 | s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) { | ||
446 | 2989 | return true, nil | ||
447 | 2990 | }) | ||
448 | 2991 | err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
449 | 2992 | c.Assert(err, gc.IsNil) | ||
450 | 2993 | // No change; we've got as many voting machines as we need. | ||
451 | 2994 | s.assertStateServerInfo(c, []string{"0", "1", "2", "3"}, []string{"1", "2", "3"}) | ||
452 | 2995 | |||
453 | 2996 | // Make machine 3 unavailable; machine 0 should be promoted, and two new | ||
454 | 2997 | // machines created. | ||
455 | 2998 | s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) { | ||
456 | 2999 | return m.Id() != "3", nil | ||
457 | 3000 | }) | ||
458 | 3001 | err = s.State.EnsureAvailability(5, constraints.Value{}, "quantal") | ||
459 | 3002 | c.Assert(err, gc.IsNil) | ||
460 | 3003 | s.assertStateServerInfo(c, []string{"0", "1", "2", "3", "4", "5"}, []string{"0", "1", "2", "4", "5"}) | ||
461 | 3004 | err = m0.Refresh() | ||
462 | 3005 | c.Assert(err, gc.IsNil) | ||
463 | 3006 | c.Assert(m0.WantsVote(), jc.IsTrue) | ||
464 | 3007 | } | ||
465 | 3008 | |||
466 | 3009 | func (s *StateSuite) TestEnsureAvailabilityRemovesUnavailableMachines(c *gc.C) { | ||
467 | 3010 | err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
468 | 3011 | c.Assert(err, gc.IsNil) | ||
469 | 3012 | s.assertStateServerInfo(c, []string{"0", "1", "2"}, []string{"0", "1", "2"}) | ||
470 | 3013 | s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) { | ||
471 | 3014 | return m.Id() != "0", nil | ||
472 | 3015 | }) | ||
473 | 3016 | err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
474 | 3017 | c.Assert(err, gc.IsNil) | ||
475 | 3018 | s.assertStateServerInfo(c, []string{"0", "1", "2", "3"}, []string{"1", "2", "3"}) | ||
476 | 3019 | // machine 0 does not have a vote, so another call to EnsureAvailability | ||
477 | 3020 | // will remove machine 0's JobEnvironManager job. | ||
478 | 3021 | err = s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
479 | 3022 | c.Assert(err, gc.IsNil) | ||
480 | 3023 | s.assertStateServerInfo(c, []string{"1", "2", "3"}, []string{"1", "2", "3"}) | ||
481 | 3024 | m0, err := s.State.Machine("0") | ||
482 | 3025 | c.Assert(err, gc.IsNil) | ||
483 | 3026 | c.Assert(m0.IsManager(), jc.IsFalse) | ||
484 | 3027 | } | ||
485 | 3028 | |||
486 | 3029 | func (s *StateSuite) TestEnsureAvailabilityConcurrentSame(c *gc.C) { | ||
487 | 3030 | s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) { | ||
488 | 3031 | return true, nil | ||
489 | 3032 | }) | ||
490 | 3033 | |||
491 | 3034 | defer state.SetBeforeHooks(c, s.State, func() { | ||
492 | 3035 | err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
493 | 3036 | c.Assert(err, gc.IsNil) | ||
494 | 3037 | // The outer EnsureAvailability call will allocate IDs 0..2, | ||
495 | 3038 | // and the inner one 3..5. | ||
496 | 3039 | expected := []string{"3", "4", "5"} | ||
497 | 3040 | s.assertStateServerInfo(c, expected, expected) | ||
498 | 3041 | }).Check() | ||
499 | 3042 | |||
500 | 3043 | err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
501 | 3044 | c.Assert(err, gc.IsNil) | ||
502 | 3045 | s.assertStateServerInfo(c, []string{"3", "4", "5"}, []string{"3", "4", "5"}) | ||
503 | 3046 | |||
504 | 3047 | // Machine 0 should never have been created. | ||
505 | 3048 | _, err = s.State.Machine("0") | ||
506 | 3049 | c.Assert(err, jc.Satisfies, errors.IsNotFoundError) | ||
507 | 3050 | } | ||
508 | 3051 | |||
509 | 3052 | func (s *StateSuite) TestEnsureAvailabilityConcurrentLess(c *gc.C) { | ||
510 | 3053 | s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) { | ||
511 | 3054 | return true, nil | ||
512 | 3055 | }) | ||
513 | 3056 | |||
514 | 3057 | defer state.SetBeforeHooks(c, s.State, func() { | ||
515 | 3058 | err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
516 | 3059 | c.Assert(err, gc.IsNil) | ||
517 | 3060 | // The outer EnsureAvailability call will initially allocate IDs 0..4, | ||
518 | 3061 | // and the inner one 5..7. | ||
519 | 3062 | expected := []string{"5", "6", "7"} | ||
520 | 3063 | s.assertStateServerInfo(c, expected, expected) | ||
521 | 3064 | }).Check() | ||
522 | 3065 | |||
523 | 3066 | // This call to EnsureAvailability will initially attempt to allocate | ||
524 | 3067 | // machines 0..4, and fail due to the concurrent change. It will then | ||
525 | 3068 | // allocate machines 8..9 to make up the difference from the concurrent | ||
526 | 3069 | // EnsureAvailability call. | ||
527 | 3070 | err := s.State.EnsureAvailability(5, constraints.Value{}, "quantal") | ||
528 | 3071 | c.Assert(err, gc.IsNil) | ||
529 | 3072 | expected := []string{"5", "6", "7", "8", "9"} | ||
530 | 3073 | s.assertStateServerInfo(c, expected, expected) | ||
531 | 3074 | |||
532 | 3075 | // Machine 0 should never have been created. | ||
533 | 3076 | _, err = s.State.Machine("0") | ||
534 | 3077 | c.Assert(err, jc.Satisfies, errors.IsNotFoundError) | ||
535 | 3078 | } | ||
536 | 3079 | |||
537 | 3080 | func (s *StateSuite) TestEnsureAvailabilityConcurrentMore(c *gc.C) { | ||
538 | 3081 | s.PatchValue(state.StateServerAvailable, func(m *state.Machine) (bool, error) { | ||
539 | 3082 | return true, nil | ||
540 | 3083 | }) | ||
541 | 3084 | |||
542 | 3085 | defer state.SetBeforeHooks(c, s.State, func() { | ||
543 | 3086 | err := s.State.EnsureAvailability(5, constraints.Value{}, "quantal") | ||
544 | 3087 | c.Assert(err, gc.IsNil) | ||
545 | 3088 | // The outer EnsureAvailability call will allocate IDs 0..2, | ||
546 | 3089 | // and the inner one 3..7. | ||
547 | 3090 | expected := []string{"3", "4", "5", "6", "7"} | ||
548 | 3091 | s.assertStateServerInfo(c, expected, expected) | ||
549 | 3092 | }).Check() | ||
550 | 3093 | |||
551 | 3094 | // This call to EnsureAvailability will initially attempt to allocate | ||
552 | 3095 | // machines 0..2, and fail due to the concurrent change. It will then | ||
553 | 3096 | // find that the number of voting machines in state is greater than | ||
554 | 3097 | // what we're attempting to ensure, and fail. | ||
555 | 3098 | err := s.State.EnsureAvailability(3, constraints.Value{}, "quantal") | ||
556 | 3099 | c.Assert(err, gc.ErrorMatches, "cannot reduce state server count") | ||
557 | 3100 | |||
558 | 3101 | // Machine 0 should never have been created. | ||
559 | 3102 | _, err = s.State.Machine("0") | ||
560 | 3103 | c.Assert(err, jc.Satisfies, errors.IsNotFoundError) | ||
561 | 3104 | } | ||
562 | 3105 | |||
563 | 2943 | func (s *StateSuite) TestStateServingInfo(c *gc.C) { | 3106 | func (s *StateSuite) TestStateServingInfo(c *gc.C) { |
564 | 2944 | info, err := s.State.StateServingInfo() | 3107 | info, err := s.State.StateServingInfo() |
565 | 2945 | c.Assert(info, jc.DeepEquals, params.StateServingInfo{}) | 3108 | c.Assert(info, jc.DeepEquals, params.StateServingInfo{}) |
Reviewers: mp+215172_ code.launchpad. net,
Message:
Please take a look.
Description:
state: (demote)promote (un)available state servers
When EnsureAvailability is called, we now check the
availability of existing state servers. If a state server
is unavailable, then we either set WantsVote=false, or
remove it entirely if it doesn't want a vote.
If we need to create new state servers, we will first
promote existing, available, non-voting state servers
before creating new ones.
Availability is currently defined by the state.Machine's
agent presence. We may want to extend this later to also
consider the associated mongo's replicaset member health.
https:/ /code.launchpad .net/~axwalk/ juju-core/ lp1271504- ensureavailabil ity-agentalive/ +merge/ 215172
(do not edit description out of merge proposal)
Please review this at https:/ /codereview. appspot. com/86490043/
Affected files (+272, -61 lines): ensureavailabil ity_test. go /client/ client_ test.go test.go
A [revision details]
M cmd/juju/
M state/addmachine.go
M state/apiserver
M state/export_
M state/state_test.go