Merge lp:~axwalk/juju-core/initiate-replicaset-retry into lp:~go-bot/juju-core/trunk

Proposed by Andrew Wilkins
Status: Merged
Approved by: Andrew Wilkins
Approved revision: no longer in the source branch.
Merged at revision: 2810
Proposed branch: lp:~axwalk/juju-core/initiate-replicaset-retry
Merge into: lp:~go-bot/juju-core/trunk
Diff against target: 109 lines (+44/-23)
2 files modified
replicaset/replicaset.go (+6/-3)
worker/peergrouper/initiate.go (+38/-20)
To merge this branch: bzr merge lp:~axwalk/juju-core/initiate-replicaset-retry
Reviewer Review Type Date Requested Status
Juju Engineering Pending
Review via email: mp+221475@code.launchpad.net

Commit message

Retry replicaset.Initiate

Another attempt at retrying replicaset
initiation. The change I made in
https://codereview.appspot.com/98670045/
does not work well, as the error from
replSetInitiate may be spurious; it is
possible to receive an error despite the
config being written.

https://codereview.appspot.com/102000044/

Description of the change

Retry replicaset.Initiate

Another attempt at retrying replicaset
initiation. The change I made in
https://codereview.appspot.com/98670045/
does not work well, as the error from
replSetInitiate may be spurious; it is
possible to receive an error despite the
config being written.

https://codereview.appspot.com/102000044/

To post a comment you must log in.
Revision history for this message
Andrew Wilkins (axwalk) wrote :
Download full text (4.4 KiB)

Reviewers: mp+221475_code.launchpad.net,

Message:
Please take a look.

Description:
Retry replicaset.Initiate

Another attempt at retrying replicaset
initiation. The change I made in
https://codereview.appspot.com/98670045/
does not work well, as the error from
replSetInitiate may be spurious; it is
possible to receive an error despite the
config being written.

https://code.launchpad.net/~axwalk/juju-core/initiate-replicaset-retry/+merge/221475

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/102000044/

Affected files (+46, -23 lines):
   A [revision details]
   M replicaset/replicaset.go
   M worker/peergrouper/initiate.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20140529113014-rt0lph0bic0c117z
+New revision: <email address hidden>

Index: replicaset/replicaset.go
=== modified file 'replicaset/replicaset.go'
--- replicaset/replicaset.go 2014-05-29 10:26:02 +0000
+++ replicaset/replicaset.go 2014-05-30 00:31:44 +0000
@@ -22,6 +22,11 @@
   // initiateAttemptDelay is the amount of time to sleep between failed
   // attempts to replSetInitiate.
   initiateAttemptDelay = 100 * time.Millisecond
+
+ // rsMembersUnreachableError is the error message returned from mongo
+ // when it thinks that replicaset members are unreachable. This can
+ // occur if replSetInitiate is executed shortly after starting up mongo.
+ rsMembersUnreachableError = "all members and seeds must be reachable to
initiate set"
  )

  var logger = loggo.GetLogger("juju.replicaset")
@@ -52,10 +57,8 @@
   logger.Infof("Initiating replicaset with config %#v", cfg)
   var err error
   for i := 0; i < maxInitiateAttempts; i++ {
- monotonicSession.Refresh()
    err = monotonicSession.Run(bson.D{{"replSetInitiate", cfg}}, nil)
- if err != nil {
- logger.Debugf("replSetInitiate failed: %v", err)
+ if err != nil && err.Error() == rsMembersUnreachableError {
     time.Sleep(initiateAttemptDelay)
     continue
    }

Index: worker/peergrouper/initiate.go
=== modified file 'worker/peergrouper/initiate.go'
--- worker/peergrouper/initiate.go 2014-04-18 04:39:58 +0000
+++ worker/peergrouper/initiate.go 2014-05-30 00:31:44 +0000
@@ -5,14 +5,21 @@

  import (
   "fmt"
+ "time"

   "labix.org/v2/mgo"

   "launchpad.net/juju-core/agent"
   "launchpad.net/juju-core/agent/mongo"
   "launchpad.net/juju-core/replicaset"
+ "launchpad.net/juju-core/utils"
  )

+var initiateAttemptStrategy = utils.AttemptStrategy{
+ Total: 30 * time.Second,
+ Delay: 1 * time.Second,
+}
+
  // InitiateMongoParams holds parameters for the MaybeInitiateMongo call.
  type InitiateMongoParams struct {
   // DialInfo specifies how to connect to the mongo server.
@@ -53,24 +60,35 @@
   }
   defer session.Close()

- cfg, err := replicaset.CurrentConfig(session)
- if err == nil && len(cfg.Members) > 0 {
- logger.Infof("replica set configuration already found: %#v", cfg)
- return nil
- }
- if err != nil && err != mgo.ErrNotFound {
- return fmt.Errorf("cannot get replica ...

Read more...

Revision history for this message
Ian Booth (wallyworld) wrote :

I'd prefer if we didn't have to do a string equality check for the
error. Can we fix that before landing?

https://codereview.appspot.com/102000044/diff/1/replicaset/replicaset.go
File replicaset/replicaset.go (right):

https://codereview.appspot.com/102000044/diff/1/replicaset/replicaset.go#newcode29
replicaset/replicaset.go:29: rsMembersUnreachableError = "all members
and seeds must be reachable to initiate set"
I'd like to think there's a error code we can check. I think we can cast
the error to a mongo error struct and look at the error code? Like is
done for IsUnauthorized() in testing/mgo.go

https://codereview.appspot.com/102000044/

Revision history for this message
Andrew Wilkins (axwalk) wrote :

On 2014/05/30 00:43:43, wallyworld wrote:
> I'd prefer if we didn't have to do a string equality check for the
error. Can we
> fix that before landing?

https://codereview.appspot.com/102000044/diff/1/replicaset/replicaset.go
> File replicaset/replicaset.go (right):

https://codereview.appspot.com/102000044/diff/1/replicaset/replicaset.go#newcode29
> replicaset/replicaset.go:29: rsMembersUnreachableError = "all members
and seeds
> must be reachable to initiate set"
> I'd like to think there's a error code we can check. I think we can
cast the
> error to a mongo error struct and look at the error code? Like is done
for
> IsUnauthorized() in testing/mgo.go

AFAICT there is no corresponding error code for this error.
Even in testing/mgo.go, isUnauthorized has to resort to comparing error
strings.

https://codereview.appspot.com/102000044/

Revision history for this message
Ian Booth (wallyworld) wrote :

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'replicaset/replicaset.go'
2--- replicaset/replicaset.go 2014-05-29 10:26:02 +0000
3+++ replicaset/replicaset.go 2014-05-30 00:35:11 +0000
4@@ -22,6 +22,11 @@
5 // initiateAttemptDelay is the amount of time to sleep between failed
6 // attempts to replSetInitiate.
7 initiateAttemptDelay = 100 * time.Millisecond
8+
9+ // rsMembersUnreachableError is the error message returned from mongo
10+ // when it thinks that replicaset members are unreachable. This can
11+ // occur if replSetInitiate is executed shortly after starting up mongo.
12+ rsMembersUnreachableError = "all members and seeds must be reachable to initiate set"
13 )
14
15 var logger = loggo.GetLogger("juju.replicaset")
16@@ -52,10 +57,8 @@
17 logger.Infof("Initiating replicaset with config %#v", cfg)
18 var err error
19 for i := 0; i < maxInitiateAttempts; i++ {
20- monotonicSession.Refresh()
21 err = monotonicSession.Run(bson.D{{"replSetInitiate", cfg}}, nil)
22- if err != nil {
23- logger.Debugf("replSetInitiate failed: %v", err)
24+ if err != nil && err.Error() == rsMembersUnreachableError {
25 time.Sleep(initiateAttemptDelay)
26 continue
27 }
28
29=== modified file 'worker/peergrouper/initiate.go'
30--- worker/peergrouper/initiate.go 2014-04-18 04:39:58 +0000
31+++ worker/peergrouper/initiate.go 2014-05-30 00:35:11 +0000
32@@ -5,14 +5,21 @@
33
34 import (
35 "fmt"
36+ "time"
37
38 "labix.org/v2/mgo"
39
40 "launchpad.net/juju-core/agent"
41 "launchpad.net/juju-core/agent/mongo"
42 "launchpad.net/juju-core/replicaset"
43+ "launchpad.net/juju-core/utils"
44 )
45
46+var initiateAttemptStrategy = utils.AttemptStrategy{
47+ Total: 30 * time.Second,
48+ Delay: 1 * time.Second,
49+}
50+
51 // InitiateMongoParams holds parameters for the MaybeInitiateMongo call.
52 type InitiateMongoParams struct {
53 // DialInfo specifies how to connect to the mongo server.
54@@ -53,24 +60,35 @@
55 }
56 defer session.Close()
57
58- cfg, err := replicaset.CurrentConfig(session)
59- if err == nil && len(cfg.Members) > 0 {
60- logger.Infof("replica set configuration already found: %#v", cfg)
61- return nil
62- }
63- if err != nil && err != mgo.ErrNotFound {
64- return fmt.Errorf("cannot get replica set configuration: %v", err)
65- }
66- err = replicaset.Initiate(
67- session,
68- p.MemberHostPort,
69- mongo.ReplicaSetName,
70- map[string]string{
71- jujuMachineTag: agent.BootstrapMachineId,
72- },
73- )
74- if err != nil {
75- return fmt.Errorf("cannot initiate replica set: %v", err)
76- }
77- return nil
78+ // Initiate may fail while mongo is initialising, so we retry until
79+ // we succssfully populate the replicaset config.
80+ for attempt := initiateAttemptStrategy.Start(); attempt.Next(); {
81+ var cfg *replicaset.Config
82+ cfg, err = replicaset.CurrentConfig(session)
83+ if err == nil && len(cfg.Members) > 0 {
84+ logger.Infof("replica set configuration already found: %#v", cfg)
85+ return nil
86+ }
87+ if err != nil && err != mgo.ErrNotFound {
88+ return fmt.Errorf("cannot get replica set configuration: %v", err)
89+ }
90+ err = replicaset.Initiate(
91+ session,
92+ p.MemberHostPort,
93+ mongo.ReplicaSetName,
94+ map[string]string{
95+ jujuMachineTag: agent.BootstrapMachineId,
96+ },
97+ )
98+ if err == nil {
99+ logger.Infof("replica set initiated")
100+ return nil
101+ }
102+ if attempt.HasNext() {
103+ logger.Debugf("replica set initiation failed, will retry: %v", err)
104+ }
105+ // Release sockets, which may have been closed by mgo.
106+ session.Refresh()
107+ }
108+ return fmt.Errorf("cannot initiate replica set: %v", err)
109 }

Subscribers

People subscribed via source and target branches

to status/vote changes: