Code review comment for lp:~axwalk/juju-core/replicaset-retry-replsetinitiate

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Reviewers: mp+219776_code.launchpad.net,

Message:
Please take a look.

Description:
replicaset: retry replSetInitiate

It would seem there's a race in mongo,
where it may think that the one and only
local replicaset member is not reachable
shortly after starting up. I was able to
reliably reproduce this error on the bot
machine by running TestInitiateReplica
in worker/peergrouper in a loop. With the
retry loop, no problems.

Fixes lp:1319617

https://code.launchpad.net/~axwalk/juju-core/replicaset-retry-replsetinitiate/+merge/219776

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/97520046/

Affected files (+29, -3 lines):
   A [revision details]
   M replicaset/replicaset.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20140515024948-mdinmvuq3nkxrxxi
+New revision: <email address hidden>

Index: replicaset/replicaset.go
=== modified file 'replicaset/replicaset.go'
--- replicaset/replicaset.go 2014-04-15 16:37:08 +0000
+++ replicaset/replicaset.go 2014-05-16 04:09:11 +0000
@@ -11,8 +11,23 @@
   "labix.org/v2/mgo/bson"
  )

-// MaxPeers defines the maximum number of peers that mongo supports.
-const MaxPeers = 7
+const (
+ // MaxPeers defines the maximum number of peers that mongo supports.
+ MaxPeers = 7
+
+ // maxInitiateAttempts is the maximum number of times to attempt
+ // replSetInitiate for each call to Initiate.
+ maxInitiateAttempts = 10
+
+ // initiateAttemptDelay is the amount of time to sleep between failed
+ // attempts to replSetInitiate.
+ initiateAttemptDelay = 100 * time.Millisecond
+
+ // rsMembersUnreachableError is the error message returned from mongo
+ // when it thinks that replicaset members are unreachable. This can
+ // occur if replSetInitiate is executed shortly after starting up mongo.
+ rsMembersUnreachableError = "all members and seeds must be reachable to
initiate set"
+)

  var logger = loggo.GetLogger("juju.replicaset")

@@ -39,7 +54,16 @@
    }},
   }
   logger.Infof("Initiating replicaset with config %#v", cfg)
- return monotonicSession.Run(bson.D{{"replSetInitiate", cfg}}, nil)
+ var err error
+ for i := 0; i < maxInitiateAttempts; i++ {
+ err = monotonicSession.Run(bson.D{{"replSetInitiate", cfg}}, nil)
+ if err != nil && err.Error() == rsMembersUnreachableError {
+ time.Sleep(initiateAttemptDelay)
+ continue
+ }
+ break
+ }
+ return err
  }

  // Member holds configuration information for a replica set member.

« Back to merge proposal