It would seem there's a race in mongo,
where it may think that the one and only
local replicaset member is not reachable
shortly after starting up. I was able to
reliably reproduce this error on the bot
machine by running TestInitiateReplica
in worker/peergrouper in a loop. With the
retry loop, no problems.
-// MaxPeers defines the maximum number of peers that mongo supports.
-const MaxPeers = 7
+const (
+ // MaxPeers defines the maximum number of peers that mongo supports.
+ MaxPeers = 7
+
+ // maxInitiateAttempts is the maximum number of times to attempt
+ // replSetInitiate for each call to Initiate.
+ maxInitiateAttempts = 10
+
+ // initiateAttemptDelay is the amount of time to sleep between failed
+ // attempts to replSetInitiate.
+ initiateAttemptDelay = 100 * time.Millisecond
+
+ // rsMembersUnreachableError is the error message returned from mongo
+ // when it thinks that replicaset members are unreachable. This can
+ // occur if replSetInitiate is executed shortly after starting up mongo.
+ rsMembersUnreachableError = "all members and seeds must be reachable to
initiate set"
+)
Reviewers: mp+219776_ code.launchpad. net,
Message:
Please take a look.
Description:
replicaset: retry replSetInitiate
It would seem there's a race in mongo,
where it may think that the one and only
local replicaset member is not reachable
shortly after starting up. I was able to
reliably reproduce this error on the bot
machine by running TestInitiateReplica
in worker/peergrouper in a loop. With the
retry loop, no problems.
Fixes lp:1319617
https:/ /code.launchpad .net/~axwalk/ juju-core/ replicaset- retry-replsetin itiate/ +merge/ 219776
(do not edit description out of merge proposal)
Please review this at https:/ /codereview. appspot. com/97520046/
Affected files (+29, -3 lines): replicaset. go
A [revision details]
M replicaset/
Index: [revision details] 20140515024948- mdinmvuq3nkxrxx i
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-
+New revision: <email address hidden>
Index: replicaset/ replicaset. go replicaset. go' replicaset. go 2014-04-15 16:37:08 +0000 replicaset. go 2014-05-16 04:09:11 +0000 org/v2/ mgo/bson"
=== modified file 'replicaset/
--- replicaset/
+++ replicaset/
@@ -11,8 +11,23 @@
"labix.
)
-// MaxPeers defines the maximum number of peers that mongo supports. Delay is the amount of time to sleep between failed Delay = 100 * time.Millisecond hableError is the error message returned from mongo hableError = "all members and seeds must be reachable to
-const MaxPeers = 7
+const (
+ // MaxPeers defines the maximum number of peers that mongo supports.
+ MaxPeers = 7
+
+ // maxInitiateAttempts is the maximum number of times to attempt
+ // replSetInitiate for each call to Initiate.
+ maxInitiateAttempts = 10
+
+ // initiateAttempt
+ // attempts to replSetInitiate.
+ initiateAttempt
+
+ // rsMembersUnreac
+ // when it thinks that replicaset members are unreachable. This can
+ // occur if replSetInitiate is executed shortly after starting up mongo.
+ rsMembersUnreac
initiate set"
+)
var logger = loggo.GetLogger ("juju. replicaset" )
@@ -39,7 +54,16 @@ Infof(" Initiating replicaset with config %#v", cfg) n.Run(bson. D{{"replSetInit iate", cfg}}, nil) mpts; i++ { n.Run(bson. D{{"replSetInit iate", cfg}}, nil) hableError { initiateAttempt Delay)
}},
}
logger.
- return monotonicSessio
+ var err error
+ for i := 0; i < maxInitiateAtte
+ err = monotonicSessio
+ if err != nil && err.Error() == rsMembersUnreac
+ time.Sleep(
+ continue
+ }
+ break
+ }
+ return err
}
// Member holds configuration information for a replica set member.