Merge lp:~axwalk/juju-core/replicaset-retry-replsetinitiate into lp:~go-bot/juju-core/trunk
Status: | Merged |
---|---|
Approved by: | Andrew Wilkins |
Approved revision: | no longer in the source branch. |
Merged at revision: | 2735 |
Proposed branch: | lp:~axwalk/juju-core/replicaset-retry-replsetinitiate |
Merge into: | lp:~go-bot/juju-core/trunk |
Diff against target: |
47 lines (+27/-3) 1 file modified
replicaset/replicaset.go (+27/-3) |
To merge this branch: | bzr merge lp:~axwalk/juju-core/replicaset-retry-replsetinitiate |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Juju Engineering | Pending | ||
Review via email: mp+219776@code.launchpad.net |
Commit message
replicaset: retry replSetInitiate
It would seem there's a race in mongo,
where it may think that the one and only
local replicaset member is not reachable
shortly after starting up. I was able to
reliably reproduce this error on the bot
machine by running TestInitiateReplica
in worker/peergrouper in a loop. With the
retry loop, no problems.
Fixes lp:1319617
Description of the change
replicaset: retry replSetInitiate
It would seem there's a race in mongo,
where it may think that the one and only
local replicaset member is not reachable
shortly after starting up. I was able to
reliably reproduce this error on the bot
machine by running TestInitiateReplica
in worker/peergrouper in a loop. With the
retry loop, no problems.
Fixes lp:1319617
Reviewers: mp+219776_ code.launchpad. net,
Message:
Please take a look.
Description:
replicaset: retry replSetInitiate
It would seem there's a race in mongo,
where it may think that the one and only
local replicaset member is not reachable
shortly after starting up. I was able to
reliably reproduce this error on the bot
machine by running TestInitiateReplica
in worker/peergrouper in a loop. With the
retry loop, no problems.
Fixes lp:1319617
https:/ /code.launchpad .net/~axwalk/ juju-core/ replicaset- retry-replsetin itiate/ +merge/ 219776
(do not edit description out of merge proposal)
Please review this at https:/ /codereview. appspot. com/97520046/
Affected files (+29, -3 lines): replicaset. go
A [revision details]
M replicaset/
Index: [revision details] 20140515024948- mdinmvuq3nkxrxx i
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-
+New revision: <email address hidden>
Index: replicaset/ replicaset. go replicaset. go' replicaset. go 2014-04-15 16:37:08 +0000 replicaset. go 2014-05-16 04:09:11 +0000 org/v2/ mgo/bson"
=== modified file 'replicaset/
--- replicaset/
+++ replicaset/
@@ -11,8 +11,23 @@
"labix.
)
-// MaxPeers defines the maximum number of peers that mongo supports. Delay is the amount of time to sleep between failed Delay = 100 * time.Millisecond hableError is the error message returned from mongo hableError = "all members and seeds must be reachable to
-const MaxPeers = 7
+const (
+ // MaxPeers defines the maximum number of peers that mongo supports.
+ MaxPeers = 7
+
+ // maxInitiateAttempts is the maximum number of times to attempt
+ // replSetInitiate for each call to Initiate.
+ maxInitiateAttempts = 10
+
+ // initiateAttempt
+ // attempts to replSetInitiate.
+ initiateAttempt
+
+ // rsMembersUnreac
+ // when it thinks that replicaset members are unreachable. This can
+ // occur if replSetInitiate is executed shortly after starting up mongo.
+ rsMembersUnreac
initiate set"
+)
var logger = loggo.GetLogger ("juju. replicaset" )
@@ -39,7 +54,16 @@ Infof(" Initiating replicaset with config %#v", cfg) n.Run(bson. D{{"replSetInit iate", cfg}}, nil) mpts; i++ { n.Run(bson. D{{"replSetInit iate", cfg}}, nil) hableError { initiateAttempt Delay)
}},
}
logger.
- return monotonicSessio
+ var err error
+ for i := 0; i < maxInitiateAtte
+ err = monotonicSessio
+ if err != nil && err.Error() == rsMembersUnreac
+ time.Sleep(
+ continue
+ }
+ break
+ }
+ return err
}
// Member holds configuration information for a replica set member.