Merge lp:~axwalk/juju-core/replicaset-retry-replsetinitiate into lp:~go-bot/juju-core/trunk

Proposed by Andrew Wilkins
Status: Merged
Approved by: Andrew Wilkins
Approved revision: no longer in the source branch.
Merged at revision: 2735
Proposed branch: lp:~axwalk/juju-core/replicaset-retry-replsetinitiate
Merge into: lp:~go-bot/juju-core/trunk
Diff against target: 47 lines (+27/-3)
1 file modified
replicaset/replicaset.go (+27/-3)
To merge this branch: bzr merge lp:~axwalk/juju-core/replicaset-retry-replsetinitiate
Reviewer Review Type Date Requested Status
Juju Engineering Pending
Review via email: mp+219776@code.launchpad.net

Commit message

replicaset: retry replSetInitiate

It would seem there's a race in mongo,
where it may think that the one and only
local replicaset member is not reachable
shortly after starting up. I was able to
reliably reproduce this error on the bot
machine by running TestInitiateReplica
in worker/peergrouper in a loop. With the
retry loop, no problems.

Fixes lp:1319617

https://codereview.appspot.com/97520046/

Description of the change

replicaset: retry replSetInitiate

It would seem there's a race in mongo,
where it may think that the one and only
local replicaset member is not reachable
shortly after starting up. I was able to
reliably reproduce this error on the bot
machine by running TestInitiateReplica
in worker/peergrouper in a loop. With the
retry loop, no problems.

Fixes lp:1319617

https://codereview.appspot.com/97520046/

To post a comment you must log in.
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Reviewers: mp+219776_code.launchpad.net,

Message:
Please take a look.

Description:
replicaset: retry replSetInitiate

It would seem there's a race in mongo,
where it may think that the one and only
local replicaset member is not reachable
shortly after starting up. I was able to
reliably reproduce this error on the bot
machine by running TestInitiateReplica
in worker/peergrouper in a loop. With the
retry loop, no problems.

Fixes lp:1319617

https://code.launchpad.net/~axwalk/juju-core/replicaset-retry-replsetinitiate/+merge/219776

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/97520046/

Affected files (+29, -3 lines):
   A [revision details]
   M replicaset/replicaset.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20140515024948-mdinmvuq3nkxrxxi
+New revision: <email address hidden>

Index: replicaset/replicaset.go
=== modified file 'replicaset/replicaset.go'
--- replicaset/replicaset.go 2014-04-15 16:37:08 +0000
+++ replicaset/replicaset.go 2014-05-16 04:09:11 +0000
@@ -11,8 +11,23 @@
   "labix.org/v2/mgo/bson"
  )

-// MaxPeers defines the maximum number of peers that mongo supports.
-const MaxPeers = 7
+const (
+ // MaxPeers defines the maximum number of peers that mongo supports.
+ MaxPeers = 7
+
+ // maxInitiateAttempts is the maximum number of times to attempt
+ // replSetInitiate for each call to Initiate.
+ maxInitiateAttempts = 10
+
+ // initiateAttemptDelay is the amount of time to sleep between failed
+ // attempts to replSetInitiate.
+ initiateAttemptDelay = 100 * time.Millisecond
+
+ // rsMembersUnreachableError is the error message returned from mongo
+ // when it thinks that replicaset members are unreachable. This can
+ // occur if replSetInitiate is executed shortly after starting up mongo.
+ rsMembersUnreachableError = "all members and seeds must be reachable to
initiate set"
+)

  var logger = loggo.GetLogger("juju.replicaset")

@@ -39,7 +54,16 @@
    }},
   }
   logger.Infof("Initiating replicaset with config %#v", cfg)
- return monotonicSession.Run(bson.D{{"replSetInitiate", cfg}}, nil)
+ var err error
+ for i := 0; i < maxInitiateAttempts; i++ {
+ err = monotonicSession.Run(bson.D{{"replSetInitiate", cfg}}, nil)
+ if err != nil && err.Error() == rsMembersUnreachableError {
+ time.Sleep(initiateAttemptDelay)
+ continue
+ }
+ break
+ }
+ return err
  }

  // Member holds configuration information for a replica set member.

Revision history for this message
Ian Booth (wallyworld) wrote :

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'replicaset/replicaset.go'
2--- replicaset/replicaset.go 2014-04-15 16:37:08 +0000
3+++ replicaset/replicaset.go 2014-05-16 04:14:34 +0000
4@@ -11,8 +11,23 @@
5 "labix.org/v2/mgo/bson"
6 )
7
8-// MaxPeers defines the maximum number of peers that mongo supports.
9-const MaxPeers = 7
10+const (
11+ // MaxPeers defines the maximum number of peers that mongo supports.
12+ MaxPeers = 7
13+
14+ // maxInitiateAttempts is the maximum number of times to attempt
15+ // replSetInitiate for each call to Initiate.
16+ maxInitiateAttempts = 10
17+
18+ // initiateAttemptDelay is the amount of time to sleep between failed
19+ // attempts to replSetInitiate.
20+ initiateAttemptDelay = 100 * time.Millisecond
21+
22+ // rsMembersUnreachableError is the error message returned from mongo
23+ // when it thinks that replicaset members are unreachable. This can
24+ // occur if replSetInitiate is executed shortly after starting up mongo.
25+ rsMembersUnreachableError = "all members and seeds must be reachable to initiate set"
26+)
27
28 var logger = loggo.GetLogger("juju.replicaset")
29
30@@ -39,7 +54,16 @@
31 }},
32 }
33 logger.Infof("Initiating replicaset with config %#v", cfg)
34- return monotonicSession.Run(bson.D{{"replSetInitiate", cfg}}, nil)
35+ var err error
36+ for i := 0; i < maxInitiateAttempts; i++ {
37+ err = monotonicSession.Run(bson.D{{"replSetInitiate", cfg}}, nil)
38+ if err != nil && err.Error() == rsMembersUnreachableError {
39+ time.Sleep(initiateAttemptDelay)
40+ continue
41+ }
42+ break
43+ }
44+ return err
45 }
46
47 // Member holds configuration information for a replica set member.

Subscribers

People subscribed via source and target branches

to status/vote changes: