mgo

Merge lp:~jameinel/mgo/bug-1191487-close-race into lp:mgo/v2

Proposed by John A Meinel
Status: Merged
Merged at revision: 227
Proposed branch: lp:~jameinel/mgo/bug-1191487-close-race
Merge into: lp:mgo/v2
Diff against target: 58 lines (+14/-6)
1 file modified
server.go (+14/-6)
To merge this branch: bzr merge lp:~jameinel/mgo/bug-1191487-close-race
Reviewer Review Type Date Requested Status
Gustavo Niemeyer Pending
Review via email: mp+169999@code.launchpad.net

Description of the change

Fix race in AcquireSocket

Fix AcquireSocket so it doesn't put sockets in the live list after
the server has been closed. Fix by John Meinel.

https://codereview.appspot.com/10392043/

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

(I wasn't able to find lbox configuration for this project, which is why I'm just doing a Launchpad MP)

Revision history for this message
John A Meinel (jameinel) wrote :

The test which was passing reliably yesterday is failing occasionally today, so I obviously need to think it through again.

Revision history for this message
John A Meinel (jameinel) wrote :

Please take a look.

Revision history for this message
John A Meinel (jameinel) wrote :

I found the race condition. It turns out that newServer pings one time, and that creates a socket. That socket *might* fail during conn.Write() which means it returns immediately. And then at a future time (40us or so), the readLoop notices the read failure and removes the socket from unusedSockets.

So instead we just check that we are actually going to acquire a new socket. If I add a "c.Error" inside the for{} loop, I can see that it would indeed fail about 1 time in 100, and now it hasn't failed after several hundred runs.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.8 KiB)

Reviewers: mp+169999_code.launchpad.net,

Message:
Please take a look.

Description:
server.go: handle a race between Close and Connect

If one routine (such as the pinger) starts a new connection at the same
time a Close is called (say by test case tear down), it is possible for
that connection to return and be marked alive even though all other
connections are closed.

This checks if closed has happened since the connect was requested, and
immediately closes the new connection.

See bug #1191487 for more details about how this was triggered in the
juju-core test suite.

https://code.launchpad.net/~jameinel/mgo/bug-1191487-close-race/+merge/169999

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/10392043/

Affected files:
   A [revision details]
   M server.go
   A server_test.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: <email address hidden>
+New revision: <email address hidden>

Index: server.go
=== modified file 'server.go'
--- server.go 2013-06-17 13:37:37 +0000
+++ server.go 2013-06-17 14:44:10 +0000
@@ -73,6 +73,7 @@
  }

  var errSocketLimit = errors.New("per-server connection limit reached")
+var errServerClosed = errors.New("server was closed")

  // AcquireSocket returns a socket for communicating with the server.
  // This will attempt to reuse an old connection, if one is available.
Otherwise,
@@ -104,6 +105,10 @@
     socket, err = server.Connect()
     if err == nil {
      server.Lock()
+ if server.closed {
+ socket.Close()
+ return nil, false, errServerClosed
+ }
      server.liveSockets = append(server.liveSockets, socket)
      server.Unlock()
     }
@@ -256,6 +261,8 @@
     server.pingValue = max
     server.Unlock()
     logf("Ping for %s is %d ms", server.Addr, max/time.Millisecond)
+ } else if err == errServerClosed {
+ return
    }
    if !loop {
     return

Index: server_test.go
=== added file 'server_test.go'
--- server_test.go 1970-01-01 00:00:00 +0000
+++ server_test.go 2013-06-18 13:05:51 +0000
@@ -0,0 +1,58 @@
+package mgo
+
+import (
+ . "launchpad.net/gocheck"
+ "net"
+)
+
+type ServerSuite struct {
+}
+
+var _ = Suite(&ServerSuite{})
+
+func (s *ServerSuite) TestCloseDuringConnect(c *C) {
+ var server *mongoServer
+ closeDial := func(addr net.Addr) (net.Conn, error) {
+ // The first call to this will be during newServer, and
+ // server will be nil. Once it returns, the second call
+ // will be with a valid server, and we will Close it.
+ if server != nil {
+ server.Close()
+ }
+ return net.DialTCP("tcp", nil, addr.(*net.TCPAddr))
+ }
+ localhostServer, err := net.Listen("tcp", "localhost:0")
+ c.Assert(err, IsNil)
+ defer localhostServer.Close()
+ go func() {
+ // Accept a connection but don't do anything with it
+ for {
+ conn, err := localhostServer.Accept()
+ logf("got connection: %v", conn)
+ if err != nil {
+ return
+ }
+ conn.Close()
+ }
+ }()
+ tcpaddr := localhostServer.Addr().(*net.TCPAddr)
+...

Read more...

231. By John A Meinel

Always unlock the server

Revision history for this message
John A Meinel (jameinel) wrote :
Revision history for this message
Roger Peppe (rogpeppe) wrote :

LGTM with a few thoughts and suggestions regarding the test.

https://codereview.appspot.com/10392043/diff/1/server.go
File server.go (right):

https://codereview.appspot.com/10392043/diff/1/server.go#newcode109
server.go:109: socket.Close()
server.Unlock()
(probably before calling socket.Close, but worth investigating)

https://codereview.appspot.com/10392043/diff/1/server_test.go
File server_test.go (right):

https://codereview.appspot.com/10392043/diff/1/server_test.go#newcode19
server_test.go:19: if server != nil {
is the dialler guaranteed not to be called concurrently?

https://codereview.appspot.com/10392043/diff/1/server_test.go#newcode22
server_test.go:22: return net.DialTCP("tcp", nil, addr.(*net.TCPAddr))
return net.Dial("tcp", addr.String()) ?

https://codereview.appspot.com/10392043/diff/1/server_test.go#newcode41
server_test.go:41: // It is possible for newServer to ping the remote
host, and the
s/possible/not possible/
?

https://codereview.appspot.com/10392043/diff/1/server_test.go#newcode52
server_test.go:52: }
i'm not sure this is guaranteed to work unless you have a sleep in here
(i think you're relying on logf yielding, which doesn't seem quite
right).

https://codereview.appspot.com/10392043/

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

https://codereview.appspot.com/10392043/diff/1/server.go
File server.go (right):

https://codereview.appspot.com/10392043/diff/1/server.go#newcode109
server.go:109: socket.Close()
On 2013/06/18 13:29:04, rog wrote:
> server.Unlock()
> (probably before calling socket.Close, but worth investigating)

Yeah, definitely.

https://codereview.appspot.com/10392043/diff/1/server.go#newcode110
server.go:110: return nil, false, errServerClosed
If we have a closed error (which is a good idea, thanks), it should also
be exercised before the connection is attempted.

https://codereview.appspot.com/10392043/diff/1/server.go#newcode264
server.go:264: } else if err == errServerClosed {
There are now two different ways to check for server closed in this
loop. Can we have a single one now that we have errServerClosed?

https://codereview.appspot.com/10392043/diff/1/server_test.go
File server_test.go (right):

https://codereview.appspot.com/10392043/diff/1/server_test.go#newcode16
server_test.go:16: // The first call to this will be during newServer,
and
I appreciate this test as a good reproducer for the issue you're fixing,
but I'm afraid it's quite fragile and extremely attached to
implementation details. If there's no good way to test it in a more
reliable way, I'm fine without an additional test (note that you arrived
at the fix from breakage in the test suite itself).

https://codereview.appspot.com/10392043/

232. By John A Meinel

Change the pinger loop so that AcquireSocket always checks for closed first.

233. By John A Meinel

Remove the whitebox test of concurrency handling.

Revision history for this message
John A Meinel (jameinel) wrote :

Please take a look.

https://codereview.appspot.com/10392043/diff/1/server_test.go
File server_test.go (right):

https://codereview.appspot.com/10392043/diff/1/server_test.go#newcode41
server_test.go:41: // It is possible for newServer to ping the remote
host, and the
On 2013/06/18 13:29:04, rog wrote:
> s/possible/not possible/
> ?

It is possible for (newServer to ping the remote host and have the write
fail).

https://codereview.appspot.com/10392043/

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

*** Submitted:

Fix race in AcquireSocket

Fix AcquireSocket so it doesn't put sockets in the live list after
the server has been closed. Fix by John Meinel.

R=rog, niemeyer
CC=
https://codereview.appspot.com/10392043

https://codereview.appspot.com/10392043/

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'server.go'
2--- server.go 2013-06-17 13:37:37 +0000
3+++ server.go 2013-06-18 14:35:44 +0000
4@@ -73,6 +73,7 @@
5 }
6
7 var errSocketLimit = errors.New("per-server connection limit reached")
8+var errServerClosed = errors.New("server was closed")
9
10 // AcquireSocket returns a socket for communicating with the server.
11 // This will attempt to reuse an old connection, if one is available. Otherwise,
12@@ -85,6 +86,10 @@
13 for {
14 server.Lock()
15 abended = server.abended
16+ if server.closed {
17+ server.Unlock()
18+ return nil, abended, errServerClosed
19+ }
20 n := len(server.unusedSockets)
21 if limit > 0 && len(server.liveSockets)-n >= limit {
22 server.Unlock()
23@@ -104,6 +109,13 @@
24 socket, err = server.Connect()
25 if err == nil {
26 server.Lock()
27+ // We've waited for the Connect, see if we got
28+ // closed in the meantime
29+ if server.closed {
30+ server.Unlock()
31+ socket.Close()
32+ return nil, abended, errServerClosed
33+ }
34 server.liveSockets = append(server.liveSockets, socket)
35 server.Unlock()
36 }
37@@ -225,12 +237,6 @@
38 for {
39 if loop {
40 time.Sleep(pingDelay)
41- server.RLock()
42- closed := server.closed
43- server.RUnlock()
44- if closed {
45- return
46- }
47 }
48 op := op
49 socket, _, err := server.AcquireSocket(0)
50@@ -256,6 +262,8 @@
51 server.pingValue = max
52 server.Unlock()
53 logf("Ping for %s is %d ms", server.Addr, max/time.Millisecond)
54+ } else if err == errServerClosed {
55+ return
56 }
57 if !loop {
58 return

Subscribers

People subscribed via source and target branches