mgo sockets in a dirty state
Bug #1191487 reported by
John A Meinel
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-core |
Invalid
|
Critical
|
John A Meinel | ||
mgo |
Fix Released
|
Undecided
|
John A Meinel |
Bug Description
/home/tarmac/
c.Fatal("Test left sockets in a dirty state")
... Error: Test left sockets in a dirty state
Occasionally when running the test suite (especially on Tarmac) I get the above failure, which resolves itself if I just run it again.
I don't know *what* is leaving things in a dirty state, but this might be a hint for why things don't always work smoothly with the packaged Mongo (we might actually not be cleaning up properly).
This was run on tarmac with the 2.2.0 tarball build of mongo.
https:/
Related branches
lp:~jameinel/mgo/bug-1191487-close-race
- Gustavo Niemeyer: Pending requested
-
Diff: 58 lines (+14/-6)1 file modifiedserver.go (+14/-6)
Changed in mgo: | |
status: | Confirmed → Fix Committed |
assignee: | nobody → John A Meinel (jameinel) |
Changed in mgo: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
Turns out there is a race condition in mgo.
pinger can be looping. It will sleep and when it wakes up it will see the server is not closed.
It then calls AcquireSocket to start the next ping, and AcquireSocket gets to the Connect logic, where it blocks waiting for the connection to actually connect to Mongo.
During this time, another goroutine calls server.Close(), which has it loop over all liveConnections and call Close on them, and then set the liveConnection slice to nil.
Then the Connect returns, and this new connection immediately gets added to the liveConnection list. After doing the ping, pinger will notice the server is closed and will set loop=False and return.
I believe during this time, the connection is in readLoop (where it sits endlessly blocked on fill because there isn't a conversation going on at this point).
Regardless, nothing will ever call Close on that connection, and it will just sit in liveConnections.
I believe the fix is that after connect returns, once it acquires the server.Lock, it needs to check if server.closed is true before it adds the new connection to the liveConnections.. Something like:
socket, err = server.Connect()
if err == nil {
server. Lock()
server. liveSockets = append( server. liveSockets, socket)
server. Unlock( )
=== modified file 'server.go'
--- server.go 2013-05-31 21:44:07 +0000
+++ server.go 2013-06-16 17:04:01 +0000
@@ -104,6 +104,12 @@
+ if server.closed {
+ // While we were waiting for Connect to finish, we were asked to close.
+ // What error should be here?
+ socket.Close()
+ return
+ }
However, I don't know what error to set (if any).
Partly because if there is an error from AcquireSocket but 'loop' is currently set true, it will go back to sleep again (because err is checked for being nil, but nothing is done if err is not nil.)
So there should probably be another check for an err type that indicates we should stop looping (presumably we suppress errs normally so that we will just ping whenever we can, and just skip it when it fails.)