I should also note what it took to trigger this failure.
cd cmd/jujud
go test -gocheck.v -gocheck.f ManageEnviron
A key requirement is that ManageEnviron has to take longer than 10s (the default pingDelay), because the logic failure is in the loop=True case (which always starts with a sleep before it actually triggers the code).
On my hardware, ManageEnviron often takes 5-8s and that will never trigger a failure.
Also, you have to have a Connect take long enough that the server.Close call triggers before the Connect has time to return.
I was able to make the test fail fairly reliably (>50% of the time) with this patch:
=== modified file 'server.go'
--- server.go 2013-05-31 21:44:07 +0000
+++ server.go 2013-06-17 06:48:07 +0000
@@ -213,7 +213,7 @@
return result
}
I should also note what it took to trigger this failure.
cd cmd/jujud
go test -gocheck.v -gocheck.f ManageEnviron
A key requirement is that ManageEnviron has to take longer than 10s (the default pingDelay), because the logic failure is in the loop=True case (which always starts with a sleep before it actually triggers the code).
On my hardware, ManageEnviron often takes 5-8s and that will never trigger a failure.
Also, you have to have a Connect take long enough that the server.Close call triggers before the Connect has time to return.
I was able to make the test fail fairly reliably (>50% of the time) with this patch:
=== modified file 'server.go'
--- server.go 2013-05-31 21:44:07 +0000
+++ server.go 2013-06-17 06:48:07 +0000
@@ -213,7 +213,7 @@
return result
}
-var pingDelay = 10 * time.Second
+var pingDelay = 5 * time.Second // 10 * time.Second
func (server *mongoServer) pinger(loop bool) {
op := queryOp{
=== modified file 'socket.go'
"labix. org/v2/ mgo/bson"
--- socket.go 2013-04-11 05:27:36 +0000
+++ socket.go 2013-06-17 06:48:41 +0000
@@ -31,6 +31,7 @@
"net"
"sync"
+ "time"
)
type replyFunc func(err error, reply *replyOp, docNum int, docData []byte)
stats. socketsAlive( +1)
debugf( "Socket %p to %s: initialized", socket, socket.addr)
socket. resetNonce( ) 10*time. Millisecond)
@@ -112,6 +113,7 @@
+ time.Sleep(
go socket.readLoop()
return socket
}
And then, of course, with both patches it runs smoothly again.