Comment 3 for bug 1191487

Revision history for this message
John A Meinel (jameinel) wrote :

I should also note what it took to trigger this failure.

cd cmd/jujud
go test -gocheck.v -gocheck.f ManageEnviron

A key requirement is that ManageEnviron has to take longer than 10s (the default pingDelay), because the logic failure is in the loop=True case (which always starts with a sleep before it actually triggers the code).

On my hardware, ManageEnviron often takes 5-8s and that will never trigger a failure.

Also, you have to have a Connect take long enough that the server.Close call triggers before the Connect has time to return.

I was able to make the test fail fairly reliably (>50% of the time) with this patch:
=== modified file 'server.go'
--- server.go 2013-05-31 21:44:07 +0000
+++ server.go 2013-06-17 06:48:07 +0000
@@ -213,7 +213,7 @@
        return result
 }

-var pingDelay = 10 * time.Second
+var pingDelay = 5 * time.Second // 10 * time.Second

 func (server *mongoServer) pinger(loop bool) {
        op := queryOp{

=== modified file 'socket.go'
--- socket.go 2013-04-11 05:27:36 +0000
+++ socket.go 2013-06-17 06:48:41 +0000
@@ -31,6 +31,7 @@
        "labix.org/v2/mgo/bson"
        "net"
        "sync"
+ "time"
 )

 type replyFunc func(err error, reply *replyOp, docNum int, docData []byte)
@@ -112,6 +113,7 @@
        stats.socketsAlive(+1)
        debugf("Socket %p to %s: initialized", socket, socket.addr)
        socket.resetNonce()
+ time.Sleep(10*time.Millisecond)
        go socket.readLoop()
        return socket
 }

And then, of course, with both patches it runs smoothly again.