Comment 4 for bug 1191487

Revision history for this message
Dave Cheney (dave-cheney) wrote : Re: [Bug 1191487] Re: mgo sockets in a dirty state

Thanks for working on this John. While I don't have much of a say in this,
we've done a lot of work to work around philosophical differences with the
mgo driver. Would a more effective to solve this issue be to present the
problem to the upstream and have a better fix incorporated. It sounds like
for scaling, and other requirements, we effectively want all the automagic
reconnection logic in the driver disabled, as our agents already cope with
this via their retry logic.

On Mon, Jun 17, 2013 at 5:01 PM, John A Meinel <email address hidden>wrote:

> I should also note what it took to trigger this failure.
>
> cd cmd/jujud
> go test -gocheck.v -gocheck.f ManageEnviron
>
> A key requirement is that ManageEnviron has to take longer than 10s (the
> default pingDelay), because the logic failure is in the loop=True case
> (which always starts with a sleep before it actually triggers the code).
>
> On my hardware, ManageEnviron often takes 5-8s and that will never
> trigger a failure.
>
> Also, you have to have a Connect take long enough that the server.Close
> call triggers before the Connect has time to return.
>
> I was able to make the test fail fairly reliably (>50% of the time) with
> this patch:
> === modified file 'server.go'
> --- server.go 2013-05-31 21:44:07 +0000
> +++ server.go 2013-06-17 06:48:07 +0000
> @@ -213,7 +213,7 @@
> return result
> }
>
> -var pingDelay = 10 * time.Second
> +var pingDelay = 5 * time.Second // 10 * time.Second
>
> func (server *mongoServer) pinger(loop bool) {
> op := queryOp{
>
> === modified file 'socket.go'
> --- socket.go 2013-04-11 05:27:36 +0000
> +++ socket.go 2013-06-17 06:48:41 +0000
> @@ -31,6 +31,7 @@
> "labix.org/v2/mgo/bson"
> "net"
> "sync"
> + "time"
> )
>
> type replyFunc func(err error, reply *replyOp, docNum int, docData []byte)
> @@ -112,6 +113,7 @@
> stats.socketsAlive(+1)
> debugf("Socket %p to %s: initialized", socket, socket.addr)
> socket.resetNonce()
> + time.Sleep(10*time.Millisecond)
> go socket.readLoop()
> return socket
> }
>
> And then, of course, with both patches it runs smoothly again.
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: MOAR JUJU SPAM!
> https://bugs.launchpad.net/bugs/1191487
>
> Title:
> mgo sockets in a dirty state
>
> Status in juju-core:
> Triaged
> Status in mgo:
> Confirmed
>
> Bug description:
> /home/tarmac/trees/src/launchpad.net/juju-core/testing/mgo.go:240:
> c.Fatal("Test left sockets in a dirty state")
> ... Error: Test left sockets in a dirty state
>
> Occasionally when running the test suite (especially on Tarmac) I get
> the above failure, which resolves itself if I just run it again.
> I don't know *what* is leaving things in a dirty state, but this might
> be a hint for why things don't always work smoothly with the packaged Mongo
> (we might actually not be cleaning up properly).
>
> This was run on tarmac with the 2.2.0 tarball build of mongo.
>
> https://code.launchpad.net/~danilo/juju-core/lbox-check-emacs/+merge/169680/comments/377328
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1191487/+subscriptions
>