juju-core

Bug #1191487
Comment #2

Comment 2 for bug 1191487

Revision history for this message

John A Meinel (jameinel) wrote on 2013-06-17:

Local testing with lots of sleeps and traps to make sure it triggers have shown it to get the test suite to pass reliably. (I can add logic that notes when this logic is triggered, and can see that the test would have failed if I didn't have it.)

I was concerned about the pinger code getting a closed socket, but
a) If you return an error from AcquireSocket, the pinger code just goes back to sleep if loop is true, so it will take 10s for it to wake up again to notice that the server is actually closed.
b) The code that acts on the socket just makes a 'no-op' request and ignores whether that request succeeds or not (it suppresses the result and the error), and just cares how long it would have taken.

The other callers of AcquireSocket could probably do something with an error here, but the very next call on the socket will fail and they handle that error. (They have to, given the socket could actually fail at any point).

So a different patch that called socket.Close() and then set an error, and then updated the pinger code to look for a ErrServerClosed or something like that would also be reasonable, but in the end is a more invasive change.

I'm running with this patch on the go-bot
=== modified file 'server.go'
--- server.go   2013-05-31 21:44:07 +0000
+++ server.go   2013-06-17 06:38:48 +0000
@@ -104,7 +104,16 @@
                        socket, err = server.Connect()
                        if err == nil {
                                server.Lock()
-                               server.liveSockets = append(server.liveSockets, socket)
+                               if server.closed {
+                                       // We don't add this to liveSockets
+                                       // because someone has already called
+                                       // server.Close() which closed and
+                                       // removed all active connections
+                                       log("server closed after Connect returned.")
+                                       socket.Close()
+                               } else {
+                                       server.liveSockets = append(server.liveSockets, socket)
+                               }
                                server.Unlock()
                        }
                }

I was concerned about the pinger code getting a closed socket, but 
a) If you return an error from AcquireSocket, the pinger code just goes back to sleep if loop is true, so it will take 10s for it to wake up again to notice that the server is actually closed.
b) The code that acts on the socket just makes a 'no-op' request and ignores whether that request succeeds or not (it suppresses the result and the error), and just cares how long it would have taken.

So a  different patch that called socket.Close() and then set an error, and then updated the pinger code to look for a ErrServerClosed or something like that would also be reasonable, but in the end is a more invasive change.