Comment 14 for bug 795025

Revision history for this message
Martin Pool (mbp) wrote :

Looking into this a bit more with hloeung:

We are already using start-stop-service to gracefully stop the server. We can change the option to --retry TERM/7200/KILL/5 that can send a SIGKILL to the front end process if it does not exit after a time. So, we can ask it nicely to exit, at which point it will stop serving new connections, and then a couple of hours later if it still exists we can kill it, at which point all the existing connections will definitely drop. We tested this, and the back end processes will indeed stop.

So:

 * clients that finish up their business in less than a couple of hours will see no interruption
 * clients that run longer than that will see the server abruptly disconnect
 * new bzr clients will reconnect and retry
 * old clients will probably abort, but if they are holding connections open indefinitely, they will probably see some network drops anyhow
 * once we are confident that the reconnect code is safe, we can backport it to previous bzr releases

Haw says that killing the server after a time will fix the LOSA-affecting issue, ie that rollouts take a lot of manual intervention, so for now I'm going to close the lp side of it.