Code review comment for lp:~jimbaker/pyjuju/robust-zk-connect

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

It works :-) I'm a little concerned by the content and verbosity of
the output both when run normally or in verbose mode.

verbose mode -> http://paste.ubuntu.com/762451/
normal mode -> http://paste.ubuntu.com/762452/

[0]

The retry is a bit aggressive, i see 3-4 calls per second to try and
resolve the host ip address, over the course of avg ~30s. Some backoff
might be beneficial here, like 1s between retries on a machine without
an address.

[1]
2011-11-29 21:06:27,053:13891(0xb6500b70):ZOO_ERROR@handle_socket_error_msg@1579: Socket [127.0.0.1:35354] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
Unhandled error in Deferred:
Unhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
Failure: txzookeeper.client.ConnectionTimeoutException: could not connect before timeout

it would be nice if this could be resolved as well in this branch, i
tracked it down to the called_aware_deferred_chain returning
failures if the parent deferred is called.

The more i look at this branch, i find the sshclient/forward code more
mystifying. Its not germane to this branch, but i've got a refactoring
branch that will followup this branch to make this code a bit cleaner.
The gist of the cleanup http://pastebin.ubuntu.com/761938/

[2] The zookeeper c library doesn't have a notion of timeouts, its an
internal thing the txzk library manages using a twisted
reactor.callLater. Effectively it means everytime we attempt a
connect, there will be a background attempt continously running in the
zk io thread.

An effective usage change would be to just increase the timeout on connect, the tunnel is already active, so its just a matter of waiting. I just commited r45 on txzookeeper to handle this issue there, namely tackle closing the conn should terminate the background connect activity, but i don't see why we wouldn't just go for the longer delay since its a continual reconnection attempt.

[3]

2011-12-05 17:54:46,490 ERROR Connection refused

Normal stdout log for the command has a few of these error
messages. Since this is normal activity it would be good to ellide
them unless in verbose mode.

review: Needs Fixing

« Back to merge proposal