tests.test_transport.TestSSHConnections.test_bzr_connect_to_bzr_ssh considered harmful

Bug #626876 reported by Vincent Ladeuil
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Critical
John A Meinel

Bug Description

This test:
- has failed intermittently for months on babune across almost all platforms,
- contains several XXX and one FIXME,
- is huge
- is blocking PQM as of today

My gut feeling is that it tries to test too much at once and should be split.

It may also be related to bug #579530 fixed by spiv in a private ppa
- fails on hardy/py2.4/paramiko-1.6.4 with:
Traceback (most recent call last):
  File "/home/babune/lib/python/testtools/runtest.py", line 144, in _run_user
    return fn(*args)
  File "/home/babune/lib/python/testtools/testcase.py", line 465, in _run_test_method
    testMethod()
  File "/home/babune/src/bzr/5396/bzrlib/tests/test_transport.py", line 981, in test_bzr_connect_to_bzr_ssh
    t.mkdir('foo')
  File "/home/babune/src/bzr/5396/bzrlib/transport/remote.py", line 242, in mkdir
    self._serialise_optional_mode(mode))
  File "/home/babune/src/bzr/5396/bzrlib/transport/remote.py", line 182, in _call2
    return self._client.call(method, *args)
  File "/home/babune/src/bzr/5396/bzrlib/smart/client.py", line 132, in call
    result, protocol = self.call_expecting_body(method, *args)
  File "/home/babune/src/bzr/5396/bzrlib/smart/client.py", line 144, in call_expecting_body
    return self._call_and_read_response(
  File "/home/babune/src/bzr/5396/bzrlib/smart/client.py", line 92, in _call_and_read_response
    response_tuple = response_handler.read_response_tuple(
  File "/home/babune/src/bzr/5396/bzrlib/smart/message.py", line 299, in read_response_tuple
    self._wait_for_response_args()
  File "/home/babune/src/bzr/5396/bzrlib/smart/message.py", line 264, in _wait_for_response_args
    self._read_more()
  File "/home/babune/src/bzr/5396/bzrlib/smart/message.py", line 285, in _read_more
    raise errors.ConnectionReset(
ConnectionReset: Connection closed: Unexpected end of message. Please check connectivity and permissions, and report a bug if problems persist.
------------

- succeds on hardy/py2.5/paramiko-1.6.4 with a traceback from a thread that can be ignored
(this is a paramiko private thread):
Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib/python2.5/threading.py", line 486, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.5/threading.py", line 446, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/babune/src/bzr/5396/bzrlib/tests/test_transport.py", line 940, in ferry_bytes
    write(bytes)
  File "/home/babune/lib/python/paramiko/channel.py", line 681, in sendall_stderr
    raise socket.error('Socket is closed')
error: Socket is closed

- succeeds on karmic/py2.4/ haha, skipped, no paramiko here :(

- fails on karmic/py25/paramiko-1.7.4 with the same traceback than hardy/py24

- succeeds on karmic/py26/paramiko-1.7.4 with the same traceback than hardy/py25
 (note the switch between 2.4 and 2.5 when going from hardy to karmic)

Related branches

Vincent Ladeuil (vila)
Changed in bzr:
status: New → Confirmed
Vincent Ladeuil (vila)
Changed in bzr:
importance: Undecided → Critical
Revision history for this message
Vincent Ladeuil (vila) wrote :

fails on freebsd/py2.5/paramiko-1.7.6 with traceback:
Traceback (most recent call last):
  File "/home/babune/lib/python/testtools/runtest.py", line 144, in _run_user
    return fn(*args)
  File "/home/babune/lib/python/testtools/testcase.py", line 465, in _run_test_method
    testMethod()
  File "/usr/home/babune/src/bzr/trunk/bzrlib/tests/test_transport.py", line 981, in test_bzr_connect_to_bzr_ssh
    t.mkdir('foo')
  File "/usr/home/babune/src/bzr/trunk/bzrlib/transport/remote.py", line 242, in mkdir
    self._serialise_optional_mode(mode))
  File "/usr/home/babune/src/bzr/trunk/bzrlib/transport/remote.py", line 182, in _call2
    return self._client.call(method, *args)
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/client.py", line 132, in call
    result, protocol = self.call_expecting_body(method, *args)
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/client.py", line 145, in call_expecting_body
    method, args, expect_response_body=True)
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/client.py", line 90, in _call_and_read_response
    readv_body=readv_body, body_stream=body_stream)
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/client.py", line 63, in _send_request
    encoder.call(method, *args)
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/protocol.py", line 1309, in call
    self._write_end()
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/protocol.py", line 1133, in _write_end
    self.flush()
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/protocol.py", line 1099, in flush
    self._real_write_func(''.join(self._buf))
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/medium.py", line 396, in accept_bytes
    self._accept_bytes(bytes)
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/medium.py", line 978, in _accept_bytes
    self._medium._accept_bytes(bytes)
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/medium.py", line 794, in _accept_bytes
    self._ensure_connection()
  File "/usr/home/babune/src/bzr/trunk/bzrlib/smart/medium.py", line 818, in _ensure_connection
    '--directory=/', '--allow-writes'])
  File "/usr/home/babune/src/bzr/trunk/bzrlib/transport/ssh.py", line 333, in connect_ssh
    channel.exec_command(cmdline)
  File "/usr/local/lib/python2.5/site-packages/paramiko/channel.py", line 213, in exec_command
    self._wait_for_event()
  File "/usr/local/lib/python2.5/site-packages/paramiko/channel.py", line 1084, in _wait_for_event
    raise e
EOFError
------------

Note that there is also a warning there:

/usr/local/lib/python2.5/site-packages/Crypto/Util/randpool.py:40: RandomPool_DeprecationWarning: This application uses RandomPool, which is BROKEN in older releases. See http://www.pycrypto.org/randpool-broken
  RandomPool_DeprecationWarning)

Revision history for this message
Vincent Ladeuil (vila) wrote :

succeeds on gentoo/py26/paramiko-1.7.6

same warning as freebsd about paramiko
same traceback as hardy/py25 that can be ignored

Revision history for this message
Vincent Ladeuil (vila) wrote :

I think that's enough evidence that this test should be rewritten in a more granular way to better
capture the possible failures.

Revision history for this message
John A Meinel (jameinel) wrote :

So far, the only 'granularity' I've seen is to remove the 'mkdir' call. Basically, I feel that it *is* important to have a real connection to an ssh server in the test suite if we can. The test is a little bit long, but only because the ssh server code is inline in the test.

As such, there doesn't seem a lot that could be done. If it is an interaction with threads and sockets, etc, then we could arguably start an ssh server in a subprocess, connect to *that* and then shut it down afterwards. That might provide better *isolation* but I don't see it as a *granularity* thing.

Revision history for this message
Vincent Ladeuil (vila) wrote :

> If it is an interaction with threads and sockets

Let's look at what is involved in that test then:

        self.start_server(ssh_server)

This will start a thread for the server and since it's threading server, one for each connection. Each of them have a socket.
Since we're using paramiko, that also means one thread per socket as per paramiko implementation.

Then we have:

        t.mkdir('foo')

This will also create a socket and its paramiko thread.

When the server receives the connection, it will start a subprocess and 3 threads for stdin, stdout and stderr each of them
associated with a pipe, so 3 file handles in the parent process and 3 file handles in the server process.

Paramiko use a 0.1 timeout for handling his sockets adding a lot of possible failures.

I can see a lot a *granularity* to add to this design to better track where the failures occur.

Or, come to think of it, I think the design is just to brittle to continue with and we'd better start again from scratch :)

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (9.3 KiB)

I'm getting this on Windows w/ python 2.6 and paramiko 1.7.5. Sorry if there are formatting issues.

Note that this is ConnectionReset, but paramiko isn't handling the Windows failure code the same way as it seems to do on Linux. (Or *we* aren't translating it, or something).

======================================================================
ERROR: bzrlib.tests.test_transport.TestSSHConnections.test_bzr_connect_to_bzr_ssh
----------------------------------------------------------------------
_StringException: Text attachment: log
------------
1.862 failed to load system host keys: [Errno 2] No such file or directory: 'C:/users/jameinel/appd
ata/local/temp/testbzr-lcsb0y.tmp/ns.test_bzr_connect_to_bzr_ssh/home/.ssh/known_hosts'
1.862 failed to load bzr host keys: [Errno 2] No such file or directory: 'C:/users/jameinel/appdata
/local/temp/testbzr-lcsb0y.tmp/ns.test_bzr_connect_to_bzr_ssh/home/bazaar/2.0/ssh_host_keys'
1.862 creating config parent directory: 'C:/users/jameinel/appdata/local/temp/testbzr-lcsb0y.tmp/ns
.test_bzr_connect_to_bzr_ssh/home/bazaar'
1.862 creating config directory: 'C:/users/jameinel/appdata/local/temp/testbzr-lcsb0y.tmp/ns.test_b
zr_connect_to_bzr_ssh/home/bazaar/2.0'
   DEBUG starting thread (client mode): 0x279ca70L
    INFO Connected (version 2.0, client paramiko_1.7.5)
   DEBUG kex algos:['diffie-hellman-group1-sha1'] server key:['ssh-rsa'] client encrypt:['aes128-ct
r', 'aes256-ctr', 'aes128-cbc', 'blowfish-cbc', 'aes256-cbc', '3des-cbc', 'arcfour128', 'arcfour256'
] server encrypt:['aes128-ctr', 'aes256-ctr', 'aes128-cbc', 'blowfish-cbc', 'aes256-cbc', '3des-cbc'
, 'arcfour128', 'arcfour256'] client mac:['hmac-sha1', 'hmac-md5', 'hmac-sha1-96', 'hmac-md5-96'] se
rver mac:['hmac-sha1', 'hmac-md5', 'hmac-sha1-96', 'hmac-md5-96'] client compress:['none'] server co
mpress:['none'] client lang:[''] server lang:[''] kex follows?False
   DEBUG Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
   DEBUG using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, r
emote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
   DEBUG Switch to new keys ...
 WARNING Adding ssh-rsa host key for localhost: 60733844CB5186657FDEDAA22B5A57D5
2.131 Trying SSH agent key 9EB9CEAB13B9027CD34F0A72C7A9BA0F
   DEBUG userauth is OK
   ERROR Socket exception: An existing connection was forcibly closed by the remote host (10054)
2.340 opening working tree 'C:/users/jameinel/appdata/local/temp/testbzr-lcsb0y.tmp'
------------
Text attachment: traceback
------------
Traceback (most recent call last):
  File "C:\Python26\lib\site-packages\testtools-0.9.3-py2.6.egg\testtools\runtest.py", line 144, in
_run_user
    return fn(*args)
  File "C:\Python26\lib\site-packages\testtools-0.9.3-py2.6.egg\testtools\testcase.py", line 413, in
 _run_test_method
    testMethod()
  File "C:\Users\jameinel\dev\bzr\bzr.dev\bzrlib\tests\test_transport.py", line 981, in test_bzr_con
nect_to_bzr_ssh
    t.mkdir('foo')
  File "C:\Users\jameinel\dev\bzr\bzr.dev\bzrlib\transport\remote.py", line 242, in mkdir
    self._serialise_optional_mode(mode))
  File "C:\Users\jameinel\dev\bzr...

Read more...

Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (10.7 KiB)

And one more failure on xp/py2.6/paramiko-1.7.6:

./bzr selftest -s bt.test_transport.TestSSHConnections
bzr selftest: C:/cygwin/home/babune/bzr/trunk/bzr
   C:\cygwin\home\babune\bzr\trunk\bzrlib
   bzr-2.3.0dev1 python-2.5.4 Windows-XP-5.1.2600-SP3

ERROR: test_transport.TestSSHConnections.test_bzr_connect_to_bzr_ssh
    Text attachment: log
------------
0.821 failed to load compiled extension: No module named _annotator_pyx
0.821 failed to load compiled extension: No module named _knit_load_data_pyx
0.851 failed to load compiled extension: No module named _known_graph_pyx
0.871 failed to load compiled extension: No module named _groupcompress_pyx
0.871 creating repository in file:///C:/windows/temp/testbzr-rtcmkm.tmp/.bzr/.
0.871 creating branch <bzrlib.branch.BzrBranchFormat7 object at 0x013246B0> in file:///C:/windows/temp/testbzr-rtcmkm.tmp/
0.881 opening working tree 'C:/windows/temp/testbzr-rtcmkm.tmp'
1.041 failed to load system host keys: [Errno 2] No such file or directory: 'C:/windows/temp/testbzr-rtcmkm.tmp/ns.test_bzr_connect_to_bzr_ssh/home/.ssh/known_hosts'
1.041 failed to load bzr host keys: [Errno 2] No such file or directory: 'C:/windows/temp/testbzr-rtcmkm.tmp/ns.test_bzr_connect_to_bzr_ssh/home/bazaar/2.0/ssh_host_keys'
1.041 creating config parent directory: 'C:/windows/temp/testbzr-rtcmkm.tmp/ns.test_bzr_connect_to_bzr_ssh/home/bazaar'
1.041 creating config directory: 'C:/windows/temp/testbzr-rtcmkm.tmp/ns.test_bzr_connect_to_bzr_ssh/home/bazaar/2.0'
   DEBUG starting thread (client mode): 0x155f390L
    INFO Connected (version 2.0, client paramiko_1.7.6)
   DEBUG kex algos:['diffie-hellman-group1-sha1'] server key:['ssh-rsa'] client encrypt:['aes128-ctr', 'aes256-ctr', 'aes128-cbc', 'blowfish-cbc', 'aes256-cbc', '3des-cbc', 'arcfour128', 'arcfour256'] server encrypt:['aes128-ctr', 'aes256-ctr', 'aes128-cbc', 'blowfish-cbc', 'aes256-cbc', '3des-cbc', 'arcfour128', 'arcfour256'] client mac:['hmac-sha1', 'hmac-md5', 'hmac-sha1-96', 'hmac-md5-96'] server mac:['hmac-sha1', 'hmac-md5', 'hmac-sha1-96', 'hmac-md5-96'] client compress:['none'] server compress:['none'] client lang:[''] server lang:[''] kex follows?False
   DEBUG Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
   DEBUG using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
   DEBUG Switch to new keys ...
 WARNING Adding ssh-rsa host key for localhost: 60733844CB5186657FDEDAA22B5A57D5
   DEBUG userauth is OK
    INFO Authentication (password) successful!
   DEBUG [chan 1] Max packet in: 34816 bytes
   DEBUG [chan 1] Max packet out: 34816 bytes
    INFO Secsh channel 1 opened.
   DEBUG [chan 1] Sesch channel 1 request ok
   ERROR Socket exception: Connection reset by peer (10054)
1.662 opening working tree 'C:/windows/temp/testbzr-rtcmkm.tmp'
------------
Text attachment: traceback
------------
Traceback (most recent call last):
  File "\\VBOXSRV\babune\lib\python\testtools\runtest.py", line 144, in _run_user
    return fn(*args)
  File "\\VBOXSRV\babune\lib\python\testtools\testcase.py", line 465, in _run_t...

Revision history for this message
Vincent Ladeuil (vila) wrote :

This currently skipped and should be re-enabled as fast as we can, hence the 'Critical' importance.

John A Meinel (jameinel)
Changed in bzr:
status: Confirmed → In Progress
assignee: nobody → John A Meinel (jameinel)
status: In Progress → Confirmed
status: Confirmed → In Progress
Revision history for this message
John A Meinel (jameinel) wrote :

The final fix is that we just weren't waiting for the paramiko.Transport to finish before we closed the server socket when the Handler returned.

Changed in bzr:
status: In Progress → Fix Released
milestone: none → 2.3b1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.