memcache._ConnectionDeadError in test runs

Bug #974632 reported by Gary Poster
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Gary Poster
Python Memcached
In Progress
High
Brad Crittenden

Bug Description

In parallel test runs I often see complaints like the below. For some reason, these sometimes are not consumed by testr as test errors. William Grant, in a launchpad-dev email titled "[Launchpad-dev] memcache errors in ec2 and buildbot -- newer python-memcached to blame?" pointed to a similar error in the non-parallel build: http://paste.ubuntu.com/896153/ . He writes:

> It's behind most of the recent buildbot failures, and is possibly
> related to system load or disk space (it went away for a couple of runs
> after buildbot was aggressively cleaned yesterday).

> python-memcached was upgraded a week ago, so it's tempting to revert it
> to see if it makes things more stable. What do people think?

Deryck and Aaron replied that the upgrade happened to fix bug 954232, and that they'd prefer not to revert it.

Here's the traceback I saw.

Traceback (most recent call last):
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/bin/test", line 278, in <module>
    testrunner.run([])
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/__init__.py", line 32, in run
    failed = run_internal(defaults, args, script_parts=script_parts)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/__init__.py", line 45, in run_internal
    runner.run()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 139, in run
    self.run_tests()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 220, in run_tests
    setup_layers, self.failures, self.errors)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 401, in run_layer
    return run_tests(options, tests, layer_name, failures, errors)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 326, in run_tests
    test(result)
  File "/usr/lib/python2.6/unittest.py", line 300, in __call__
    return self.run(*args, **kwds)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/testcase.py", line 517, in run
    return self.__RunTest(self, self.exception_handlers).run(result)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 74, in run
    return self._run_one(actual_result)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 88, in _run_one
    return self._run_prepared_result(ExtendedToOriginalDecorator(result))
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 96, in _run_prepared_result
    result.startTest(self.case)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/testresult/real.py", line 591, in startTest
    return self.decorated.startTest(test)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 731, in startTest
    self.testSetUp()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 716, in testSetUp
    layer.testSetUp()
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/lib/lp/services/testing/profiled.py", line 30, in profiled_func
    return func(cls, *args, **kw)
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/lib/lp/testing/layers.py", line 650, in testSetUp
    MemcachedLayer.client.flush_all()
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 304, in flush_all
    s.expect("OK")
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 1136, in expect
    line = self.readline()
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 1129, in readline
    raise _ConnectionDeadError()
memcache._ConnectionDeadError
Traceback (most recent call last):
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/bin/test", line 278, in <module>
    testrunner.run([])
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/__init__.py", line 32, in run
    failed = run_internal(defaults, args, script_parts=script_parts)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/__init__.py", line 45, in run_internal
    runner.run()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 139, in run
    self.run_tests()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 220, in run_tests
    setup_layers, self.failures, self.errors)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 401, in run_layer
    return run_tests(options, tests, layer_name, failures, errors)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 326, in run_tests
    test(result)
  File "/usr/lib/python2.6/unittest.py", line 300, in __call__
    return self.run(*args, **kwds)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/testcase.py", line 517, in run
    return self.__RunTest(self, self.exception_handlers).run(result)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 74, in run
    return self._run_one(actual_result)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 88, in _run_one
    return self._run_prepared_result(ExtendedToOriginalDecorator(result))
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 96, in _run_prepared_result
    result.startTest(self.case)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/testresult/real.py", line 591, in startTest
    return self.decorated.startTest(test)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 731, in startTest
    self.testSetUp()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 716, in testSetUp
    layer.testSetUp()
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/lib/lp/services/testing/profiled.py", line 30, in profiled_func
    return func(cls, *args, **kw)
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/lib/lp/testing/layers.py", line 650, in testSetUp
    MemcachedLayer.client.flush_all()
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 304, in flush_all
    s.expect("OK")
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 1136, in expect
    line = self.readline()
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 1129, in readline
    raise _ConnectionDeadError()
memcache._ConnectionDeadError
Traceback (most recent call last):
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/bin/test", line 278, in <module>
    testrunner.run([])
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/__init__.py", line 32, in run
    failed = run_internal(defaults, args, script_parts=script_parts)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/__init__.py", line 45, in run_internal
    runner.run()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 139, in run
    self.run_tests()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 220, in run_tests
    setup_layers, self.failures, self.errors)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 401, in run_layer
    return run_tests(options, tests, layer_name, failures, errors)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 326, in run_tests
    test(result)
  File "/usr/lib/python2.6/unittest.py", line 300, in __call__
    return self.run(*args, **kwds)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/testcase.py", line 517, in run
    return self.__RunTest(self, self.exception_handlers).run(result)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 74, in run
    return self._run_one(actual_result)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 88, in _run_one
    return self._run_prepared_result(ExtendedToOriginalDecorator(result))
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 96, in _run_prepared_result
    result.startTest(self.case)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/testresult/real.py", line 591, in startTest
    return self.decorated.startTest(test)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 731, in startTest
    self.testSetUp()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 716, in testSetUp
    layer.testSetUp()
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/lib/lp/services/testing/profiled.py", line 30, in profiled_func
    return func(cls, *args, **kw)
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/lib/lp/testing/layers.py", line 650, in testSetUp
    MemcachedLayer.client.flush_all()
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 304, in flush_all
    s.expect("OK")
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 1136, in expect
    line = self.readline()
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 1129, in readline
    raise _ConnectionDeadError()
memcache._ConnectionDeadError
Traceback (most recent call last):
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/bin/test", line 278, in <module>
    testrunner.run([])
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/__init__.py", line 32, in run
    failed = run_internal(defaults, args, script_parts=script_parts)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/__init__.py", line 45, in run_internal
    runner.run()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 139, in run
    self.run_tests()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 220, in run_tests
    setup_layers, self.failures, self.errors)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 401, in run_layer
    return run_tests(options, tests, layer_name, failures, errors)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 326, in run_tests
    test(result)
  File "/usr/lib/python2.6/unittest.py", line 300, in __call__
    return self.run(*args, **kwds)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/testcase.py", line 517, in run
    return self.__RunTest(self, self.exception_handlers).run(result)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 74, in run
    return self._run_one(actual_result)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 88, in _run_one
    return self._run_prepared_result(ExtendedToOriginalDecorator(result))
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/runtest.py", line 96, in _run_prepared_result
    result.startTest(self.case)
  File "/var/lib/buildbot/dependencies/eggs/testtools-0.9.14-py2.6.egg/testtools/testresult/real.py", line 591, in startTest
    return self.decorated.startTest(test)
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 731, in startTest
    self.testSetUp()
  File "/var/lib/buildbot/dependencies/eggs/zope.testing-3.9.4_p6-py2.6.egg/zope/testing/testrunner/runner.py", line 716, in testSetUp
    layer.testSetUp()
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/lib/lp/services/testing/profiled.py", line 30, in profiled_func
    return func(cls, *args, **kw)
  File "/var/lib/buildbot/slaves/slave/lucid-devel/build/lib/lp/testing/layers.py", line 650, in testSetUp
    MemcachedLayer.client.flush_all()
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 304, in flush_all
    s.expect("OK")
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 1136, in expect
    line = self.readline()
  File "/var/lib/buildbot/dependencies/eggs/python_memcached-1.48-py2.6.egg/memcache.py", line 1129, in readline
    raise _ConnectionDeadError()
memcache._ConnectionDeadError

Related branches

Revision history for this message
Brad Crittenden (bac) wrote :

We recently upgraded python-memcached from 1.45 to 1.48 (it looks like 1.49 was available at the time but not used).

The fix for bug 728359 directly affected the readline method and introduced the _ConnectionDeadError, which is now raised in readline() when the server socket is seen to be closed.

Previously, under those conditions readline() simply returned None. Now it raises an exception. In two places (_get() and _set()) that exception is caught and a retry is performed. Unfortunately, many other places in memcache.py call readline() but do not handle the exception.

The path that we're experiencing is the expect() method. In 1.45 if the socket were closed, readline() would return None, the expected values wouldn't match, and a message was written to the debug log but there were no other ill effects.

Now we get this exception propagated.

It appears to me the fix for bug 728359 was not implemented thoroughly and has left python-memcached in state where exceptions can escape that were not intended, as indicated by the name _ConnectionDeadError.

Brad Crittenden (bac)
Changed in python-memcached:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Brad Crittenden (bac)
Revision history for this message
Deryck Hodge (deryck) wrote :

FWIW, 1.49 isn't on PyPI, at least not that I can find. So 1.48 is the latest there, and that's why it was chosen for Launchpad.

Revision history for this message
Sean Reifschneider (jafo) wrote :

I've pushed a patch to trunk that includes my proposed fix for this, based on the discussion in Brad's patch. I found a few other cases where the _unsafe_* functions are calling readline(), so I added an argument to readline to request the exception, and returned the default behavior, then made these paths that are within the exception catching request the exception.

I'd appreciate a review and testing against your test suite.

Thanks everyone!

Sean

Revision history for this message
Sean Reifschneider (jafo) wrote :

Here is the diff of what I applied.

Revision history for this message
Brad Crittenden (bac) wrote :

Hi Sean, thanks for this patch. It looks good to me.

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Gary Poster (gary)
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
Gary Poster (gary)
tags: added: qa-untestable
removed: qa-needstesting
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.