Merge lp:~stub/launchpad/bug-504291-disconnection-errors into lp:launchpad

Proposed by Stuart Bishop on 2010-01-14
Status: Merged
Approved by: Stuart Bishop on 2010-01-19
Approved revision: not available
Merged at revision: not available
Proposed branch: lp:~stub/launchpad/bug-504291-disconnection-errors
Merge into: lp:launchpad
Diff against target: 194 lines (+78/-20)
4 files modified
lib/canonical/launchpad/webapp/dbpolicy.py (+4/-1)
lib/canonical/launchpad/webapp/publication.py (+20/-4)
lib/canonical/launchpad/webapp/tests/test_publication.py (+52/-15)
lib/lp/scripts/utilities/importfascist.py (+2/-0)
To merge this branch: bzr merge lp:~stub/launchpad/bug-504291-disconnection-errors
Reviewer Review Type Date Requested Status
Henning Eggers (community) code 2010-01-14 Approve on 2010-01-18
Review via email: mp+17371@code.launchpad.net

Commit Message

Detect, report and repair Storm Stores in an invalid state after handling a request.

To post a comment you must log in.
Stuart Bishop (stub) wrote :

Bug #504291

It seems that somehow our database connections can be left in a disconnected state after a request has been handled. This stuffs up the next request handled by that thread.

This branch detects this case, logs a soft oops to help us track down the cause, and repairs the database connection to a usable state.

Henning Eggers (henninge) wrote :
Download full text (3.6 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks for fixing this annoying bug!

Since I cannot comment much in the actual fix, I found some things to
complain about in the test ... ;-)

> === modified file 'lib/canonical/launchpad/webapp/tests/test_publication.py'
> --- lib/canonical/launchpad/webapp/tests/test_publication.py 2009-10-07 18:41:52 +0000
> +++ lib/canonical/launchpad/webapp/tests/test_publication.py 2010-01-14 12:28:17 +0000
> @@ -121,6 +121,43 @@
> # Ensure that it is different to the last logged OOPS.
> self.assertNotEqual(repr(last_oops), repr(next_oops))
>
> + def test_bug_504291_logs_oops(self):

I am not too fond of bug ids in method names. Could you find a different
name that fits for the situation when the oops is logged? I think it is
enough to refer to the bug number in the comment.

> + # Bug #504291 was that a Store was being left in a disconnected
> + # state after a request, causing subsequent requests handled by that
> + # thread to fail. We detect this state in endRequest and log an
> + # OOPS to help track down the trigger.
> + error_reporting_utility = getUtility(IErrorReportingUtility)
> + last_oops = error_reporting_utility.getLastOopsReport()
> +
> + request = LaunchpadTestRequest()
> + publication = WebServicePublication(None)
> + da.set_request_started()

I had to get the branch in order to understand what "da" is. How about
doing the import "as dbadapter"? That would be much clearer, if I
understood the abbreviation correctly. ;)

> +
> + # Disconnect a store
> + from canonical.launchpad.database.emailaddress import EmailAddress
> + from canonical.launchpad.interfaces.lpstorm import IMasterStore
> + from storm.database import STATE_DISCONNECTED, STATE_RECONNECT

Why do you put these imports in here and not at the top of the file?
Please move them there (unless there is a circular import problem that I
am not aware of).

> + store = IMasterStore(EmailAddress)
> + store._connection._state = STATE_DISCONNECTED
> +
> + # Invoke the endRequest hook.
> + publication.endRequest(request, None)
> +
> + next_oops = error_reporting_utility.getLastOopsReport()
> +
> + # Ensure that it is different to the last logged OOPS.
> + self.assertNotEqual(repr(last_oops), repr(next_oops))
> +
> + # Ensure the OOPS mentions the correct exception
> + self.assertNotEqual(repr(next_oops).find("Bug #504291"), -1)

Hm, returning -1 is a speciality of the find method but you are not
really comparing values here. This would read more natural as

    self.assertTrue(repr(next_oops).find("Bug #504291") == -1)

I think but I cannot be too sure about it.

> +
> + # Ensure the OOPS is correctly marked as informational only.
> + self.assertEqual(next_oops.informational, 'True')
> +
> + # Ensure the store has been rolled back and in a usable state.
> + self.assertEqual(store._connection._state, STATE_RECONNECT)
> + store.find(EmailAddress).first()

Err, what is that last line about? Are you testing for an OOPS? Please
add a com...

Read more...

review: Approve (code)
Stuart Bishop (stub) wrote :
Download full text (3.8 KiB)

On Mon, Jan 18, 2010 at 11:54 PM, Henning Eggers <email address hidden> wrote:

>> +    def test_bug_504291_logs_oops(self):
>
> I am not too fond of bug ids in method names. Could you find a different
> name that fits for the situation when the oops is logged? I think it is
> enough to refer to the bug number in the comment.

Its now 'store_disconnected_after_request_handled_logs_oops', which was the sort of mouthful I was trying to avoid by using the bug number. I'm not sure if it is any more meaningful.

>> +        # Bug #504291 was that a Store was being left in a disconnected
>> +        # state after a request, causing subsequent requests handled by that
>> +        # thread to fail. We detect this state in endRequest and log an
>> +        # OOPS to help track down the trigger.
>> +        error_reporting_utility = getUtility(IErrorReportingUtility)
>> +        last_oops = error_reporting_utility.getLastOopsReport()
>> +
>> +        request = LaunchpadTestRequest()
>> +        publication = WebServicePublication(None)
>> +        da.set_request_started()
>
> I had to get the branch in order to understand what "da" is. How about
> doing the import "as dbadapter"? That would be much clearer, if I
> understood the abbreviation correctly. ;)

Done.

>> +        # Disconnect a store
>> +        from canonical.launchpad.database.emailaddress import EmailAddress
>> +        from canonical.launchpad.interfaces.lpstorm import IMasterStore
>> +        from storm.database import STATE_DISCONNECTED, STATE_RECONNECT
>
> Why do you put these imports in here and not at the top of the file?
> Please move them there (unless there is a circular import problem that I
> am not aware of).

I actually prefer it now from a readability POV as someone reading the test doesn't need to jump to the top of the file to see this information. I suspect I got into the habit dealing with circular imports though - stick them inline and you have one less thing to worry about breaking your tests.

I've moved them to the top per standard practice.

>> +        store = IMasterStore(EmailAddress)
>> +        store._connection._state = STATE_DISCONNECTED
>> +
>> +        # Invoke the endRequest hook.
>> +        publication.endRequest(request, None)
>> +
>> +        next_oops = error_reporting_utility.getLastOopsReport()
>> +
>> +        # Ensure that it is different to the last logged OOPS.
>> +        self.assertNotEqual(repr(last_oops), repr(next_oops))
>> +
>> +        # Ensure the OOPS mentions the correct exception
>> +        self.assertNotEqual(repr(next_oops).find("Bug #504291"), -1)
>
> Hm, returning -1 is a speciality of the find method but you are not
> really comparing values here. This would read more natural as
>
>    self.assertTrue(repr(next_oops).find("Bug #504291") == -1)
>
> I think but I cannot be too sure about it.

You mean self.assertTrue(repr(next_oops).find("...") != -1)

I've changed it. Both seem equally readable to me.

>> +        # Ensure the OOPS is correctly marked as informational only.
>> +        self.assertEqual(next_oops.informational, 'True')
>> +
>> +...

Read more...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/canonical/launchpad/webapp/dbpolicy.py'
2--- lib/canonical/launchpad/webapp/dbpolicy.py 2010-01-14 13:25:34 +0000
3+++ lib/canonical/launchpad/webapp/dbpolicy.py 2010-01-19 04:28:14 +0000
4@@ -6,9 +6,12 @@
5 __metaclass__ = type
6 __all__ = [
7 'BaseDatabasePolicy',
8+ 'DatabaseBlockedPolicy',
9 'LaunchpadDatabasePolicy',
10+ 'MasterDatabasePolicy',
11+ 'ReadOnlyLaunchpadDatabasePolicy',
12 'SlaveDatabasePolicy',
13- 'MasterDatabasePolicy',
14+ 'SlaveOnlyDatabasePolicy',
15 ]
16
17 from datetime import datetime, timedelta
18
19=== modified file 'lib/canonical/launchpad/webapp/publication.py'
20--- lib/canonical/launchpad/webapp/publication.py 2010-01-14 13:25:34 +0000
21+++ lib/canonical/launchpad/webapp/publication.py 2010-01-19 04:28:14 +0000
22@@ -10,6 +10,7 @@
23 import gc
24 import os
25 import re
26+import sys
27 import thread
28 import threading
29 import traceback
30@@ -23,6 +24,7 @@
31 import transaction
32
33 from psycopg2.extensions import TransactionRollbackError
34+from storm.database import STATE_DISCONNECTED
35 from storm.exceptions import DisconnectionError, IntegrityError
36 from storm.zope.interfaces import IZStorm
37
38@@ -631,12 +633,26 @@
39 if is_browser(request) and status_group == '5XXs':
40 OpStats.stats['5XXs_b'] += 1
41
42- # Reset all Storm stores when not running the test suite. We could
43- # reset them when running the test suite but that'd make writing tests
44- # a much more painful task. We still reset the slave stores though
45- # to minimize stale cache issues.
46+ # Make sure our databases are in a sane state for the next request.
47 thread_name = threading.currentThread().getName()
48 for name, store in getUtility(IZStorm).iterstores():
49+ try:
50+ assert store._connection._state != STATE_DISCONNECTED, (
51+ "Bug #504291: Store left in a disconnected state.")
52+ except AssertionError:
53+ # The Store is in a disconnected state. This should
54+ # not happen, as store.rollback() should have been called
55+ # by now. Log an OOPS so we know about this. This
56+ # is Bug #504291 happening.
57+ getUtility(IErrorReportingUtility).handling(
58+ sys.exc_info(), request)
59+ # Repair things so the server can remain operational.
60+ store.rollback()
61+ # Reset all Storm stores when not running the test suite.
62+ # We could reset them when running the test suite but
63+ # that'd make writing tests a much more painful task. We
64+ # still reset the slave stores though to minimize stale
65+ # cache issues.
66 if thread_name != 'MainThread' or name.endswith('-slave'):
67 store.reset()
68
69
70=== modified file 'lib/canonical/launchpad/webapp/tests/test_publication.py'
71--- lib/canonical/launchpad/webapp/tests/test_publication.py 2010-01-14 13:28:18 +0000
72+++ lib/canonical/launchpad/webapp/tests/test_publication.py 2010-01-19 04:28:14 +0000
73@@ -10,6 +10,7 @@
74
75 from contrib.oauth import OAuthRequest, OAuthSignatureMethod_PLAINTEXT
76
77+from storm.database import STATE_DISCONNECTED, STATE_RECONNECT
78 from storm.exceptions import DisconnectionError
79 from storm.zope.interfaces import IZStorm
80
81@@ -18,19 +19,21 @@
82 from zope.publisher.interfaces import Retry
83
84 from canonical.config import dbconfig
85-from canonical.testing import DatabaseFunctionalLayer
86+from canonical.launchpad.database.emailaddress import EmailAddress
87+from canonical.launchpad.interfaces.lpstorm import IMasterStore
88 from canonical.launchpad.interfaces.oauth import IOAuthConsumerSet
89 from canonical.launchpad.ftests import ANONYMOUS, login
90 from canonical.launchpad.tests.readonly import (
91 remove_read_only_file, touch_read_only_file)
92-from lp.testing import TestCase, TestCaseWithFactory
93-import canonical.launchpad.webapp.adapter as da
94+import canonical.launchpad.webapp.adapter as dbadapter
95 from canonical.launchpad.webapp.interfaces import (
96 IStoreSelector, MAIN_STORE, MASTER_FLAVOR, OAuthPermission, SLAVE_FLAVOR)
97 from canonical.launchpad.webapp.publication import (
98 is_browser, LaunchpadBrowserPublication)
99 from canonical.launchpad.webapp.servers import (
100 LaunchpadTestRequest, WebServicePublication)
101+from canonical.testing import DatabaseFunctionalLayer
102+from lp.testing import TestCase, TestCaseWithFactory
103
104
105 class TestLaunchpadBrowserPublication(TestCase):
106@@ -190,7 +193,7 @@
107 # disconnections, as per Bug #373837.
108 request = LaunchpadTestRequest()
109 publication = WebServicePublication(None)
110- da.set_request_started()
111+ dbadapter.set_request_started()
112 try:
113 raise DisconnectionError('Fake')
114 except DisconnectionError:
115@@ -198,17 +201,51 @@
116 Retry,
117 publication.handleException,
118 None, request, sys.exc_info(), True)
119- da.clear_request_started()
120- next_oops = error_reporting_utility.getLastOopsReport()
121-
122- # Ensure the OOPS mentions the correct exception
123- self.assertNotEqual(repr(next_oops).find("DisconnectionError"), -1)
124-
125- # Ensure the OOPS is correctly marked as informational only.
126- self.assertEqual(next_oops.informational, 'True')
127-
128- # Ensure that it is different to the last logged OOPS.
129- self.assertNotEqual(repr(last_oops), repr(next_oops))
130+ dbadapter.clear_request_started()
131+ next_oops = error_reporting_utility.getLastOopsReport()
132+
133+ # Ensure the OOPS mentions the correct exception
134+ self.assertTrue(repr(next_oops).find("DisconnectionError") != -1)
135+
136+ # Ensure the OOPS is correctly marked as informational only.
137+ self.assertEqual(next_oops.informational, 'True')
138+
139+ # Ensure that it is different to the last logged OOPS.
140+ self.assertNotEqual(repr(last_oops), repr(next_oops))
141+
142+ def test_store_disconnected_after_request_handled_logs_oops(self):
143+ # Bug #504291 was that a Store was being left in a disconnected
144+ # state after a request, causing subsequent requests handled by that
145+ # thread to fail. We detect this state in endRequest and log an
146+ # OOPS to help track down the trigger.
147+ error_reporting_utility = getUtility(IErrorReportingUtility)
148+ last_oops = error_reporting_utility.getLastOopsReport()
149+
150+ request = LaunchpadTestRequest()
151+ publication = WebServicePublication(None)
152+ dbadapter.set_request_started()
153+
154+ # Disconnect a store
155+ store = IMasterStore(EmailAddress)
156+ store._connection._state = STATE_DISCONNECTED
157+
158+ # Invoke the endRequest hook.
159+ publication.endRequest(request, None)
160+
161+ next_oops = error_reporting_utility.getLastOopsReport()
162+
163+ # Ensure that it is different to the last logged OOPS.
164+ self.assertNotEqual(repr(last_oops), repr(next_oops))
165+
166+ # Ensure the OOPS mentions the correct exception
167+ self.assertNotEqual(repr(next_oops).find("Bug #504291"), -1)
168+
169+ # Ensure the OOPS is correctly marked as informational only.
170+ self.assertEqual(next_oops.informational, 'True')
171+
172+ # Ensure the store has been rolled back and in a usable state.
173+ self.assertEqual(store._connection._state, STATE_RECONNECT)
174+ store.find(EmailAddress).first() # Confirms Store is working.
175
176 def test_is_browser(self):
177 # No User-Agent: header.
178
179=== modified file 'lib/lp/scripts/utilities/importfascist.py'
180--- lib/lp/scripts/utilities/importfascist.py 2010-01-06 11:27:28 +0000
181+++ lib/lp/scripts/utilities/importfascist.py 2010-01-19 04:28:14 +0000
182@@ -59,10 +59,12 @@
183 valid_imports_not_in_all = {
184 'cookielib': set(['domain_match']),
185 'email.Utils': set(['mktime_tz']),
186+ 'storm.database': set(['STATE_DISCONNECTED']),
187 'textwrap': set(['dedent']),
188 'zope.component': set(
189 ['adapter',
190 'ComponentLookupError',
191+ 'provideAdapter',
192 'provideHandler',
193 ]),
194 }