Code review comment for lp:~stub/launchpad/retry-404

Revision history for this message
Stuart Bishop (stub) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, Jan 9, 2009 at 10:12 PM, Francis J. Lacoste wrote:
> Review: Needs Fixing
>> === modified file
'lib/canonical/launchpad/pagetests/standalone/xx-dbpolicy.txt'
>> +
>> +
>> +A 404 error page is shown when code raises a LookupError. If a slave
>> +database is being used, this might have been caused by replication lag
>> +if the missing data was only recently created. To fix this surprising
>> +error, requests are always retried using the master database before
>> +returning a 404 error to the user.
>> +
>> + >>> anon_browser.handleErrors = True
>> + >>> anon_browser.raiseHttpErrors = False
>> +
>> + # Confirm requests are going to the SLAVE
>> + >>> anon_browser.open('http://launchpad.dev/+whichdb')
>> + >>> print whichdb(anon_browser)
>> + SLAVE
>> +
>> + # The slave database contains no data, but we don't get
>> + # a 404 page - the request is retried against the MASTER.
>> + >>> anon_browser.open('http://launchpad.dev/~stub')
>> + >>> anon_browser.headers['Status']
>> + '200 Ok'
>
> I'm not sure I understand why would fail? In our tests, the SLAVE and the
> MASTER are the same database, so how would a LookupError be raised in one
> case, and not in the other?

At the top of this page test, we change this so the SLAVE database is
an empty database. I've also added some code up there that confirms
the database actually is empty (the other tests didn't care - they
just used the database name as reported by PostgreSQL).

>> === modified file 'lib/canonical/launchpad/webapp/publication.py'
>> --- lib/canonical/launchpad/webapp/publication.py 2008-11-04
16:25:36 +0000
>> +++ lib/canonical/launchpad/webapp/publication.py 2009-01-09
09:12:16 +0000
>> @@ -43,7 +43,8 @@
>> import canonical.launchpad.webapp.adapter as da
>> from canonical.launchpad.webapp.interfaces import (
>> IDatabasePolicy, IPlacelessAuthUtility, IPrimaryContext,
>> - ILaunchpadRoot, IOpenLaunchBag, OffsiteFormPostError)
>> + ILaunchpadRoot, IOpenLaunchBag, OffsiteFormPostError,
>> + IStoreSelector, MAIN_STORE, DEFAULT_FLAVOR, MASTER_FLAVOR)
>> from canonical.launchpad.webapp.opstats import OpStats
>> from canonical.launchpad.webapp.uri import URI, InvalidURIError
>> from canonical.launchpad.webapp.vhosts import allvhosts
>> @@ -431,6 +432,18 @@
>> # the publication, so there's nothing we need to do here.
>> pass
>>
>> + # If we get a LookupError and the default database being used is
>> + # a replica, raise a Retry exception instead of returning
>> + # the 404 error page. We do this in case the LookupError is
>> + # caused by replication lag. Our database policy forces the
>> + # use of the master database for retries.
>> + if retry_allowed and isinstance(exc_info[1], LookupError):
>> + store_selector = getUtility(IStoreSelector)
>> + default_store = store_selector.get(MAIN_STORE, DEFAULT_FLAVOR)
>> + master_store = store_selector.get(MAIN_STORE, MASTER_FLAVOR)
>> + if default_store is not master_store:
>> + raise Retry(exc_info)
>> +
>> # Reraise Retry exceptions rather than log.
>> if retry_allowed and isinstance(
>> exc_info[1], (Retry, DisconnectionError, IntegrityError,
>
> If you look at the block at the context boundary, you'll see that it removes
> the variables related to tracking the number of ticks in the request.
>
> So you should either do the same thing, or move your check into a
> shouldRetryLookup() method and it to the list of condition on which that block
> is triggered.

I've refactored the handleException method so the tickcounts are
handled correctly. I created a should_retry() helper in the
handleException method to encapsulate the logic.

- --
Stuart Bishop
http://www.stuartbishop.net/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFJay/jAfqZj7rGN0oRAmFrAJ9jadxhwFI4FkOH+LixckOO6wcG7QCcCDL5
6Aj4SeUZACZzgZF15sVn/CY=
=WIhw
-----END PGP SIGNATURE-----

1=== modified file 'lib/canonical/launchpad/pagetests/standalone/xx-dbpolicy.txt'
2--- lib/canonical/launchpad/pagetests/standalone/xx-dbpolicy.txt 2009-01-09 09:12:16 +0000
3+++ lib/canonical/launchpad/pagetests/standalone/xx-dbpolicy.txt 2009-01-12 11:29:50 +0000
4@@ -17,6 +17,24 @@
5 ... """)
6 >>> config.push('empty_slave', config_overlay)
7
8+We should confirm that the empty database is as empty as we hope it is.
9+
10+ >>> from zope.component import getUtility
11+ >>> from canonical.launchpad.webapp.interfaces import (
12+ ... IStoreSelector, MAIN_STORE, MASTER_FLAVOR, SLAVE_FLAVOR)
13+ >>> from canonical.launchpad.database.person import Person
14+ >>> slave_store = getUtility(IStoreSelector).get(
15+ ... MAIN_STORE, SLAVE_FLAVOR)
16+ >>> master_store = getUtility(IStoreSelector).get(
17+ ... MAIN_STORE, MASTER_FLAVOR)
18+ >>> slave_store.find(Person).count()
19+ 0
20+ >>> master_store.find(Person).count() > 0
21+ True
22+
23+This helper parses the output of the +whichdb view (which unfortunately
24+needs to be created externally to this pagetest).
25+
26 >>> def whichdb(browser):
27 ... dbname = extract_text(find_tag_by_id(browser.contents, 'dbname'))
28 ... if dbname == 'launchpad_ftest':
29
30=== modified file 'lib/canonical/launchpad/webapp/publication.py'
31--- lib/canonical/launchpad/webapp/publication.py 2009-01-09 09:12:16 +0000
32+++ lib/canonical/launchpad/webapp/publication.py 2009-01-12 11:53:42 +0000
33@@ -431,23 +431,48 @@
34 # The exception wasn't raised in the middle of the traversal nor
35 # the publication, so there's nothing we need to do here.
36 pass
37-
38- # If we get a LookupError and the default database being used is
39- # a replica, raise a Retry exception instead of returning
40- # the 404 error page. We do this in case the LookupError is
41- # caused by replication lag. Our database policy forces the
42- # use of the master database for retries.
43- if retry_allowed and isinstance(exc_info[1], LookupError):
44- store_selector = getUtility(IStoreSelector)
45- default_store = store_selector.get(MAIN_STORE, DEFAULT_FLAVOR)
46- master_store = store_selector.get(MAIN_STORE, MASTER_FLAVOR)
47- if default_store is not master_store:
48- raise Retry(exc_info)
49-
50- # Reraise Retry exceptions rather than log.
51- if retry_allowed and isinstance(
52- exc_info[1], (Retry, DisconnectionError, IntegrityError,
53- TransactionRollbackError)):
54+
55+ def should_retry(exc_info):
56+ if not retry_allowed:
57+ return False
58+
59+ # If we get a LookupError and the default database being
60+ # used is a replica, raise a Retry exception instead of
61+ # returning the 404 error page. We do this in case the
62+ # LookupError is caused by replication lag. Our database
63+ # policy forces the use of the master database for retries.
64+ if isinstance(exc_info[1], LookupError):
65+ store_selector = getUtility(IStoreSelector)
66+ default_store = store_selector.get(MAIN_STORE, DEFAULT_FLAVOR)
67+ master_store = store_selector.get(MAIN_STORE, MASTER_FLAVOR)
68+ if default_store is master_store:
69+ return False
70+ else:
71+ return True
72+
73+ # Retry exceptions need to be propagated so they are
74+ # retried. Retry exceptions occur when an optimistic
75+ # transaction failed, such as we detected two transactions
76+ # attempting to modify the same resource.
77+ # DisconnectionError and TransactionRollbackError indicate
78+ # a database transaction failure, and should be retried
79+ # The appserver detects the error state, and a new database
80+ # connection is opened allowing the appserver to cope with
81+ # database or network outages.
82+ # An IntegrityError may be caused when we insert a row
83+ # into the database that already exists, such as two requests
84+ # doing an insert-or-update. It may succeed if we try again.
85+ if isinstance(exc_info[1], (Retry, DisconnectionError,
86+ IntegrityError, TransactionRollbackError)):
87+ return True
88+
89+ return False
90+
91+ # Reraise Retry exceptions ourselves rather than invoke
92+ # our superclass handleException method, as it will log OOPS
93+ # reports etc. This would be incorrect, as transaction retry
94+ # is a normal part of operation.
95+ if should_retry(exc_info):
96 if request.supportsRetry():
97 # Remove variables used for counting ticks as this request is
98 # going to be retried.
99@@ -456,9 +481,11 @@
100 if isinstance(exc_info[1], Retry):
101 raise
102 raise Retry(exc_info)
103+
104 superclass = zope.app.publication.browser.BrowserPublication
105- superclass.handleException(self, object, request, exc_info,
106- retry_allowed)
107+ superclass.handleException(
108+ self, object, request, exc_info, retry_allowed)
109+
110 # If it's a HEAD request, we don't care about the body, regardless of
111 # exception.
112 # UPSTREAM: Should this be part of zope,

« Back to merge proposal