Merge lp:~stub/launchpad/retry-404 into lp:launchpad
- retry-404
- Merge into devel
Status: | Merged |
---|---|
Approved by: | Данило Шеган |
Approved revision: | no longer in the source branch. |
Merged at revision: | not available |
Proposed branch: | lp:~stub/launchpad/retry-404 |
Merge into: | lp:launchpad |
To merge this branch: | bzr merge lp:~stub/launchpad/retry-404 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Данило Шеган (community) | Approve | ||
Review via email: mp+2805@code.launchpad.net |
This proposal supersedes a proposal from 2009-01-09.
Commit message
Description of the change
Stuart Bishop (stub) wrote : Posted in a previous version of this proposal | # |
Francis J. Lacoste (flacoste) wrote : Posted in a previous version of this proposal | # |
> === modified file 'lib/canonical/
> +
> +
> +A 404 error page is shown when code raises a LookupError. If a slave
> +database is being used, this might have been caused by replication lag
> +if the missing data was only recently created. To fix this surprising
> +error, requests are always retried using the master database before
> +returning a 404 error to the user.
> +
> + >>> anon_browser.
> + >>> anon_browser.
> +
> + # Confirm requests are going to the SLAVE
> + >>> anon_browser.open('http://
> + >>> print whichdb(
> + SLAVE
> +
> + # The slave database contains no data, but we don't get
> + # a 404 page - the request is retried against the MASTER.
> + >>> anon_browser.open('http://
> + >>> anon_browser.
> + '200 Ok'
I'm not sure I understand why would fail? In our tests, the SLAVE and the
MASTER are the same database, so how would a LookupError be raised in one
case, and not in the other?
> +
> + # 404s are still returned though if the data doesn't exist in the
> + # MASTER database either.
> + >>> anon_browser.open('http://
> + >>> anon_browser.
> + '404 Not Found'
> +
> + # This session is still using the SLAVE though by default.
> + >>> anon_browser.open('http://
> + >>> print whichdb(
> + SLAVE
> === modified file 'lib/canonical/
> --- lib/canonical/
> +++ lib/canonical/
> @@ -43,7 +43,8 @@
> import canonical.
> from canonical.
> IDatabasePolicy, IPlacelessAuthU
> - ILaunchpadRoot, IOpenLaunchBag, OffsiteFormPost
> + ILaunchpadRoot, IOpenLaunchBag, OffsiteFormPost
> + IStoreSelector, MAIN_STORE, DEFAULT_FLAVOR, MASTER_FLAVOR)
> from canonical.
> from canonical.
> from canonical.
> @@ -431,6 +432,18 @@
> # the publication, so there's nothing we need to do here.
> pass
>
> + # If we get a LookupError and the default database being used is
> + # a replica, raise a Retry exception instead of returning
> + # the 404 error page. We do this in case the LookupError is
> + # caused by replication lag. Our database policy forces the
> + # use of the master database for retries.
> + if retry_allowed and isinstance(
> + store_selector = getUtility(
> + default_store = store_selector.
> + master_store = store_selector.
> + if default_store is not master_store:
> + raise Retry(exc_info)
> +
> # Reraise Re...
Stuart Bishop (stub) wrote : Posted in a previous version of this proposal | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Fri, Jan 9, 2009 at 10:12 PM, Francis J. Lacoste wrote:
> Review: Needs Fixing
>> === modified file
'lib/canonical/
>> +
>> +
>> +A 404 error page is shown when code raises a LookupError. If a slave
>> +database is being used, this might have been caused by replication lag
>> +if the missing data was only recently created. To fix this surprising
>> +error, requests are always retried using the master database before
>> +returning a 404 error to the user.
>> +
>> + >>> anon_browser.
>> + >>> anon_browser.
>> +
>> + # Confirm requests are going to the SLAVE
>> + >>> anon_browser.open('http://
>> + >>> print whichdb(
>> + SLAVE
>> +
>> + # The slave database contains no data, but we don't get
>> + # a 404 page - the request is retried against the MASTER.
>> + >>> anon_browser.open('http://
>> + >>> anon_browser.
>> + '200 Ok'
>
> I'm not sure I understand why would fail? In our tests, the SLAVE and the
> MASTER are the same database, so how would a LookupError be raised in one
> case, and not in the other?
At the top of this page test, we change this so the SLAVE database is
an empty database. I've also added some code up there that confirms
the database actually is empty (the other tests didn't care - they
just used the database name as reported by PostgreSQL).
>> === modified file 'lib/canonical/
>> --- lib/canonical/
16:25:36 +0000
>> +++ lib/canonical/
09:12:16 +0000
>> @@ -43,7 +43,8 @@
>> import canonical.
>> from canonical.
>> IDatabasePolicy, IPlacelessAuthU
>> - ILaunchpadRoot, IOpenLaunchBag, OffsiteFormPost
>> + ILaunchpadRoot, IOpenLaunchBag, OffsiteFormPost
>> + IStoreSelector, MAIN_STORE, DEFAULT_FLAVOR, MASTER_FLAVOR)
>> from canonical.
>> from canonical.
>> from canonical.
>> @@ -431,6 +432,18 @@
>> # the publication, so there's nothing we need to do here.
>> pass
>>
>> + # If we get a LookupError and the default database being used is
>> + # a replica, raise a Retry exception instead of returning
>> + # the 404 error page. We do this in case the LookupError is
>> + # caused by replication lag. Our database policy forces the
>> + # use of the master database for retries.
>> + if retry_allowed and isinstance(
>> + store_selector = getUtility(
>> + default_store = store_selector.
>> + master_store = store_selector.
>> + if default_store is not master_store:
>> + raise Retry(exc_info)
>> +
>...
1 | === modified file 'lib/canonical/launchpad/pagetests/standalone/xx-dbpolicy.txt' |
2 | --- lib/canonical/launchpad/pagetests/standalone/xx-dbpolicy.txt 2009-01-09 09:12:16 +0000 |
3 | +++ lib/canonical/launchpad/pagetests/standalone/xx-dbpolicy.txt 2009-01-12 11:29:50 +0000 |
4 | @@ -17,6 +17,24 @@ |
5 | ... """) |
6 | >>> config.push('empty_slave', config_overlay) |
7 | |
8 | +We should confirm that the empty database is as empty as we hope it is. |
9 | + |
10 | + >>> from zope.component import getUtility |
11 | + >>> from canonical.launchpad.webapp.interfaces import ( |
12 | + ... IStoreSelector, MAIN_STORE, MASTER_FLAVOR, SLAVE_FLAVOR) |
13 | + >>> from canonical.launchpad.database.person import Person |
14 | + >>> slave_store = getUtility(IStoreSelector).get( |
15 | + ... MAIN_STORE, SLAVE_FLAVOR) |
16 | + >>> master_store = getUtility(IStoreSelector).get( |
17 | + ... MAIN_STORE, MASTER_FLAVOR) |
18 | + >>> slave_store.find(Person).count() |
19 | + 0 |
20 | + >>> master_store.find(Person).count() > 0 |
21 | + True |
22 | + |
23 | +This helper parses the output of the +whichdb view (which unfortunately |
24 | +needs to be created externally to this pagetest). |
25 | + |
26 | >>> def whichdb(browser): |
27 | ... dbname = extract_text(find_tag_by_id(browser.contents, 'dbname')) |
28 | ... if dbname == 'launchpad_ftest': |
29 | |
30 | === modified file 'lib/canonical/launchpad/webapp/publication.py' |
31 | --- lib/canonical/launchpad/webapp/publication.py 2009-01-09 09:12:16 +0000 |
32 | +++ lib/canonical/launchpad/webapp/publication.py 2009-01-12 11:53:42 +0000 |
33 | @@ -431,23 +431,48 @@ |
34 | # The exception wasn't raised in the middle of the traversal nor |
35 | # the publication, so there's nothing we need to do here. |
36 | pass |
37 | - |
38 | - # If we get a LookupError and the default database being used is |
39 | - # a replica, raise a Retry exception instead of returning |
40 | - # the 404 error page. We do this in case the LookupError is |
41 | - # caused by replication lag. Our database policy forces the |
42 | - # use of the master database for retries. |
43 | - if retry_allowed and isinstance(exc_info[1], LookupError): |
44 | - store_selector = getUtility(IStoreSelector) |
45 | - default_store = store_selector.get(MAIN_STORE, DEFAULT_FLAVOR) |
46 | - master_store = store_selector.get(MAIN_STORE, MASTER_FLAVOR) |
47 | - if default_store is not master_store: |
48 | - raise Retry(exc_info) |
49 | - |
50 | - # Reraise Retry exceptions rather than log. |
51 | - if retry_allowed and isinstance( |
52 | - exc_info[1], (Retry, DisconnectionError, IntegrityError, |
53 | - TransactionRollbackError)): |
54 | + |
55 | + def should_retry(exc_info): |
56 | + if not retry_allowed: |
57 | + return False |
58 | + |
59 | + # If we get a LookupError and the default database being |
60 | + # used is a replica, raise a Retry exception instead of |
61 | + # returning the 404 error page. We do this in case the |
62 | + # LookupError is caused by replication lag. Our database |
63 | + # policy forces the use of the master database for retries. |
64 | + if isinstance(exc_info[1], LookupError): |
65 | + store_selector = getUtility(IStoreSelector) |
66 | + default_store = store_selector.get(MAIN_STORE, DEFAULT_FLAVOR) |
67 | + master_store = store_selector.get(MAIN_STORE, MASTER_FLAVOR) |
68 | + if default_store is master_store: |
69 | + return False |
70 | + else: |
71 | + return True |
72 | + |
73 | + # Retry exceptions need to be propagated so they are |
74 | + # retried. Retry exceptions occur when an optimistic |
75 | + # transaction failed, such as we detected two transactions |
76 | + # attempting to modify the same resource. |
77 | + # DisconnectionError and TransactionRollbackError indicate |
78 | + # a database transaction failure, and should be retried |
79 | + # The appserver detects the error state, and a new database |
80 | + # connection is opened allowing the appserver to cope with |
81 | + # database or network outages. |
82 | + # An IntegrityError may be caused when we insert a row |
83 | + # into the database that already exists, such as two requests |
84 | + # doing an insert-or-update. It may succeed if we try again. |
85 | + if isinstance(exc_info[1], (Retry, DisconnectionError, |
86 | + IntegrityError, TransactionRollbackError)): |
87 | + return True |
88 | + |
89 | + return False |
90 | + |
91 | + # Reraise Retry exceptions ourselves rather than invoke |
92 | + # our superclass handleException method, as it will log OOPS |
93 | + # reports etc. This would be incorrect, as transaction retry |
94 | + # is a normal part of operation. |
95 | + if should_retry(exc_info): |
96 | if request.supportsRetry(): |
97 | # Remove variables used for counting ticks as this request is |
98 | # going to be retried. |
99 | @@ -456,9 +481,11 @@ |
100 | if isinstance(exc_info[1], Retry): |
101 | raise |
102 | raise Retry(exc_info) |
103 | + |
104 | superclass = zope.app.publication.browser.BrowserPublication |
105 | - superclass.handleException(self, object, request, exc_info, |
106 | - retry_allowed) |
107 | + superclass.handleException( |
108 | + self, object, request, exc_info, retry_allowed) |
109 | + |
110 | # If it's a HEAD request, we don't care about the body, regardless of |
111 | # exception. |
112 | # UPSTREAM: Should this be part of zope, |
Francis J. Lacoste (flacoste) wrote : Posted in a previous version of this proposal | # |
Looks good.
status approved
--
Francis J. Lacoste
<email address hidden>
Stuart Bishop (stub) wrote : | # |
Diff of unreviewed changes is at https:/
Migrating the 'should this exception cause a Retry' logic to the IDatabasePolicy doesn't seem to fit very well, so I have left it the way it is.
Addresses Bug #311690.
During normal operation, the slave databases lag behind the master
by 30-60 seconds. If a browser attempts to access a newly created,
and the database policy selects a slave database as the data source,
a 404 error page will be returned to the user. This is a surprising
result, given the same user might have just created the missing
information. This is most noticible when multiple tools are
being used to access Launchpad, such as Apport and a web browser
as the database selection policy cannot tell that the same person
is the operator.
To solve this, we cause the publisher to retry requests using the
master database as the default datasource instead of returning
the 404 error page if a slave was the default datasource. This is
similar to our existing deadlock and serialization exception handling,
and uses the same mechanism (the conflict handling for Zope's optimistic
transactions).