Merge lp:~gary/launchpad/bug683115 into lp:launchpad

Proposed by Gary Poster on 2010-11-30
Status: Merged
Approved by: Gary Poster on 2010-12-01
Approved revision: no longer in the source branch.
Merged at revision: 12013
Proposed branch: lp:~gary/launchpad/bug683115
Merge into: lp:launchpad
Diff against target: 251 lines (+95/-25)
3 files modified
lib/canonical/launchpad/doc/google-searchservice.txt (+48/-13)
lib/canonical/launchpad/ftests/googlesearches/googlesearchservice-negative-total.xml (+35/-0)
lib/canonical/launchpad/utilities/searchservice.py (+12/-12)
To merge this branch: bzr merge lp:~gary/launchpad/bug683115
Reviewer Review Type Date Requested Status
Jeroen T. Vermeulen (community) Approve on 2010-12-01
Steve Kowalik (community) code* 2010-11-30 Approve on 2010-12-01
Review via email: mp+42322@code.launchpad.net

Commit Message

[r=jtv,stevenk][ui=none][bug=683115] Raise an error at the point that I believe it is entering our code base for bug 683115.

Description of the Change

This branch raises an error at the point that I believe it is entering our code base for bug 683115. If I am right, then we will see the new error in the oops reports with the value that is causing the problem. We can then decided what to do about it.

An alternate approach would be to set the total to 0 and make an informative OOPS. I decided against that approach because I don't know if the Google results are actually causing the problem, and I did not want to introduce unnecessary band-aids until it was proven that we needed them. I don't feel very strongly about it.

This is a quick diagnostic branch, so I did not change the doctest (which is actually a decent one, I think).

On the other hand, I did correct all of what ``make lint`` complained about because it was quick. The downside there is that it adds a bunch of noise to the diff. The most questionable thing I did for lint's sake is to make searchservice.py not support old spelling of the elementtree import. I think it's fine.

Thank you

Gary

To post a comment you must log in.
Steve Kowalik (stevenk) wrote :

This looks like excellent work, thanks Gary!

review: Approve (code*)
Jeroen T. Vermeulen (jtv) wrote :

I concur. And Gary, thanks for cleaning up that lint along the way. (Though wow, do I hate ReST section headings!)

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/canonical/launchpad/doc/google-searchservice.txt'
2--- lib/canonical/launchpad/doc/google-searchservice.txt 2010-10-25 13:16:10 +0000
3+++ lib/canonical/launchpad/doc/google-searchservice.txt 2010-12-01 19:33:19 +0000
4@@ -1,17 +1,21 @@
5-= Google Search Service =
6+=====================
7+Google Search Service
8+=====================
9
10 The GoogleSearchService is a Google Custom Service Business Edition
11 (cs-be) client. Given one or more terms, it will retrieve an XML
12 summary of the matching launchpad.net pages.
13
14
15-== GoogleSearchService ==
16+GoogleSearchService
17+===================
18
19 The GoogleSearchService implements the ISearchService interface.
20
21 >>> from zope.component import getUtility
22 >>> from zope.interface.verify import verifyObject
23- >>> from canonical.launchpad.interfaces.searchservice import ISearchService
24+ >>> from canonical.launchpad.interfaces.searchservice import (
25+ ... ISearchService)
26
27 >>> google_search = getUtility(ISearchService)
28 >>> verifyObject(ISearchService, google_search)
29@@ -20,13 +24,16 @@
30 <...GoogleSearchService ...>
31
32
33-=== GoogleSearchService search() ===
34+----------------------------
35+GoogleSearchService search()
36+----------------------------
37
38 The search method accepts a string argument of terms and an optional int
39 argument of start. The terms are the same as the text that would be
40 entered in Google search form; the terms should not be escaped.
41
42- >>> from canonical.launchpad.interfaces.searchservice import ISearchResults
43+ >>> from canonical.launchpad.interfaces.searchservice import (
44+ ... ISearchResults)
45
46 >>> first_page_matches = google_search.search(terms='bug')
47 >>> first_page_matches
48@@ -41,7 +48,8 @@
49 <...PageMatches ...>
50
51
52-== PageMatches ==
53+PageMatches
54+===========
55
56 The PageMatches object returned by GoogleSearchService.search()
57 implements ISearchResults.
58@@ -106,7 +114,8 @@
59 0
60
61
62-== PageMatch ==
63+PageMatch
64+=========
65
66 The PageMatch object represents a single result from a search result
67 set. It is created by passing a title, url, and a summary. It is
68@@ -138,7 +147,8 @@
69 'http://launchpad.dev/unicode-titles'
70
71
72-== Search configuration ==
73+Search configuration
74+====================
75
76 The google search service is configured by the google section in
77 lazr.config. All requests are made to Google's site, but the
78@@ -183,7 +193,8 @@
79 start : 0
80
81
82-== create_search_url() ==
83+create_search_url()
84+===================
85
86 The search url used inside the search() method is created by
87 create_search_url(). It accepts two optional arguments: terms and start.
88@@ -229,7 +240,8 @@
89 'http://launchpad.dev:.../...q=svg+%2Bbugs&start=20'
90
91
92-== Google Search Protocol parsing ==
93+Google Search Protocol parsing
94+==============================
95
96 The GoogleSearchService's _parse_google_search_protocol() requires a
97 subset of the GSP 3.2 markup to create the PageMatch and PageMatches
98@@ -283,6 +295,25 @@
99 GoogleWrongGSPVersion: Could not get the 'total' from the
100 GSP XML response.
101
102+Similarly, if the total were ever less than zero (which we are suspicious
103+of because of bug 683115), we would see an error.
104+
105+ >>> gsp_xml_file_name = path.join(
106+ ... base_path, 'googlesearchservice-negative-total.xml')
107+ >>> gsp_xml_file = open(gsp_xml_file_name, 'r')
108+ >>> gsp_xml = gsp_xml_file.read()
109+ >>> gsp_xml_file.close()
110+ >>> print gsp_xml
111+ <...
112+ <RES SN="1" EN="1">
113+ <M>-1</M>...
114+
115+ >>> google_search._parse_google_search_protocol(gsp_xml)
116+ Traceback (most recent call last):
117+ ...
118+ GoogleResponseError: The reported total (-1, from '-1') was less than
119+ zero. See bug 683115.
120+
121 A PageMatch requires a title, url, and a summary. If those elements
122 ('<T>', '<U>', '<S>') cannot be found nested in an '<R>' a PageMatch
123 cannot be made. A missing title (<T>) indicates a bad page on Launchpad,
124@@ -343,7 +374,7 @@
125 ...
126 <R N="1">
127 <U>https://blueprints.edge.launchpad.net/ubuntu/+spec/gobuntu-hardy</U>
128- <UE>https://blueprints.edge.launchpad.net/ubuntu/%2Bspec/gobuntu-hardy</UE>
129+ <UE>https://blueprints.edge.launchpad.net.../%2Bspec/gobuntu-hardy</UE>
130 <T>Blueprint: &lt;b&gt;gobuntu&lt;/b&gt; hardy</T>
131 <RK>0</RK>
132 <S></S>
133@@ -504,7 +535,9 @@
134 tracker allows collaboration'
135
136
137-=== URL rewriting ===
138+-------------
139+URL rewriting
140+-------------
141
142 The URL scheme used in the rewritten URL is configured in
143 config.google.url_rewrite_scheme. The hostname is set in the shared
144@@ -557,7 +590,9 @@
145 'https://help.launchpad.net/OpenID'
146
147
148-=== Graceful handling of timeouts ===
149+-----------------------------
150+Graceful handling of timeouts
151+-----------------------------
152
153 The external service (Google Search Engine) may not be available, or
154 is not responding quickly because there are network issues. In these
155
156=== added file 'lib/canonical/launchpad/ftests/googlesearches/googlesearchservice-negative-total.xml'
157--- lib/canonical/launchpad/ftests/googlesearches/googlesearchservice-negative-total.xml 1970-01-01 00:00:00 +0000
158+++ lib/canonical/launchpad/ftests/googlesearches/googlesearchservice-negative-total.xml 2010-12-01 19:33:19 +0000
159@@ -0,0 +1,35 @@
160+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
161+<GSP VER="3.3">
162+<TM>0.416786</TM><Q>bugs site:launchpad.net</Q>
163+<PARAM name="as_sitesearch" value="launchpad.net" original_value="launchpad.net"/>
164+<PARAM name="client" value="google-csbe" original_value="google-csbe"/>
165+<PARAM name="cx" value="011948274177882366108:ixpjdn21yfg" original_value="011948274177882366108%3Aixpjdn21yfg"/>
166+<PARAM name="ie" value="utf8" original_value="utf8"/>
167+<PARAM name="num" value="20" original_value="20"/>
168+<PARAM name="oe" value="utf8" original_value="utf8"/>
169+<PARAM name="output" value="xml_no_dtd" original_value="xml_no_dtd"/>
170+<PARAM name="q" value="bugs" original_value="bugs"/>
171+<PARAM name="start" value="0" original_value="0"/>
172+<PARAM name="adkw" value="AELymgURIr4plE6V5qUaesxj1S8kUSFxCVrVLNU_OeCogh9Q7W5be6lEGMcbb0q6WTDgLL7zsnlnYGLvVrsdxgx3AamFm0M6ARaxerSLvSf-1JQHrOLuhsQ" original_value="AELymgURIr4plE6V5qUaesxj1S8kUSFxCVrVLNU_OeCogh9Q7W5be6lEGMcbb0q6WTDgLL7zsnlnYGLvVrsdxgx3AamFm0M6ARaxerSLvSf-1JQHrOLuhsQ"/>
173+<PARAM name="hl" value="en" original_value="en"/>
174+<Context><title>Launchpad Search Engine</title></Context>
175+
176+<RES SN="1" EN="1">
177+<M>-1</M>
178+<FI/><NB><NU>/custom?q=bugs+site:launchpad.net&amp;num=20&amp;hl=en&amp;output=xml_no_dtd&amp;client=google-csbe&amp;cx=011948274177882366108:ixpjdn21yfg&amp;ie=UTF-8&amp;oe=UTF-8&amp;start=0&amp;sa=N</NU>
179+</NB>
180+
181+<R N="1">
182+ <U>https://bugs.launchpad.net/</U>
183+ <UE>https://bugs.launchpad.net/</UE>
184+ <T>A Title</T>
185+ <RK>0</RK>
186+ <CRAWLDATE>17 hours ago</CRAWLDATE>
187+ <S>A 'R'esult.</S>
188+ <LANG>en</LANG>
189+ <Label>_cse_ixpjdn21yfg</Label>
190+ <HAS><L/><RT/></HAS>
191+</R>
192+
193+</RES>
194+</GSP>
195
196=== modified file 'lib/canonical/launchpad/utilities/searchservice.py'
197--- lib/canonical/launchpad/utilities/searchservice.py 2010-09-09 21:49:53 +0000
198+++ lib/canonical/launchpad/utilities/searchservice.py 2010-12-01 19:33:19 +0000
199@@ -13,10 +13,7 @@
200 'PageMatches',
201 ]
202
203-try:
204- import xml.etree.cElementTree as ET
205-except ImportError:
206- import cElementTree as ET
207+import xml.etree.cElementTree as ET
208 import urllib
209 from urlparse import urlunparse
210
211@@ -70,7 +67,6 @@
212 """
213 return config.vhost.mainsite.hostname
214
215-
216 def __init__(self, title, url, summary):
217 """initialize a PageMatch.
218
219@@ -155,14 +151,14 @@
220 implements(ISearchService)
221
222 _default_values = {
223- 'client' : 'google-csbe',
224- 'cx' : None,
225- 'ie' : 'utf8',
226- 'num' : 20,
227- 'oe' : 'utf8',
228- 'output' : 'xml_no_dtd',
229+ 'client': 'google-csbe',
230+ 'cx': None,
231+ 'ie': 'utf8',
232+ 'num': 20,
233+ 'oe': 'utf8',
234+ 'output': 'xml_no_dtd',
235 'start': 0,
236- 'q' : None,
237+ 'q': None,
238 }
239
240 @property
241@@ -293,6 +289,10 @@
242 # The datatype is not what PageMatches requires.
243 raise GoogleWrongGSPVersion(
244 "Could not get the 'total' from the GSP XML response.")
245+ if total < 0:
246+ raise GoogleResponseError(
247+ ("The reported total (%d, from %r) was less than zero. "
248+ "See bug 683115.") % (total, results.find('M').text))
249 for result in results.findall('R'):
250 url_tag = result.find('U')
251 title_tag = result.find('T')