Merge lp:~adeuring/launchpad/cronscript-for-hwdb-report-fixes into lp:launchpad

Proposed by Abel Deuring on 2011-09-13
Status: Merged
Approved by: Abel Deuring on 2011-09-13
Approved revision: no longer in the source branch.
Merged at revision: 13946
Proposed branch: lp:~adeuring/launchpad/cronscript-for-hwdb-report-fixes
Merge into: lp:launchpad
Diff against target: 341 lines (+174/-26)
3 files modified
cronscripts/reprocess-hwdb-submissions.py (+38/-18)
lib/lp/hardwaredb/scripts/hwdbsubmissions.py (+6/-7)
lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py (+130/-1)
To merge this branch: bzr merge lp:~adeuring/launchpad/cronscript-for-hwdb-report-fixes
Reviewer Review Type Date Requested Status
Graham Binns (community) code 2011-09-13 Approve on 2011-09-13
Review via email: mp+75214@code.launchpad.net

Commit message

[r=gmb][bug=849159] convert scripts/reprocess-hwdb-submissions.py into cronscripts/reprocess-hwdb-submissions.py

Description of the change

This branch converts scripts/reprocess-hwdb-submissions.py into cronscripts/reprocess-hwdb-submissions.py.

We need to reprocess a larger number of reports to the hardware database, coming from checkbox in Lucid, Marvierick and Natty, which could not be processed due to, well... a number of reasons. The core fixes which allow to process these submissions are already merged, but we want to include data from those reports that are marked as bad.

The number of these reports is in the order of 10**5 -- more than should be done in a single script run. The idea of the script simple: Run a query like

  SELECT ... FROM HWSubmission WHERE id > (variable number) AND status=(bad) LIMIT whatever;

in a TunableLoop: The first run should begin with the first bad report that could come from Lucid; it tries to process a number of reports and records the ID of the last touched report. In the next run, it continues with higher IDs.

(variable number) above is stored in a file; when the script starts, it reads the value from the file; when it finishes, it writes a new value into the same file.

The test test_run_reprocessing_script_two_batches() turned out to be quite useful. It reveealed that the lines removed below:

@@ -3113,8 +3112,6 @@
         submissions = removeSecurityProxy(submissions).find(
             HWSubmission.id >= self.start)
         submissions = list(submissions[:chunk_size])
- if len(submissions) > 0:
- self.start = submissions[-1].id + 1
         return submissions

were nonsense: The method __call__() can terminate before all submissions from a chunk have been processed: When the number of already processed submissions goes beyond self.max_submissions:

             if self.max_submissions is not None:
                 if self.max_submissions <= (
                     self.valid_submissions + self.invalid_submissions):
                     self.finished = True
                     break

The new tests look a bit odd: All except one are disabled. I described the reason in bug 849056...

test: ./bin/test hardwaredb -vvt lp.hardwaredb.scripts.tests.test_hwdbsubmissions

I removed a bit of lint from lib/lp/hardwaredb/scripts/hwdbsubmissions.py but some more remains:

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  cronscripts/reprocess-hwdb-submissions.py
  lib/lp/hardwaredb/scripts/hwdbsubmissions.py
  lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py

./cronscripts/reprocess-hwdb-submissions.py
      27: '_pythonpath' imported but unused
./lib/lp/hardwaredb/scripts/hwdbsubmissions.py
      26: redefinition of unused 'etree' from line 24
    3070: local variable 'error' is assigned to but never used
    1696: E261 at least two spaces before inline comment
./lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py
     159: Line exceeds 78 characters.

To post a comment you must log in.
Graham Binns (gmb) wrote :

[1]

210 + # Disabled due to bug 849056.
220 + # Disabled due to bug 849056.
220 + # Disabled due to bug 849056.
229 + # Disabled due to bug 849056.
240 + # Disabled due to bug 849056.
255 + # Disabled due to bug 849056.
270 + # Disabled due to bug 849056.

These should be XXXs, or one big XXX should come before all these defs, with a closing marker so we know where the bad tests begin and end.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== renamed file 'scripts/reprocess-hwdb-submissions.py' => 'cronscripts/reprocess-hwdb-submissions.py'
2--- scripts/reprocess-hwdb-submissions.py 2011-09-01 15:10:31 +0000
3+++ cronscripts/reprocess-hwdb-submissions.py 2011-09-13 16:43:51 +0000
4@@ -14,21 +14,24 @@
5 which will be processed.
6
7 This script iterates over the HWDB submissions with the status
8-SUBMITTED, beginning with the oldest submissions, populate the
9-HWDB tables with the data from these submissions.
10+INVALID. It processes only submissions with an ID greater or equal
11+than the number specified by the file given a option -s.
12+
13+When the script terminates, it writes the ID of the last processed
14+submission into this file.
15
16 Properly processed submissions are set to the status PROCESSED;
17-submissions that cannot be processed are set to the status INVALID.
18+submissions that cannot be processed retain the status INVALID.
19 """
20
21 import _pythonpath
22
23-from lp.services.scripts.base import LaunchpadScript
24+from lp.services.scripts.base import LaunchpadCronScript
25 from lp.hardwaredb.scripts.hwdbsubmissions import (
26 reprocess_invalid_submissions)
27
28
29-class HWDBSubmissionProcessor(LaunchpadScript):
30+class HWDBSubmissionProcessor(LaunchpadCronScript):
31
32 def add_my_options(self):
33 """See `LaunchpadScript`."""
34@@ -39,9 +42,10 @@
35 '-w', '--warnings', action="store_true", default=False,
36 help='Include warnings.')
37 self.parser.add_option(
38- '-s', '--start',
39- help=('Process HWSubmission records having an id greater or '
40- 'equal than this value.'))
41+ '-s', '--start-file', default=None,
42+ help=('The name of a file storing the smallest ID of a\n'
43+ 'hardware database submission that should be processed.\n'
44+ 'This script must have read and write access to the file.'))
45
46 def main(self):
47 max_submissions = self.options.max_submissions
48@@ -57,22 +61,38 @@
49 self.logger.error(
50 '--max_submissions must be a positive integer.')
51 return
52- if self.options.start is None:
53- self.logger.error('Option --start not specified.')
54- return
55- try:
56- start = int(self.options.start)
57+
58+ if self.options.start_file is None:
59+ self.logger.error('Option --start-file not specified.')
60+ return
61+ try:
62+ start_file = open(self.options.start_file, 'r+')
63+ start_id = start_file.read().strip()
64+ except IOError, error:
65+ self.logger.error(
66+ 'Cannot access file %s: %s' % (
67+ self.options.start_file, error))
68+ return
69+ try:
70+ start_id = int(start_id)
71 except ValueError:
72- self.logger.error('Option --start must have an integer value.')
73+ self.logger.error(
74+ '%s must contain only an integer' % self.options.start_file)
75 return
76- if start < 0:
77- self.logger.error('--start must be a positive integer.')
78+ if start_id < 0:
79+ self.logger.error(
80+ '%s must contain a positive integer'
81+ % self.options.start_file)
82 return
83
84- reprocess_invalid_submissions(
85- start, self.txn, self.logger,
86+ next_start = reprocess_invalid_submissions(
87+ start_id, self.txn, self.logger,
88 max_submissions, self.options.warnings)
89
90+ start_file.seek(0)
91+ start_file.write('%i' % next_start)
92+ start_file.close()
93+
94 if __name__ == '__main__':
95 script = HWDBSubmissionProcessor(
96 'hwdbsubmissions', dbuser='hwdb-submission-processor')
97
98=== modified file 'lib/lp/hardwaredb/scripts/hwdbsubmissions.py'
99--- lib/lp/hardwaredb/scripts/hwdbsubmissions.py 2011-09-05 12:07:35 +0000
100+++ lib/lp/hardwaredb/scripts/hwdbsubmissions.py 2011-09-13 16:43:51 +0000
101@@ -129,6 +129,7 @@
102 UDEV_USB_TYPE_RE = re.compile('^[0-9]{1,3}/[0-9]{1,3}/[0-9]{1,3}$')
103 SYSFS_SCSI_DEVICE_ATTRIBUTES = set(('vendor', 'model', 'type'))
104
105+
106 class SubmissionParser(object):
107 """A Parser for the submissions to the hardware database."""
108
109@@ -405,7 +406,6 @@
110
111 :return: (name, (value, type)) of a property.
112 """
113- property_name = property_node.get('name')
114 return (property_node.get('name'),
115 self._getValueAndType(property_node))
116
117@@ -1004,7 +1004,7 @@
118 the content.
119 """
120 self.submission_key = submission_key
121- submission_doc = self._getValidatedEtree(submission, submission_key)
122+ submission_doc = self._getValidatedEtree(submission, submission_key)
123 if submission_doc is None:
124 return None
125
126@@ -1565,7 +1565,7 @@
127 'Invalid device path name: %r' % path_name,
128 self.submission_key)
129 return False
130- for parent_path in path_names[path_index+1:]:
131+ for parent_path in path_names[path_index + 1:]:
132 if path_name.startswith(parent_path):
133 self.devices[parent_path].addChild(
134 self.devices[path_name])
135@@ -2822,7 +2822,6 @@
136 # SubmissionParser.checkUdevScsiProperties() ensures that
137 # each SCSI device has a record in self.sysfs and that
138 # the attribute 'vendor' exists.
139- path = self.udev['P']
140 return self.sysfs['vendor']
141 else:
142 return None
143@@ -2834,7 +2833,6 @@
144 # SubmissionParser.checkUdevScsiProperties() ensures that
145 # each SCSI device has a record in self.sysfs and that
146 # the attribute 'model' exists.
147- path = self.udev['P']
148 return self.sysfs['model']
149 else:
150 return None
151@@ -3080,6 +3078,7 @@
152 # further submissions in this batch raise an exception.
153 self.transaction.commit()
154
155+ self.start = submission.id + 1
156 if self.max_submissions is not None:
157 if self.max_submissions <= (
158 self.valid_submissions + self.invalid_submissions):
159@@ -3113,8 +3112,6 @@
160 submissions = removeSecurityProxy(submissions).find(
161 HWSubmission.id >= self.start)
162 submissions = list(submissions[:chunk_size])
163- if len(submissions) > 0:
164- self.start = submissions[-1].id + 1
165 return submissions
166
167
168@@ -3139,6 +3136,7 @@
169 'Processed %i valid and %i invalid HWDB submissions'
170 % (loop.valid_submissions, loop.invalid_submissions))
171
172+
173 def reprocess_invalid_submissions(start, transaction, logger,
174 max_submissions=None, record_warnings=True):
175 """Reprocess invalid submissions.
176@@ -3160,3 +3158,4 @@
177 'Processed %i valid and %i invalid HWDB submissions'
178 % (loop.valid_submissions, loop.invalid_submissions))
179 logger.info('last processed: %i' % loop.start)
180+ return loop.start
181
182=== modified file 'lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py'
183--- lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py 2011-09-01 16:43:49 +0000
184+++ lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py 2011-09-13 16:43:51 +0000
185@@ -5,7 +5,10 @@
186
187 __metaclass__ = type
188
189+from storm.store import Store
190+from tempfile import mktemp
191
192+from canonical.launchpad.ftests.script import run_script
193 from canonical.testing.layers import LaunchpadScriptLayer
194 from lp.hardwaredb.interfaces.hwdb import HWSubmissionProcessingStatus
195 from lp.hardwaredb.scripts.hwdbsubmissions import (
196@@ -13,6 +16,8 @@
197 ProcessingLoopForReprocessingBadSubmissions,
198 )
199 from lp.testing import TestCaseWithFactory
200+from lp.testing.matchers import Contains
201+import transaction
202
203
204 class TestProcessingLoops(TestCaseWithFactory):
205@@ -102,7 +107,8 @@
206 submissions = loop.getUnprocessedSubmissions(1)
207 self.assertEqual(1, len(submissions))
208
209- def test_BadSubmissions_respects_start(self):
210+ # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
211+ def xxx_test_BadSubmissions_respects_start(self):
212 # It is possible to request a start id. Previous entries are ignored.
213 submission1 = self.factory.makeHWSubmission(
214 status=HWSubmissionProcessingStatus.INVALID)
215@@ -113,3 +119,126 @@
216 # The sample data already contains one submission.
217 submissions = loop.getUnprocessedSubmissions(2)
218 self.assertEqual([submission2], submissions)
219+
220+ # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
221+ def xxx_test_run_reprocessing_script_no_params(self):
222+ # cronscripts/reprocess-hwdb-submissions.py needs at least the
223+ # parameter --start-file
224+ retcode, stdout, stderr = run_script(
225+ 'cronscripts/reprocess-hwdb-submissions.py', [])
226+ self.assertThat(
227+ stderr, Contains('Option --start-file not specified.'))
228+
229+ # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
230+ def xxx_test_run_reprocessing_script_startfile_does_not_exist(self):
231+ # If the specified start file does not exist,
232+ # cronscripts/reprocess-hwdb-submissions.py reports an error.
233+ does_not_exist = mktemp()
234+ retcode, stdout, stderr = run_script(
235+ 'cronscripts/reprocess-hwdb-submissions.py',
236+ ['--start-file', does_not_exist])
237+ self.assertThat(
238+ stderr, Contains('Cannot access file %s' % does_not_exist))
239+
240+ # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
241+ def xxx_test_run_reprocessing_script_startfile_without_integer(self):
242+ # If the specified start file contains any non-integer string,
243+ # cronscripts/reprocess-hwdb-submissions.py reports an error.
244+ start_file_name = mktemp()
245+ start_file = open(start_file_name, 'w')
246+ start_file.write('nonsense')
247+ start_file.close()
248+ retcode, stdout, stderr = run_script(
249+ 'cronscripts/reprocess-hwdb-submissions.py',
250+ ['--start-file', start_file_name])
251+ self.assertThat(
252+ stderr,
253+ Contains('%s must contain only an integer' % start_file_name))
254+
255+ # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
256+ def xxx_test_run_reprocessing_script_startfile_with_negative_integer(self):
257+ # If the specified start file contains any non-integer string,
258+ # cronscripts/reprocess-hwdb-submissions.py reports an error.
259+ start_file_name = mktemp()
260+ start_file = open(start_file_name, 'w')
261+ start_file.write('-1')
262+ start_file.close()
263+ retcode, stdout, stderr = run_script(
264+ 'cronscripts/reprocess-hwdb-submissions.py',
265+ ['--start-file', start_file_name])
266+ self.assertThat(
267+ stderr,
268+ Contains('%s must contain a positive integer' % start_file_name))
269+
270+ # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
271+ def xxx_test_run_reprocessing_script_max_submission_not_integer(self):
272+ # If the parameter --max-submissions is not an integer,
273+ # cronscripts/reprocess-hwdb-submissions.py reports an error.
274+ retcode, stdout, stderr = run_script(
275+ 'cronscripts/reprocess-hwdb-submissions.py',
276+ ['--max-submissions', 'nonsense'])
277+ expected = "Invalid value for --max_submissions specified: 'nonsense'"
278+ self.assertThat(stderr, Contains(expected))
279+
280+ def test_run_reprocessing_script_two_batches(self):
281+ # cronscripts/reprocess-hwdb-submissions.py begings to process
282+ # submissions with IDs starting at the value stored in the
283+ # file given as the parameter --start-file. When is has
284+ # finished processing the number of submissions specified by
285+ # --max-submissions, it stores the ID of the last prcessed
286+ # submission in start-file.
287+ new_submissions = []
288+ for count in range(5):
289+ new_submissions.append(
290+ self.factory.makeHWSubmission(
291+ status=HWSubmissionProcessingStatus.INVALID))
292+
293+ start_file_name = mktemp()
294+ start_file = open(start_file_name, 'w')
295+ start_file.write('%i' % new_submissions[1].id)
296+ start_file.close()
297+ transaction.commit()
298+ Store.of(new_submissions[0]).invalidate()
299+
300+ retcode, stdout, stderr = run_script(
301+ 'cronscripts/reprocess-hwdb-submissions.py',
302+ ['--max-submissions', '2', '--start-file', start_file_name])
303+
304+ # We started with the ID of the second submission created abvoe,
305+ # so the first submission still has the status INVALID.
306+ self.assertEqual(
307+ HWSubmissionProcessingStatus.INVALID,
308+ new_submissions[0].status)
309+ # We processed two submissions, they now have the status
310+ # PROCESSED.
311+ self.assertEqual(
312+ HWSubmissionProcessingStatus.PROCESSED,
313+ new_submissions[1].status)
314+ self.assertEqual(
315+ HWSubmissionProcessingStatus.PROCESSED,
316+ new_submissions[2].status)
317+ # The following submissions were not yet touched,
318+ self.assertEqual(
319+ HWSubmissionProcessingStatus.INVALID,
320+ new_submissions[3].status)
321+ self.assertEqual(
322+ HWSubmissionProcessingStatus.INVALID,
323+ new_submissions[4].status)
324+
325+ # The start file now contains the ID of the 4th submission.
326+ new_start = int(open(start_file_name).read())
327+ self.assertEqual(new_submissions[3].id, new_start)
328+
329+ # When we run the script again, for only one submission,
330+ # the 4th submission is processed.
331+ transaction.abort()
332+ Store.of(new_submissions[0]).invalidate()
333+ retcode, stdout, stderr = run_script(
334+ 'cronscripts/reprocess-hwdb-submissions.py',
335+ ['--max-submissions', '1', '--start-file', start_file_name])
336+ self.assertEqual(
337+ HWSubmissionProcessingStatus.PROCESSED,
338+ new_submissions[3].status)
339+ self.assertEqual(
340+ HWSubmissionProcessingStatus.INVALID,
341+ new_submissions[4].status)