Merge lp:~adeuring/launchpad/cronscript-for-hwdb-report-fixes into lp:launchpad

Proposed by Abel Deuring
Status: Merged
Approved by: Abel Deuring
Approved revision: no longer in the source branch.
Merged at revision: 13946
Proposed branch: lp:~adeuring/launchpad/cronscript-for-hwdb-report-fixes
Merge into: lp:launchpad
Diff against target: 341 lines (+174/-26)
3 files modified
cronscripts/reprocess-hwdb-submissions.py (+38/-18)
lib/lp/hardwaredb/scripts/hwdbsubmissions.py (+6/-7)
lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py (+130/-1)
To merge this branch: bzr merge lp:~adeuring/launchpad/cronscript-for-hwdb-report-fixes
Reviewer Review Type Date Requested Status
Graham Binns (community) code Approve
Review via email: mp+75214@code.launchpad.net

Commit message

[r=gmb][bug=849159] convert scripts/reprocess-hwdb-submissions.py into cronscripts/reprocess-hwdb-submissions.py

Description of the change

This branch converts scripts/reprocess-hwdb-submissions.py into cronscripts/reprocess-hwdb-submissions.py.

We need to reprocess a larger number of reports to the hardware database, coming from checkbox in Lucid, Marvierick and Natty, which could not be processed due to, well... a number of reasons. The core fixes which allow to process these submissions are already merged, but we want to include data from those reports that are marked as bad.

The number of these reports is in the order of 10**5 -- more than should be done in a single script run. The idea of the script simple: Run a query like

  SELECT ... FROM HWSubmission WHERE id > (variable number) AND status=(bad) LIMIT whatever;

in a TunableLoop: The first run should begin with the first bad report that could come from Lucid; it tries to process a number of reports and records the ID of the last touched report. In the next run, it continues with higher IDs.

(variable number) above is stored in a file; when the script starts, it reads the value from the file; when it finishes, it writes a new value into the same file.

The test test_run_reprocessing_script_two_batches() turned out to be quite useful. It reveealed that the lines removed below:

@@ -3113,8 +3112,6 @@
         submissions = removeSecurityProxy(submissions).find(
             HWSubmission.id >= self.start)
         submissions = list(submissions[:chunk_size])
- if len(submissions) > 0:
- self.start = submissions[-1].id + 1
         return submissions

were nonsense: The method __call__() can terminate before all submissions from a chunk have been processed: When the number of already processed submissions goes beyond self.max_submissions:

             if self.max_submissions is not None:
                 if self.max_submissions <= (
                     self.valid_submissions + self.invalid_submissions):
                     self.finished = True
                     break

The new tests look a bit odd: All except one are disabled. I described the reason in bug 849056...

test: ./bin/test hardwaredb -vvt lp.hardwaredb.scripts.tests.test_hwdbsubmissions

I removed a bit of lint from lib/lp/hardwaredb/scripts/hwdbsubmissions.py but some more remains:

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  cronscripts/reprocess-hwdb-submissions.py
  lib/lp/hardwaredb/scripts/hwdbsubmissions.py
  lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py

./cronscripts/reprocess-hwdb-submissions.py
      27: '_pythonpath' imported but unused
./lib/lp/hardwaredb/scripts/hwdbsubmissions.py
      26: redefinition of unused 'etree' from line 24
    3070: local variable 'error' is assigned to but never used
    1696: E261 at least two spaces before inline comment
./lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py
     159: Line exceeds 78 characters.

To post a comment you must log in.
Revision history for this message
Graham Binns (gmb) wrote :

[1]

210 + # Disabled due to bug 849056.
220 + # Disabled due to bug 849056.
220 + # Disabled due to bug 849056.
229 + # Disabled due to bug 849056.
240 + # Disabled due to bug 849056.
255 + # Disabled due to bug 849056.
270 + # Disabled due to bug 849056.

These should be XXXs, or one big XXX should come before all these defs, with a closing marker so we know where the bad tests begin and end.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== renamed file 'scripts/reprocess-hwdb-submissions.py' => 'cronscripts/reprocess-hwdb-submissions.py'
--- scripts/reprocess-hwdb-submissions.py 2011-09-01 15:10:31 +0000
+++ cronscripts/reprocess-hwdb-submissions.py 2011-09-13 16:43:51 +0000
@@ -14,21 +14,24 @@
14 which will be processed.14 which will be processed.
1515
16This script iterates over the HWDB submissions with the status16This script iterates over the HWDB submissions with the status
17SUBMITTED, beginning with the oldest submissions, populate the17INVALID. It processes only submissions with an ID greater or equal
18HWDB tables with the data from these submissions.18than the number specified by the file given a option -s.
19
20When the script terminates, it writes the ID of the last processed
21submission into this file.
1922
20Properly processed submissions are set to the status PROCESSED;23Properly processed submissions are set to the status PROCESSED;
21submissions that cannot be processed are set to the status INVALID.24submissions that cannot be processed retain the status INVALID.
22"""25"""
2326
24import _pythonpath27import _pythonpath
2528
26from lp.services.scripts.base import LaunchpadScript29from lp.services.scripts.base import LaunchpadCronScript
27from lp.hardwaredb.scripts.hwdbsubmissions import (30from lp.hardwaredb.scripts.hwdbsubmissions import (
28 reprocess_invalid_submissions)31 reprocess_invalid_submissions)
2932
3033
31class HWDBSubmissionProcessor(LaunchpadScript):34class HWDBSubmissionProcessor(LaunchpadCronScript):
3235
33 def add_my_options(self):36 def add_my_options(self):
34 """See `LaunchpadScript`."""37 """See `LaunchpadScript`."""
@@ -39,9 +42,10 @@
39 '-w', '--warnings', action="store_true", default=False,42 '-w', '--warnings', action="store_true", default=False,
40 help='Include warnings.')43 help='Include warnings.')
41 self.parser.add_option(44 self.parser.add_option(
42 '-s', '--start',45 '-s', '--start-file', default=None,
43 help=('Process HWSubmission records having an id greater or '46 help=('The name of a file storing the smallest ID of a\n'
44 'equal than this value.'))47 'hardware database submission that should be processed.\n'
48 'This script must have read and write access to the file.'))
4549
46 def main(self):50 def main(self):
47 max_submissions = self.options.max_submissions51 max_submissions = self.options.max_submissions
@@ -57,22 +61,38 @@
57 self.logger.error(61 self.logger.error(
58 '--max_submissions must be a positive integer.')62 '--max_submissions must be a positive integer.')
59 return63 return
60 if self.options.start is None:64
61 self.logger.error('Option --start not specified.')65 if self.options.start_file is None:
62 return66 self.logger.error('Option --start-file not specified.')
63 try:67 return
64 start = int(self.options.start)68 try:
69 start_file = open(self.options.start_file, 'r+')
70 start_id = start_file.read().strip()
71 except IOError, error:
72 self.logger.error(
73 'Cannot access file %s: %s' % (
74 self.options.start_file, error))
75 return
76 try:
77 start_id = int(start_id)
65 except ValueError:78 except ValueError:
66 self.logger.error('Option --start must have an integer value.')79 self.logger.error(
80 '%s must contain only an integer' % self.options.start_file)
67 return81 return
68 if start < 0:82 if start_id < 0:
69 self.logger.error('--start must be a positive integer.')83 self.logger.error(
84 '%s must contain a positive integer'
85 % self.options.start_file)
70 return86 return
7187
72 reprocess_invalid_submissions(88 next_start = reprocess_invalid_submissions(
73 start, self.txn, self.logger,89 start_id, self.txn, self.logger,
74 max_submissions, self.options.warnings)90 max_submissions, self.options.warnings)
7591
92 start_file.seek(0)
93 start_file.write('%i' % next_start)
94 start_file.close()
95
76if __name__ == '__main__':96if __name__ == '__main__':
77 script = HWDBSubmissionProcessor(97 script = HWDBSubmissionProcessor(
78 'hwdbsubmissions', dbuser='hwdb-submission-processor')98 'hwdbsubmissions', dbuser='hwdb-submission-processor')
7999
=== modified file 'lib/lp/hardwaredb/scripts/hwdbsubmissions.py'
--- lib/lp/hardwaredb/scripts/hwdbsubmissions.py 2011-09-05 12:07:35 +0000
+++ lib/lp/hardwaredb/scripts/hwdbsubmissions.py 2011-09-13 16:43:51 +0000
@@ -129,6 +129,7 @@
129UDEV_USB_TYPE_RE = re.compile('^[0-9]{1,3}/[0-9]{1,3}/[0-9]{1,3}$')129UDEV_USB_TYPE_RE = re.compile('^[0-9]{1,3}/[0-9]{1,3}/[0-9]{1,3}$')
130SYSFS_SCSI_DEVICE_ATTRIBUTES = set(('vendor', 'model', 'type'))130SYSFS_SCSI_DEVICE_ATTRIBUTES = set(('vendor', 'model', 'type'))
131131
132
132class SubmissionParser(object):133class SubmissionParser(object):
133 """A Parser for the submissions to the hardware database."""134 """A Parser for the submissions to the hardware database."""
134135
@@ -405,7 +406,6 @@
405406
406 :return: (name, (value, type)) of a property.407 :return: (name, (value, type)) of a property.
407 """408 """
408 property_name = property_node.get('name')
409 return (property_node.get('name'),409 return (property_node.get('name'),
410 self._getValueAndType(property_node))410 self._getValueAndType(property_node))
411411
@@ -1004,7 +1004,7 @@
1004 the content.1004 the content.
1005 """1005 """
1006 self.submission_key = submission_key1006 self.submission_key = submission_key
1007 submission_doc = self._getValidatedEtree(submission, submission_key)1007 submission_doc = self._getValidatedEtree(submission, submission_key)
1008 if submission_doc is None:1008 if submission_doc is None:
1009 return None1009 return None
10101010
@@ -1565,7 +1565,7 @@
1565 'Invalid device path name: %r' % path_name,1565 'Invalid device path name: %r' % path_name,
1566 self.submission_key)1566 self.submission_key)
1567 return False1567 return False
1568 for parent_path in path_names[path_index+1:]:1568 for parent_path in path_names[path_index + 1:]:
1569 if path_name.startswith(parent_path):1569 if path_name.startswith(parent_path):
1570 self.devices[parent_path].addChild(1570 self.devices[parent_path].addChild(
1571 self.devices[path_name])1571 self.devices[path_name])
@@ -2822,7 +2822,6 @@
2822 # SubmissionParser.checkUdevScsiProperties() ensures that2822 # SubmissionParser.checkUdevScsiProperties() ensures that
2823 # each SCSI device has a record in self.sysfs and that2823 # each SCSI device has a record in self.sysfs and that
2824 # the attribute 'vendor' exists.2824 # the attribute 'vendor' exists.
2825 path = self.udev['P']
2826 return self.sysfs['vendor']2825 return self.sysfs['vendor']
2827 else:2826 else:
2828 return None2827 return None
@@ -2834,7 +2833,6 @@
2834 # SubmissionParser.checkUdevScsiProperties() ensures that2833 # SubmissionParser.checkUdevScsiProperties() ensures that
2835 # each SCSI device has a record in self.sysfs and that2834 # each SCSI device has a record in self.sysfs and that
2836 # the attribute 'model' exists.2835 # the attribute 'model' exists.
2837 path = self.udev['P']
2838 return self.sysfs['model']2836 return self.sysfs['model']
2839 else:2837 else:
2840 return None2838 return None
@@ -3080,6 +3078,7 @@
3080 # further submissions in this batch raise an exception.3078 # further submissions in this batch raise an exception.
3081 self.transaction.commit()3079 self.transaction.commit()
30823080
3081 self.start = submission.id + 1
3083 if self.max_submissions is not None:3082 if self.max_submissions is not None:
3084 if self.max_submissions <= (3083 if self.max_submissions <= (
3085 self.valid_submissions + self.invalid_submissions):3084 self.valid_submissions + self.invalid_submissions):
@@ -3113,8 +3112,6 @@
3113 submissions = removeSecurityProxy(submissions).find(3112 submissions = removeSecurityProxy(submissions).find(
3114 HWSubmission.id >= self.start)3113 HWSubmission.id >= self.start)
3115 submissions = list(submissions[:chunk_size])3114 submissions = list(submissions[:chunk_size])
3116 if len(submissions) > 0:
3117 self.start = submissions[-1].id + 1
3118 return submissions3115 return submissions
31193116
31203117
@@ -3139,6 +3136,7 @@
3139 'Processed %i valid and %i invalid HWDB submissions'3136 'Processed %i valid and %i invalid HWDB submissions'
3140 % (loop.valid_submissions, loop.invalid_submissions))3137 % (loop.valid_submissions, loop.invalid_submissions))
31413138
3139
3142def reprocess_invalid_submissions(start, transaction, logger,3140def reprocess_invalid_submissions(start, transaction, logger,
3143 max_submissions=None, record_warnings=True):3141 max_submissions=None, record_warnings=True):
3144 """Reprocess invalid submissions.3142 """Reprocess invalid submissions.
@@ -3160,3 +3158,4 @@
3160 'Processed %i valid and %i invalid HWDB submissions'3158 'Processed %i valid and %i invalid HWDB submissions'
3161 % (loop.valid_submissions, loop.invalid_submissions))3159 % (loop.valid_submissions, loop.invalid_submissions))
3162 logger.info('last processed: %i' % loop.start)3160 logger.info('last processed: %i' % loop.start)
3161 return loop.start
31633162
=== modified file 'lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py'
--- lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py 2011-09-01 16:43:49 +0000
+++ lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py 2011-09-13 16:43:51 +0000
@@ -5,7 +5,10 @@
55
6__metaclass__ = type6__metaclass__ = type
77
8from storm.store import Store
9from tempfile import mktemp
810
11from canonical.launchpad.ftests.script import run_script
9from canonical.testing.layers import LaunchpadScriptLayer12from canonical.testing.layers import LaunchpadScriptLayer
10from lp.hardwaredb.interfaces.hwdb import HWSubmissionProcessingStatus13from lp.hardwaredb.interfaces.hwdb import HWSubmissionProcessingStatus
11from lp.hardwaredb.scripts.hwdbsubmissions import (14from lp.hardwaredb.scripts.hwdbsubmissions import (
@@ -13,6 +16,8 @@
13 ProcessingLoopForReprocessingBadSubmissions,16 ProcessingLoopForReprocessingBadSubmissions,
14 )17 )
15from lp.testing import TestCaseWithFactory18from lp.testing import TestCaseWithFactory
19from lp.testing.matchers import Contains
20import transaction
1621
1722
18class TestProcessingLoops(TestCaseWithFactory):23class TestProcessingLoops(TestCaseWithFactory):
@@ -102,7 +107,8 @@
102 submissions = loop.getUnprocessedSubmissions(1)107 submissions = loop.getUnprocessedSubmissions(1)
103 self.assertEqual(1, len(submissions))108 self.assertEqual(1, len(submissions))
104109
105 def test_BadSubmissions_respects_start(self):110 # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
111 def xxx_test_BadSubmissions_respects_start(self):
106 # It is possible to request a start id. Previous entries are ignored.112 # It is possible to request a start id. Previous entries are ignored.
107 submission1 = self.factory.makeHWSubmission(113 submission1 = self.factory.makeHWSubmission(
108 status=HWSubmissionProcessingStatus.INVALID)114 status=HWSubmissionProcessingStatus.INVALID)
@@ -113,3 +119,126 @@
113 # The sample data already contains one submission.119 # The sample data already contains one submission.
114 submissions = loop.getUnprocessedSubmissions(2)120 submissions = loop.getUnprocessedSubmissions(2)
115 self.assertEqual([submission2], submissions)121 self.assertEqual([submission2], submissions)
122
123 # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
124 def xxx_test_run_reprocessing_script_no_params(self):
125 # cronscripts/reprocess-hwdb-submissions.py needs at least the
126 # parameter --start-file
127 retcode, stdout, stderr = run_script(
128 'cronscripts/reprocess-hwdb-submissions.py', [])
129 self.assertThat(
130 stderr, Contains('Option --start-file not specified.'))
131
132 # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
133 def xxx_test_run_reprocessing_script_startfile_does_not_exist(self):
134 # If the specified start file does not exist,
135 # cronscripts/reprocess-hwdb-submissions.py reports an error.
136 does_not_exist = mktemp()
137 retcode, stdout, stderr = run_script(
138 'cronscripts/reprocess-hwdb-submissions.py',
139 ['--start-file', does_not_exist])
140 self.assertThat(
141 stderr, Contains('Cannot access file %s' % does_not_exist))
142
143 # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
144 def xxx_test_run_reprocessing_script_startfile_without_integer(self):
145 # If the specified start file contains any non-integer string,
146 # cronscripts/reprocess-hwdb-submissions.py reports an error.
147 start_file_name = mktemp()
148 start_file = open(start_file_name, 'w')
149 start_file.write('nonsense')
150 start_file.close()
151 retcode, stdout, stderr = run_script(
152 'cronscripts/reprocess-hwdb-submissions.py',
153 ['--start-file', start_file_name])
154 self.assertThat(
155 stderr,
156 Contains('%s must contain only an integer' % start_file_name))
157
158 # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
159 def xxx_test_run_reprocessing_script_startfile_with_negative_integer(self):
160 # If the specified start file contains any non-integer string,
161 # cronscripts/reprocess-hwdb-submissions.py reports an error.
162 start_file_name = mktemp()
163 start_file = open(start_file_name, 'w')
164 start_file.write('-1')
165 start_file.close()
166 retcode, stdout, stderr = run_script(
167 'cronscripts/reprocess-hwdb-submissions.py',
168 ['--start-file', start_file_name])
169 self.assertThat(
170 stderr,
171 Contains('%s must contain a positive integer' % start_file_name))
172
173 # XXX 2011-09-13, Abel Deuring: Disabled due to bug 849056.
174 def xxx_test_run_reprocessing_script_max_submission_not_integer(self):
175 # If the parameter --max-submissions is not an integer,
176 # cronscripts/reprocess-hwdb-submissions.py reports an error.
177 retcode, stdout, stderr = run_script(
178 'cronscripts/reprocess-hwdb-submissions.py',
179 ['--max-submissions', 'nonsense'])
180 expected = "Invalid value for --max_submissions specified: 'nonsense'"
181 self.assertThat(stderr, Contains(expected))
182
183 def test_run_reprocessing_script_two_batches(self):
184 # cronscripts/reprocess-hwdb-submissions.py begings to process
185 # submissions with IDs starting at the value stored in the
186 # file given as the parameter --start-file. When is has
187 # finished processing the number of submissions specified by
188 # --max-submissions, it stores the ID of the last prcessed
189 # submission in start-file.
190 new_submissions = []
191 for count in range(5):
192 new_submissions.append(
193 self.factory.makeHWSubmission(
194 status=HWSubmissionProcessingStatus.INVALID))
195
196 start_file_name = mktemp()
197 start_file = open(start_file_name, 'w')
198 start_file.write('%i' % new_submissions[1].id)
199 start_file.close()
200 transaction.commit()
201 Store.of(new_submissions[0]).invalidate()
202
203 retcode, stdout, stderr = run_script(
204 'cronscripts/reprocess-hwdb-submissions.py',
205 ['--max-submissions', '2', '--start-file', start_file_name])
206
207 # We started with the ID of the second submission created abvoe,
208 # so the first submission still has the status INVALID.
209 self.assertEqual(
210 HWSubmissionProcessingStatus.INVALID,
211 new_submissions[0].status)
212 # We processed two submissions, they now have the status
213 # PROCESSED.
214 self.assertEqual(
215 HWSubmissionProcessingStatus.PROCESSED,
216 new_submissions[1].status)
217 self.assertEqual(
218 HWSubmissionProcessingStatus.PROCESSED,
219 new_submissions[2].status)
220 # The following submissions were not yet touched,
221 self.assertEqual(
222 HWSubmissionProcessingStatus.INVALID,
223 new_submissions[3].status)
224 self.assertEqual(
225 HWSubmissionProcessingStatus.INVALID,
226 new_submissions[4].status)
227
228 # The start file now contains the ID of the 4th submission.
229 new_start = int(open(start_file_name).read())
230 self.assertEqual(new_submissions[3].id, new_start)
231
232 # When we run the script again, for only one submission,
233 # the 4th submission is processed.
234 transaction.abort()
235 Store.of(new_submissions[0]).invalidate()
236 retcode, stdout, stderr = run_script(
237 'cronscripts/reprocess-hwdb-submissions.py',
238 ['--max-submissions', '1', '--start-file', start_file_name])
239 self.assertEqual(
240 HWSubmissionProcessingStatus.PROCESSED,
241 new_submissions[3].status)
242 self.assertEqual(
243 HWSubmissionProcessingStatus.INVALID,
244 new_submissions[4].status)