Merge lp:~doanac/utah/rsyslog-config into lp:utah
- rsyslog-config
- Merge into dev
Status: | Merged |
---|---|
Merged at revision: | 940 |
Proposed branch: | lp:~doanac/utah/rsyslog-config |
Merge into: | lp:utah |
Diff against target: |
744 lines (+165/-214) 7 files modified
tests/test_rsyslog.py (+65/-58) utah/config.py (+1/-95) utah/provisioning/baremetal/bamboofeeder.py (+2/-2) utah/provisioning/baremetal/cobbler.py (+2/-3) utah/provisioning/rsyslog.py (+92/-53) utah/provisioning/vm.py (+2/-2) utah/run.py (+1/-1) |
To merge this branch: | bzr merge lp:~doanac/utah/rsyslog-config |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Max Brustkern (community) | Approve | ||
Andy Doan (community) | Needs Resubmitting | ||
Review via email:
|
Commit message
Description of the change
This helps fix our current configuration issues we are having and should help get us ready for the next release
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Max Brustkern (nuclearbob) wrote : | # |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Max Brustkern (nuclearbob) wrote : | # |
For 432-435, if we run this branch on a machine that still has None in the config file, the whole config file will be ignored. If the config files is all defaults anyway, it'll be fine, but that seems like it could have the potential to cause problems. Maybe if we do want to move to int for that, we can introduce that as a default now and then remove null from the schema in another version or two. I'm still concerned about the 4 hour timeout, since part of the job of the utah client is to ensure that things don't run forever.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Max Brustkern (nuclearbob) wrote : | # |
I got good tests of this on virtual and physical machines. I think once we determine how to address the configuration changes around timeouts, we should be good to go.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Andy Doan (doanac) wrote : | # |
On 06/13/2013 07:14 PM, Max Brustkern wrote:
> How does this line (303 in the merge) work:
> self.assertTrue
> documentation on assertTrue, it looks like 2 should be picked up as
> the mesg argument?
typo, was supposed to assertLessThan
> Also, I'm a little worried about the regression potential of 432-435.
> Are we sure we don't have any test cases that currently take longer
> than 4 hours?
I was just making an educated guess. You also said in your next thread:
| I'm still concerned about the 4 hour timeout, since part of the job
| of the utah client is to ensure that things don't run forever.
I think we should backout this change to config.py. The wait_for_run
already checks if the value is None and will default to a 2 day timeout.
> On line 531 of the merge, I get removing the steps. Was the previous
> reason for the seealso because the other method had steps, or because
> it was similar? Maybe the seealso still makes sense?
>
> Looks like that also happened on 589, so maybe there's a consistent
> reason for it.
I think it was just because of the steps and how they were documented
> It looks like the timeout in wait_for_steps is now the timeout for
> the whole process rather than each step. That seems reasonable, and
> I'm assuming that since the previous default values were
> whole-process timeouts, we don't need to worry about updating them.
that was my hope :)
> The rest of it looks reasonable to me, assuming it all tests out
> okay.
Sounds like I have two things to fix. I'll probably do that tomorrow and
re-submit.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Andy Doan (doanac) wrote : | # |
screw it. i couldn't sleep knowing it should be easy to fix your concerns.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Max Brustkern (nuclearbob) wrote : | # |
This seems good to me. If nobody objects, I'm in favor of merging it today.
Preview Diff
1 | === modified file 'tests/test_rsyslog.py' |
2 | --- tests/test_rsyslog.py 2013-06-04 20:46:41 +0000 |
3 | +++ tests/test_rsyslog.py 2013-06-14 03:44:27 +0000 |
4 | @@ -19,6 +19,8 @@ |
5 | import itertools |
6 | import logging |
7 | import mock |
8 | +import os |
9 | +import shutil |
10 | import socket |
11 | import time |
12 | import tempfile |
13 | @@ -67,14 +69,21 @@ |
14 | time.sleep(1) |
15 | truncate = False |
16 | |
17 | + def setUp(self): |
18 | + """Create a temporary directory for logging.""" |
19 | + self.logdir = tempfile.mkdtemp() |
20 | + self.logfile = os.path.join(self.logdir, 'tmp.log') |
21 | + |
22 | + def tearDown(self): |
23 | + """Cleanup artifacts from setUp.""" |
24 | + shutil.rmtree(self.logdir) |
25 | + |
26 | def test_easy(self): |
27 | """Make sure we can match a typical first message.""" |
28 | steps = [ |
29 | { |
30 | "message": "test_easy", |
31 | "pattern": ".*log source = /proc/kmsg started", |
32 | - "timeout": 120, |
33 | - "booted": False, |
34 | }, |
35 | ] |
36 | |
37 | @@ -82,9 +91,9 @@ |
38 | 'test 1', |
39 | '<13>Feb 20 13:38:55 host user: log source = /proc/kmsg started', |
40 | ] |
41 | - r = RSyslog("utah-test", '/tmp') |
42 | + r = RSyslog("utah-test", self.logdir) |
43 | threading.Thread(target=self.producer, args=(r.port, messages)).start() |
44 | - r.wait_for_install(steps) |
45 | + r._wait_for_steps(steps, 5, self.logfile, {}) |
46 | |
47 | def test_fail_pattern(self): |
48 | """Exception is raised on fail_pattern match.""" |
49 | @@ -93,7 +102,6 @@ |
50 | "message": "test_late_command_failure", |
51 | "pattern": "finished", |
52 | "fail_pattern": "failure", |
53 | - "timeout": 120, |
54 | }, |
55 | ] |
56 | |
57 | @@ -103,9 +111,10 @@ |
58 | 'finished', |
59 | ] |
60 | |
61 | - r = RSyslog('utah-test', '/tmp') |
62 | + r = RSyslog("utah-test", self.logdir) |
63 | threading.Thread(target=self.producer, args=(r.port, messages)).start() |
64 | - self.assertRaises(UTAHException, r.wait_for_install, steps) |
65 | + with self.assertRaises(UTAHException): |
66 | + r._wait_for_steps(steps, 5, self.logfile, {}) |
67 | |
68 | def test_future(self): |
69 | """Test handling a missing message.""" |
70 | @@ -115,18 +124,15 @@ |
71 | "pattern": ".*log source = /proc/kmsg started", |
72 | # set a really small timeout to prove we properly calculate |
73 | # future timeout logic |
74 | - "timeout": 0.01, |
75 | "booted": True, |
76 | }, |
77 | { |
78 | "message": "hit skipped", |
79 | "pattern": "hit skipped", |
80 | - "timeout": 120, |
81 | }, |
82 | { |
83 | "message": "message 3", |
84 | "pattern": "message 3", |
85 | - "timeout": 120, |
86 | }, |
87 | ] |
88 | |
89 | @@ -136,7 +142,7 @@ |
90 | 'message 3', |
91 | ] |
92 | |
93 | - r = RSyslog("utah-test", '/tmp') |
94 | + r = RSyslog("utah-test", self.logdir) |
95 | threading.Thread(target=self.producer, args=(r.port, messages)).start() |
96 | |
97 | def booted_cb(match): |
98 | @@ -145,7 +151,7 @@ |
99 | |
100 | self.test_future_booted = False |
101 | with patch.object(r.logger, 'warn') as mock_method: |
102 | - r.wait_for_install(steps, booted_cb) |
103 | + r._wait_for_steps(steps, 5, self.logfile, {'booted': booted_cb}) |
104 | mock_method.assert_called_with( |
105 | 'Expected pattern missed, matched future pattern: %s', |
106 | 'hit skipped') |
107 | @@ -158,18 +164,15 @@ |
108 | { |
109 | "message": "test callbacks", |
110 | "pattern": ".*log source = /proc/kmsg started", |
111 | - "timeout": 120, |
112 | "booted": True, |
113 | }, |
114 | { |
115 | "message": "hit skipped", |
116 | "pattern": "hit skipped", |
117 | - "timeout": 120, |
118 | }, |
119 | { |
120 | "message": "message 3", |
121 | "pattern": "message 3", |
122 | - "timeout": 120, |
123 | "blah": True, |
124 | }, |
125 | ] |
126 | @@ -180,7 +183,7 @@ |
127 | 'message 3', |
128 | ] |
129 | |
130 | - r = RSyslog("utah-test", '/tmp') |
131 | + r = RSyslog("utah-test", self.logdir) |
132 | threading.Thread(target=self.producer, args=(r.port, messages)).start() |
133 | |
134 | def booted_cb(match): |
135 | @@ -196,61 +199,66 @@ |
136 | 'booted': booted_cb, |
137 | 'blah': blah_cb, |
138 | } |
139 | - r._wait_for_steps(steps, "/dev/null", callbacks) |
140 | + r._wait_for_steps(steps, 5, "/dev/null", callbacks) |
141 | self.assertTrue(self.test_callbacks_booted, 'booted callback not made') |
142 | self.assertTrue(self.test_callbacks_blah, 'blah callback not made') |
143 | |
144 | + def test_install(self): |
145 | + """Make sure wait_for_install works.""" |
146 | + messages = [ |
147 | + 'blah', |
148 | + 'ubiquity[11]: Ubiquity 1', |
149 | + 'finish-install: umount', |
150 | + ] |
151 | + |
152 | + def cb(match): |
153 | + self.test_install_cb = True |
154 | + |
155 | + self.test_install_cb = False |
156 | + r = RSyslog("utah-test", self.logdir) |
157 | + r.logger = mock.Mock() |
158 | + threading.Thread(target=self.producer, args=(r.port, messages)).start() |
159 | + r.wait_for_install(cb) |
160 | + self.assertFalse(r.logger.error.called) |
161 | + self.assertTrue(self.test_install_cb) |
162 | + |
163 | + r = RSyslog("utah-test", self.logdir) |
164 | + messages[-1] = 'utah: Late command failure detected' |
165 | + threading.Thread(target=self.producer, args=(r.port, messages)).start() |
166 | + with self.assertRaises(UTAHException): |
167 | + r.wait_for_install() |
168 | + |
169 | def test_booted(self): |
170 | """Make sure wait_for_booted works.""" |
171 | tst_uuid = str(uuid.uuid4()) |
172 | - steps = [ |
173 | - { |
174 | - "message": "test booted", |
175 | - "pattern": ".*log source = /proc/kmsg started", |
176 | - "timeout": 120, |
177 | - }, |
178 | - { |
179 | - "message": "test uuid", |
180 | - "uuid": True, |
181 | - "pattern": '.*Machine UUID:\s+(.*)', |
182 | - "timeout": 2, |
183 | - }, |
184 | - ] |
185 | - |
186 | messages = [ |
187 | 'test 1', |
188 | '<13>Feb 20 13:38:55 host user: log source = /proc/kmsg started', |
189 | '<13>Feb 21 13:40:55 host user: Machine UUID: {}'.format(tst_uuid), |
190 | ] |
191 | |
192 | - r = RSyslog("utah-test", '/tmp') |
193 | + r = RSyslog("utah-test", self.logdir) |
194 | r.logger = mock.Mock() |
195 | threading.Thread(target=self.producer, args=(r.port, messages)).start() |
196 | - r.wait_for_booted(steps, tst_uuid) |
197 | + r.wait_for_booted(tst_uuid) |
198 | self.assertFalse(r.logger.error.called) |
199 | |
200 | threading.Thread(target=self.producer, args=(r.port, messages)).start() |
201 | with self.assertRaises(UTAHProvisioningException): |
202 | - r.wait_for_booted(steps, 'bad') |
203 | + r.wait_for_booted('bad') |
204 | |
205 | def test_booted_workaround(self): |
206 | """Test that wait_for_booted doesn't raise an exception on timeout.""" |
207 | - steps = [ |
208 | - { |
209 | - "message": "test booted", |
210 | - "pattern": "Wait forever for a line that never comes", |
211 | - "timeout": 5, |
212 | - }, |
213 | - ] |
214 | - |
215 | - r = RSyslog("utah-test", '/tmp') |
216 | - |
217 | - r.wait_for_booted(steps, None) |
218 | + orig = config.boottimeout |
219 | + try: |
220 | + config.boottimeout = 0.5 |
221 | + r = RSyslog("utah-test", self.logdir) |
222 | + r.wait_for_booted(None) |
223 | + finally: |
224 | + config.boottimeout = orig |
225 | |
226 | def test_wait_for_run(self): |
227 | """Test that wait_for_utah_run works.""" |
228 | - steps = config.autorun_steps |
229 | - |
230 | messages = [ |
231 | 'test 1', |
232 | 'utah: completed /etc/utah/autorun/01_run-utah rc=7', |
233 | @@ -261,7 +269,7 @@ |
234 | |
235 | r = RSyslog("utah-test", '/tmp') |
236 | threading.Thread(target=self.producer, args=(r.port, messages)).start() |
237 | - r.wait_for_utah_run(steps, cb) |
238 | + r.wait_for_utah_run(cb) |
239 | self.assertEqual(7, self.utah_run_rc) |
240 | |
241 | def usefile(self, truncate): |
242 | @@ -270,7 +278,6 @@ |
243 | { |
244 | "message": "truncate test", |
245 | "pattern": ".*log source = /proc/kmsg started", |
246 | - "timeout": 60, |
247 | }, |
248 | ] |
249 | |
250 | @@ -282,12 +289,12 @@ |
251 | ] |
252 | |
253 | with tempfile.NamedTemporaryFile() as f: |
254 | - r = RSyslog("utah-test", '/tmp', f.name) |
255 | + r = RSyslog("utah-test", self.logdir, f.name) |
256 | t = threading.Thread( |
257 | target=self.file_producer, |
258 | args=(f, messages, truncate)) |
259 | t.start() |
260 | - r.wait_for_booted(steps, None) |
261 | + r._wait_for_steps(steps, 5, self.logfile, {}) |
262 | t.join() |
263 | |
264 | def test_usefile(self): |
265 | @@ -304,31 +311,31 @@ |
266 | { |
267 | "message": "message_1", |
268 | "pattern": "finished", |
269 | - "timeout": 1, |
270 | }, |
271 | { |
272 | "message": "message_2", |
273 | "pattern": "pattern", |
274 | - "timeout": 1, |
275 | }, |
276 | { |
277 | "message": "message_3", |
278 | "pattern": "pattern", |
279 | - "timeout": 1, |
280 | }, |
281 | ] |
282 | |
283 | - messages = itertools.repeat('log example') |
284 | + messages = ['message_1', 'message_2', 'message_4'] |
285 | pattern = (r'Timeout \(\d+\) occurred for {} message.\n' |
286 | r'Remaining messages: {}' |
287 | .format(steps[0]['message'], |
288 | ', '.join([step['message'] |
289 | for step in steps[1:]]))) |
290 | |
291 | - r = RSyslog('utah-test', '/tmp') |
292 | + r = RSyslog('utah-test', self.logdir) |
293 | thread = threading.Thread(target=self.producer, |
294 | args=(r.port, messages)) |
295 | thread.daemon = True |
296 | thread.start() |
297 | - self.assertRaisesRegexp(UTAHTimeout, pattern, |
298 | - r.wait_for_install, steps) |
299 | + start = time.time() |
300 | + with self.assertRaisesRegexp(UTAHTimeout, pattern): |
301 | + r._wait_for_steps(steps, 5, self.logfile, {}) |
302 | + # assert the timeout was within 2 seconds of what we expect |
303 | + self.assertLess(abs(time.time() - start - 5), 2) |
304 | |
305 | === modified file 'utah/config.py' |
306 | --- utah/config.py 2013-06-05 15:56:26 +0000 |
307 | +++ utah/config.py 2013-06-14 03:44:27 +0000 |
308 | @@ -46,67 +46,17 @@ |
309 | "$schema": "http://json-schema.org/draft-04/schema#", |
310 | 'title': 'UTAH configuration', |
311 | 'type': 'object', |
312 | - 'additionalProperties': False, |
313 | + 'additionalProperties': True, |
314 | 'properties': { |
315 | 'arch': { |
316 | 'description': 'Default architecture', |
317 | 'type': 'string', |
318 | }, |
319 | - 'autorun_steps': { |
320 | - 'description': 'rsyslog patterns matched during autorun', |
321 | - 'type': 'array', |
322 | - 'items': {'type': 'object'}, |
323 | - 'default': [ |
324 | - { |
325 | - 'message': 'UTAH setup', |
326 | - 'pattern': ('.*utah: completed ' |
327 | - '/etc/utah/autorun/\d+_utah-setup'), |
328 | - 'timeout': 180, |
329 | - }, |
330 | - { |
331 | - 'message': 'UTAH client install', |
332 | - 'pattern': ('.*utah: completed ' |
333 | - '/etc/utah/autorun/\d+_install-client'), |
334 | - 'timeout': 180, |
335 | - }, |
336 | - { |
337 | - 'message': 'UTAH runlist', |
338 | - 'pattern': ('.*utah: completed ' |
339 | - '/etc/utah/autorun/\d+_run-utah rc=(\d+)'), |
340 | - 'completed': True, |
341 | - 'timeout': 7200, # 2 hours |
342 | - }, |
343 | - ], |
344 | - }, |
345 | 'boot': { |
346 | 'description': 'Default kernel parameters', |
347 | 'type': ['string', 'null'], |
348 | 'default': None, |
349 | }, |
350 | - 'boot_steps': { |
351 | - 'description': 'rsyslog patterns matched during boot', |
352 | - 'type': 'array', |
353 | - 'items': {'type': 'object'}, |
354 | - 'default': [ |
355 | - { |
356 | - 'message': 'system booted', |
357 | - 'pattern': '.*Linux version \\d', |
358 | - 'timeout': 600, |
359 | - }, |
360 | - { |
361 | - 'message': 'disk mounted', |
362 | - 'pattern': '.*mounted filesystem with ordered data', |
363 | - 'timeout': 600, |
364 | - }, |
365 | - { |
366 | - 'message': 'UTAH uuid received', |
367 | - # binds this to the callback for wait_for_booted |
368 | - 'uuid': True, |
369 | - 'pattern': '.*Machine UUID:\s+(.*)', |
370 | - 'timeout': 600, |
371 | - }, |
372 | - ], |
373 | - }, |
374 | 'boottimeout': { |
375 | 'description': 'Time to wait for installed system to boot', |
376 | 'type': 'integer', |
377 | @@ -223,50 +173,6 @@ |
378 | 'format': 'path', |
379 | 'default': None, |
380 | }, |
381 | - 'install_steps': { |
382 | - 'description': 'rsyslog patterns matched during installation', |
383 | - 'type': 'array', |
384 | - 'items': {'type': 'object'}, |
385 | - 'default': [ |
386 | - { |
387 | - 'message': 'system started', |
388 | - 'pattern': '.*', |
389 | - 'booted': True, |
390 | - 'timeout': 360, |
391 | - }, |
392 | - { |
393 | - 'message': 'install started', |
394 | - 'pattern': [ |
395 | - '.*ubiquity\\[\\d+\\]: Ubiquity \\d', |
396 | - # needed for non-ubiquity installs |
397 | - '.*anna-install', |
398 | - ], |
399 | - 'timeout': 120, |
400 | - }, |
401 | - { |
402 | - 'message': 'install complete', |
403 | - 'pattern': [ |
404 | - '.*ubiquity reboot', |
405 | - '.*log-output -t ubiquity umount /target$', |
406 | - # for server bootspeed |
407 | - '.*finish-install.d/94save-logs', |
408 | - '.*finish-install: umount', |
409 | - # a catch-all |
410 | - '.*rsyslogd:.*exiting on signal 15.', |
411 | - ], |
412 | - 'fail_pattern': [ |
413 | - # ubiquity/failure_command |
414 | - '.*utah: Installation failure detected', |
415 | - # ubiquity/success_command |
416 | - # and d-i/late_command failures |
417 | - '.*utah: Late command failure detected', |
418 | - # d-i installation failure |
419 | - '.*exited with status [^0]', |
420 | - ], |
421 | - 'timeout': 3600, |
422 | - }, |
423 | - ], |
424 | - }, |
425 | 'installpackages': { |
426 | 'description': 'Packages to install on system', |
427 | 'type': 'array', |
428 | |
429 | === modified file 'utah/provisioning/baremetal/bamboofeeder.py' |
430 | --- utah/provisioning/baremetal/bamboofeeder.py 2013-06-10 05:48:52 +0000 |
431 | +++ utah/provisioning/baremetal/bamboofeeder.py 2013-06-14 03:44:27 +0000 |
432 | @@ -271,8 +271,8 @@ |
433 | self.restart() |
434 | self.logger.info('System installing') |
435 | |
436 | - self.rsyslog.wait_for_install(config.install_steps) |
437 | - self.rsyslog.wait_for_booted(config.boot_steps, self.uuid) |
438 | + self.rsyslog.wait_for_install() |
439 | + self.rsyslog.wait_for_booted(self.uuid) |
440 | retry(self.sshcheck, logmethod=self.logger.info, |
441 | retry_timeout=config.checktimeout) |
442 | |
443 | |
444 | === modified file 'utah/provisioning/baremetal/cobbler.py' |
445 | --- utah/provisioning/baremetal/cobbler.py 2013-06-10 05:57:49 +0000 |
446 | +++ utah/provisioning/baremetal/cobbler.py 2013-06-14 03:44:27 +0000 |
447 | @@ -202,8 +202,7 @@ |
448 | for arg in self.machineinfo]) |
449 | |
450 | self.restart() |
451 | - self.rsyslog.wait_for_install(config.install_steps, |
452 | - self._disable_netboot) |
453 | + self.rsyslog.wait_for_install(self._disable_netboot) |
454 | |
455 | # TODO: wait_for_booted isn't working on d-i installs and |
456 | # autopilot jobs like: |
457 | @@ -211,7 +210,7 @@ |
458 | # ps-unity-autopilot-release-testing/140/label=autopilot-nvidia/ |
459 | # For now, it seems to be a problem on all installs, so we're going to |
460 | # handle it in rsyslog.py |
461 | - self.rsyslog.wait_for_booted(config.boot_steps, self.uuid) |
462 | + self.rsyslog.wait_for_booted(self.uuid) |
463 | |
464 | retry(self.sshcheck, logmethod=self.logger.info, |
465 | retry_timeout=config.checktimeout) |
466 | |
467 | === modified file 'utah/provisioning/rsyslog.py' |
468 | --- utah/provisioning/rsyslog.py 2013-05-22 16:56:32 +0000 |
469 | +++ utah/provisioning/rsyslog.py 2013-06-14 03:44:27 +0000 |
470 | @@ -23,6 +23,7 @@ |
471 | import sys |
472 | import time |
473 | |
474 | +from utah.config import config |
475 | from utah.exceptions import UTAHException |
476 | from utah.provisioning.exceptions import UTAHProvisioningException |
477 | from utah.timeout import UTAHTimeout |
478 | @@ -70,65 +71,85 @@ |
479 | """Return UDP port number used to listen for syslog messages.""" |
480 | return self._port |
481 | |
482 | - def wait_for_install(self, steps, booted_callback=None): |
483 | + def wait_for_install(self, booted_callback=None): |
484 | """Monitor rsyslog messages during the installation. |
485 | |
486 | Works through each step in the steps array to find messages in the |
487 | syslog indicating what part of the install we are in. |
488 | |
489 | - :param steps: Set of steps that the installation will go through. |
490 | - :type steps: list(dict) |
491 | :param booted_callback: |
492 | function to be called once the system has been booted |
493 | :type booted_callback: callable | None |
494 | |
495 | - An example of a valid steps argument would be as follows:: |
496 | - |
497 | - { |
498 | - "message": "system started", |
499 | - "pattern": ".*log source = /proc/kmsg started", |
500 | - "fail_pattern": ".*exited with status [^0]", |
501 | - "timeout": 1000, |
502 | - "booted": false |
503 | - }, |
504 | - { |
505 | - "message": "system started", |
506 | - "pattern": [ |
507 | - ".*log source = /proc/kmsg started" |
508 | - ], |
509 | - "timeout": 1000, |
510 | - "booted": false |
511 | - } |
512 | - |
513 | - where: |
514 | - |
515 | - - ``pattern`` can an array or string. |
516 | - - ``booted`` is optional, but can be used to indicate that the |
517 | - system is fully operational. This can be used in conjunction with |
518 | - the ``booted_callback`` function. |
519 | - |
520 | - .. seealso:: :meth:`wait_for_booted` |
521 | - |
522 | """ |
523 | + steps = [ |
524 | + { |
525 | + 'message': 'system started', |
526 | + 'pattern': ['.*'], |
527 | + 'booted': True, |
528 | + }, |
529 | + { |
530 | + 'message': 'install started', |
531 | + 'pattern': [ |
532 | + '.*ubiquity\\[\\d+\\]: Ubiquity \\d', |
533 | + # needed for non-ubiquity installs |
534 | + '.*anna-install', |
535 | + ], |
536 | + }, |
537 | + { |
538 | + 'message': 'install complete', |
539 | + 'pattern': [ |
540 | + '.*ubiquity reboot', |
541 | + '.*log-output -t ubiquity umount /target$', |
542 | + # for server bootspeed |
543 | + '.*finish-install.d/94save-logs', |
544 | + '.*finish-install: umount', |
545 | + # a catch-all |
546 | + '.*rsyslogd:.*exiting on signal 15.', |
547 | + ], |
548 | + 'fail_pattern': [ |
549 | + # ubiquity/failure_command |
550 | + '.*utah: Installation failure detected', |
551 | + # ubiquity/success_command |
552 | + # and d-i/late_command failures |
553 | + '.*utah: Late command failure detected', |
554 | + # d-i installation failure |
555 | + '.*exited with status [^0]', |
556 | + ], |
557 | + }, |
558 | + ] |
559 | + |
560 | logfile = '{}/{}-install.log'.format(self._logpath, self._host) |
561 | callbacks = {'booted': booted_callback} |
562 | - self._wait_for_steps(steps, logfile, callbacks) |
563 | + self._wait_for_steps(steps, config.installtimeout, logfile, callbacks) |
564 | |
565 | - def wait_for_booted(self, steps, uuid): |
566 | + def wait_for_booted(self, uuid): |
567 | """Monitor rsyslog during boot up. |
568 | |
569 | Works the same as the :meth:`wait_for_install` method but takes in |
570 | steps that determine a system has booted after the install has |
571 | completed. |
572 | |
573 | - :param steps: Set of steps that the boot up will go through. |
574 | - :type steps: list(dict) |
575 | :param uuid: The UUID generated by provisioning code. |
576 | :type uuid: string |
577 | |
578 | - .. seealso:: :meth:`wait_for_install` |
579 | - |
580 | """ |
581 | + steps = [ |
582 | + { |
583 | + 'message': 'system booted', |
584 | + 'pattern': '.*Linux version \\d', |
585 | + }, |
586 | + { |
587 | + 'message': 'disk mounted', |
588 | + 'pattern': '.*mounted filesystem with ordered data', |
589 | + }, |
590 | + { |
591 | + 'message': 'UTAH uuid received', |
592 | + # binds this to the callback for wait_for_booted |
593 | + 'uuid': True, |
594 | + 'pattern': '.*Machine UUID:\s+(.*)', |
595 | + }, |
596 | + ] |
597 | logfile = '{}/{}-boot.log'.format(self._logpath, self._host) |
598 | |
599 | def _cb(match): |
600 | @@ -140,7 +161,7 @@ |
601 | # TODO: Resolve the issues around this and get it back to normal |
602 | # i.e. LP#1100386 |
603 | try: |
604 | - self._wait_for_steps(steps, logfile, callbacks) |
605 | + self._wait_for_steps(steps, config.boottimeout, logfile, callbacks) |
606 | if not _cb.uuid: |
607 | self.logger.error('UUID not sent from target') |
608 | elif uuid and uuid != _cb.uuid: |
609 | @@ -149,39 +170,60 @@ |
610 | except UTAHTimeout: |
611 | self.logger.warning('Timed out waiting for boot, but continuing') |
612 | |
613 | - def wait_for_utah_run(self, steps, completed_cb=None): |
614 | + def wait_for_utah_run(self, completed_cb=None): |
615 | """Monitor rsyslog while waiting for utah-client to finish autorun. |
616 | |
617 | - :param steps: Set of steps that the boot up will go through. |
618 | - :type steps: list(dict) |
619 | :param completed_cb: A callback function that will include the utah |
620 | exit code. |
621 | :type completed_cb: function |
622 | |
623 | """ |
624 | + steps = [ |
625 | + { |
626 | + 'message': 'UTAH setup', |
627 | + 'pattern': ('.*utah: completed ' |
628 | + '/etc/utah/autorun/\d+_utah-setup'), |
629 | + }, |
630 | + { |
631 | + 'message': 'UTAH client install', |
632 | + 'pattern': ('.*utah: completed ' |
633 | + '/etc/utah/autorun/\d+_install-client'), |
634 | + }, |
635 | + { |
636 | + 'message': 'UTAH runlist', |
637 | + 'pattern': ('.*utah: completed ' |
638 | + '/etc/utah/autorun/\d+_run-utah rc=(\d+)'), |
639 | + 'completed': True, |
640 | + }, |
641 | + ] |
642 | logfile = '{}/{}-utah-run.log'.format(self._logpath, self._host) |
643 | callbacks = {'completed': completed_cb} |
644 | - self._wait_for_steps(steps, logfile, callbacks) |
645 | + timeout = config.runtimeout |
646 | + if not timeout: |
647 | + # this means forever, but lets give something sensible to help |
648 | + # prevent jobs from hanging forever |
649 | + timeout = 172800 # 2 days |
650 | + self._wait_for_steps(steps, timeout, logfile, callbacks) |
651 | |
652 | - def _wait_for_steps(self, steps, logfile, callbacks): |
653 | + def _wait_for_steps(self, steps, timeout, logfile, callbacks): |
654 | with open(logfile, 'w') as f: |
655 | x = 0 |
656 | + deadline = time.time() + timeout |
657 | while x < len(steps): |
658 | message = steps[x]['message'] |
659 | pattern = steps[x]['pattern'] |
660 | fail_pattern = steps[x].get('fail_pattern', []) |
661 | - timeout = steps[x]['timeout'] |
662 | |
663 | if not isinstance(pattern, list): |
664 | pattern = [pattern] |
665 | if not isinstance(fail_pattern, list): |
666 | fail_pattern = [fail_pattern] |
667 | pattern.extend(fail_pattern) |
668 | - future_pats, future_timeout = self._future_patterns(steps, x) |
669 | + future_pats = self._future_patterns(steps, x) |
670 | pattern.extend(future_pats) |
671 | - self.logger.info('Waiting %ds for: %s', timeout, message) |
672 | - timeout += future_timeout |
673 | - pattern, match = self._wait_for(f, pattern, message, timeout) |
674 | + self.logger.info( |
675 | + 'Waiting %ds for: %s', deadline - time.time(), message) |
676 | + pattern, match = self._wait_for(f, pattern, message, deadline) |
677 | if match is None: |
678 | remaining_messages = [step['message'] |
679 | for step in steps[x + 1:]] |
680 | @@ -217,16 +259,14 @@ |
681 | @staticmethod |
682 | def _future_patterns(steps, index): |
683 | patterns = [] |
684 | - timeout = 0 |
685 | index += 1 |
686 | for step in steps[index:]: |
687 | pattern = step['pattern'] |
688 | - timeout += step['timeout'] |
689 | if not isinstance(pattern, list): |
690 | patterns.append(pattern) |
691 | else: |
692 | patterns.extend(pattern) |
693 | - return patterns, timeout |
694 | + return patterns |
695 | |
696 | @staticmethod |
697 | def _fast_forward(steps, pattern, match, callbacks): |
698 | @@ -252,11 +292,10 @@ |
699 | |
700 | x += 1 |
701 | |
702 | - def _wait_for(self, writer, patterns, message, timeout): |
703 | + def _wait_for(self, writer, patterns, message, deadline): |
704 | pats = [re.compile(x) for x in patterns] |
705 | |
706 | - endtime = time.time() + timeout |
707 | - while(time.time() < endtime): |
708 | + while(time.time() < deadline): |
709 | data = self._read_line() |
710 | if data: |
711 | if self.logger.level <= logging.DEBUG: |
712 | |
713 | === modified file 'utah/provisioning/vm.py' |
714 | --- utah/provisioning/vm.py 2013-06-11 04:23:34 +0000 |
715 | +++ utah/provisioning/vm.py 2013-06-14 03:44:27 +0000 |
716 | @@ -415,7 +415,7 @@ |
717 | '{}.syslog.log'.format(self.name)) |
718 | self.logger.info('Logs will be written to %s', log_filename) |
719 | |
720 | - self.rsyslog.wait_for_install(config.install_steps) |
721 | + self.rsyslog.wait_for_install() |
722 | while vm.isActive() is not 0: |
723 | pass |
724 | |
725 | @@ -550,7 +550,7 @@ |
726 | else: |
727 | raise UTAHVMProvisioningException('Failed to provision VM') |
728 | |
729 | - self.rsyslog.wait_for_booted(config.boot_steps, self.uuid) |
730 | + self.rsyslog.wait_for_booted(self.uuid) |
731 | try: |
732 | self.logger.info( |
733 | 'Waiting %d seconds for ping response', config.boottimeout) |
734 | |
735 | === modified file 'utah/run.py' |
736 | --- utah/run.py 2013-06-10 04:54:50 +0000 |
737 | +++ utah/run.py 2013-06-14 03:44:27 +0000 |
738 | @@ -234,7 +234,7 @@ |
739 | raise UTAHException('Failed to get return code from UTAH runlist') |
740 | _provision_and_run.rc = int(match.group(1)) |
741 | |
742 | - machine.rsyslog.wait_for_utah_run(config.autorun_steps, _complete_cb) |
743 | + machine.rsyslog.wait_for_utah_run(_complete_cb) |
744 | return _provision_and_run.rc, tgt_report |
745 | |
746 |
How does this line (303 in the merge) work: (abs(time. time() - start - 5), 2)
self.assertTrue
? In the documentation on assertTrue, it looks like 2 should be picked up as the mesg argument?
Also, I'm a little worried about the regression potential of 432-435. Are we sure we don't have any test cases that currently take longer than 4 hours?
On line 531 of the merge, I get removing the steps. Was the previous reason for the seealso because the other method had steps, or because it was similar? Maybe the seealso still makes sense?
Looks like that also happened on 589, so maybe there's a consistent reason for it.
It looks like the timeout in wait_for_steps is now the timeout for the whole process rather than each step. That seems reasonable, and I'm assuming that since the previous default values were whole-process timeouts, we don't need to worry about updating them.
The rest of it looks reasonable to me, assuming it all tests out okay.