Merge lp:~javier.collado/utah/bug1161855 into lp:utah

Proposed by Javier Collado
Status: Merged
Approved by: Javier Collado
Approved revision: 851
Merged at revision: 848
Proposed branch: lp:~javier.collado/utah/bug1161855
Merge into: lp:utah
Diff against target: 250 lines (+103/-23)
5 files modified
debian/changelog (+1/-0)
docs/source/reference.rst (+6/-3)
tests/test_rsyslog.py (+22/-0)
utah/config.py (+9/-1)
utah/provisioning/rsyslog.py (+65/-19)
To merge this branch: bzr merge lp:~javier.collado/utah/bug1161855
Reviewer Review Type Date Requested Status
Andy Doan (community) Approve
Javier Collado (community) Needs Resubmitting
Review via email: mp+156139@code.launchpad.net

Description of the change

This branch adds the ability to set failure patterns in the syslog
implementation so that the server stops if one of those patterns is found (not
only on timeout).

This is good not only to stop as soon as possible, but also to make sure that
an error code is return by the server process when a problem happens.

I've tested this with both desktop/server images and also introducing a problem
in the late/success_command to make sure the error was detected.

To post a comment you must log in.
Revision history for this message
Max Brustkern (nuclearbob) wrote :

Looks good to me, but I haven't tested it yet. I guess maybe a latecommand designed to fail would be the best thing for that?

Revision history for this message
Andy Doan (doanac) wrote :

well done. just add a test for test_rsyslog.py and I'm +1.

I'm a little jealous - I'd started a similar fix but it was uglier than this :)

lp:~javier.collado/utah/bug1161855 updated
846. By Javier Collado

Merged changes to remove temporary files downloaded based on URL (LP: #1101186)

These changes entail moving the cleanup and the process running code outside of
the Machine class to their own classes.

Revision history for this message
Javier Collado (javier.collado) wrote :

A test case to make sure the fail_pattern works has been added and the
documentation has been updated.

Regarding testing, what I've done is test the pattern that is used to detect
problem in the late/success_command. However, I haven't been able to test the
ones for installation failures (either in ubiquity or in d-i). I could send a
message that matches those patterns to syslog, but that won't test that those
messages are really printed for a broken image (when I stumble upon one I'll
take a look at the syslog to check how the error message is written).

review: Needs Resubmitting
lp:~javier.collado/utah/bug1161855 updated
850. By Javier Collado

Added test case to verify fail_patttern

851. By Javier Collado

Updated documentation

Revision history for this message
Javier Collado (javier.collado) wrote :

Rebased changes to fix conflict in debian/changelog

Revision history for this message
Andy Doan (doanac) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'debian/changelog'
2--- debian/changelog 2013-03-29 15:55:46 +0000
3+++ debian/changelog 2013-03-29 17:11:22 +0000
4@@ -2,6 +2,7 @@
5
6 * Return error code on unhandled error (LP: #1160857)
7 * Remove temporary files downloaded based on URL (LP: #1101186)
8+ * Stop server on installation failure (LP: #1161855)
9
10 -- Javier Collado <javier.collado@canonical.com> Wed, 27 Mar 2013 13:00:47 +0100
11
12
13=== modified file 'docs/source/reference.rst'
14--- docs/source/reference.rst 2013-03-28 17:51:50 +0000
15+++ docs/source/reference.rst 2013-03-29 17:11:22 +0000
16@@ -97,15 +97,18 @@
17
18 .. automodule:: utah.provisioning
19
20+.. automodule:: utah.provisioning.exceptions
21+ :members:
22+
23 .. automodule:: utah.provisioning.provisioning
24 :members:
25
26+.. automodule:: utah.provisioning.rsyslog
27+ :members:
28+
29 .. automodule:: utah.provisioning.ssh
30 :members:
31
32-.. automodule:: utah.provisioning.exceptions
33- :members:
34-
35 ``utah.provisioning.baremetal``
36 -------------------------------
37
38
39=== modified file 'tests/test_rsyslog.py'
40--- tests/test_rsyslog.py 2013-03-14 20:23:09 +0000
41+++ tests/test_rsyslog.py 2013-03-29 17:11:22 +0000
42@@ -20,6 +20,7 @@
43 import threading
44 import unittest
45
46+from utah.exceptions import UTAHException
47 from utah.provisioning.rsyslog import RSyslog
48
49
50@@ -71,6 +72,27 @@
51 threading.Thread(target=self.producer, args=(r.port, messages)).start()
52 r.wait_for_install(steps)
53
54+ def test_fail_pattern(self):
55+ """Exception is raised on fail_pattern match."""
56+ steps = [
57+ {
58+ "message": "test_late_command_failure",
59+ "pattern": "finished",
60+ "fail_pattern": "failure",
61+ "timeout": 120,
62+ },
63+ ]
64+
65+ messages = [
66+ 'started',
67+ 'failure',
68+ 'finished',
69+ ]
70+
71+ r = RSyslog('utah-test', '/tmp')
72+ threading.Thread(target=self.producer, args=(r.port, messages)).start()
73+ self.assertRaises(UTAHException, r.wait_for_install, steps)
74+
75 def test_future(self):
76 """
77 test to make sure we can handle missing a message and understanding
78
79=== modified file 'utah/config.py'
80--- utah/config.py 2013-03-18 16:44:36 +0000
81+++ utah/config.py 2013-03-29 17:11:22 +0000
82@@ -174,7 +174,15 @@
83 '.*finish-install: umount',
84 '.*rsyslogd:.*exiting on signal 15.', # a catch-all
85 ],
86- "timeout": 3600
87+ 'fail_pattern': [
88+ # ubiquity/failure_command
89+ '.*utah: Installation failure detected',
90+ # ubiquity/success_command and d-i/late_command failures
91+ '.*utah: Late command failure detected',
92+ # d-i installation failure
93+ '.*exited with status [^0]',
94+ ],
95+ 'timeout': 3600,
96 },
97 ],
98 boot_steps=[
99
100=== modified file 'utah/provisioning/rsyslog.py'
101--- utah/provisioning/rsyslog.py 2013-03-05 21:02:47 +0000
102+++ utah/provisioning/rsyslog.py 2013-03-29 17:11:22 +0000
103@@ -13,6 +13,8 @@
104 # You should have received a copy of the GNU General Public License along
105 # with this program. If not, see <http://www.gnu.org/licenses/>.
106
107+"""rsyslog processing and monitoring."""
108+
109 import logging
110 import os
111 import re
112@@ -25,11 +27,20 @@
113
114
115 class RSyslog(object):
116+
117+ """Listen to rsyslog messages and process them.
118+
119+ :param hostname: Host where the syslog is coming from.
120+ :type hostname: str
121+ :param logpath: Base directory where log files are created.
122+ :type logpath: str
123+ :param usefile:
124+ allows class to ``tail`` a file rather than act as an rsyslogd server
125+ :type usefile: str | None
126+
127+ """
128+
129 def __init__(self, hostname, logpath, usefile=None):
130- """
131- :param usefile: allows class to "tail" a file rather than act as an
132- rsyslogd server
133- """
134 self._host = hostname
135 self._logpath = logpath
136
137@@ -54,16 +65,27 @@
138
139 @property
140 def port(self):
141+ """Return UDP port number used to listen for syslog messages."""
142 return self._port
143
144 def wait_for_install(self, steps, booted_callback=None):
145- """
146+ """Monitor rsyslog messages during the installation.
147+
148 Works through each step in the steps array to find messages in the
149- syslog indicating what part of the install we are in. Steps is an
150- array of:
151+ syslog indicating what part of the install we are in.
152+
153+ :param steps: Set of steps that the installation will go through.
154+ :type steps: list
155+ :param booted_callback:
156+ function to be called once the system has been booted
157+ :type booted_callback: callable | None
158+
159+ An example of a valid steps argument would be as follows::
160+
161 {
162 "message": "system started",
163 "pattern": ".*log source = /proc/kmsg started",
164+ "fail_pattern": ".*exited with status [^0]",
165 "timeout": 1000,
166 "booted": false
167 },
168@@ -75,19 +97,33 @@
169 "timeout": 1000,
170 "booted": false
171 }
172- "pattern" can an array or string
173- "booted" is optional, but can be used to indicate that the system is
174- fully operational. This can be used in conjunction with the
175- "booted_callback" function.
176+
177+ where:
178+
179+ - ``pattern`` can an array or string.
180+ - ``booted`` is optional, but can be used to indicate that the
181+ system is fully operational. This can be used in conjunction with
182+ the ``booted_callback`` function.
183+
184+ .. seealso:: :meth:`wait_for_booted`
185+
186 """
187 logfile = '{}/{}-install.log'.format(self._logpath, self._host)
188 callbacks = {'booted': booted_callback}
189 self._wait_for_steps(steps, logfile, callbacks)
190
191 def wait_for_booted(self, steps):
192- """
193- Works the same as the wait_for_install function but takes in steps
194- that determine a system has booted after the install has completed.
195+ """Monitor rsyslog during boot up.
196+
197+ Works the same as the :meth:`wait_for_install` method but takes in
198+ steps that determine a system has booted after the install has
199+ completed.
200+
201+ :param steps: Set of steps that the boot up will go through.
202+ :type steps: list
203+
204+ .. seealso:: :meth:`wait_for_install`
205+
206 """
207 logfile = '{}/{}-boot.log'.format(self._logpath, self._host)
208 callbacks = {}
209@@ -99,14 +135,21 @@
210 while x < len(steps):
211 message = steps[x]['message']
212 pattern = steps[x]['pattern']
213+ fail_pattern = steps[x].get('fail_pattern', [])
214 timeout = steps[x]['timeout']
215
216 if not isinstance(pattern, list):
217 pattern = [pattern]
218+ if not isinstance(fail_pattern, list):
219+ fail_pattern = [fail_pattern]
220+ pattern.extend(fail_pattern)
221 future_pats = self._future_patterns(steps, x)
222 pattern.extend(future_pats)
223 self.logger.info('Waiting %ds for: %s', timeout, message)
224 match = self._wait_for(f, pattern, message, timeout)
225+ if match in fail_pattern:
226+ raise UTAHException('Failure pattern found: {}'
227+ .format(match))
228 if match in future_pats:
229 msg = 'Expected pattern missed, matched future pattern: %s'
230 self.logger.warn(msg, match)
231@@ -135,11 +178,14 @@
232
233 @staticmethod
234 def _fast_forward(steps, pattern, callbacks):
235- """
236- Looks through each item in the steps array to find the index of the
237- given pattern. It will return that index so that the wait_for code
238- knows where to continue from. It will also alert the booted_callback
239- function if that was one of the steps that was missed.
240+ """Figure out what should be the next step.
241+
242+ Look through each item in the steps array to find the index of the
243+ given pattern. It will return that index so that the ``wait_for`` code
244+ knows where to continue from. It will also alert the
245+ ``booted_callback`` function if that was one of the steps that was
246+ missed.
247+
248 """
249 x = 0
250 while x < len(steps):

Subscribers

People subscribed via source and target branches