Merge lp:~fginther/jenkins-launchpad-plugin/wait-for-node into lp:~private-ps-quality-team/jenkins-launchpad-plugin/trunk

Proposed by Francis Ginther
Status: Needs review
Proposed branch: lp:~fginther/jenkins-launchpad-plugin/wait-for-node
Merge into: lp:~private-ps-quality-team/jenkins-launchpad-plugin/trunk
Diff against target: 419 lines (+333/-23)
6 files modified
jenkinsutils.py (+73/-1)
jsonjenkins.py (+0/-19)
launchpad.py (+3/-3)
tests/test_jenkinsutils.py (+134/-0)
tests/test_waitForNodeIdle.py (+60/-0)
waitForNodeIdle.py (+63/-0)
To merge this branch: bzr merge lp:~fginther/jenkins-launchpad-plugin/wait-for-node
Reviewer Review Type Date Requested Status
Martin Mrazik (community) Needs Fixing
PS Jenkins bot continuous-integration Needs Fixing
Review via email: mp+148956@code.launchpad.net

Commit message

Adds a script, waitForNodeIdle.py to wait for a jenkins slave node to become idle. This can be used to automate slave node maintenance tasks which require the node to be offline and idle.

Description of the change

Adds a script to wait for a jenkins slave node to become idle. This can be used to automate slave node maintenance tasks which require the node to be offline and idle. See ps-qa-pbuilder-clean for an example.

Testing:
 - Includes unit tests for wait_for_node_idle()
 - Tested with jenkins job ps-qa-pbuilder-clean to perform a pbuilder --clean.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Martin Mrazik (mrazik) wrote :

Few things:

1. Can we make the timeout arg optional with some reasonable default value? It will make the use of this easier as I won't need to think of what "reasonable timeout" is. Not quite sure what the default should be but I would expect something like 2 hours.

2. I would prefer an "--mark-offline" option (maybe on by default) which would put the node to maintenance so no new jobs will be scheduled. You will need to do this anyway and if we do it directly in this script we will shorten the window where a race condition can occur. After the node is in maintenance I would do the check for idle again just to make sure nothing was scheduled while we were doing the lock. Once --mark-offline is there it would be nice to have --mark-online which would put the node back online again without any additional checks. That way all I need is this python script and the rest can be done in shell as a separate build step.

3. I wonder if "def wait_for_node_idle" shouldn't be in jenkinsutils.py (as well as any helpers for making the node online/offline).

review: Needs Fixing
Revision history for this message
Martin Mrazik (mrazik) wrote :

btw. I just tried to put a node offline while a test was running and the test continues to run. Maybe with the --mark-offline option the script can first mark the node offline and then wait for being idle.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Francis Ginther (fginther) wrote :

This doesn't work right due to some authorization issue:
Traceback (most recent call last):
  File "/var/lib/jenkins/jenkins-launchpad-plugin-wfn/waitForNodeIdle.py", line 95, in <module>
    sys.exit(main())
  File "/var/lib/jenkins/jenkins-launchpad-plugin-wfn/waitForNodeIdle.py", line 87, in main
    mark_offline(json_jenkins, args.url, args.node)
  File "/var/lib/jenkins/jenkins-launchpad-plugin-wfn/waitForNodeIdle.py", line 46, in mark_offline
    data = json_jenkins.get_json_data(offline_url, False)
  File "/var/lib/jenkins/jenkins-launchpad-plugin-wfn/jsonjenkins.py", line 19, in get_json_data
    data = urllib2.urlopen(url)
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

It's the call to load http://{server}/computer/{node}/toggleOffline which works fine when triggered from a browser. Yes, I know that get_json_data isn't the right call for this, but it's failing on the urlopen method, not the json.loads method.

The alternative solution is to call jenkins-cli from within the python script. I don't want to do that as it's mixing two APIs when a pure web based solution should work.

Revision history for this message
Martin Mrazik (mrazik) wrote :

using PreemptiveBasicAuthHandler class from[1] instead of urllib2.HTTPBasicAuthHandler seems to fix the issue (well, then it fails on the json.load call but that seems to be straightforward).

What about adding PreemptiveBasicAuthHandler and renaming JSONJenkins to something else (JenkinsURLlib?) providing two different methods -- get_json_data (with the json.load as its right now) and get_data (returning raw data)?

[1]
http://pythonhosted.org/jenkinsapi/_modules/jenkinsapi/utils/urlopener.html

Revision history for this message
Francis Ginther (fginther) wrote :

Address the earlier comments. This version was tested on a local jenkins server.

mark-offline will mark the node offline, then wait for the timeout (currently 2 hours) for the node to become idle. When a node is offline, jenkins will not schedule new jobs, but does wait for any running jobs to complete. This allows for a graceful shutdown of a node without impacting any running jobs.

Revision history for this message
Francis Ginther (fginther) wrote :

Now ready for review.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Martin Mrazik (mrazik) wrote :

Tests are failing.

review: Needs Fixing

Unmerged revisions

85. By Francis Ginther

Added logging messages.

84. By Francis Ginther

Merge to trunk

83. By Francis Ginther

Updated waitForNodeIdle test cases.

82. By Francis Ginther

Refactored and renamed JSONJenkins to JenkinsUrl and fixed authorization to allow setting nodes offline. Moved code to manipulate nodes into jenkinsutils.

81. By Francis Ginther

Add options to mark the node offline and online.

80. By Francis Ginther

Added test_waitForNodeIdle.py for unit tests.

79. By Francis Ginther

Fixed timeout issue found in testing

78. By Francis Ginther

Add wait_for_node_idle.py to determine when a jenkins node is idle and ready for clean-up actions.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'jenkinsutils.py'
2--- jenkinsutils.py 2013-02-23 13:27:39 +0000
3+++ jenkinsutils.py 2013-03-01 03:45:24 +0000
4@@ -1,11 +1,34 @@
5+import datetime
6+import jenkins
7+import json
8 import re
9+import time
10+import urllib2
11+from jenkinsapi.utils.urlopener import get_jenkins_auth_handler
12 from xml.dom.minidom import parseString
13-import jenkins
14+
15 from settings import (JENKINS_URL,
16 JENKINS_PASSWORD, JENKINS_USER,
17 PUBLIC_JENKINS_URL, URLS_TO_HIDE, logger)
18
19
20+class JenkinsUrl():
21+ urllib_opener = None
22+
23+ def __init__(self, jenkins_url, username, password):
24+ handlers = get_jenkins_auth_handler(username, password, jenkins_url)
25+ self.urllib_opener = urllib2.build_opener(*handlers)
26+ urllib2.install_opener(self.urllib_opener)
27+
28+ def get_json_data(self, url, append_api=True):
29+ if append_api:
30+ url = url + '/api/json'
31+ return json.load(self.get_data(url))
32+
33+ def get_data(self, url):
34+ return urllib2.urlopen(url)
35+
36+
37 def is_job_published(job):
38 if not job:
39 logger.debug('is_job_published: no job specified')
40@@ -217,3 +240,52 @@
41 raise KeyError("Multiple merge jobs found: %s" %
42 (', '.join(merge_jobs)))
43 return merge_jobs[0]
44+
45+
46+def is_node_offline(jenkins, url, node):
47+ '''Returns the offline state of the node'''
48+ node_url = '/'.join([url, 'computer', node])
49+ data = jenkins.get_json_data(node_url)
50+ return data['temporarilyOffline']
51+
52+
53+def is_node_idle(jenkins, url, node):
54+ '''Returns the idle state of the node'''
55+ node_url = '/'.join([url, 'computer', node])
56+ data = jenkins.get_json_data(node_url)
57+ return data['idle']
58+
59+
60+def set_node_offline(jenkins, url, node, message):
61+ '''Sets the node offline'''
62+ node_url = '/'.join([url, 'computer', node])
63+ if is_node_offline(jenkins, url, node):
64+ return
65+ offline_url = node_url + "/toggleOffline?offlineMessage=" + message
66+ jenkins.get_data(offline_url)
67+ if not is_node_offline(jenkins, url, node):
68+ raise EnvironmentError("Failed to set node offline")
69+
70+
71+def set_node_online(jenkins, url, node):
72+ '''Sets the node online'''
73+ node_url = '/'.join([url, 'computer', node])
74+ if not is_node_offline(jenkins, url, node):
75+ return
76+ online_url = node_url + "/toggleOffline"
77+ jenkins.get_data(online_url)
78+ if is_node_offline(jenkins, url, node):
79+ raise EnvironmentError("Failed to set node online")
80+
81+
82+def wait_for_node_idle(jenkins, url, node, timeout):
83+ '''Waits timeout seconds for a jenkins slave node to become idle'''
84+ node_url = '/'.join([url, 'computer', node])
85+ start_time = current_time = datetime.datetime.now()
86+ end_time = start_time + datetime.timedelta(seconds=timeout)
87+ while current_time < end_time:
88+ if is_node_idle(jenkins, url, node):
89+ return 0
90+ current_time = datetime.datetime.now()
91+ time.sleep(10)
92+ raise EnvironmentError("Node failed to become idle")
93
94=== removed file 'jsonjenkins.py'
95--- jsonjenkins.py 2013-02-06 10:01:05 +0000
96+++ jsonjenkins.py 1970-01-01 00:00:00 +0000
97@@ -1,19 +0,0 @@
98-import json
99-import urllib2
100-
101-
102-class JSONJenkins():
103- urllib_opener = None
104-
105- def __init__(self, jenkins_url, username, password):
106- passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
107- passman.add_password(None, jenkins_url, username, password)
108- authhandler = urllib2.HTTPBasicAuthHandler(passman)
109- self.urllib_opener = urllib2.build_opener(authhandler)
110- urllib2.install_opener(self.urllib_opener)
111-
112- def get_json_data(self, url, append_api=True):
113- if append_api:
114- url = url + '/api/json'
115- data = urllib2.urlopen(url)
116- return json.load(data)
117
118=== modified file 'launchpad.py'
119--- launchpad.py 2013-02-23 17:25:38 +0000
120+++ launchpad.py 2013-03-01 03:45:24 +0000
121@@ -8,7 +8,7 @@
122 LP_ENV, LP_APP, logger)
123 from lazr.restfulclient.errors import Unauthorized
124 import re
125-from jsonjenkins import JSONJenkins
126+from jenkinsutils import JenkinsUrl
127 from textwrap import dedent
128 import jenkinsutils
129 import jenkins
130@@ -189,8 +189,8 @@
131
132 def _get_json_jenkins(self, url):
133 if not self._json_jenkins:
134- self._json_jenkins = JSONJenkins(url, JENKINS_USER,
135- JENKINS_PASSWORD)
136+ self._json_jenkins = JenkinsUrl(url, JENKINS_USER,
137+ JENKINS_PASSWORD)
138 return self._json_jenkins
139
140 def format_message_for_mp_update(self, build_url, message=None,
141
142=== modified file 'tests/test_jenkinsutils.py'
143--- tests/test_jenkinsutils.py 2013-02-23 19:57:22 +0000
144+++ tests/test_jenkinsutils.py 2013-03-01 03:45:24 +0000
145@@ -1,3 +1,4 @@
146+import datetime
147 from mock import MagicMock, patch
148 import jenkinsutils
149 import unittest
150@@ -306,3 +307,136 @@
151 'missing_branch')
152 self.assertIn("jenkins job parameter [missing_branch]: not found",
153 "%s" % (err.exception))
154+
155+
156+class NodeTestCase(unittest.TestCase):
157+
158+ def test_is_node_idle(self):
159+ jenkins_mock = MagicMock()
160+ jenkins_mock.get_json_data = MagicMock(return_value={'idle': True})
161+ ret = jenkinsutils.is_node_idle(jenkins_mock, 'http://ip',
162+ 'pbuilder-node')
163+ self.assertTrue(ret)
164+
165+ def test_is_node_busy(self):
166+ jenkins_mock = MagicMock()
167+ jenkins_mock.get_json_data = MagicMock(return_value={'idle': False})
168+ ret = jenkinsutils.is_node_idle(jenkins_mock, 'http://ip',
169+ 'pbuilder-node')
170+ self.assertFalse(ret)
171+
172+ def test_is_node_offline(self):
173+ jenkins_mock = MagicMock()
174+ jenkins_mock.get_json_data = MagicMock(return_value={'temporarilyOffline': True})
175+ ret = jenkinsutils.is_node_offline(jenkins_mock, 'http://ip',
176+ 'pbuilder-node')
177+ self.assertTrue(ret)
178+
179+ def test_is_node_online(self):
180+ jenkins_mock = MagicMock()
181+ jenkins_mock.get_json_data = MagicMock(return_value={'temporarilyOffline': False})
182+ ret = jenkinsutils.is_node_offline(jenkins_mock, 'http://ip',
183+ 'pbuilder-node')
184+ self.assertFalse(ret)
185+
186+ def test_set_node_online(self):
187+ jenkins_mock = MagicMock()
188+ node_data = [{'temporarilyOffline': False},
189+ {'temporarilyOffline': True}]
190+ jenkins_mock.get_json_data = MagicMock(side_effect=node_data)
191+ jenkins_mock.get_data = MagicMock()
192+ jenkinsutils.set_node_offline(jenkins_mock, 'http://ip',
193+ 'pbuilder-node', 'message')
194+ jenkins_mock.get_data.assert_called_with('http://ip/computer/pbuilder-node/toggleOffline?offlineMessage=message')
195+
196+ def test_set_node_online_already_online(self):
197+ jenkins_mock = MagicMock()
198+ node_data = [{'temporarilyOffline': True},
199+ {'temporarilyOffline': True}]
200+ jenkins_mock.get_json_data = MagicMock(side_effect=node_data)
201+ jenkins_mock.get_data = MagicMock()
202+ jenkinsutils.set_node_offline(jenkins_mock, 'http://ip',
203+ 'pbuilder-node', 'message')
204+ jenkins_mock.get_data.assert_not_called()
205+
206+ def test_set_node_online_fail(self):
207+ jenkins_mock = MagicMock()
208+ node_data = [{'temporarilyOffline': False},
209+ {'temporarilyOffline': False}]
210+ jenkins_mock.get_json_data = MagicMock(side_effect=node_data)
211+ jenkins_mock.get_data = MagicMock()
212+ with self.assertRaises(EnvironmentError) as err:
213+ jenkinsutils.set_node_offline(jenkins_mock, 'http://ip',
214+ 'pbuilder-node', 'message')
215+ self.assertEqual('Failed to set node offline', '%s' % err.exception)
216+
217+ def test_set_node_offline(self):
218+ jenkins_mock = MagicMock()
219+ node_data = [{'temporarilyOffline': True},
220+ {'temporarilyOffline': False}]
221+ jenkins_mock.get_json_data = MagicMock(side_effect=node_data)
222+ jenkins_mock.get_data = MagicMock()
223+ jenkinsutils.set_node_online(jenkins_mock, 'http://ip',
224+ 'pbuilder-node')
225+ jenkins_mock.get_data.assert_called_with('http://ip/computer/pbuilder-node/toggleOffline')
226+
227+ def test_set_node_offline_already_offline(self):
228+ jenkins_mock = MagicMock()
229+ node_data = [{'temporarilyOffline': False},
230+ {'temporarilyOffline': False}]
231+ jenkins_mock.get_json_data = MagicMock(side_effect=node_data)
232+ jenkins_mock.get_data = MagicMock()
233+ jenkinsutils.set_node_online(jenkins_mock, 'http://ip',
234+ 'pbuilder-node')
235+ jenkins_mock.get_data.assert_not_called()
236+
237+ def test_set_node_offline_fail(self):
238+ jenkins_mock = MagicMock()
239+ node_data = [{'temporarilyOffline': True},
240+ {'temporarilyOffline': True}]
241+ jenkins_mock.get_json_data = MagicMock(side_effect=node_data)
242+ jenkins_mock.get_data = MagicMock()
243+ with self.assertRaises(EnvironmentError) as err:
244+ jenkinsutils.set_node_online(jenkins_mock, 'http://ip',
245+ 'pbuilder-node')
246+ self.assertEqual('Failed to set node online', '%s' % err.exception)
247+
248+
249+class WaitForNodeIdle(unittest.TestCase):
250+
251+ def setUp(self):
252+ self.times = [datetime.datetime.now(),
253+ datetime.datetime.now() + datetime.timedelta(seconds=10),
254+ datetime.datetime.now() + datetime.timedelta(seconds=20),
255+ datetime.datetime.now() + datetime.timedelta(seconds=30)]
256+ self.datetime_mock = MagicMock()
257+ self.datetime_mock.now = MagicMock(side_effect=self.times)
258+ self.datetime_patch = patch('datetime.datetime', self.datetime_mock)
259+ self.datetime_patch.start()
260+ self.sleep_patch = patch('time.sleep', MagicMock())
261+ self.sleep_patch.start()
262+
263+ def tearDown(self):
264+ self.datetime_patch.stop()
265+ self.sleep_patch.stop()
266+
267+ def test_wait_for_node_idle(self):
268+ '''Verifies that an idle node returns 0'''
269+ jenkins_mock = MagicMock()
270+ jenkins_mock.get_json_data = MagicMock(return_value={'idle': True})
271+ ret = jenkinsutils.wait_for_node_idle(jenkins_mock,
272+ 'http://ip',
273+ 'pbuilder-node',
274+ 20)
275+ self.assertEqual(ret, 0)
276+
277+ def test_wait_for_node_idle_timeout(self):
278+ '''Verifies that a busy node returns 1 (timeout hit)'''
279+ jenkins_mock = MagicMock()
280+ jenkins_mock.get_json_data = MagicMock(return_value={'idle': False})
281+ with self.assertRaises(EnvironmentError) as err:
282+ jenkinsutils.wait_for_node_idle(jenkins_mock,
283+ 'http://ip',
284+ 'pbuilder-node',
285+ 24)
286+ self.assertEqual('Node failed to become idle', '%s' % err.exception)
287
288=== added file 'tests/test_waitForNodeIdle.py'
289--- tests/test_waitForNodeIdle.py 1970-01-01 00:00:00 +0000
290+++ tests/test_waitForNodeIdle.py 2013-03-01 03:45:24 +0000
291@@ -0,0 +1,60 @@
292+import datetime
293+from mock import patch, MagicMock
294+import unittest
295+
296+import jenkinsutils
297+import waitForNodeIdle
298+
299+
300+class TestWaitForNodeIdle(unittest.TestCase):
301+
302+ def setUp(self):
303+ self.args_mock = MagicMock()
304+ self.args_mock.url = 'http://ip'
305+ self.args_mock.node = 'pbuilder-node'
306+ self.args_patch = patch('waitForNodeIdle.parse_arguments',
307+ MagicMock(return_value=self.args_mock))
308+ self.args_patch.start()
309+
310+ def tearDown(self):
311+ self.args_patch.stop()
312+
313+ def test_mark_offline(self):
314+ '''Verifies that a successful node offline request returns 0'''
315+ set_offline_mock = MagicMock()
316+ wait_mock = MagicMock()
317+ self.args_mock.mark_offline = True
318+ self.args_mock.mark_online = False
319+ with patch('jenkinsutils.set_node_offline', set_offline_mock):
320+ with patch('jenkinsutils.wait_for_node_idle', wait_mock):
321+ ret = waitForNodeIdle.main()
322+ self.assertEqual(ret, 0)
323+
324+ def test_mark_offline_fail(self):
325+ '''Verifies that a failed node offline request returns 1'''
326+ set_offline_mock = MagicMock()
327+ wait_mock = MagicMock(side_effect=EnvironmentError())
328+ self.args_mock.mark_offline = True
329+ self.args_mock.mark_online = False
330+ with patch('jenkinsutils.set_node_offline', set_offline_mock):
331+ with patch('jenkinsutils.wait_for_node_idle', wait_mock):
332+ ret = waitForNodeIdle.main()
333+ self.assertEqual(ret, 1)
334+
335+ def test_mark_online(self):
336+ '''Verifies that a successful node online request returns 0'''
337+ self.args_mock.mark_offline = False
338+ self.args_mock.mark_online = True
339+ set_online_mock = MagicMock()
340+ with patch('jenkinsutils.set_node_online', set_online_mock):
341+ ret = waitForNodeIdle.main()
342+ self.assertEqual(ret, 0)
343+
344+ def test_mark_online_fail(self):
345+ '''Verifies that a failed node online request returns 1'''
346+ self.args_mock.mark_offline = False
347+ self.args_mock.mark_online = True
348+ set_online_mock = MagicMock(side_effect=EnvironmentError())
349+ with patch('jenkinsutils.set_node_online', set_online_mock):
350+ ret = waitForNodeIdle.main()
351+ self.assertEqual(ret, 1)
352
353=== added file 'waitForNodeIdle.py'
354--- waitForNodeIdle.py 1970-01-01 00:00:00 +0000
355+++ waitForNodeIdle.py 2013-03-01 03:45:24 +0000
356@@ -0,0 +1,63 @@
357+#!/usr/bin/env python
358+import argparse
359+import sys
360+import time
361+
362+import jenkinsutils
363+from settings import (JENKINS_USER, JENKINS_PASSWORD, logger)
364+
365+
366+def parse_arguments():
367+ parser = argparse.ArgumentParser()
368+ parser.add_argument('-u', '--url',
369+ required=True,
370+ help='URL of the jenkins server')
371+ parser.add_argument('-n', '--node',
372+ required=True,
373+ help='Node to query')
374+ parser.add_argument('-t', '--timeout',
375+ required=False,
376+ default=7200,
377+ type=int,
378+ help='Length of time in seconds to wait')
379+ parser.add_argument('--mark-offline',
380+ default=False,
381+ action='store_true',
382+ help='Mark the node offline and wait')
383+ parser.add_argument('--mark-online',
384+ default=False,
385+ action='store_true',
386+ help='Mark the node online')
387+ args = parser.parse_args()
388+ if not (args.mark_online or args.mark_offline):
389+ parser.error('Must specify --mark-offline or --mark-online')
390+ return args
391+
392+
393+def main():
394+ args = parse_arguments()
395+ jenkins = jenkinsutils.JenkinsUrl(args.url, JENKINS_USER, JENKINS_PASSWORD)
396+ if args.mark_offline:
397+ try:
398+ jenkinsutils.set_node_offline(jenkins, args.url, args.node,
399+ 'waitForNodeIdle')
400+ jenkinsutils.wait_for_node_idle(jenkins, args.url, args.node,
401+ args.timeout)
402+ logger.debug('Node is offline')
403+ except EnvironmentError:
404+ logger.debug('Failed to fully offline node')
405+ if jenkinsutils.is_node_offline(jenkins, args.url, args.node):
406+ logger.debug('Node is offline, but may be busy')
407+ return 1
408+ elif args.mark_online:
409+ try:
410+ jenkinsutils.set_node_online(jenkins, args.url, args.node)
411+ logger.debug('Node is online')
412+ except EnvironmentError:
413+ logger.debug('Failed to online node')
414+ return 1
415+ return 0
416+
417+
418+if __name__ == "__main__":
419+ sys.exit(main())

Subscribers

People subscribed via source and target branches

to all changes: