jenkins master sometimes fails to add slave node

Bug #1376318 reported by Ryan Beisner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
jenkins (Juju Charms Collection)
New
Undecided
Unassigned

Bug Description

The addnode code often executes before the jenkins master service is fully started, resulting in:
    hook failed: "master-relation-changed"

Check & wait logic is needed there.

#### deployer fail
2014-10-01 14:44:47 [DEBUG] deployer.env: Connected to environment
2014-10-01 14:44:47 [INFO] deployer.import: Adding relations...
2014-10-01 14:44:47 [INFO] deployer.import: Adding relation osci:master <-> osci-slave:slave
2014-10-01 14:44:48 [DEBUG] deployer.import: Waiting for relation convergence 60s
2014-10-01 14:45:48 [ERROR] deployer.env: The following units had errors:
   unit: osci/0: machine: 1 agent-state: error details: hook failed: "master-relation-changed"
2014-10-01 14:45:48 [INFO] deployer.cli: Deployment stopped. run time: 539.98
Traceback (most recent call last):
  File "./deploy.py", line 632, in <module>
    main()
  File "./deploy.py", line 541, in main
    '-c', deployer_bundle])
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)

#### juju log
unit-osci-0: 2014-10-01 14:44:53 INFO master-relation-changed Adding node to Jenkins master
unit-osci-0: 2014-10-01 14:44:53 INFO master-relation-changed Traceback (most recent call last):
unit-osci-0: 2014-10-01 14:44:53 INFO master-relation-changed File "/var/lib/juju/agents/unit-osci-0/charm/hooks/addnode", line 18, in <module>
unit-osci-0: 2014-10-01 14:44:53 INFO master-relation-changed l_jenkins.create_node(host, int(executors) * 2, host , labels=labels)
unit-osci-0: 2014-10-01 14:44:53 INFO master-relation-changed File "/usr/lib/python2.7/dist-packages/jenkins/__init__.py", line 415, in create_node
unit-osci-0: 2014-10-01 14:44:53 INFO master-relation-changed raise JenkinsException('create[%s] failed'%(name))
unit-osci-0: 2014-10-01 14:44:53 INFO master-relation-changed jenkins.JenkinsException: create[osci-slave-0] failed
unit-osci-0: 2014-10-01 14:44:53 ERROR juju.worker.uniter uniter.go:486 hook failed: exit status 1

.

Digging a little deeper, simulating and re-creating: I restarted the master jenkins service, then immediately ran these commands, which expose 2 race scenarios:

#### jenkins master service is not started, no socket
ubuntu@juju-beis0-machine-1:~/tmp/charm/hooks$ ./addnode everywhere 2 qwerty ubuntu ubuntu
<jenkins.Jenkins object at 0x7fa96719c7d0>
Traceback (most recent call last):
  File "./addnode", line 16, in <module>
    if l_jenkins.node_exists(host):
  File "/usr/lib/python2.7/dist-packages/jenkins/__init__.py", line 355, in node_exists
    self.get_node_info(name)
  File "/usr/lib/python2.7/dist-packages/jenkins/__init__.py", line 338, in get_node_info
    response = self.jenkins_open(urllib2.Request(self.server + NODE_INFO%locals()))
  File "/usr/lib/python2.7/dist-packages/jenkins/__init__.py", line 174, in jenkins_open
    return urllib2.urlopen(req).read()
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1180, in do_open
    r = h.getresponse(buffering=True)
  File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
    line = self.fp.readline()
  File "/usr/lib/python2.7/socket.py", line 447, in readline
    data = self._sock.recv(self._rbufsize)
socket.error: [Errno 104] Connection reset by peer

#### jenkins master service is started, but not ready
ubuntu@juju-beis0-machine-1:~/tmp/charm/hooks$ ./addnode everywhere 2 qwerty ubuntu ubuntu
<jenkins.Jenkins object at 0x7f7b640587d0>
Adding node to Jenkins master
Traceback (most recent call last):
  File "./addnode", line 20, in <module>
    l_jenkins.create_node(host, int(executors) * 2, host , labels=labels)
  File "/usr/lib/python2.7/dist-packages/jenkins/__init__.py", line 415, in create_node
    raise JenkinsException('create[%s] failed'%(name))
jenkins.JenkinsException: create[everywhere] failed

Tags: uosci
Ryan Beisner (1chb1n)
tags: added: uosci
Revision history for this message
Ryan Beisner (1chb1n) wrote :

The linked branch adds wait/check loop logic.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.