Merge lp:~stylesen/lava-scheduler/fix-worker-multinode-error into lp:lava-scheduler

Proposed by Senthil Kumaran S
Status: Merged
Approved by: Neil Williams
Approved revision: 257
Merged at revision: 256
Proposed branch: lp:~stylesen/lava-scheduler/fix-worker-multinode-error
Merge into: lp:lava-scheduler
Diff against target: 32 lines (+10/-4)
1 file modified
lava_scheduler_daemon/service.py (+10/-4)
To merge this branch: bzr merge lp:~stylesen/lava-scheduler/fix-worker-multinode-error
Reviewer Review Type Date Requested Status
Neil Williams Approve
Antonio Terceiro Pending
Review via email: mp+183301@code.launchpad.net

This proposal supersedes a proposal from 2013-08-30.

Description of the change

Fix scheduling jobs across multiple workers setup.

To post a comment you must log in.
Revision history for this message
Antonio Terceiro (terceiro) wrote : Posted in a previous version of this proposal

> === modified file 'lava_scheduler_daemon/dbjobsource.py'
> --- lava_scheduler_daemon/dbjobsource.py 2013-08-28 15:13:07 +0000
> +++ lava_scheduler_daemon/dbjobsource.py 2013-08-30 18:09:51 +0000
> @@ -192,7 +192,6 @@
> if d.hostname in configured_boards:
> if job:
> job = self._fix_device(d, job)
> - if job:
> job_list.add(job)

This conditional has to stay because _fix_device (which should be called
_allocate_device instead BTW) might return None, in the case where multiple
schedulers try to allocate the same job to different boards, but the local
scheduler lost the race.

Otherwise +1

 review needs-fixing

review: Needs Fixing
Revision history for this message
Neil Williams (codehelp) wrote :

Tested on playgroundmaster & playgroundworker01 - the update to revert the change in dbjobsource.py works fine and the patch fixes the problem of running a single MultiNode group across multiple dispatchers. Approved. Thanks Senthil.

review: Approve
Revision history for this message
Neil Williams (codehelp) wrote :

A further fix is needed - it is possible for the new check to fail as there is no actual_device:

013-09-02 17:08:13,620 [ERROR] [lava_scheduler_daemon.service.JobQueue] AttributeError: 'NoneType' object has no attribute 'hostname'
Traceback (most recent call last):
  File "/srv/lava/.cache/eggs/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/base.py", line 1178, in mainLoop
    self.runUntilCurrent()
  File "/srv/lava/.cache/eggs/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/base.py", line 773, in runUntilCurrent
    f(*a, **kw)
  File "/srv/lava/.cache/eggs/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py", line 368, in callback
    self._startRunCallbacks(result)
  File "/srv/lava/.cache/eggs/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py", line 464, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/srv/lava/.cache/eggs/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py", line 551, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/instance-manager/lava-scheduler/lava_scheduler_daemon/service.py", line 50, in _cbCheckJobs
    if job.actual_device.hostname in configured_boards:
exceptions.AttributeError: 'NoneType' object has no attribute 'hostname'

Testing:
if job.actual_device and job.actual_device.hostname in configured_boards:

Revision history for this message
Neil Williams (codehelp) wrote :

if job.actual_device and job.actual_device.hostname in configured_boards: fix has been added to the lp:~codehelp/lava-scheduler/reserved-boards merge proposal for which this merge is a prerequisite (as I don't have push access to the branch for this proposal).

I'm happy for this branch to be merged once lp:~codehelp/lava-scheduler/reserved-boards is also approved. (i.e. the dependency is now two-way, each requires the other - apply both on the scheduler and the worker(s) before restarting lava on the server and the worker(s).)

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lava_scheduler_daemon/service.py'
2--- lava_scheduler_daemon/service.py 2013-08-28 13:13:46 +0000
3+++ lava_scheduler_daemon/service.py 2013-08-31 01:41:17 +0000
4@@ -17,6 +17,7 @@
5 # along with LAVA Scheduler. If not, see <http://www.gnu.org/licenses/>.
6
7 import logging
8+import lava_dispatcher.config as dispatcher_config
9
10 from twisted.application.service import Service
11 from twisted.internet import defer
12@@ -42,11 +43,16 @@
13 self._cbCheckJobs).addErrback(catchall_errback(self.logger))
14
15 def _cbCheckJobs(self, job_list):
16+ configured_boards = [
17+ x.hostname for x in dispatcher_config.get_devices()]
18+
19 for job in job_list:
20- new_job = JobRunner(self.source, job, self.dispatcher,
21- self.reactor, self.daemon_options)
22- self.logger.info("Starting Job: %d " % job.id)
23- new_job.start()
24+ if job.actual_device.hostname in configured_boards:
25+ new_job = JobRunner(self.source, job, self.dispatcher,
26+ self.reactor, self.daemon_options)
27+ self.logger.info("Starting Job: %d " % job.id)
28+
29+ new_job.start()
30
31 def startService(self):
32 self._check_job_call.start(20)

Subscribers

People subscribed via source and target branches