When external commands, issued by MAAS, fail, the log output does not give any information about the failure.

Bug #1184589 reported by Raphaël Badin
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Julian Edwards

Bug Description

MAAS uses external commands to handle a few things: powering up machines, DHCP, DNS, etc. When these fail, the celery logs are not giving any detail about the failure and this is seriously hindering our ability to debug stuff. This is an example of a failing rndc command http://paste.ubuntu.com/5706575/.

Related branches

Revision history for this message
Raphaël Badin (rvb) wrote :

When a command fails, the logs should contain:
- the exact command that was run (because, very often, one will be able to re-launch that same command plus debugging enabled to understand what the problem is).
- the error returned by the command

Note that this was done for the power commands (but only for them) in src/provisioningserver/power/poweraction.py:PowerActionFail.

I think this should be generalized and applied to every single call we make to subprocess.check_output and the likes.

tags: added: tech-debt
Changed in maas:
milestone: none → 13.10
Raphaël Badin (rvb)
tags: added: trivial
Raphaël Badin (rvb)
tags: added: log
Graham Binns (gmb)
Changed in maas:
assignee: nobody → Graham Binns (gmb)
Graham Binns (gmb)
Changed in maas:
status: Triaged → In Progress
Changed in maas:
milestone: 13.10 → 14.04
Raphaël Badin (rvb)
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Graham Binns (gmb) wrote :

Setting this back to "Triaged::High" since it's not actually been fixed for the celery case, which was kind of the point of it.

Changed in maas:
status: Fix Committed → Triaged
assignee: Graham Binns (gmb) → nobody
Revision history for this message
Raphaël Badin (rvb) wrote :

I can confirm this bug is still alive and kicking: I manually broke the restart_dhcp_server task by changing the name of the restarted service (in /usr/lib/python2.7/dist-packages/provisioningserver/tasks.py) and this is what gets written to /var/log/maas/celery.log when the task runs (and fails) http://paste.ubuntu.com/6919457/

Raphaël Badin (rvb)
tags: removed: trivial
Tycho Andersen (tycho-s)
tags: added: cloud-installer
Revision history for this message
Tycho Andersen (tycho-s) wrote :

I've been doing this:

CloudFive:/usr/lib/python2.7/dist-packages/provisioningserver/power$ diff -U5 poweraction.py.old poweraction.py
--- poweraction.py.old 2014-02-26 17:33:18.318719677 +0200
+++ poweraction.py 2014-02-26 17:26:50.478738789 +0200
@@ -116,11 +116,14 @@
         # to the individual scripts.
         try:
             output = subprocess.check_output(
                 commands, shell=True, stderr=subprocess.STDOUT, close_fds=True)
         except subprocess.CalledProcessError as e:
- raise PowerActionFail(self, e)
+ paf = PowerActionFail(self, e)
+ with open('/tmp/paf', 'w') as out:
+ out.write(str(paf))
+ raise paf
         # This output is only examined in tests, execute just ignores it
         return output

     def execute(self, **kwargs):
         """Execute the template.

followed by a:

 sudo service maas-cluster-celery restart

which will at least let you see the script and its output.

Revision history for this message
Tycho Andersen (tycho-s) wrote :

Of course launchpad seems to eat diffs, but you get the idea :-)

Changed in maas:
assignee: nobody → Julian Edwards (julian-edwards)
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Yay! The rndc failure happened again, but this time I'm seeing the output we need—

«warning: /tmp/tmp7Xz17g/named.conf.rndc.maas:7: couldn't add command channel 127.0.0.1#56314: address in use»

Filing separate bug about that.

Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.