maas-region-celeryd connects to the wrong queue.

Bug #1067929 reported by Raphaël Badin
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
maas (Ubuntu)
Fix Released
Critical
Andres Rodriguez
Precise
Fix Released
Undecided
Unassigned

Bug Description

Symptoms
========

DNS configuration is broken because of this problem.

In /var/log/maas/celery-region.log search for

[2012-10-23 18:02:19,037: INFO/Beat] Scheduler: Sending due task provisioningserver.tasks.report_boot_images
[2012-10-23 18:02:19,052: INFO/MainProcess] Got task from broker: provisioningserver.tasks.report_boot_images
[2012-10-23 18:02:19,087: INFO/MainProcess] Task provisioningserver.tasks.report_boot_images

Root Cause
==========

maas-region-celeryd connects to 2 queues: ' celery' and 'master'. The problem is obviously the space in front of 'celery'

start_celery() should use something like that instead:
    command = [
        'celeryd',
        '--logfile=%s' % args.logfile,
        '--schedule=%s' % args.schedule,
        '--loglevel=INFO',
        '--beat',
        '--queues=celery,master',
        ]

== TEST ==
1. Install maas
2. run maas-import-pxe-files.
3. in /var/log/maas/celery-region.log search:

[2012-10-23 18:02:19,037: INFO/Beat] Scheduler: Sending due task provisioningserver.tasks.report_boot_images
[2012-10-23 18:02:19,052: INFO/MainProcess] Got task from broker: provisioningserver.tasks.report_boot_images[045a83ac-64a2-448b-ada3-f82131b56561]
[2012-10-23 18:02:19,087: INFO/MainProcess] Task provisioningserver.tasks.report_boot_images[045a83ac-64a2-448b-ada3-f82131b56561] succeeded in 0.00621294975281s: None

Related branches

Raphaël Badin (rvb)
affects: maas → maas (Ubuntu)
Changed in maas (Ubuntu):
status: New → Incomplete
status: Incomplete → In Progress
importance: Undecided → Critical
assignee: nobody → Andres Rodriguez (andreserl)
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Andres says it was a missing "=". It used to say:

--queues celery,master

and he's fixing it to say:

--queues=celery,master

description: updated
Revision history for this message
Raphaël Badin (rvb) wrote :

> Andres says it was a missing "=". It used to say:
> --queues celery,master
> and he's fixing it to say:
> --queues=celery,master

More precisely, the script was using os.execv('celeryd',… , '-Q celery,master',…) and the space before 'celery' got "escaped". I suggested using the alternative syntax ("--queues=celery,master") to avoid any problems.

Changed in maas (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Raphaël Badin (rvb) wrote :

A workaround is to edit the maas-region-celeryd script (sudo vim /usr/sbin/maas-region-celeryd) and change: '-Q celery,master' into '--queues=celery,master'.

Then restart the service:
$ sudo service maas-region-celery restart

description: updated
Thiago Martins (martinx)
Changed in maas (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Julian Edwards (julian-edwards) wrote :

This is not fix-released yet, but I can't set it back.

Revision history for this message
Thiago Martins (martinx) wrote :

Sorry guys, status changed by accident.

Revision history for this message
Thiago Martins (martinx) wrote :

Guys, please, this is annoying... Any news?!

I tried this change: '-Q celery,master' into '--queues=celery,master' and it does not work... Even after 2 hours...

After, I re-installed MaaS from scratch and I edited that file during the MaaS Server installation, before GRUB installtion, by changing to TTY2 and editing it with: "nano /target/usr/sbin/maas-region-celeryd"

...and it works!! The "5 minutes message" disappear...

After that, I found some DNS problems... There is too much bugs on every single step I made with MaaS...

So, I re-installed it again from scratch, this time using two network interfaces for my MaaS, I edited the maas-region-celeryd again during the installation time, but, this time, I tried also this: "dpkg-reconfigure maas-region-controller" to change my MaaS IP and this problem (bug 1067929) appear again...

I'm sorry but, I'm sure that MaaS is not ready for production yet.

I'll join maas-devel mailing list to try to help... Because I liked this idea very much and I'll use it a lot within my company. So, it MUST work smoothly.

Cheers!
Thiago

Revision history for this message
Alex Wauck (awauck) wrote :

Yeah...that workaround doesn't seem to work.

Revision history for this message
Thiago Martins (martinx) wrote :

Installation from scratch, file changed to: "--queues=celery,master", message still here even after 2 hours.

MaaS Virtual Machine (KVM test) configuration:

---
2 CPU
1024M
20G of HD
2 ETH (eth0 not used / eth1 default)
---

maas-import-pxe-files executed two times during this test...

MaaS seems to be working, since I can create the nodes and ssh into then. 1 node up and running, DHCP and DNS okay and in "sync"...

I'll try it again from scratch, again, using only one ETH this time...

Tks!
Thiago

Revision history for this message
Thiago Martins (martinx) wrote :

Guys,

 I created a simple procedure to install MaaS (Ubuntu 12.10) and not hit this problem... As follows:

 1- Create a Virtual Machine with;

 * 1024G of RAM
 * 1 CPU
 * 20G HD
 * Ubuntu 12.10 64 bits
 * Only ONE Ethernet (most important)

 2- Install Ubuntu MaaS Installation (via server CD);

 3- During the GRUB installation, to not hit <ENTER> immediately;

 4- Change to Linux second console TTY2 (alt + F2);

 5- Run:

 * nano /target/usr/sbin/maas-region-celeryd

 Change '-Q celery,master' into '--queues=celery,master'

 * Save and exit nano;

 6- Go back to TTY1 (alt + F1);

 7- Finish the Ubuntu installation.

 Now, follow the steps from here:

 http://evilnick.org/MAAS/install.html

 NOTE: Do not install maas-dns package, it is very unstable (BUG 1069535 and BUG 1069570).

 The "5 minutes" message disappear again.

 When I try this same procedure using 2 eth, the 5 minutes do not disappear. So, using 1 eth is a requirement. And I have no idea why.

Cheers!
Thiago

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Thiago,

Maybe, you also need to dpkg-reconfigure maas-cluster-controller (besides sudo dpkg-reconfigure maas-region-controller) and set the correct address to communicate to.

Cheers.

Changed in maas (Ubuntu):
status: Fix Released → Fix Committed
Revision history for this message
Raphaël Badin (rvb) wrote :

Hi Thiago,

There is a background task that should be responsible for cleaning up the message if no new images are detected, can you please have a look in /var/log/maas/celery-region.log and confirm that you're seeing the task being processed all right?

If it is the case, the log file will contain statements like:
[2012-10-22 10:24:52,309: INFO/Beat] Scheduler: Sending due task provisioningserver.tasks.report_boot_images
[2012-10-22 10:24:52,314: INFO/MainProcess] Got task from broker: provisioningserver.tasks.report_boot_images[f9b441f1-14a8-4a87-a94d-10b197361d20]
[2012-10-22 10:24:52,365: INFO/MainProcess] Task provisioningserver.tasks.report_boot_images[f9b441f1-14a8-4a87-a94d-10b197361d20] succeeded in 0.00126314163208s: None

If something is wrong, you'll see things like:
[2012-10-22 10:09:52,006: INFO/Beat] Scheduler: Sending due task provisioningserver.tasks.report_boot_images
[2012-10-22 10:14:52,111: INFO/Beat] Scheduler: Sending due task provisioningserver.tasks.report_boot_images
[2012-10-22 10:19:52,214: INFO/Beat] Scheduler: Sending due task provisioningserver.tasks.report_boot_images

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Note that the FIX is in the queue... we are waiting for it to be released!

Revision history for this message
StuartIanNaylor (stuartiannaylor) wrote :

I have the same.

[2012-10-22 22:23:47,911: INFO/Beat] child process calling self.run()
[2012-10-22 22:23:47,911: INFO/Beat] Celerybeat: Starting...
[2012-10-22 22:23:47,959: INFO/Beat] Scheduler: Sending due task provisioningserver.tasks.report_boot_images
[2012-10-22 22:28:48,067: INFO/Beat] Scheduler: Sending due task provisioningserver.tasks.report_boot_images
[2012-10-22 22:33:48,176: INFO/Beat] Scheduler: Sending due task provisioningserver.tasks.report_boot_images
[2012-10-22 22:34:07,741: WARNING/MainProcess] celeryd: Warm shutdown (MainProcess)
[2012-10-22 22:34:07,741: INFO/MainProcess] Celerybeat: Shutting down...
[2012-10-22 22:34:37,400: WARNING/MainProcess] -------------- celery@maas1 v2.5.3

dpkg-reconfigure maas-cluster-controller and dpkg-reconfigure maas-region-controller and changed '-Q celery,master' into '--queues=celery,master'. Not whilst installing though.

Revision history for this message
Dave Walker (davewalker) wrote : Please test proposed package

Hello Raphaël, or anyone else affected,

Accepted into quantal-proposed. The package will build now and be available in a few hours in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
description: updated
tags: added: verification-done
removed: verification-needed
tags: added: verification-needed
removed: verification-done
Revision history for this message
dann frazier (dannf) wrote :

I have a quantal/maas system that was displaying this message. I upgraded to the version in -proposed (0.1+bzr1269+dfsg-0ubuntu1) and waited > 5 minutes. Unfortunately, the issue persists.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

I investigated this and consulted with bigjools, and rvba and this fix is not related to the meessage of missing images.

description: updated
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package maas - 0.1+bzr1269+dfsg-0ubuntu1

---------------
maas (0.1+bzr1269+dfsg-0ubuntu1) quantal-proposed; urgency=low

  * New upstream bugfix release
    - Fixes commissioning failing to set memory attribute. (LP: #1064638)
    - Fixes node listing by adding pagination (LP: #1064672)
    - Changes default bind rndc key which breaks initscripts (LP: #1066938)
    - Fixes invalid DNS config once node is enlisted (LP: #1066958)
    - Reference documentation link to correct URL (LP: #1067261)

  [ Andres Rodriguez ]
  * debian/rules: Change upstream branch.

  [ Gavin Panella ]
  * debian/maas-dns.postinst: Remove MAAS-related include lines from named's
    config before adding a new one (LP: #1066929)

  [ Raphael Badin ]
  * debian/extras/maas-region-celeryd: Remove whitespace that affects DNS
    rabbitmq queue. (LP: #1067929)
 -- Andres Rodriguez <email address hidden> Tue, 16 Oct 2012 10:31:37 -0400

Changed in maas (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Michael Casadevall (mcasadevall) wrote :

The uploading fix appears to be incorrect, as I installed the proposed update and the issue was not resolved. Marking verification-failed.

tags: added: verification-failed
removed: verification-done
Revision history for this message
Michael Casadevall (mcasadevall) wrote :

(I wasn't specific, the SRU upload does indeed correct the issues in the queues line, but celery does not properly connect which is the point of this SRU.)

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Marking this verification done as upstream (the bug reporter) thought that fixing this would also fix the UI message being displayed, however, that's related to bug #1068843. I've updated the bug description to accordingly reflect the real issue. Doing so we can simply mark this bug as verification-done

description: updated
tags: added: verification-done
removed: verification-failed
Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

I can say that for me does not disapear.

And the PXE clients boots to a provisioning image and then shuts down automatically.

Revision history for this message
Raphaël Badin (rvb) wrote :

Ok, let me clarify the situation here: this bug needed to be fixed in order for the message to disappear and the fix has landed a few days ago. But there is another bug (bug 1070318) which *also* prevents the message from being properly removed.

Bug 1068843 is currently being worked on and it should also fix bug 1070318 as a side effect. So the whole 'the warning does not disappear' should be fix when the fix for bug 1068843 will be landed.

Revision history for this message
Dave Walker (davewalker) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Dave Walker (davewalker)
Changed in maas (Ubuntu Precise):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.