Can't start a node with its associated cluster interface configured as Unmanaged

Bug #1382108 reported by Jason Hobbs
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Julian Edwards
1.7
Fix Released
Critical
Graham Binns

Bug Description

I have my cluster interface configured as "Unmanaged" - no DNS or DHCP should be configured. When I go to start a node via the UI, I get an error:

"Multiple failures encountered. See /var/log/maas/maas-django.log on the region server for more information."

The cluster daemon also crashes and in the UI I see:
"One or more clusters are currently disconnected. Visit the clusters page for more information."

In maas.log I see:
Oct 16 23:42:11 trusty-maas7 maas.dhcp: [ERROR] Could not create host map for 74:d4:35:89:ba:b5 with address 192.168.10.102: Command `omshell` returned non-zero exit status 0:#012> > > > not connected.#012> no open object.#012> no open object.#012> no open object.#012> no open object.#012> not connected.#012>

In pserv.log:
2014-10-16 23:45:18+0800 [-] Unhandled Error
 Traceback (most recent call last):
   File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1201, in mainLoop
     self.runUntilCurrent()
   File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 797, in runUntilCurrent
     f(*a, **kw)
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 423, in errback
     self._startRunCallbacks(fail)
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 490, in _startRunCallbacks
     self._runCallbacks()
 --- <exception caught here> ---
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 1020, in checkKnownErrors
     desc = str(error.value)
 exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\u2192' in position 18: ordinal not in range(128)

This is with 1.7.0~beta6+bzr3260-0ubuntu1~trusty1 - this used to work.

Related branches

summary: - Can't start a node with its cluster configured as Unmanaged
+ Can't start a node with its associated cluster interface configured as
+ Unmanaged
Revision history for this message
Gavin Panella (allenap) wrote :

This comes from p.rpc.dhcp.create_host_maps: the CannotCreateHostMap
exception contains the Unicode character for an arrow. It's more than a
bit rubbish that Twisted's AMP implementation doesn't cope with
non-ASCII errors, but we can work around it at least.

Fwiw:

    Command `omshell` returned non-zero exit status 0:#012> > > > not
    connected.#012> no open object.#012> no open object.#012> no open
    object.#012> no open object.#012> not connected.#012>

is an example of the error message that comes out of __str__() and
__unicode__() on ExternalProcessError. The #012 bit is probably meant to
be octal for newline, but I'm not sure what's introducing it (syslog
maybe?).

tags: added: trivial
Changed in maas:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Why is DHCP/omshell being invoked when the cluster interface is unmanaged?

Revision history for this message
Gavin Panella (allenap) wrote :

Good point! That's the important bug.

I was fixated on the UnicodeEncodeError, which I've now filed separately as bug 1382237.

> The cluster daemon also crashes ...

It disconnects because Twisted's AMP implementation does that after passing an unrecognised exception over the wire. However, it should reconnect within 30 seconds at most.

Gavin Panella (allenap)
tags: added: dhcp
removed: trivial
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Making it critical because it's a crash.

Changed in maas:
milestone: none → 1.7.0
importance: High → Critical
Revision history for this message
Christian Reis (kiko) wrote :

Oh, this is interesting; look at the change that Newell did for bug 1376888:

   https://code.launchpad.net/~newell-jensen/maas/bug-1376888/+merge/237861

I wonder if we missed taking unmanaged interfaces into account there. Can you check?

Revision history for this message
Newell Jensen (newell-jensen) wrote :

Kiko,

My branch was only for deleting nodes. However the error here is very similar to the one that we were also getting for deleting nodes that were created on a managed interface and then deleted after the interface had been changed to unmanaged.

Changed in maas:
assignee: nobody → Julian Edwards (julian-edwards)
status: Triaged → In Progress
Changed in maas:
milestone: 1.7.0 → none
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Julian Edwards (julian-edwards) wrote :

I started backporting to 1.7 and was amazed to see the code divergence already. Graham, are you planning on backporting your start_nodes change to 1.7?

Revision history for this message
Christian Reis (kiko) wrote :

I'm pretty sure no, he isn't, because it would not be accepted through at this point.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

I honestly think it's worthwhile because it adds a set of very valuable cleanups, error catching and robustness. I were in charge, I'd approve it as a necessary change ...

Revision history for this message
Andres Rodriguez (andreserl) wrote :

I agree here. If the fixes above plus gmb's branch fix crashes that affect the integrity of the release, these should be backported to 1.7.

Raphael, any comments as one of the gate keepers?

Revision history for this message
Graham Binns (gmb) wrote : Re: [Bug 1382108] Re: Can't start a node with its associated cluster interface configured as Unmanaged

On 21 October 2014 03:04, Julian Edwards <email address hidden> wrote:
> I started backporting to 1.7 and was amazed to see the code divergence
> already. Graham, are you planning on backporting your start_nodes change
> to 1.7?

Argh, we missed talking about this at the meeting yesterday. I wasn't
going to push hard for it to be backported *but* landing it would also
fix some of the problems seen in bug 1375942 (e.g. old static IPs
being left allocated after a failed deployment). So yes, I say let's
do it.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Wednesday 22 Oct 2014 07:31:23 you wrote:
> On 21 October 2014 03:04, Julian Edwards <email address hidden> wrote:
> > I started backporting to 1.7 and was amazed to see the code divergence
> > already. Graham, are you planning on backporting your start_nodes change
> > to 1.7?
>
> Argh, we missed talking about this at the meeting yesterday. I wasn't
> going to push hard for it to be backported *but* landing it would also
> fix some of the problems seen in bug 1375942 (e.g. old static IPs
> being left allocated after a failed deployment). So yes, I say let's
> do it.

There's some chat on the bug, basically everyone is for backporting apart from
Kiko. We need to pressga^Wconvince him.

Revision history for this message
Christian Reis (kiko) wrote :

The patch in this bug is totally acceptable, fwiw, if it does fix the problem.

Revision history for this message
Graham Binns (gmb) wrote :

We've agreed to backport just this patch (pulling it out of Node.start() in trunk and putting it in NodeManager.start_nodes() in 1.7. I'm working on that now.

Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello Jason, or anyone else affected,

Accepted maas into utopic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/maas/1.7.5+bzr3369-0ubuntu1~14.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Revision history for this message
Andres Rodriguez (andreserl) wrote :

This issue has been verified to work both on upgrade and fresh install, and has been QA'd. Marking verification-done.

tags: added: verification-done
removed: verification-needed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.