Commission is not available because of the current state of the node.

Bug #1807991 reported by Jason Hobbs
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Lee Trager

Bug Description

When using FCE, we wait for nodes to enlist and show up in NEW state, rename the machine, update some tags, then we issue a command to start commissioning.

That failed for one node in 2.5.0 rc2 with this error:

2018-12-11-08:55:49 root DEBUG maas root machines read
2018-12-11-08:55:53 root DEBUG maas root machine power-parameters nprfpk
2018-12-11-08:55:55 foundationcloudengine.layers.maaslayer DEBUG Commissioning geodude
2018-12-11-08:55:55 root DEBUG maas root machine update nprfpk hostname=geodude zone=default
2018-12-11-08:55:58 root DEBUG maas root tag read foundation-nodes
2018-12-11-08:56:00 root DEBUG maas root tag update-nodes foundation-nodes add=nprfpk
2018-12-11-08:56:02 root DEBUG maas root machine commission nprfpk
2018-12-11-08:56:05 root ERROR Command failed: machine commission nprfpk
2018-12-11-08:56:05 root ERROR {"__all__": ["Commission is not available because of the current state of the node."]}

At 2018-12-11-08:55:49 the machine was in NEW state. We didn't issue any commission commands until 2018-12-11-08:56:05.

According to the maas logs, it looks like after we read NEW state and before we started commissioning, the machine transitioned to COMMISSIONING state on its own without us telling it to:

10.244.40.32/var/log/maas/maas.log:2018-12-11T08:55:51+00:00 leafeon maas.node: [info] whole-mule: Status transition from NEW to COMMISSIONING

This is with 2.5.0~rc2-7433-gea48d302e-0ubuntu1~18.04.1

Related branches

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Andres Rodriguez (andreserl) wrote :

MAAS run's commissioning during enlistment. This means that the machine will:

1. Register itself to MAAS
2. After registering, the machine will start the commissioning process
3. Once they are commissioned, the machine will be set to 'New'.

To move from 'New' to ready you have two options:

1. Commission again
2. Test hardware.

Changed in maas:
status: New → Invalid
status: Invalid → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Ok, so how can we tell from the API that we can issue a "commission" command without getting an error back?

How can we distinguish from the state where it's "new" and about to automatically transition to "commissioning" from the state where it's in "new" and we can issue a "commission" command to transition it to ready?

Changed in maas:
status: Incomplete → New
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

it seems to me the machine should not be in 'new' state until enlistment is complete and I can safely issue a 'commission' command without worrying about hitting an error. Otherwise, you're breaking an existing API.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

It is actually not breaking API compatibility because it is intended that the machine is 'newly' registered (sets status to 'NEW'), the machine is automatically commissioned (sets status to 'Commissioning'), and once it is finished, it will be set back to 'New'. Hence, the machine will remain in new.

We can explore to see if it would be possible to always go from Commissioning -> New without going from New -> Commissioning -> New and fix that for 2.5.1.

Thoughts?

Changed in maas:
milestone: none → 2.5.1
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Lee Trager (ltrager)
Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1807991] Re: Commission is not available because of the current state of the node.
Download full text (3.7 KiB)

In previous releases, machines that auto-enlisted went to 'New' state and
stayed there. A client could see a newly auto-enlisted node in 'New' state
and safely issue a 'commission' command without worrying about getting an
error back. That means a client that correctly used the API in 2.4 may get
an error back in 2.5, using the exact same sequence of reads/writes. That's
what makes it an API breaking change.

I think 'Commissioning -> New' is much better than 'New -> Commissioning ->
New'. In the latter, 'New' is used for two different states and creates a
race with clients issuing the commissioning operation. When we see a node
in 'Commissioning' we know we can't do anything with it, except wait for it
to finish 'Commissioning'.

Even that is a bit weird though, because usually after 'Commissioning' we
go to 'Testing' or 'Ready'. This feels like a different state to me -
'Enlisting' maybe. However, I'm not sure how a client would react
differently to the two states.

On Tue, Dec 11, 2018 at 11:25 AM Andres Rodriguez <email address hidden>
wrote:

> It is actually not breaking API compatibility because it is intended
> that the machine is 'newly' registered (sets status to 'NEW'), the
> machine is automatically commissioned (sets status to 'Commissioning'),
> and once it is finished, it will be set back to 'New'. Hence, the
> machine will remain in new.
>
> We can explore to see if it would be possible to always go from
> Commissioning -> New without going from New -> Commissioning -> New and
> fix that for 2.5.1.
>
> Thoughts?
>
> ** Changed in: maas
> Milestone: None => 2.5.1
>
> ** Changed in: maas
> Status: New => Triaged
>
> ** Changed in: maas
> Importance: Undecided => Medium
>
> ** Changed in: maas
> Assignee: (unassigned) => Lee Trager (ltrager)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1807991
>
> Title:
> Commission is not available because of the current state of the node.
>
> Status in MAAS:
> Triaged
>
> Bug description:
> When using FCE, we wait for nodes to enlist and show up in NEW state,
> rename the machine, update some tags, then we issue a command to start
> commissioning.
>
> That failed for one node in 2.5.0 rc2 with this error:
>
> 2018-12-11-08:55:49 root DEBUG maas root machines read
> 2018-12-11-08:55:53 root DEBUG maas root machine power-parameters nprfpk
> 2018-12-11-08:55:55 foundationcloudengine.layers.maaslayer DEBUG
> Commissioning geodude
> 2018-12-11-08:55:55 root DEBUG maas root machine update nprfpk
> hostname=geodude zone=default
> 2018-12-11-08:55:58 root DEBUG maas root tag read foundation-nodes
> 2018-12-11-08:56:00 root DEBUG maas root tag update-nodes
> foundation-nodes add=nprfpk
> 2018-12-11-08:56:02 root DEBUG maas root machine commission nprfpk
> 2018-12-11-08:56:05 root ERROR Command failed: machine commission nprfpk
> 2018-12-11-08:56:05 root ERROR {"__all__": ["Commission is not available
> because of the current state of the node."]}
>
> At 2018-12-11-08:55:49 the machine was in NEW state. We didn't issue
> any commission commands until 2018-12-11-08:56:...

Read more...

Revision history for this message
Andres Rodriguez (andreserl) wrote :
Download full text (5.3 KiB)

I don’t think there should be a new state, because the reality is that the
machine has enlisted and registered itself (hence its set as “New”), and
the “auto-commissioning” happens, which is in fact the machine is
commissioning (hence set as commissioning), but since it is automatic, the
machine is set back to “New”.

That said, I think then that the best solution is to have a global option
that allows you to disable “auto-commission” right after enlistment.

On Tue, Dec 11, 2018 at 12:45 PM Jason Hobbs <email address hidden>
wrote:

> In previous releases, machines that auto-enlisted went to 'New' state and
> stayed there. A client could see a newly auto-enlisted node in 'New' state
> and safely issue a 'commission' command without worrying about getting an
> error back. That means a client that correctly used the API in 2.4 may get
> an error back in 2.5, using the exact same sequence of reads/writes. That's
> what makes it an API breaking change.
>
> I think 'Commissioning -> New' is much better than 'New -> Commissioning ->
> New'. In the latter, 'New' is used for two different states and creates a
> race with clients issuing the commissioning operation. When we see a node
> in 'Commissioning' we know we can't do anything with it, except wait for it
> to finish 'Commissioning'.
>
> Even that is a bit weird though, because usually after 'Commissioning' we
> go to 'Testing' or 'Ready'. This feels like a different state to me -
> 'Enlisting' maybe. However, I'm not sure how a client would react
> differently to the two states.
>
> On Tue, Dec 11, 2018 at 11:25 AM Andres Rodriguez <<email address hidden>
> >
> wrote:
>
> > It is actually not breaking API compatibility because it is intended
> > that the machine is 'newly' registered (sets status to 'NEW'), the
> > machine is automatically commissioned (sets status to 'Commissioning'),
> > and once it is finished, it will be set back to 'New'. Hence, the
> > machine will remain in new.
> >
> > We can explore to see if it would be possible to always go from
> > Commissioning -> New without going from New -> Commissioning -> New and
> > fix that for 2.5.1.
> >
> > Thoughts?
> >
> > ** Changed in: maas
> > Milestone: None => 2.5.1
> >
> > ** Changed in: maas
> > Status: New => Triaged
> >
> > ** Changed in: maas
> > Importance: Undecided => Medium
> >
> > ** Changed in: maas
> > Assignee: (unassigned) => Lee Trager (ltrager)
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1807991
> >
> > Title:
> > Commission is not available because of the current state of the node.
> >
> > Status in MAAS:
> > Triaged
> >
> > Bug description:
> > When using FCE, we wait for nodes to enlist and show up in NEW state,
> > rename the machine, update some tags, then we issue a command to start
> > commissioning.
> >
> > That failed for one node in 2.5.0 rc2 with this error:
> >
> > 2018-12-11-08:55:49 root DEBUG maas root machines read
> > 2018-12-11-08:55:53 root DEBUG maas root machine power-parameters
> nprfpk
> > 2018-12-11-08:55:55 foundationcloudengine.layers.maaslayer DEBUG
> > Co...

Read more...

Revision history for this message
Lee Trager (ltrager) wrote :

The related branch modifies the API to allow enlistment to request a new Machine to be created in COMMISSIONING. Now when machines are enlisted they will go COMMISSIONING -> NEW. The branch also adds a global configuration option, enlist_commissioning, which allows users to disable commissioning during enlistment restoring MAAS < 2.5 behavior.

Changed in maas:
status: Triaged → Fix Committed
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

sub'd to field high. We need this fixed in 2.5 ASAP as it's causing a high number of failures in CI. We can not use the new IPMI enlistment method because of bug 1707562. Fixing this would allow us to work around bug 1707562, or fixing bug 1707562 would allow us to use the new IPMI method to work around this bug, but them both being broken is a bad combination.

Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.