Commissioning now requires an IP address to be available for each network interface on a network MAAS manages

Bug #1383384 reported by Jason Hobbs
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Unassigned
1.7
Fix Released
Critical
Jason Hobbs

Bug Description

Merges for https://bugs.launchpad.net/maas/+bug/1375360 changed commissioning to run 'dhclient' on all interfaces off the node it commissions. This means any interfaces that are on the network MAAS is managing will get leases created for them with IPs assigned out of MAAS's dynamic pool.

This means that where systems previously only required 1 IP address to commission, they now can require two or more. This breaks existing deployments with IP address ranges already allocated to match the previous 1 IP per node requirement, causing their dynamic DHCP pools to become exhausted.

This bug was reported by an end user and I verified it on my own system.

This is with 1.7.0~beta7+bzr3266-0ubuntu1~trusty2

Related branches

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I think the right fix for this is to not take out a full lease to do discovery on the nodes. DHCP does this establish. a lease

Client: Request
Server: Response <details>
Client: Reuest <details>

To discover, we only need this, which won't establish a lease:
Client: Request
Server: Response <details>

Revision history for this message
Christian Reis (kiko) wrote :

There is a tool called dhcp-probe which we incidentally have in Ubuntu which we could use for that task.

HOWEVER, I really do think that sites running small dynamic DHCP pools are broken. Could we release note this by saying that the dynamic DHCP pool should be 3x the number of physical nodes one has?

Revision history for this message
Blake Rouse (blake-rouse) wrote :

According to the DHCP specification after the very first request the DHCP server has already reserved an address to the client. The client can either accept from that DHCP server or another DHCP server. If it is not accepted to that DHCP server then it is released, but a response still has to be sent, and the other DHCP server notices it accepted another servers address, so it releases it.

I think the best thing to do, is not get to complicated and just do a "dhclient -r" after the interface has an ip address, so it is released.

The only way to get dhcp information without making a lease, is just to listen passively and hope a DHCP response comes in using something like dhcpdump. Which I believe is not an option, as we need to know what network the interface belongs to.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Which DHCP specification are you seeing that in?

from the rfc: Each server may respond with a DHCPOFFER message that includes an available network address in the 'yiaddr' field (and other configuration parameters in DHCP options). Servers need not reserve the offered network address, although the protocol will work more efficiently if the server avoids allocating the offered network address to another client.

https://www.ietf.org/rfc/rfc2131.txt

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

The RFC makes it sound like it's up to the DHCP server - we would need to test isc-dhcpd to see how it behaves.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I tested how isc-dhcpd behaves by running two dhcpd servers on the same network. I watched the leases file for both and took a packet capture.

Here's the packets seen in a DHCP transaction on this setup:
client: DHCPDISCOVER
server 1: DHCPOFFER <details 1>
server 2: DHCPOFFER <details 2>
client: DHCPREQUEST <details 1>
server 1: DHCPACK <details 1>

isc-dhcpd doesn't write a new lease unless the client responds to the DHCPOFFER with a DHCPREQUEST. So, server 1 writes a new lease in this case but not server 2.

Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1383384] Re: Commissioning now requires an IP address to be available for each network interface on a network MAAS manages

On Monday 20 Oct 2014 18:03:02 you wrote:
> There is a tool called dhcp-probe which we incidentally have in Ubuntu
> which we could use for that task.
>
> HOWEVER, I really do think that sites running small dynamic DHCP pools
> are broken. Could we release note this by saying that the dynamic DHCP
> pool should be 3x the number of physical nodes one has?

The correct number is the number of NICs in the nodes, factored by how many
you intend to commission/enlist at the same time. So If you want to
commission half of them at once, it's numnics x 0.5, ish.

I agree that having small pools is broken, it's very easy to increase in
private networks.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

If you remember, the requirement of a DHCP request on each NIC is so that MAAS can determine on which cluster interface each NIC resides.

For this to work, the lease has to be present and *valid* at the point that the lease scanner runs, which is once a minute.

Provided that we ensure that the dhclient request is at the start of the commissioning run, and the run takes at least a minute, we can release the provided IP(s) at the end of the run.

However if people want to commission all the nodes at the same time, there is no avoiding increasing the size of the pool. So I don't think this is a bug, it's a requirement of a feature.

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

On 10/20/2014 07:45 PM, Julian Edwards wrote:
> If you remember, the requirement of a DHCP request on each NIC is so
> that MAAS can determine on which cluster interface each NIC resides.
>
> For this to work, the lease has to be present and *valid* at the point
> that the lease scanner runs, which is once a minute.

No, it doesn't have to be that way, that's just how we do things today.

Instead, we could capture the DHCPOFFER on the node and either include
it in the commissioning script output to be processed in a hook on the
server side, and we could get the same information there as would be
available in the DHCP leases file, without ever taking up a lease.

> Provided that we ensure that the dhclient request is at the start of the
> commissioning run, and the run takes at least a minute, we can release
> the provided IP(s) at the end of the run.
>
> However if people want to commission all the nodes at the same time,
> there is no avoiding increasing the size of the pool. So I don't think
> this is a bug, it's a requirement of a feature.

It's a bug because it breaks existing deployments without giving people
a work around, other than reconfiguring their network, which may not be
easy or possible in some cases.

At the very least we need to provide a way for people to work around
this. For now, I'm working on a branch to provide an option to disable
this feature.

Jason

Changed in maas:
status: Incomplete → In Progress
assignee: nobody → Jason Hobbs (jason-hobbs)
Revision history for this message
Blake Rouse (blake-rouse) wrote :

I think there might be a bug here.

We need to verify that commissioning does release the allocated ip address from the DHCP server.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Tuesday 21 Oct 2014 00:57:23 you wrote:
> On 10/20/2014 07:45 PM, Julian Edwards wrote:
> > If you remember, the requirement of a DHCP request on each NIC is so
> > that MAAS can determine on which cluster interface each NIC resides.
> >
> > For this to work, the lease has to be present and *valid* at the point
> > that the lease scanner runs, which is once a minute.
>
> No, it doesn't have to be that way, that's just how we do things today.

Right, that's what I was describing.

> Instead, we could capture the DHCPOFFER on the node and either include
> it in the commissioning script output to be processed in a hook on the
> server side, and we could get the same information there as would be
> available in the DHCP leases file, without ever taking up a lease.

This is perfectly valid too and sounds like a good solution.

Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Christian Reis (kiko) wrote :

The 1.7 option has landed and I'm marking that fixed. For trunk, let's explore building our own DHCP prober that doesn't leak addresses.

Changed in maas:
status: Fix Committed → Triaged
milestone: 1.7.0 → next
Christian Reis (kiko)
Changed in maas:
milestone: next → 1.7.1
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Ok we need to fix it by ensuring the lease is expired after commissioning is done.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Wednesday 12 Nov 2014 14:49:49 you wrote:
> Ok we need to fix it by ensuring the lease is expired after
> commissioning is done.

r3360 and r3354 in trunk.

Christian Reis (kiko)
Changed in maas:
milestone: 1.7.1 → next
Changed in maas:
assignee: Jason Hobbs (jason-hobbs) → nobody
Revision history for this message
Christian Reis (kiko) wrote :

This is the status of trunk:

  - The option to disable dhclient on all interfaces is NOT present.
  - We release leases at the end of commissioning and when static dhcp entries are removed (r3354 and r3360 on trunk)

This means that on trunk, we don't leak IP addresses at the end of commissioning like 1.7.0 did. However, if multiple machines are commissioned simultaneously, we require N*M (number of machines * number of interfaces on each machine, assuming identical machines) IP addresses during commissioning.

We could work around that issue by providing a custom DHCP "tester" client that doesn't do the full DHCP request handshake, which IIRC doesn't cause the ISC DHCP server to commit the allocation of the entry.

Raphael argues that if the "burst exhaustion" scenario is really an issue, then forward-porting the option is a better solution.

Changed in maas:
status: Triaged → Fix Released
milestone: next → none
Revision history for this message
Andres Rodriguez (andreserl) wrote :

This bug was filed in upstream MAAS and not in Ubuntu. It was fixed as part of 1.7. This was fixed and verified to be working in all Ubuntu releases. Ubuntu 1.7 is being SRU'd. Marking this as verification-done, as it seems to be blocking SRU.

tags: added: verification-done
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.