[2.3] Ephemeral boot environment does not renew DHCP leases

Bug #1732522 reported by Andres Rodriguez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Mike Pontillo
cloud-init
Invalid
Undecided
Unassigned
cloud-initramfs-tools (Ubuntu)
Confirmed
Medium
Unassigned
systemd (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

I started commissioning+hardware testing on a machine, and while the machine was testing (for 2hrs+) i noticed that the IP address had disappeared. The machine has the MAC of 00:25:90:4c:e7:9e and IP of 192.168.0.211 from the dynamic range.

Checking the MAAS server, I noticed that the IP/MAC was in the ARP table:

andreserl@maas:/var/lib/maas/dhcp$ arp -a | grep 211
192-168-9-211.maas (192.168.9.211) at 00:25:90:4c:e7:9e [ether] on bond-lan

Checking the leases file has the following: http://pastebin.ubuntu.com/25969442/

Then I checked a couple areas of MAAS:
 - Device discovery, the machine wasn't there.
 - Subnet details page, the machine wasn't there (e.g. as observed)

Related branches

Revision history for this message
Andres Rodriguez (andreserl) wrote :
Changed in maas:
milestone: none → 2.3.0rc2
milestone: 2.3.0rc2 → 2.3.0
Revision history for this message
Andres Rodriguez (andreserl) wrote :

SO this is what I noticed, after about 10 minutes of running hardware testing, the IP disappeared of the PXE interface. See screenshots attached below.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Argh, the IP disappeared, but the machine still holds such IP in the *PXE* interface.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Ok, I'm being unclear again. This is the behavior:

1. MAAS PXE boots to commissioning+testing, gets the IP address on the PXE interface.
2. During commissioning, the second interface, which is connected to the same VLAN, gets another IP as part of the network discovery process.
3. The machine then transitions to testing. After about 10 minutes of testing, I noticed that the machine no longers shows the IP for the PXE interface.
4. After a few more minutes, the same happens for the second interface.
5. That said, machine still holds the IP on the PXE interface and can connect to it just fine.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Ok, I'm being unclear again. This is the behavior:

1. MAAS PXE boots to commissioning+testing, gets the IP address on the PXE interface.
2. During commissioning, the second interface, which is connected to the same VLAN, gets another IP as part of the network discovery process.
3. The machine then transitions to testing. After about 10 minutes of testing, I noticed that the machine no longers shows the IP for the PXE interface.
4. After a few more minutes, the same happens for the second interface.
5. That said, machine still holds the IP on the PXE interface and can connect to it just fine.

Revision history for this message
Andres Rodriguez (andreserl) wrote :
Changed in maas:
importance: Undecided → High
status: New → Confirmed
Changed in maas:
assignee: nobody → Mike Pontillo (mpontillo)
Changed in maas:
status: Confirmed → Triaged
Revision history for this message
Mike Pontillo (mpontillo) wrote :

After triaging this issue, I think we need to discuss the proper fix with the cloud-init team.

When cloud-init configures the interfaces[1], I see that "bringup=True" is set. I'm guessing this means cloud-init should have run the equivalent of "ifup" on each interface after it has been configured. However, looking at "ifquery --state", this is not happening; only the loopback interface is configured.

# ifquery --state
lo=lo

I also notice that no DHCP client is running. I guess this means that whatever IP address is currently assigned to eno1 (172.16.100.153) came from a different run of a DHCP client; possibly from the PXE process (but I don't see it passed up through /proc/cmdline).

Since the DHCP client is not (no longer?) running, that means the host is holding onto an IP address from the DHCP pool, has not released it, and is no longer renewing it. Therefore it eventually expires and is no longer shown in MAAS.

If I bring the interface up manually, all becomes well, and I see the IP address as-expected in MAAS (though it handed out a different IP address, .160, so I now am holding the expired IP, and a new legitimate IP address from DHCP):

# ifup eno1
Internet Systems Consortium DHCP Client 4.3.3
Copyright 2004-2015 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/eno1/ec:a8:6b:fd:aa:24
Sending on LPF/eno1/ec:a8:6b:fd:aa:24
Sending on Socket/fallback
DHCPDISCOVER on eno1 to 255.255.255.255 port 67 interval 3 (xid=0xc83a8b40)
DHCPDISCOVER on eno1 to 255.255.255.255 port 67 interval 6 (xid=0xc83a8b40)
DHCPREQUEST of 172.16.100.160 on eno1 to 255.255.255.255 port 67 (xid=0x408b3ac8)
DHCPOFFER of 172.16.100.160 from 172.16.100.2
DHCPACK of 172.16.100.160 from 172.16.100.2
Restarting ntp (via systemctl): ntp.service.
bound to 172.16.100.160 -- renewal in 269 seconds.

# ifquery --state
eno1=eno1
lo=lo

g# ip addr show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether ec:a8:6b:fd:aa:24 brd ff:ff:ff:ff:ff:ff
    inet 172.16.100.153/24 brd 172.16.100.255 scope global eno1
       valid_lft forever preferred_lft forever
    inet 172.16.100.160/24 brd 172.16.100.255 scope global secondary eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::eea8:6bff:fefd:aa24/64 scope link
       valid_lft forever preferred_lft forever

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Forgot to include the relevant portion of the cloud-init logs in my comment above.

[1]: http://paste.ubuntu.com/25984019/

summary: - [2.3, UI] IP address not listed while hardware testing
+ [2.3] Ephemeral boot environment does not renew DHCP leases
Revision history for this message
Mike Pontillo (mpontillo) wrote :

I'm landing some debug logging in MAAS to help identify if this issue is occurring; regiond.log will contain lines like "Lease update: ..." when MAAS receives notifications about lease changes.

Note that this will NOT be a fix for this issue. But I don't think there's anything more I can do for this in MAAS itself.

Changed in maas:
status: Triaged → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

This issue is discussed in a document at
 https://docs.google.com/document/d/14xH2Q3VH_7ArXzRPhqogfACeOI0rmEinm_Q98imNWlc/

Its all about "transition" of networking information from the initramfs environment which is configured by the kernel command line over to the "real root".

The Ubuntu foundations team is expecting to have dhcp transition correctly in 18.04.

Changed in maas:
status: Fix Committed → Fix Released
Scott Moser (smoser)
Changed in cloud-initramfs-tools (Ubuntu):
status: New → Confirmed
Changed in systemd (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Changed in cloud-initramfs-tools (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Dan Watkins (oddbloke) wrote :

Is this still an issue that needs cloud-init work?

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Dan Streetman (ddstreet) wrote :

please reopen if this is still an issue

Changed in systemd (Ubuntu):
status: Confirmed → Invalid
Chad Smith (chad.smith)
Changed in cloud-init:
status: Incomplete → Invalid
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.