networking comes up before hostname is set

Bug #1739516 reported by Michael Hudson-Doyle
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
cloud-init
Expired
Medium
Unassigned

Bug Description

When boot with libvirt a disk image that has been installed with subiquity which has the workaround for bug 1737630 applied, i.e. networkd starts automatically, I cannot ping the VM by hostname from the host.

I think this is because the networking has come up before the hostname is set, so the hostname is not sent along with the DHCP request to libvirt's dnsmasq and so that dnsmasq cannot answer lookups for the hostname. If I run "netplan apply" on the vm, enough things are apparently restarted that DHCP happens again and I can ping the vm by hostname from the host.

I'm not completely sure I have diagnosed this correctly and certainly don't know how to fix it.

Scott Moser (smoser)
Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Scott Moser (smoser) wrote :

This is true that hostname is not set before networking comes up.
I would like to fix this, but there are a couple things to consider
a.) network datasources
   currently there are some datasources that run only after networking comes up. As it is right now it is "too late"to read the hostname from the network metadata service and then update the system hostname before dhcp would run.

b.) systemd-networkd's dhcp client seems to actually be listening for hostname getting set and updating its lease information on that event. we saw this in azure when we were removing the old 'bounce the network' code that served the purpose of publishing the

c.) relying on the guest to populate dns information via dhcp is kind of garbage anyway. as a "cloud" solution anyway.

d.) cloud-init allows setting hostname in user-data (in addition to meta-data).
the user-data provided by the user could be in a '#include' url, which might not be available until all networking is up. Thus, even if we moved network datasources to pull their information 'pre-network' (the way that the digital ocean md service does) we can't consume all the user-data at that point.

'd' might be a reasonable limitation. the other things are acheivable.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

For a and d, sure if finding out what the hostname needs to be involves having the network up, there's nothing that can be done to avoid this.

For c, yes, this is kind of garbage. Utah depends on this though :/ Maybe I can get it to edit the libvirt network config to map the MAC address to a particular IP address instead, that would definitely be less fragile...

And finally for b, it would make sense that a hostname change triggers a refresh of the DHCP lease but I see nothing in the code to do this and my experiments don't seem to indicate it happening either.

Revision history for this message
Birger Schmidt (bs-ubo) wrote :

I just stumbled over this bug as well.

Reading all the cases (a,b,c...) I do not see the downside in just setting the hostname in the init-local stage as well.

This can be done as an additional step only if the info is already there (i.e. mounted via iso). To check that would not take long and neither would setting the hostname take long.

Please consider adding this functionality and in case you decide against it please tell us what you think the downside of this would be.

As a side note: A similar request can be solved at the same time. See here https://bugs.launchpad.net/cloud-init/+bug/1643688.

Revision history for this message
Jesse R (scronkfinkle) wrote :

I am also running into this issue. We run DNSMasq and build out our cloud-init images with terraform. We're getting some pretty nasty networking issues because when we roll out any new batches of machines, they all request an IP with the hostname "Ubuntu", and then set their hostname afterwards.

Noticing the age of this ticket, has a better workaround for this kind of behavior been implemented that I missed? It's a pretty big blocker for us, and it seems reasonable to just be able to set the hostname in the local stage

Revision history for this message
Chad Smith (chad.smith) wrote :

@Jesse thanks for the bump and notes on this bug, since the origin of this bug we had added a related feature which allows init-lovel based datasources to set the hostname before network is brought online[1]. From my recollection of the feature, it requires that the datasource meta-data.local-hostname[2] (not user-data.hostname) to provide "local-hostname" config.

If you get a chance would you be able to:
 1. provide the steps used in terraform to reproduce this issue
 2. attach the tar.gz from cloud-init `sudo collect-logs -u`. Note that this collect-logs will include user-data, so please double check to make sure you don't have sensitive information (passwords/credentials) provided from the user-data/meta-data provided during launch.

Thank you, the attached logs will help confirm suspicions on why this feature isn't quite enough for terraform type deployments.

References:

[1] https://github.com/canonical/cloud-init/commit/133ad2cb327ad17b7b81319fac8f9f14577c04df
[2] https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/__init__.py#L754

Revision history for this message
Jesse R (scronkfinkle) wrote :
Download full text (4.4 KiB)

@Chad thanks for writing back! Attached is the collect-logs output.

For terraform, we're using a provider to hook into our proxmox infrastructure. Under the hood, proxmox is calling QEMU to manage the virtual machines. I installed `qemu-guest-agent` to the cloud-init image using `virt-customize` from the `libguestfs-tools` package.

On first boot, the hostname is successfully set, but it doesn't appear to be fast enough before networking is brought up.

To build an identical image to the one i'm using:
Download the cloudinit image
```
wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img
```

use `virt-customize` to install `qemu-guest-agent`
```
sudo virt-customize -a https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img --install qemu-guest-agent
```

From there, we upload it to proxmox and have it clone the image for VM's. I would imagine if one used regular Qemu or another provider with terraform the behavior would be the same.

Here's the output of `terraform apply`
```
module.greeks["attis"].proxmox_vm_qemu.basic_admin: Refreshing state... [id=aramis5/qemu/111]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.greeks["attis"].proxmox_vm_qemu.basic_admin will be created
  + resource "proxmox_vm_qemu" "basic_admin" {
      + additional_wait = 0
      + agent = 1
      + automatic_reboot = true
      + balloon = 0
      + bios = "seabios"
      + boot = "c"
      + bootdisk = "scsi0"
      + ciuser = "awx"
      + clone = "ubuntu-2004-cloudinit-terraform"
      + clone_wait = 0
      + cores = 4
      + cpu = "host"
      + default_ipv4_address = (known after apply)
      + define_connection_info = true
      + force_create = false
      + full_clone = true
      + guest_agent_ready_timeout = 100
      + hotplug = "network,disk,usb"
      + id = (known after apply)
      + ipconfig0 = "ip=dhcp"
      + kvm = true
      + memory = 8192
      + name = "attis"
      + nameserver = (known after apply)
      + numa = false
      + onboot = false
      + oncreate = true
      + os_type = "cloud-init"
      + preprovision = true
      + reboot_required = (known after apply)
      + scsihw = "virtio-scsi-pci"
      + searchdomain = (known after apply)
      + sockets = 1
      + ssh_host = (known after apply)
      + ssh_port = (known after apply)
      + sshkeys = "<trimmed>"
      + tablet = true
      + target_node = "arami...

Read more...

Revision history for this message
Jesse R (scronkfinkle) wrote :

I wanted to give an update to this with a fix for anyone else that runs into my particular issue. The first was that using `virt-customize` install `qemu-guest-agent` was setting `/etc/machine-id`. This caused dnsmasq to assign the same CLID to each VM. I assume that means it thought all the VM's were the same machine, requesting an IP on different interfaces. The way to fix that was to truncate the file after installation with
```
sudo virt-customize -a https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img --truncate /etc/machine-id
```

With that sorted out, I was also able to use nocloud to set the hostname properly on boot. I used the method of setting the SMBIOS serial. In terraform I was able to specify this as QEMU args like so
```
args = "-smbios type=1,serial=ds=nocloud-net;h=${var.name}"
```
where `var.name` was the hostname.

Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.