cloud-init sets wrong netmask causing broken network config on Oracle Cloud

Bug #1989686 reported by Terje Røsten
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Fix Released
High
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned

Bug Description

[ Impact ]

 * On Oracle cloud with 22.04 and 22.10 images may not contain `/run/net-ens3.conf` network definitions when launching imported custom cloud-images from Oracle object store. cloud-init 22.3.3 attempts to render static network configuration in these cases and rely solely on Oracle IMDS for network configuration as published at http://169.254.169.254/opc/v2/vnics.

The rendered network configuration is incomplete and lacks proper default routes and/or DNS configuration
resulting in improper network egress routes and rules and absent DNS settings resulting in hostname lookup errors.

Because IMDS data is incomplete to fully configure DNS, cloud-init 22.3.4 configures DHCP on the primary NIC based on the MAC address set in Oracle IMDS network confing and only defines static network config to setup secondary NICs and secondary routes.

[ Test Plan ]

 Kinetic only (where 22.3.3 was released)
 * download daily image for kinetic
 * put daily ubuntu cloudimage to oracle object store via oci cmdline
 * import dailyimage from via oracle cli from storage bucket
 * launch imported customimage as a Flex instance type
 * Validate that WARNING IMDS is shown implying writing net config from IMDS content instead of /run/net-ens3.conf
 * validate failure on 22.3.3 no default route set and invalid netplan config
 * setup network dhcp on primary interface manually

Bionic, Focal, Jammy and Kinetic
 * upgrade cloud-init 22.3.4
 * rm /etc/netplan/50-cloud-init.yaml
 * sudo cloud-init clean --logs --reboot
 * Validate network config, default routes and nslookup canonical.com

[ Where problems could occur ]

 * This behavior was a regression introduced only in kinetic and -proposed series 22.3.3 and returs to previous published behavior of cloud-init 22.2 which was dhcp on primary ethernet device so there should be no regression here beyond fixing 22.3.3 in kinetic and the -proposed streams.

[ Other Info ]

[ Original Description ]
Testing U22.10 Cloud image:

 https://cloud-images.ubuntu.com/kinetic/20220914/kinetic-server-cloudimg-amd64.img

on Oracle Cloud, with with cloud-init 22.3-13-g70ce6442-0ubuntu1~22.10.1.

cloud-init is able to find correct network information (note: the use if /21 as netmask):

ephemeral.py[DEBUG]: Attempting setup of ephemeral network on ens3 with 100.103.27.215/21 brd 100.103.31.255
subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '100.103.27.215/21', 'broadcast', '100.103.31.255', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'ens3', 'up'] with allowed return codes [0] (shell=False, capture=True)
subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'append', '0.0.0.0/0', 'via', '100.103.24.1', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'append', '169.254.0.0/16', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
url_helper.py[DEBUG]: [0/3] open 'http://169.254.169.254/opc/v2/instance/' with {'url': 'http://169.254.169.254/opc/v2/instance/', 'stream': False

ephemeral.py[DEBUG]: Attempting setup of ephemeral network on ens3 with 100.103.27.215/21 brd 100.103.31.255
subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '100.103.27.215/21', 'broadcast', '100.103.31.255', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'ens3', 'up'] with allowed return codes [0] (shell=False, capture=True)
subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'append', '0.0.0.0/0', 'via', '100.103.24.1', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'append', '169.254.0.0/16', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
url_helper.py[DEBUG]: [0/3] open 'http://169.254.169.254/opc/v2/instance/' with {'url': 'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': True, 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/22.3-13-g70ce6442-0ubuntu1~22.10.1', 'Authorization': 'Bearer Oracle'}} configuration
url_helper.py[DEBUG]: Read from http://169.254.169.254/opc/v2/instance/ (200, 4154b) after 1 attempts
url_helper.py[DEBUG]: [0/3] open 'http://169.254.169.254/opc/v2/vnics/' with {'url': 'http://169.254.169.254/opc/v2/vnics/', 'stream': False, 'allow_redirects': True, 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/22.3-13-g70ce6442-0ubuntu1~22.10.1', 'Authorization': 'Bearer Oracle'}} configuration
url_helper.py[DEBUG]: Read from http://169.254.169.254/opc/v2/vnics/ (200, 280b) after 1 attempts
subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'del', '169.254.0.0/16', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'del', '0.0.0.0/0', 'via', '100.103.24.1', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'ens3', 'down'] with allowed return codes [0] (shell=False, capture=True)
subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'del', '100.103.27.215/21', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)

However, at later stage it forgets about the netmask:

stages.py[DEBUG]: applying net config names for {'config': [{'name': 'ens3', 'type': 'physical', 'mac_address': '02:00:17:06:ae:e9', 'mtu': 9000, 'subnets': [{'type': 'static', 'address': '100.103.27.215'}]}], 'version': 1}

creates:

$ cat /run/systemd/network/10-netplan-ens3.*
[Match]
MACAddress=02:00:17:06:ae:e9

[Link]
Name=ens3
WakeOnLan=off
MTUBytes=9000
[Match]
MACAddress=02:00:17:06:ae:e9
Name=ens3

[Link]
MTUBytes=9000

[Network]
LinkLocalAddressing=ipv6
Address=100.103.27.215/24

Note /24 here.

If one is unlucky and get an IP in upper octets (in the subnet),
ip command will refuse to set default gateway in routing table as IP of gw is outside subnet.

Hence, we end up with no working network configuration:

root@v:# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:00:17:06:ae:e9 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    inet 100.103.27.215/24 brd 100.103.27.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::17ff:fe06:aee9/64 scope link
       valid_lft forever preferred_lft forever
root@v:# ip r
100.103.27.0/24 dev ens3 proto kernel scope link src 100.103.27.215
root@v:#

Any ideas what's causing this or how to debug more deeply?

Thanks in advance.

Revision history for this message
Terje Røsten (terjeros) wrote :

FYI: reverting to cloud-init 22.1-14-g2e17a0d6-0ubuntu1~22.04.5 (even on U22.10) seems to resolve the issue.

Revision history for this message
James Falcon (falcojr) wrote :

Thanks for the bug report. On an affected instance, can you run the following commands and provide the output?

curl --header "Authorization:Bearer Oracle" http://169.254.169.254/opc/v2/vnics

cat /etc/netplan/50-cloud-init.yaml

cat /run/net-*
(if that doesn't work, cat any file that starts with "net" and ends with ".conf" in /run)

Finally, please run `cloud-init collect-logs` and upload the tarball here.

Revision history for this message
Terje Røsten (terjeros) wrote :

(after running dhclient -v ens3 to get networking up):

$ curl --header "Authorization:Bearer Oracle" http://169.254.169.254/opc/v2/vnics
[
  {
    "macAddr": "02:00:17:0F:17:EB",
    "privateIp": "100.103.26.6",
    "subnetCidrBlock": "100.103.24.0/21",
    "virtualRouterIp": "100.103.24.1",
    "vlanTag": 3945,
    "vnicId": "ocid1.vnic.oc1.iad.abuwcljtmtfxijv75gnssqeliui6sow3gzskerirhua4i4jg6nllnexbkynq"
  }
]

$ cat /etc/netplan/50-cloud-init.yaml

# This file is generated from information provided by the datasource. Changes
# to it will not persist across an instance reboot. To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    version: 2
    ethernets:
        ens3:
            addresses:
            - 100.103.26.6/24
            match:
                macaddress: 02:00:17:0f:17:eb
            mtu: 9000
            set-name: ens3

Revision history for this message
James Falcon (falcojr) wrote :

Thanks for the additional info. I see the problem. We'll get this fixed ASAP.

Changed in cloud-init (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
James Falcon (falcojr) wrote :

One more question. Can you provide me the instance type and image used to launch this instance?

Revision history for this message
Chad Smith (chad.smith) wrote :

Upstream pull request addressing this bug should land shortly
https://github.com/canonical/cloud-init/pull/1735

James Falcon (falcojr)
Changed in cloud-init (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Terje Røsten (terjeros) wrote :

Hi!

Thanks for quick fix!

Here is some more input:

Shape data:

"shape": "VM.Standard.E4.Flex",
"shapeConfig": {
 "maxVnicAttachments": 2,
   "memoryInGBs": 8.0,
   "networkingBandwidthInGbps": 1.0,
   "ocpus": 1.0
},

Image is:
  https://cloud-images.ubuntu.com/kinetic/20220914/kinetic-server-cloudimg-amd64.img

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 22.3.3-0ubuntu1~22.10.1

---------------
cloud-init (22.3.3-0ubuntu1~22.10.1) kinetic; urgency=medium

  * New upstream bug fix release.
    + Release 22.3.3 (LP: #1986703)
    + Fix Oracle DS not setting subnet when using IMDS (#1735)
      (LP: #1989686)
    + azure: define new attribute for pre-22.3 pickles (#1725)
    + sources/azure: ensure instance id is always correct (#1727)
      [Chris Patterson]

 -- Brett Holman <email address hidden> Mon, 19 Sep 2022 13:13:06 -0600

Changed in cloud-init (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Terje Røsten (terjeros) wrote :

Hi again!

Testing new image:

 https://cloud-images.ubuntu.com/kinetic/20220925/kinetic-server-cloudimg-amd64.img

with cloud-init 22.3.3-0ubuntu1~22.10.1.

Now is netmask correct, however default route is still not set:

$ ip a show ens3

2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:00:17:0c:8e:c1 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    inet 100.103.24.186/21 brd 100.103.31.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::17ff:fe0c:8ec1/64 scope link
       valid_lft forever preferred_lft forever

$ ip r
100.103.24.0/21 dev ens3 proto kernel scope link src 100.103.24.186

From logs:

 19.896466] cloud-init[581]: Cloud-init v. 22.3.3-0ubuntu1~22.10.1 running 'init' at Tue, 27 Sep 2022 10:16:42 +0000. Up 19.87 seconds.
[ 19.907026] cloud-init[581]: ci-info: ++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++
[ 19.913354] cloud-init[581]: ci-info: +--------+------+-------------------------+---------------+--------+-------------------+
[ 19.919590] cloud-init[581]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
[ 19.926204] cloud-init[581]: ci-info: +--------+------+-------------------------+---------------+--------+-------------------+
[ 19.933926] cloud-init[581]: ci-info: | ens3 | True | 100.103.24.186 | 255.255.248.0 | global | 02:00:17:0c:8e:c1 |
[ 19.943601] cloud-init[581]: ci-info: | ens3 | True | fe80::17ff:fe0c:8ec1/64 | . | link | 02:00:17:0c:8e:c1 |
[ 19.951073] cloud-init[581]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
[ 19.960127] cloud-init[581]: ci-info: | lo | True | ::1/128 | . | host | . |
[ 19.969361] cloud-init[581]: ci-info: +--------+------+-------------------------+---------------+--------+-------------------+
[ 19.977345] cloud-init[581]: ci-info: +++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++
[ 19.984171] cloud-init[581]: ci-info: +-------+--------------+---------+---------------+-----------+-------+
[ 19.993051] cloud-init[581]: ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
[ 20.001401] cloud-init[581]: ci-info: +-------+--------------+---------+---------------+-----------+-------+
[ 20.009636] cloud-init[581]: ci-info: | 0 | 100.103.24.0 | 0.0.0.0 | 255.255.248.0 | ens3 | U |
[ 20.017465] cloud-init[581]: ci-info: +-------+--------------+---------+---------------+-----------+-------+

Revision history for this message
James Falcon (falcojr) wrote :

Hi Terje, thanks for the update. We'll get this one fixed as well.

Thanks for providing your shape and image data. If possible, I'd like to get even more details on how you're launching your instance. The reason I ask is that the instances you're launching are retrieving their networking information from Oracle's metadata service. You can see this source of information if you 'curl --header "Authorization: Bearer Oracle" http://169.254.169.254/opc/v2/vnics' on a launched instance. However, this isn't the primary way cloud-init receives its networking information on Oracle. Networking information is usually retrieved from files in /run/net*. E.g., on my launched instance:

ubuntu@instance20220927140125:~$ cat /run/net-ens3.conf
DEVICE='ens3'
PROTO='dhcp'
IPV4PROTO='dhcp'
IPV4ADDR='10.0.2.174'
IPV4NETMASK='255.255.255.0'
IPV4BROADCAST='10.0.2.255'
IPV4GATEWAY='10.0.2.1'
IPV4DNS0='169.254.169.254'
ROOTSERVER='10.0.2.1'
HOSTNAME=''
DNSDOMAIN='default.oraclevcn.com'
DOMAINSEARCH='default.oraclevcn.com'

Cloud-init isn't supposed to get it's primary networking information from the metadata service, and will only do so if the /run/net* files are unavailable, which as far as we're aware, shouldn't ever happen. The reason for the extra churn here is that I have no easy way of testing these changes because I didn't even know you can launch such an instance. So if you could share your launch information with me, that would be very helpful.

Are you using the CLI to launch? If so, can I get the exact command you're using (with any sensitive data redacted)? If you're going through the console, can you indicate all of the options you've selected? Do you have access to any non-public features that I should be aware of?

Revision history for this message
Terje Røsten (terjeros) wrote :

Hi!

We have a range of Ubuntu releases running, for U18.04 and U20.04 I see that /run/net-ens3.conf
is present and with contents similar to above.

For U22.04 and U22.10 instances there are no /run/net-*.conf present.

The U22.04 instance are using the standard image:
 https://docs.oracle.com/en-us/iaas/images/image/6f9a42e0-86a4-4ecd-9c6a-c2bd2f3c15f8/

While U22.10 is using
 https://cloud-images.ubuntu.com/kinetic/20220925/kinetic-server-cloudimg-amd64.img

which is imported to Oracle Object Store and converted to image by:

LAUNCH_MODE="PARAVIRTUALIZED"
image_type="qcow2"

oci os object put -ns $namespace -bn $bucket_name --name "$IMAGE_NAME" --file "$FILE"

oci compute image import from-object-uri \
 --display-name="$IMAGE" \
 --uri https://objectstorage.us-ashburn-1.oraclecloud.com/n/$namespace/b/$bucket_name/o/"$IMAGE_NAME" \
 --compartment-id="$compartment_id" \
 --launch-mode="$LAUNCH_MODE" \
 --source-image-type="$image_type" \
 --defined-tags '{ "customimages": { "upstream" : "'$URL'" }}'

Then instance are created by clicking "Create instance" in the
Compute -> CustomImages view in the OCI console for the created image.

Shape used is E4.Flex, networking is more or less standard, however with
"Do not assign a public IPv4 address".

Note: due to default gateway not set, instance will be reachable from instances in the same subnet only.

Chad Smith (chad.smith)
description: updated
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Terje, or anyone else affected,

Accepted cloud-init into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/22.3.4-0ubuntu1~18.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in cloud-init (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Chris Halse Rogers (raof) wrote :

Hello Terje, or anyone else affected,

Accepted cloud-init into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/22.3.4-0ubuntu1~20.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in cloud-init (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed-focal
Changed in cloud-init (Ubuntu Jammy):
status: New → Fix Committed
tags: added: verification-needed-jammy
Revision history for this message
Chris Halse Rogers (raof) wrote :

Hello Terje, or anyone else affected,

Accepted cloud-init into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/22.3.4-0ubuntu1~22.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Terje Røsten (terjeros) wrote : Re: cloud-init sets wrong netmask causing borken network config on Oracle Cloud

Default gateway is still missing:

$ cloud-init init
Cloud-init v. 22.3.4-0ubuntu1~22.04.1 running 'init' at Tue, 04 Oct 2022 10:43:43 +0000. Up 6162.65 seconds.
++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++
+--------+------+------------------------+---------------+--------+-------------------+
| Device | Up | Address | Mask | Scope | Hw-Address |
+--------+------+------------------------+---------------+--------+-------------------+
| ens3 | True | 100.103.31.188 | 255.255.248.0 | global | 02:00:17:09:06:85 |
| ens3 | True | fe80::17ff:fe09:685/64 | . | link | 02:00:17:09:06:85 |
| lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
| lo | True | ::1/128 | . | host | . |
+--------+------+------------------------+---------------+--------+-------------------+
+++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++
+-------+--------------+---------+---------------+-----------+-------+
| Route | Destination | Gateway | Genmask | Interface | Flags |
+-------+--------------+---------+---------------+-----------+-------+
| 0 | 100.103.24.0 | 0.0.0.0 | 255.255.248.0 | ens3 | U |
+-------+--------------+---------+---------------+-----------+-------+
+++++++++++++++++++Route IPv6 info+++++++++++++++++++
+-------+-------------+---------+-----------+-------+
| Route | Destination | Gateway | Interface | Flags |
+-------+-------------+---------+-----------+-------+
| 1 | fe80::/64 | :: | ens3 | U |
| 3 | local | :: | ens3 | U |
| 4 | multicast | :: | ens3 | U |
+-------+-------------+---------+-----------+-------+

root@u:~# ip r
100.103.24.0/21 dev ens3 proto kernel scope link src 100.103.31.188
root@u:~#

Chad Smith (chad.smith)
summary: - cloud-init sets wrong netmask causing borken network config on Oracle
+ cloud-init sets wrong netmask causing broken network config on Oracle
Cloud
Revision history for this message
James Falcon (falcojr) wrote :

Terje, if you upgraded from a previous cloud-init version, the datasource info (including network config) is cached by the instance and not refetched or regenerated. Can you run "cloud-init clean --reboot" (the --logs flag is useful too, but will wipe out your earlier cloud-init logs) and see if you get any different results after the instance reboots?

Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):

This testing approach across upgrade is referenced in https://cloudinit.readthedocs.io/en/latest/topics/debugging.html#manual-sru-verification-procedure.

I'd expect this last `cloud-init init` run you ran on this instance would have logged that network was not re-written or applied to the system. You'd probably see a log like the following around the time at which you ran `cloud-init init`.

2022-10-05 14:11:20,579 - stages.py[DEBUG]: No network config applied. Neither a new instance nor datasource network update allowed

Revision history for this message
James Falcon (falcojr) wrote :

Terje, thanks for all of the helpful feedback here. Based on the command and uptime listed, as well as our internal testing, we still believe this issue to be fixed. If you see this issue after running "cloud-init clean", or if you have launched a fresh instance containing this latest version, please let us know so that we can investigate further. If we don't hear from you by end of week, we'll assume that this is fixed and can go ahead with releasing it.

Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):
Download full text (7.5 KiB)

##kinetic validation 22.3.4

I was able to validate kinetic failure path 22.3.3 using this procedure below and success across upgrade to 22.3.4:

$ wget https://cloud-images.ubuntu.com/kinetic/20220925/kinetic-server-cloudimg-amd64.img
$ oci os object put -ns intcanonical -bn blackboxsw-bucket --file kinetic-server-cloudimg-amd64.img
$ oci compute image import from-object-uri --display-name="blackboxsw-test-kinetic" --uri https://objectstorage.us-phoenix-1.oraclecloud.com/n/intcanonical/b/blackboxsw-bucket/o/kinetic-server-cloudimg-amd64.img --compartment-id="ocid1.compartment.oc1..aaaaaaaagydpgegyncqufyw45spizdp7t4bxpy752csiioelfyyykvedkf4a" --launch-mode="PARAVIRTUALIZED" --source-image-type="qcow2"

# Launch with cloud-config setting bogus passw0rd because we expect to have to use serial console to login
$ cat > set-hostname.yaml <<EOF
## template: jinja
#cloud-config
ssh_import_id: [chad.smith]
hostname: SRU-worked-{{v1.cloud_name}}
password: passw0rd
chpasswd: { expire: False }
ssh_pwauth: True
EOF

# Launch instance with poor default password as we expect network to come up broken

$ oci compute instance launch --availability-domain="qIZq:PHX-AD-3" --shape="VM.Standard2.1" --image-id="ocid1.image.oc1.phx.aaaaaaaazgo7n42d6zl4ucig4w5spqnmaexakaqc6trsukgbxaslynrsvqsq" --subnet-id="ocid1.subnet.oc1.phx.aaaaaaaacosethjtrcdct2imjdbrsruvfhhiaq2xf64d4klhuy54aflokqaq" --ssh-authorized-keys-file=".ssh/id_rsa.pub" --user-data-file="./sethostname.yaml" -c ocid1.tenancy.oc1..aaaaaaaao7f7cccogqrg5emjxkxmctzbnhl6zdkkx36yq2jgxnm4p5vmysbq --display-name bbsw-kinetic-dbg

# Run sample script from serial console to capture current broken state
cat oracle-sru-verification.sh
#!/bin/bash
set -ex
echo "===== confirm failure"
dpkg-query --show cloud-init
ip a
ip r
cat /etc/netplan/50-cloud-init.yaml
for file in /run/systemd/network/*; do
 cat $file;
done
egrep 'WARNING|Applying' /var/log/cloud-init.log

echo "===== fix network manually"
MAC=$(cat /sys/class/net/ens3/address)
cat > 50-cloud-init.yaml <<EOF
# Temporarily fix netplan setting dhcp on primary NIC
network:
    version: 2
    ethernets:
        ens3:
            dhcp4: true
            match:
                macaddress: $MAC
            set-name: ens3
EOF
sudo cp 50-cloud-init.yaml /etc/netplan
sudo netplan generate
sudo netplan apply
echo "===== confirm success across upgrade/reboot"
sudo add-apt-repository ppa:cloud-init-dev/proposed -y
sudo apt install cloud-init -y
sudo rm /etc/netplan/50-cloud-init.yaml
sudo cloud-init clean --logs --reboot

$ ssh ubuntu@VM_IP # Assume we can now get in (a measure of success)

# Check prior script run in failed config case

+ echo '===== confirm failure'
===== confirm failure
+ dpkg-query --show cloud-init
cloud-init 22.3.3-0ubuntu1~22.10.1
+ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP group default qlen 1000
  ...

Read more...

Chad Smith (chad.smith)
description: updated
Revision history for this message
Chad Smith (chad.smith) wrote :
Download full text (30.4 KiB)

# Bionic, Focal and Jammy validation of 22.3.4 upgrade path not breaking network config for imported images in Oracle

# Download daily images
$ wget https://cloud-images.ubuntu.com/jammy/20221004/jammy-server-cloudimg-amd64.img
$ wget https://cloud-images.ubuntu.com/focal/20221005/focal-server-cloudimg-amd64.img
$ wget https://cloud-images.ubuntu.com/bionic/20221006/bionic-server-cloudimg-amd64.img

# Upload to Oracle bucket
$ for series in bionic focal jammy; do oci os object put -ns intcanonical -bn blackboxsw-bucket --file $series-server-cloudimg-amd64.img; done

# import bucket img files as customimages
$ for series in bionic focal jammy; do oci compute image import from-object-uri --display-name="blackboxsw-test-$series" --uri https://objectstorage.us-phoenix-1.oraclecloud.com/n/intcanonical/b/blackboxsw-bucket/o/$series-server-cloudimg-amd64.img --compartment-id="ocid1.compartment.oc1..aaaaaaaagydpgegyncqufyw45spizdp7t4bxpy752csiioelfyyykvedkf4a" --launch-mode="PARAVIRTUALIZED" --source-image-type="qcow2"; done

# Launch custom image ids for bionic, focal, jammy

$ oci compute instance launch --availability-domain="qIZq:PHX-AD-3" --shape="VM.Standard2.1" --image-id=ocid1.image.oc1.phx.aaaaaaaaet2o7k3acpywoex2shrc525wui6a3nuzd2cohony7dxxfcfm42qa --subnet-id="ocid1.subnet.oc1.phx.aaaaaaaacosethjtrcdct2imjdbrsruvfhhiaq2xf64d4klhuy54aflokqaq" --ssh-authorized-keys-file="/root/.ssh/id_rsa.pub" --user-data-file="/root/sethostname.yaml" -c ocid1.tenancy.oc1..aaaaaaaao7f7cccogqrg5emjxkxmctzbnhl6zdkkx36yq2jgxnm4p5vmysbq --display-name bbsw-bionic-dbg
$ oci compute instance launch --availability-domain="qIZq:PHX-AD-3" --shape="VM.Standard2.1" --image-id=ocid1.image.oc1.phx.aaaaaaaawrioy6r6l6lazrdxwcwy5j2mqu7q36k6ymbmt5l6hcsff645mkua --subnet-id="ocid1.subnet.oc1.phx.aaaaaaaacosethjtrcdct2imjdbrsruvfhhiaq2xf64d4klhuy54aflokqaq" --ssh-authorized-keys-file="/root/.ssh/id_rsa.pub" --user-data-file="/root/sethostname.yaml" -c ocid1.tenancy.oc1..aaaaaaaao7f7cccogqrg5emjxkxmctzbnhl6zdkkx36yq2jgxnm4p5vmysbq --display-name bbsw-focal-dbg
$ oci compute instance launch --availability-domain="qIZq:PHX-AD-3" --shape="VM.Standard2.1" --image-id=ocid1.image.oc1.phx.aaaaaaaa6fgohx4btsfar4vxoczflyewtpypxa4rttcos7u7hvxd55weyh4a --subnet-id="ocid1.subnet.oc1.phx.aaaaaaaacosethjtrcdct2imjdbrsruvfhhiaq2xf64d4klhuy54aflokqaq" --ssh-authorized-keys-file="/root/.ssh/id_rsa.pub" --user-data-file="/root/sethostname.yaml" -c ocid1.tenancy.oc1..aaaaaaaao7f7cccogqrg5emjxkxmctzbnhl6zdkkx36yq2jgxnm4p5vmysbq --display-name bbsw-jammy-dbg

# Validate 22.2 release behavior , upgrade to 22.3.4 and retained default routes, DNS and warnings about rendering network from IMDS using this script

#!/bin/bash
# Confirm released version network config on cloud-init 22.1
BIONIC_VM=ubuntu@129.146.99.255
FOCAL_VM=ubuntu@158.101.17.232
JAMMY_VM=ubuntu@129.146.53.240

for vm in $BIONIC_VM $FOCAL_VM $JAMMY_VM; do
  echo -- Validate current state
  ssh $vm lsb_release -sc
  ssh $vm -- dpkg-query --show cloud-init
  ssh $vm -- cloud-init status --long
  ssh $vm -- cat /etc/netplan/50-cloud-init.yaml
  ssh $vm -- ip a
  ssh $vm -- ip r
  echo --- Confirm no warnings or traces
  ssh $vm -- ...

tags: added: verification-done verification-done-bionic verification-done-focal verification-done-jammy
removed: verification-needed verification-needed-bionic verification-needed-focal verification-needed-jammy
Revision history for this message
Terje Røsten (terjeros) wrote :

Hi again!

I upgraded to cloud-init 22.3.4-0ubuntu1 and did:

$ hostname something-else
$ cloud-init clean --logs --reboot

After reboot, networking seems to work fine.

Thanks!

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (13.4 KiB)

This bug was fixed in the package cloud-init - 22.3.4-0ubuntu1~22.04.1

---------------
cloud-init (22.3.4-0ubuntu1~22.04.1) jammy; urgency=medium

  * New upstream bugfix release. (LP: #1987318)
    + Release 22.3.4 (LP: #1986703)
    + Fix Oracle DS primary interface when using IMDS (#1757)
      (LP: #1989686)

cloud-init (22.3.3-0ubuntu1~22.04.1) jammy; urgency=medium

  * New upstream bugfix release. (LP: #1987318)
    + Release 22.3.3
    + Fix Oracle DS not setting subnet when using IMDS (#1735)
    + azure: define new attribute for pre-22.3 pickles (#1725)
    + sources/azure: ensure instance id is always correct (#1727)
      [Chris Patterson]

cloud-init (22.3-13-g70ce6442-0ubuntu1~22.04.1) jammy; urgency=medium

  * d/control: add python3-debconf to Depends and Build-Depends
  * d/cloud-init.lintian-overrides: lintian fixes:
    + Fix systemd-service-file-refers-to-unusual-wantedby-target format.
  * d/cloud-init.postinst
    + Lintian: Fix command-with-path-in-maintainer-script for grub-install.
  * d/source/lintian-overrides: lintian fixes:
    + silence binary-nmu-debian-revision-in-source bug:
      https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1014584
  * d/p/expire-on-hashed-users.patch:
    Add patch to ensure password expire doesn't apply to hashed users
  * drop the following cherry-picks now included:
    + cpick-a2e62738-Fix-cc_phone_home-requiring-tries-1500
  * New upstream snapshot. (LP: #1987318)
    + Fix v2 interface matching when no MAC
    + test: reduce number of network dependencies in flaky test (#1702)
    + docs: publish cc_ubuntu_autoinstall docs to rtd (#1696)
    + net: Fix EphemeraIPNetwork (#1697)
    + test: make ansible test work across older versions (#1691)
    + Networkd multi-address support/fix (#1685) [Teodor Garzdin]
    + make: drop broken targets (#1688)
    + net: Passthough v2 netconfigs in netplan systems (#1650)
    + NM ipv6 connection does not work on Azure and Openstack (#1616)
      [Emanuele Giuseppe Esposito]
    + Fix check_format_tip (#1679)
    + DataSourceVMware: fix var use before init (#1674) [Andrew Kutz]
    + rpm/copr: ensure RPM represents new clean.d dir artifacts (#1680)
    + test: avoid centos leaked check of /etc/yum.repos.d/epel-testing.repo
      (#1676)
    + Release 22.3 (#1662)
    + sources: obj.pkl cache should be written anyime get_data is run
      (#1669)
    + schema: drop release number from version file (#1664)
    + pycloudlib: bump to quiet azure HTTP info logs (#1668)
    + test: fix wireguard integration tests (#1666)
    + Github is deprecating the 18.04 runner starting 12.1 (#1665)
    + integration tests: Ensure one setup for all tests (#1661)
    + tests: ansible test fixes (#1660)
    + Prevent concurrency issue in test_webhook_hander.py (#1658)
    + Workaround net_setup_link race with udev (#1655)
    + test: drop erroneous lxd assertion, verify command succeeded (#1657)
    + Fix Chrony usage on Centos Stream (#1648) [Sven Haardiek]
    + sources/azure: handle network unreachable errors for saveable PPS
      (#1642) [Chris Patterson]
    + Return cc_set_hostname to PER_INSTANCE frequency (#1651)
    + test: Collect integration test time b...

Changed in cloud-init (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (13.6 KiB)

This bug was fixed in the package cloud-init - 22.3.4-0ubuntu1~20.04.1

---------------
cloud-init (22.3.4-0ubuntu1~20.04.1) focal; urgency=medium

  * New upstream bugfix release. (LP: #1987318)
    + Release 22.3.4 (LP: #1986703)
    + Fix Oracle DS primary interface when using IMDS (#1757)
      (LP: #1989686)

cloud-init (22.3.3-0ubuntu1~20.04.1) focal; urgency=medium

  * New upstream bugfix release. (LP: #1987318)
    + Release 22.3.3
    + Fix Oracle DS not setting subnet when using IMDS (#1735)
    + azure: define new attribute for pre-22.3 pickles (#1725)
    + sources/azure: ensure instance id is always correct (#1727)
      [Chris Patterson]

cloud-init (22.3-13-g70ce6442-0ubuntu1~20.04.1) focal; urgency=medium

  * d/control:
    - add python3-debconf to Depends and Build-Depends
    - Build-Depends: bump debhelper-compat to v10
  * d/control: lintian fixes:
    + upgrade debhelper-compat to 10 and move it to d/control
    + d/compat: removed in favor of d/control
  * d/cloud-init.postinst:
    + Lintian: Disable uses-dpkg-database-directly on legit use of it in
      distros/debian.py
  * d/cloud-init.postinst: lintian fixes:
    + Fix command-with-path-in-maintainer-script for grub-install
  * d/p/expire-on-hashed-users.patch:
    Add patch to ensure password expire doesn't apply to hashed users
  * d/source/lintian-overrides: lintian fixes:
    + silence binary-nmu-debian-revision-in-source bug:
      https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1014584
  * drop the following cherry-picks now included:
    + cpick-a2e62738-Fix-cc_phone_home-requiring-tries-1500
  * New upstream snapshot. (LP: #1987318)
    + Fix v2 interface matching when no MAC
    + test: reduce number of network dependencies in flaky test (#1702)
    + docs: publish cc_ubuntu_autoinstall docs to rtd (#1696)
    + net: Fix EphemeraIPNetwork (#1697)
    + test: make ansible test work across older versions (#1691)
    + Networkd multi-address support/fix (#1685) [Teodor Garzdin]
    + make: drop broken targets (#1688)
    + net: Passthough v2 netconfigs in netplan systems (#1650)
    + NM ipv6 connection does not work on Azure and Openstack (#1616)
      [Emanuele Giuseppe Esposito]
    + Fix check_format_tip (#1679)
    + DataSourceVMware: fix var use before init (#1674) [Andrew Kutz]
    + rpm/copr: ensure RPM represents new clean.d dir artifacts (#1680)
    + test: avoid centos leaked check of /etc/yum.repos.d/epel-testing.repo
      (#1676)
    + Release 22.3 (#1662)
    + sources: obj.pkl cache should be written anyime get_data is run
      (#1669)
    + schema: drop release number from version file (#1664)
    + pycloudlib: bump to quiet azure HTTP info logs (#1668)
    + test: fix wireguard integration tests (#1666)
    + Github is deprecating the 18.04 runner starting 12.1 (#1665)
    + integration tests: Ensure one setup for all tests (#1661)
    + tests: ansible test fixes (#1660)
    + Prevent concurrency issue in test_webhook_hander.py (#1658)
    + Workaround net_setup_link race with udev (#1655)
    + test: drop erroneous lxd assertion, verify command succeeded (#1657)
    + Fix Chrony usage on Centos Stream (#1648)
      [Sven Haardiek]
    ...

Changed in cloud-init (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (13.5 KiB)

This bug was fixed in the package cloud-init - 22.3.4-0ubuntu1~18.04.1

---------------
cloud-init (22.3.4-0ubuntu1~18.04.1) bionic; urgency=medium

  * New upstream bugfix release. (LP: #1987318)
    + Release 22.3.4 (LP: #1986703)
    + Fix Oracle DS primary interface when using IMDS (#1757)
      (LP: #1989686)

cloud-init (22.3.3-0ubuntu1~18.04.1) bionic; urgency=medium

  * New upstream bugfix release. (LP: #1987318)
    + Release 22.3.3
    + Fix Oracle DS not setting subnet when using IMDS (#1735)
    + azure: define new attribute for pre-22.3 pickles (#1725)
    + sources/azure: ensure instance id is always correct (#1727)
      [Chris Patterson]

cloud-init (22.3-13-g70ce6442-0ubuntu1~18.04.1) bionic; urgency=medium

  * d/control: add python3-debconf to Depends and Build-Depends
  * d/cloud-init.postinst:
    + Lintian: Fix command-with-path-in-maintainer-script for grub-install
  * d/p/renderer-do-not-prefer-netplan refresh to activators change
  * d/p/expire-on-hashed-users.patch:
    Add patch to ensure password expire doesn't apply to hashed users
  * d/source/lintian-overrides: lintian fixes:
    + silence binary-nmu-debian-revision-in-source bug:
      https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1014584
  * refresh patches:
    + debian/patches/ec2-dont-apply-full-imds-network-config.patch
    + debian/patches/openstack-no-network-config.patch
    + debian/patches/renderer-do-not-prefer-netplan.patch
  * drop the following cherry-picks now included:
    + cpick-a2e62738-Fix-cc_phone_home-requiring-tries-1500
  * New upstream snapshot. (LP: #1987318)
    + Fix v2 interface matching when no MAC
    + test: reduce number of network dependencies in flaky test (#1702)
    + docs: publish cc_ubuntu_autoinstall docs to rtd (#1696)
    + net: Fix EphemeraIPNetwork (#1697)
    + test: make ansible test work across older versions (#1691)
    + Networkd multi-address support/fix (#1685) [Teodor Garzdin]
    + make: drop broken targets (#1688)
    + net: Passthough v2 netconfigs in netplan systems (#1650)
    + NM ipv6 connection does not work on Azure and Openstack (#1616)
      [Emanuele Giuseppe Esposito]
    + Fix check_format_tip (#1679)
    + DataSourceVMware: fix var use before init (#1674) [Andrew Kutz]
    + rpm/copr: ensure RPM represents new clean.d dir artifacts (#1680)
    + test: avoid centos leaked check of /etc/yum.repos.d/epel-testing.repo
      (#1676)
    + Release 22.3 (#1662)
    + sources: obj.pkl cache should be written anyime get_data is run
      (#1669)
    + schema: drop release number from version file (#1664)
    + pycloudlib: bump to quiet azure HTTP info logs (#1668)
    + test: fix wireguard integration tests (#1666)
    + Github is deprecating the 18.04 runner starting 12.1 (#1665)
    + integration tests: Ensure one setup for all tests (#1661)
    + tests: ansible test fixes (#1660)
    + Prevent concurrency issue in test_webhook_hander.py (#1658)
    + Workaround net_setup_link race with udev (#1655)
    + test: drop erroneous lxd assertion, verify command succeeded (#1657)
    + Fix Chrony usage on Centos Stream (#1648)
      [Sven Haardiek]
    + sources/azure: handle network unreachable errors f...

Changed in cloud-init (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Terje Røsten (terjeros) wrote :

Just verified image below works out of the box on Oracle Cloud:

 https://cloud-images.ubuntu.com/kinetic/20221019/kinetic-server-cloudimg-amd64.img

Thanks guys!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.