ConfigDrive: cloud-init fails to configure bond from network_data.json

Bug #1605749 reported by Mathieu Gagné
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Medium
Unassigned
cloud-init (Ubuntu)
Fix Released
Medium
Unassigned
Xenial
Fix Released
Medium
Unassigned

Bug Description

cloud-init fails to configure bond interfaces from network_data.json

There is a couple of reasons:

Bond links found in network_data.json do not have a name attribute. cloud-init doesn't require the name attribute to exist in links. [1] However cloud-init later expects the links to have a name attribute and crashes when it doesn't have any. [2] The name attribute is not part of the OpenStack network_data.json specification and will therefore never be provided.

If a link name is provided, the generated ENI configuration has a couple of issues:

1) cloud-init currently thinks that the bond_links attribute found in a bond link are actual physical interface names and not link id as expected.

This means you end up with 4 physical interfaces configured on the server: 2 existing physical interfaces (ex.: eno1 and eno2) and 2 physical interfaces based on the name found in bond_links (in that case, eth0 and eth1). The later don't exist on the server and configured bond interface tries to enslave non-existing links and fails to bring up.

2) The "auto" stanza is missing from bond and bond slave interfaces. Interfaces are never started/configured properly at boot.

3) Once 1) and 2) are fixed, it looks like cloud-init runs the network configuration again in dsmode=net and fails at multiple steps:

3.1) get_interfaces_by_mac is run once again and tries to detect all known mac addresses by listing all entries found in /sys/class/net/. At this point, the bonding is up and the file 'bond_masters' exists. This means '/sys/class/net/bond_masters/address' won't exist (because /sys/class/net/bond_masters is a file, not a directory) and get_interface_mac will throw an uncatched exception, aborting the configuration process.

3.2) Once 3.1) is fixed, configuration fails again but for a different reason. It is because once the bonding is configured, all slave interfaces will have their mac addresses updated so they are all identical. This means convert_net_json will fail at the "need_names" step and will throw this exception: "No mac_address or name entry for" because now the mac address of one of the physical interface isn't found.

Here is attached to this bug a network_data.json for test purpose.

For reference, here is the MAC address mapping on the server:
- eno1: 0c:c4:7a:34:6e:3c
- eno2: 0c:c4:7a:34:6e:3d

Current rendered ENI is:

    auto lo
    iface lo inet loopback
        dns-nameservers 1.1.1.191 1.1.1.4

    iface eno1 inet manual
        mtu 1500

    iface eno2 inet manual
        mtu 1500

    iface bond0 inet manual
        bond_xmit_hash_policy layer3+4
        bond_miimon 100
        bond_mode 4
        bond-slaves none

    auto eth0
    iface eth0 inet manual
        bond_miimon 100
        bond-master bond0
        bond_mode 4
        bond_xmit_hash_policy layer3+4

    auto eth1
    iface eth1 inet manual
        bond_miimon 100
        bond-master bond0
        bond_mode 4
        bond_xmit_hash_policy layer3+4

    auto bond0.602
    iface bond0.602 inet static
        netmask 255.255.255.248
        address 2.2.2.13
        vlan-raw-device bond0
        hwaddress fa:16:3e:b3:72:30
        vlan_id 602
        post-up route add default gw 2.2.2.9 || true
        pre-down route del default gw 2.2.2.9 || true

    auto bond0.612
    iface bond0.612 inet static
        netmask 255.255.255.248
        address 10.0.1.5
        vlan-raw-device bond0
        hwaddress fa:16:3e:66:ab:a6
        vlan_id 612
        post-up route add -net 192.168.1.0 netmask 255.255.255.255 gw 10.0.1.1 || true
        pre-down route del -net 192.168.1.0 netmask 255.255.255.255 gw 10.0.1.1 || true

[1] http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/sources/helpers/openstack.py#L547
[2] http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/net/network_state.py#L284

Related branches

Revision history for this message
Mathieu Gagné (mgagne) wrote :
Mathieu Gagné (mgagne)
description: updated
Scott Moser (smoser)
Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Scott Moser (smoser) wrote :

I pushed some code to
 https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bond_name
that i think is working. the rendered ENI looks like:

auto lo
iface lo inet loopback
    dns-nameservers 1.1.1.191 1.1.1.4

auto oeth0
iface oeth0 inet manual
    bond-master bond0
    bond_miimon 100
    bond_mode 4
    bond_xmit_hash_policy layer3+4
    mtu 1500

auto oeth1
iface oeth1 inet manual
    bond-master bond0
    bond_miimon 100
    bond_mode 4
    bond_xmit_hash_policy layer3+4
    mtu 1500

iface bond0 inet manual
    bond-slaves none
    bond_miimon 100
    bond_mode 4
    bond_xmit_hash_policy layer3+4

auto bond0.602
iface bond0.602 inet static
    address 2.2.2.13
    netmask 255.255.255.248
    hwaddress fa:16:3e:b3:72:30
    vlan-raw-device bond0
    vlan_id 602
    post-up route add default gw 2.2.2.9 || true
    pre-down route del default gw 2.2.2.9 || true

auto bond0.612
iface bond0.612 inet static
    address 10.0.1.5
    netmask 255.255.255.248
    hwaddress fa:16:3e:66:ab:a6
    vlan-raw-device bond0
    vlan_id 612
    post-up route add -net 192.168.1.0 netmask 255.255.255.255 gw 10.0.1.1 || true
    pre-down route del -net 192.168.1.0 netmask 255.255.255.255 gw 10.0.1.1 || true

Revision history for this message
Mathieu Gagné (mgagne) wrote :

So I tried your patch and so far, I suspect the only thing missing is the "auto bond0" stanza. As soon as I added it and rebooted, it worked. Will patch on my side and test further.

Revision history for this message
Scott Moser (smoser) wrote :

for referenceheres what i have as the network_config generated.
{
 "config": [
  {
   "mac_address": "0c:c4:7a:34:6e:3c",
   "mtu": 1500,
   "name": "oeth0",
   "subnets": [],
   "type": "physical"
  },
  {
   "mac_address": "0c:c4:7a:34:6e:3d",
   "mtu": 1500,
   "name": "oeth1",
   "subnets": [],
   "type": "physical"
  },
  {
   "bond_interfaces": [
    "oeth0",
    "oeth1"
   ],
   "name": "bond0",
   "params": {
    "bond_miimon": 100,
    "bond_mode": "4",
    "bond_xmit_hash_policy": "layer3+4"
   },
   "subnets": [],
   "type": "bond"
  },
  {
   "mac_address": "fa:16:3e:b3:72:30",
   "name": "bond0.602",
   "subnets": [
    {
     "address": "2.2.2.13",
     "ipv4": true,
     "netmask": "255.255.255.248",
     "routes": [
      {
       "gateway": "2.2.2.9",
       "netmask": "0.0.0.0",
       "network": "0.0.0.0"
      }
     ],
     "type": "static"
    }
   ],
   "type": "vlan",
   "vlan_id": 602,
   "vlan_link": "bond0"
  },
  {
   "mac_address": "fa:16:3e:66:ab:a6",
   "name": "bond0.612",
   "subnets": [
    {
     "address": "10.0.1.5",
     "ipv4": true,
     "netmask": "255.255.255.248",
     "routes": [
      {
       "gateway": "10.0.1.1",
       "netmask": "255.255.255.255",
       "network": "192.168.1.0"
      }
     ],
     "type": "static"
    }
   ],
   "type": "vlan",
   "vlan_id": 612,
   "vlan_link": "bond0"
  },
  {
   "address": "1.1.1.191",
   "type": "nameserver"
  },
  {
   "address": "1.1.1.4",
   "type": "nameserver"
  }
 ],
 "version": 1
}

Revision history for this message
Scott Moser (smoser) wrote :

branch at https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bond_name now has a vlan case and a bond case that both resolve 'link' references ratehr than assuming link_id == interface_name.

Specifically, 'bond_links' and 'vlan_link' are references to an entry in the "links" array that has the provided 'id'.

Mathieu Gagné (mgagne)
description: updated
Revision history for this message
Scott Moser (smoser) wrote :

fixed in 0.7.8.

Changed in cloud-init:
status: Confirmed → Fix Released
Scott Moser (smoser)
Changed in cloud-init (Ubuntu):
status: New → Fix Released
Changed in cloud-init (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
Changed in cloud-init (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Mathieu, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.7-31-g65ace7b-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote :

Hello Mathieu, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.8-1-g3705bb5-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Mathieu Gagné (mgagne) wrote :

Works for me. Tested with 2 bonded NICs with 2 tagged VLANs.

Revision history for this message
Scott Moser (smoser) wrote :

marked verification done per mgagne comment.
Thanks!

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.9 KiB)

This bug was fixed in the package cloud-init - 0.7.8-1-g3705bb5-0ubuntu1~16.04.1

---------------
cloud-init (0.7.8-1-g3705bb5-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * New upstream release 0.7.8.
  * New upstream snapshot.
    - systemd: put cloud-init.target After multi-user.target (LP: #1623868)

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.2) xenial-proposed; urgency=medium

  * debian/control: add Breaks of older versions of walinuxagent (LP: #1623570)

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/control: fix missing dependency on python3-serial,
    and make SmartOS datasource work.
  * debian/cloud-init.templates fix capitalisation in template so
    dpkg-reconfigure works to select OpenStack. (LP: #1575727)
  * d/README.source, d/control, d/new-upstream-snapshot, d/rules: sync
    with yakkety for changes due to move to git.
  * d/rules: change PYVER=python3 to PYVER=3 to adjust to upstream change.
  * debian/rules, debian/cloud-init.install: remove install file
    to ensure expected files are collected into cloud-init deb.
    (LP: #1615745)
  * debian/dirs: remove obsolete / unused file.
  * upstream move from bzr to git.
  * New upstream snapshot.
    - Allow link type of null in network_data.json [Jon Grimm] (LP: #1621968)
    - DataSourceOVF: fix user-data as base64 with python3 (LP: #1619394)
    - remove obsolete .bzrignore
    - systemd: Better support package and upgrade. (LP: #1576692, #1621336)
    - tests: cleanup tempdirs in apt_source tests
    - apt config conversion: treat empty string as not provided. (LP: #1621180)
    - Fix typo in default keys for phone_home [Roland Sommer] (LP: #1607810)
    - salt minion: update default pki directory for newer salt minion.
      (LP: #1609899)
    - bddeb: add --release flag to specify the release in changelog.
    - apt-config: allow both old and new format to be present.
      [Christian Ehrhardt] (LP: #1616831)
    - python2.6: fix dict comprehension usage in _lsb_release. [Joshua Harlow]
    - Add a module that can configure spacewalk. [Joshua Harlow]
    - add install option for openrc [Matthew Thode]
    - Generate a dummy bond name for OpenStack (LP: #1605749)
    - network: fix get_interface_mac for bond slave, read_sys_net for ENOTDIR
    - azure dhclient-hook cleanups
    - Minor cleanups to atomic_helper and add unit tests.
    - Fix Gentoo net config generation [Matthew Thode]
    - distros: fix get_primary_arch method use of os.uname [Andrew Jorgensen]
    - Apt: add new apt configuration format [Christian Ehrhardt]
    - Get Azure endpoint server from DHCP client [Brent Baude]
    - DigitalOcean: use the v1.json endpoint [Ben Howard]
    - MAAS: add vendor-data support (LP: #1612313)
    - Upgrade to a configobj package new enough to work [Joshua Harlow]
    - ConfigDrive: recognize 'tap' as a link type. (LP: #1610784)
    - NoCloud: fix bug providing network-interfaces via meta-data.
      (LP: 1577982)
    - Add distro tags on config modules that should have it [Joshua Harlow]
    - ChangeLog: update changelog for previous commit.
    - add ntp config module [Ryan Harper]
    - SmartOS: more improvement...

Read more...

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.