unable to deploy a machine with vmhost if a bond interface was created

Bug #1992185 reported by Christian
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Alberto Donato
3.2
Fix Released
High
Alberto Donato

Bug Description

Since 3.2, i can no longer deploy new physical machine:

1. once my machine is ready, i go to the network section and create a bond combining my boot interface with another one
2. at deployment time, OS gets installed (20.04) and after reboot cloud-init successfully completes but machine gets stuck in "deploying" state

Note that if i deploy ithout creating bond, it will work.

Looked at the regiond.log and found an exception reported in it:

```
2022-10-07 11:04:24 metadataserver.api_twisted: [critical] Failed to process status message instantly.
        Traceback (most recent call last):
          File "/usr/lib/python3.8/threading.py", line 870, in run
            self._target(*self._args, **self._kwargs)
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 821, in worker
            return target()
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/_threads/_threadworker.py", line 46, in work
            task()
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/_threads/_team.py", line 190, in doWork
            task()
        --- <exception caught here> ---
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext
            result = inContext.theWork()
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
            inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/context.py", line 85, in callWithContext
            return func(*args,**kw)
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 856, in callInContext
            return func(*args, **kwargs)
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 202, in wrapper
            result = func(*args, **kwargs)
          File "/snap/maas/23947/lib/python3.8/site-packages/metadataserver/api_twisted.py", line 564, in _processMessageNow
            self._processMessage(node, message)
          File "/snap/maas/23947/lib/python3.8/site-packages/maasserver/utils/orm.py", line 756, in call_within_transaction
            return func_outside_txn(*args, **kwargs)
          File "/snap/maas/23947/lib/python3.8/site-packages/maasserver/utils/orm.py", line 559, in retrier
            return func(*args, **kwargs)
          File "/usr/lib/python3.8/contextlib.py", line 75, in inner
            return func(*args, **kwds)
          File "/snap/maas/23947/lib/python3.8/site-packages/metadataserver/api_twisted.py", line 472, in _processMessage
            _create_vmhost_for_deployment(node)
          File "/snap/maas/23947/lib/python3.8/site-packages/metadataserver/api_twisted.py", line 210, in _create_vmhost_for_deployment
            ip = node.ip_addresses(ifaces=ifaces + [boot_if])[0]
        builtins.IndexError: list index out of range
```

Looked the code and there was a change between 3.1 and 3.2 which fails:

```
    # the IP is associated to the bridge the boot interface is in, not the
    # interface itself
    boot_if = node.get_boot_interface()
    ifaces = list(boot_if.children.all())
    ip = node.ip_addresses(ifaces=ifaces + [boot_if])[0]
```

I have extracted the JSON output of a maas-cli "machine read" of my interface and my boot interface looks like this:
```json
{
  "name": "eno1",
  "system_id": "mx48cn",
  "type": "physical",
  "vlan": {
    "vid": 123,
    ...
  },
  "link_connected": true,
  "children": [
    "bond0"
  ],
  "enabled": true,
}
```

And here is my bond0 interface:

```json
  {
    "parents": [
      "eno1",
      "eno2"
    ],
    "name": "bond0",
    "vendor": null,
    "vlan": {
      "vid": 123,
    },
    "link_connected": false,
    "children": [
      "br0"
    ],
    "enabled": true,
  }
```

and finally my br0 (created automatically by maas):

```json
  {
    "links": [
      {
        "ip_address": "1.2.3.4",
        "subnet": {
          "vlan": {
            "vid": 123,
          },
        }
      }
    ],
    "firmware_version": null,
    "sriov_max_vf": 0,
    "product": null,
    "params": {
      "bridge_stp": false,
      "bridge_type": "standard"
    },
    "discovered": null,
    "link_speed": 0,
    "interface_speed": 0,
    "parents": [
      "bond0"
    ],
    "name": "br0",
    "vlan": {
      "vid": 123,
    },
    "link_connected": false,
  }

```

As you can see, maas python code ends up with an empty list hence generating the exception it assumes there is always at least on entry in the list.

Tags: bug-council

Related branches

Changed in maas:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Christian (cboitel) wrote :

Note that:
1. system fails alike also if i do not ask to deploy it as a KVM host
2. system deploys fine if i do not use bond interfaces

Revision history for this message
Christian (cboitel) wrote :

My mistake: system does fail only with bond interfaces if i ask to deploy as a KVM host.

Revision history for this message
Christian (cboitel) wrote :

We should look at boot interface, and find any bridge children in the same vlan.

Note that one may configure multiple vlan on boot interface thus leading to multiple bridges being found in the children. We must take only IPs in the same VLAN as the boot interface.

Revision history for this message
Nobuto Murata (nobuto) wrote :
Download full text (3.2 KiB)

Subscribed ~field-high.

Confirmed the failure and the same traceback. With this issue, a bonding cannot be used and a VM host must run on a single interface which can be a SPOF.

maas: 1:3.2.6-12016-g.19812b4da-0ubuntu1~20.04.1 on focal
deploying machine: focal, no error is logged on the deployed node side, but it looks like the maas control plane is failing to determine the IP address to talk to the host.

The code block is out side of `if lxd, else kvm` block and we've confirmed that it affects both LXD and Libvirt backends. Which means using another backend is not a workaround unfortunately.

https://git.launchpad.net/maas/tree/src/metadataserver/api_twisted.py?id=8000dfbb56afc1f7d70f47aa6d8802c6d69b740d#n210

====

2022-10-19 08:14:40 metadataserver.api_twisted: [critical] Failed to process status message instantly.
        Traceback (most recent call last):
          File "/usr/lib/python3.8/threading.py", line 870, in run
            self._target(*self._args, **self._kwargs)
          File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 821, in worker
            return target()
          File "/usr/lib/python3/dist-packages/twisted/_threads/_threadworker.py", line 46, in work
            task()
          File "/usr/lib/python3/dist-packages/twisted/_threads/_team.py", line 190, in doWork
            task()
        --- <exception caught here> ---
          File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext
            result = inContext.theWork()
          File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
            inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
          File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 85, in callWithContext
            return func(*args,**kw)
          File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 856, in callInContext
            return func(*args, **kwargs)
          File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 202, in wrapper
            result = func(*args, **kwargs)
          File "/usr/lib/python3/dist-packages/metadataserver/api_twisted.py", line 564, in _processMessageNow
            self._processMessage(node, message)
          File "/usr/lib/python3/dist-packages/maasserver/utils/orm.py", line 756, in call_within_transaction
            return func_outside_txn(*args, **kwargs)
          File "/usr/lib/python3/dist-packages/maasserver/utils/orm.py", line 559, in retrier
            return func(*args, **kwargs)
          File "/usr/lib/python3.8/contextlib.py", line 75, in inner
            return func(*args, **kwds)
          File "/usr/lib/python3/dist-packages/metadataserver/api_twisted.py", line 472, in _processMessage
            _create_vmhost_for_deployment(node)
          File "/usr/lib/python3/dist-packages/metadataserver/api_twisted.py", line 210, in _create_vmhost_for_deployment
        ...

Read more...

summary: - unable to deploy a machine with kvm if a bond interface was created
+ unable to deploy a machine with vmhost if a bond interface was created
Revision history for this message
Nobuto Murata (nobuto) wrote :

Attaching the whole curtin config for the VM host to be deployed.

And it has the bond section as follows:

network:
  bonds:
    bond0:
      interfaces:
      - ens10
      - ens4
      macaddress: 52:54:00:3e:6d:61
      mtu: 1500
      parameters:
        down-delay: 0
        gratuitious-arp: 1
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer2
        up-delay: 0
  bridges:
    br-bond0:
      addresses:
      - 192.168.151.122/24
      gateway4: 192.168.151.1
      interfaces:
      - bond0
      macaddress: 52:54:00:3e:6d:61
      mtu: 1500
      nameservers:
        addresses:
        - 192.168.151.1
        search:
        - maas
      parameters:
        forward-delay: 15
        stp: false

tags: added: bug-council
Revision history for this message
Adam Collard (adam-collard) wrote :

Related to LP:1970962

Changed in maas:
assignee: nobody → Alberto Donato (ack)
Revision history for this message
Yoshi Kadokawa (yoshikadokawa) wrote :

Seeing the same issue on MAAS v3.1 as well. When not configuring bond for the boot interface, then it will succeed with provision as KVM host.

Alberto Donato (ack)
Changed in maas:
status: Triaged → In Progress
milestone: none → 3.3.0
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 3.3.0 → 3.3.0-beta1
Changed in maas:
status: Fix Committed → Fix Released
Changed in maas:
milestone: 3.3.0-beta1 → 3.3.0-beta2
Revision history for this message
Alvin Cura (alvinc) wrote :

I have been having this issue too on 3.2.6-12016-g.19812b4da. I was not using a bond, but I had predefined bridges on vlans. I am removing all bridges and vlans to try a clean deployment.

Revision history for this message
Alvin Cura (alvinc) wrote :

No dice. regiond.log tells me

2023-01-23 20:28:03 metadataserver.api_twisted: [critical] Failed to process status message instantly.
 Traceback (most recent call last):
   File "/usr/lib/python3.8/threading.py", line 870, in run
     self._target(*self._args, **self._kwargs)
   File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 821, in worker
     return target()
   File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/_threads/_threadworker.py", line 46, in work
     task()
   File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/_threads/_team.py", line 190, in doWork
     task()
 --- <exception caught here> ---
   File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext
     result = inContext.theWork()
   File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
     inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
   File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext
     return self.currentContext().callWithContext(ctx, func, *args, **kw)
   File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/context.py", line 85, in callWithContext
     return func(*args,**kw)
   File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 856, in callInContext
     return func(*args, **kwargs)
   File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 202, in wrapper
     result = func(*args, **kwargs)
   File "/snap/maas/23947/lib/python3.8/site-packages/metadataserver/api_twisted.py", line 564, in _processMessageNow
     self._processMessage(node, message)
   File "/snap/maas/23947/lib/python3.8/site-packages/maasserver/utils/orm.py", line 756, in call_within_transaction
     return func_outside_txn(*args, **kwargs)
   File "/snap/maas/23947/lib/python3.8/site-packages/maasserver/utils/orm.py", line 559, in retrier
     return func(*args, **kwargs)
   File "/usr/lib/python3.8/contextlib.py", line 75, in inner
     return func(*args, **kwds)
   File "/snap/maas/23947/lib/python3.8/site-packages/metadataserver/api_twisted.py", line 472, in _processMessage
     _create_vmhost_for_deployment(node)
   File "/snap/maas/23947/lib/python3.8/site-packages/metadataserver/api_twisted.py", line 266, in _create_vmhost_for_deployment
     discover_and_sync_vmhost(pod, node.owner)
   File "/snap/maas/23947/lib/python3.8/site-packages/maasserver/vmhost.py", line 77, in discover_and_sync_vmhost
     raise PodProblem(str(error))
 maasserver.exceptions.PodProblem: Failed talking to pod: 'NoneType' object has no attribute 'get'

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.