[2.5, enhacement] Error when creating pod with multiple NICs with different IP mode

Bug #1806969 reported by Vladimir Grevtsev
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Newell Jensen
2.6
Fix Released
Undecided
Newell Jensen

Bug Description

Problem statement:

According to https://github.com/maas/maas/blob/master/src/provisioningserver/utils/tests/test_constraints.py#L112 - I have picked up the constraint string format and built a command:

$ maas admin pod compose 51 hostname=multi-nic-test-7 cores=2 memory=4096 storage=40 interfaces='eth0:space=oam-space;eth1:space=internal-space,mode=unconfigured'

Unable to compose machine because: Failed talking to pod: Unable to compose multi-nic-test-7: error: Failed to start domain multi-nic-test-7
error: error creating macvtap interface <email address hidden> (52:54:00:97:d0:e1): Device or resource busy

Trying the same with "subnet", not with "space":

$ maas admin pod compose 51 hostname=multi-nic-test-7 cores=2 memory=4096 storage=40 interfaces='eth0:subnet=oam;eth1:space=internal,mode=unconfigured'
Not Found

Expected result: Pod will create with following NIC configuration: https://www.dropbox.com/s/q98iuvcdbt14hpg/multi-nic-test-7.maas%20%20ln-sv-infr02%20MAAS%202018-12-05%2017-26-01.png

Actual result: CLI command fails

Workaround was found for this:

1) create a pod without "mode" constraint
2) do this: maas admin interface link-subnet {pod_id} {interface_id} id={link_id} mode=LINK_UP force=True subnet={desired_subnet_cidr}

Related branches

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

+ field-high, because workaround is available, but degradation of expected functionality is happening.

tags: added: field-high
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Just a clarification, this is not expected functionality provided that if a user is to attach an interface to the network, the expectation was that this would come with an IP address.

This seems like an extension of the functionality to support this. We will target this to 2.5.1.

Thanks!

Changed in maas:
milestone: none → 2.5.1
assignee: nobody → Mike Pontillo (mpontillo)
importance: Undecided → Medium
status: New → Triaged
summary: - [2.5] Error when creating pod with multiple NICs with different IP mode
+ [2.5, enhacement] Error when creating pod with multiple NICs with
+ different IP mode
Changed in maas:
status: Triaged → In Progress
Revision history for this message
Mike Pontillo (mpontillo) wrote :

I'm working on a patch to fix this, but I do have one concern.

Consider a constraint string such as:

    interfaces='eth0:space=foo;eth1:space=bar,mode=unconfigured'

In this case, MAAS will not know which subnet to link eth1 to. So you will end up with an interface on the correct L2 space, but with no knowledge of which subnet to link it to. Therefore you end up with a "disconnected" interface on no subnet.

I feel like, from the bug description, you would prefer a link to the subnet, but with an unconfigured IP mode. There are a few things I can think of that we could do to reduce the ambiguity, including one or more of the following:

 - Require the user to specify the desired subnet in the constraint string
 - Make it so that specifying `mode=unconfigured` chooses an arbitrary subnet on the `bar` space.
 - Add another mode constraint such as `mode=disconnected` to specify that the interface should not be linked to a subnet at all, even if one is specified or implied by the constraints.

Thoughts?

Revision history for this message
Mike Pontillo (mpontillo) wrote :

After discussing this, the decision was to try to make this as consistent as possible by selecting an arbitrary subnet (if available) and creating a link with no IP address.

Note: the effect on the MAAS data model in this case will be the creation of an IP address with the type STICKY and a null IP address.

Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Przemyslaw Hausman (phausman) wrote :
Download full text (6.2 KiB)

With MAAS 2.6.0 it does not work.

2019-07-01-22:33:22 root DEBUG maas root pod compose 2 hostname=vault-2 cores=4 memory=6144 storage=20.0 zone=3 interfaces=eth0:space=oam-space;eth1:space=internal-space
2019-07-01-22:33:27 root ERROR Command failed: pod compose 2 hostname=vault-2 cores=4 memory=6144 storage=20.0 zone=3 interfaces=''eth0:space=oam-space;eth1:space=internal-space''
2019-07-01-22:33:27 root ERROR b'Unable to compose machine because: Failed talking to pod: Unable to compose vault-2: error: Failed to start domain vault-2\nerror: error creating macvtap interface <email address hidden> (52:54:00:a0:82:36): Device or resource busy'
Traceback (most recent call last):
  File "/usr/local/bin/fce", line 11, in <module>
    load_entry_point('foundationcloudengine', 'console_scripts', 'fce')()
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/main.py", line 139, in entry_point
    sys.exit(main(sys.argv[1:]))
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/main.py", line 130, in main
    opts.func(opts)
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/build.py", line 65, in build_main
    args.steps)
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/build.py", line 44, in build_and_validate_if_needed
    layer.build_outer(only_steps)
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/layers/baselayer.py", line 118, in build_outer
    self.build(only_steps=only_steps)
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/layers/maaslayer.py", line 2310, in build
    super(MaasLayer, self).run_steps(only_steps)
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/layers/steppedbaselayer.py", line 51, in run_steps
    step.build()
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/layers/maaslayer.py", line 1294, in build
    zone['id'],
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/maas_cli.py", line 435, in add_pod_vm
    return cmd(maas_profile, command)
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/maas_cli.py", line 117, in cmd
    raise error
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/maas_cli.py", line 112, in cmd
    output = raw_cmd(maas_profile, split_command)
  File "/home/ubuntu/foundation/foundationcloudengine/foundationcloudengine/maas_cli.py", line 106, in raw_cmd
    return subprocess.check_output(maas_cmd)
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['maas', 'root', 'pod', 'compose', '2', 'hostname=vault-2', 'cores=4', 'memory=6144', 'storage=20.0', 'zone=3', 'interfaces=eth0:space=oam-space;eth1:space=internal-space']' returned non-zero exit status 2.

This is a debug log from /var/log/libvirt/libvirt.log. In XML, there's bond1.1166 with type='direct' and mode='bridge'.

# infra-2 (Not OK)

2019-07-01 22:33:25.508+0000: 1992: debug : virDomainAttachDeviceFl...

Read more...

Changed in maas:
assignee: Mike Pontillo (mpontillo) → nobody
status: Fix Released → New
Revision history for this message
Adam Collard (adam-collard) wrote :

> With MAAS 2.6.0 it does not work.

AFAICT, the command you're running isn't creating a pod with a different ip mode? Why do you think this is the same bug?

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

Hi Adam, thank you for looking into that issue!

I do get the same error message "error creating macvtap interface <email address hidden> (52:54:00:a0:82:36): Device or resource busy".

In both cases (mine and Vladimir's) MAAS was trying to configure bondX.NNN instead of bridge created on top of it.

When you look at the output of my 'ip link', you'll see that I have 'brinternal' bridge created on top of bond1.1166. The bridge should have been chosen by MAAS, not the vlan.

Interesting is that the issue did not occur in other infra node -- i.e. MAAS correctly picked the interface (bridge), as you can see in libvirt logs.

Hope that makes sense.

Changed in maas:
assignee: nobody → Newell Jensen (newell-jensen)
Revision history for this message
Newell Jensen (newell-jensen) wrote :

Prezemyslaw and Vladimir,

You guys need to be using the mode=unconfigured constraint as mentioned by mpontillo and from the stacktrace supplied it doesn't look like you are.

As an example, it should be something like this:

    interfaces='eth0:space=foo;eth1:space=bar,mode=unconfigured'

If is you still have issues when using mode=configured, let me know and we can reopen the bug.

Thanks,

Newell

Changed in maas:
status: New → Fix Released
Revision history for this message
Newell Jensen (newell-jensen) wrote :

Okay, I think I was suspecting the same thing as Adam was in #6 above. Taking a deeper look now.

Changed in maas:
status: Fix Released → New
status: New → Triaged
status: Triaged → In Progress
Revision history for this message
Newell Jensen (newell-jensen) wrote :

After taking a deeper look can you verify that the KVM hosts are managed by MAAS? Please see https://docs.maas.io/2.6/en/manage-kvm-host-networking for more details, notably the introduction paragraphs and the warning.

Changed in maas:
status: In Progress → Fix Released
Revision history for this message
Przemyslaw Hausman (phausman) wrote :

In our case KVM hosts have not been deployed by MAAS. These are 3 infra nodes. In fact, MAAS is installed onto those machines and later configured to create 3 pods.

Revision history for this message
Newell Jensen (newell-jensen) wrote :

Thanks for responding. Since you have MAAS installed on these three infra nodes (if I am following you correctly), there shouldn't be any issues since that makes these nodes controllers in MAAS. The only way I see this bug showing up is if you installed MAAS on this node, created the KVM host/pod, and then subsequently created the bridges without restarting the MAAS services so that MAAS did not know that it actually had a bridged interface.

Revision history for this message
Nicolas Pochet (npochet) wrote :

I could reproduce the bug described.
With the following netplan, it does not trigger the bug:
network:
    ethernets:
        ens3:
            addresses: []
            dhcp4: false
        ens9:
            addresses: []
            dhcp4: false

    bridges:
        broam:
            interfaces:
              - ens3
            addresses:
            - 192.168.105.10/24
            gateway4: 192.168.105.1
            nameservers:
                addresses:
                - 8.8.8.8
        brmaas:
            interfaces:
              - ens9
            addresses:
            - 192.168.106.10/24

    version: 2

When using a bridge on top of a VLAN like in the following netplan configuration, it triggers the bug:
network:
    ethernets:
        ens3:
            addresses: []
            dhcp4: false
        ens9:
            addresses: []
            dhcp4: false
    vlans:
        ens9.1:
            id: 1
            link: ens9
    bridges:
        broam:
            interfaces:
              - ens3
            addresses:
            - 192.168.105.10/24
            gateway4: 192.168.105.1
            nameservers:
                addresses:
                - 8.8.8.8
        brmaas:
            interfaces:
              - ens9.1
            addresses:
            - 192.168.106.10/24

    version: 2

When trying to create the VM, with the following command:
maas root pod compose 1 hostname=juju-1 cores=1 memory=1024 storage=8.0 zone=1 interfaces="eth0:space=oam-space;eth1:space=maas-space"
I get the following error:

Unable to compose machine because: Failed talking to pod: Unable to compose juju-1: error: Failed to start domain juju-1
error: error creating macvtap interface <email address hidden> (52:54:00:00:b9:9d): Device or resource busy

Even if a bridge exists and seems to be recognized by MAAS in the Controllers tab.
I'll attach a sosreport of the MAAS server.

Revision history for this message
Nicolas Pochet (npochet) wrote :

sosreport of the MAAS server attached.

Revision history for this message
Newell Jensen (newell-jensen) wrote :

Nicolas,

Thanks for reporting this. Did you try restarting the MAAS services after running netplan apply?

$ sudo systemctl restart maas-regiond && sudo systemctl restart maas-rackd

After doing that, MAAS should recommission the machine to gather the new networking information and then saving this to the database. You mentioned that you can see in the Controllers->Interfaces tab that the bridge is recognized but does it appear to be on the correct VLAN?

Revision history for this message
Newell Jensen (newell-jensen) wrote :

Additionally, you can test the branch I have up for review that is linked to the bug to see if fixes the issue for you.

Changed in maas:
status: Fix Released → In Progress
Revision history for this message
Przemyslaw Hausman (phausman) wrote :

> Did you try restarting the MAAS services after running netplan apply?

Speaking for myself -- yes, I did try restarting regiond and rackd multiple times. The issue was still there after the restart, I'm afraid.

Revision history for this message
Nicolas Pochet (npochet) wrote :

Newell,

I did restart the maas services between both tests. To be 100% sure I also rebooted the machine to re-do the test with the bridge on top of the VLAN interface.

Yes, it appeared to be on the correct VLAN.

Revision history for this message
Nicolas Pochet (npochet) wrote :

Newell,

I tried your patch by doing the following:

cd /usr/lib/python3/dist-packages/maasserver/

sudo patch -p3 --dry-run -i ~/maas.patch
sudo patch -p3 -i ~/maas.patch

I restarted all the MAAS services:

sudo systemctl restart maas-*

I even rebooted the machine.

I still get the same error:

maas root pod compose 1 hostname=juju-1 cores=1 memory=1024 storage=8.0 zone=1 interfaces="eth0:space=oam-space;eth1:space=maas-space"
Unable to compose machine because: Failed talking to pod: Unable to compose juju-1: error: Failed to start domain juju-1
error: error creating macvtap interface <email address hidden> (52:54:00:77:79:28): Device or resource busy

Revision history for this message
Newell Jensen (newell-jensen) wrote :

Nicolas,

Anyway I can get access to this system to test on it?

Revision history for this message
Nicolas Pochet (npochet) wrote :

Newell,

The VM is running on my laptop, so it might be difficult for me to provide you access.

To reproduce the issue, you might:
- Create a nested KVM VM (or have access to a physical host) with Ubuntu 18.04
- Configure the network configuration as described in #14
- Install MAAS from ppa:maas/stable
- Install KVM on it
- Allow the maas user to get access to KVM
- Create two network spaces (related to the two bridges available)
- Create a VM with two interfaces with the MAAS CLI

If you need it, I can provide you the qcow2 image of my VM.

Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Newell Jensen (newell-jensen) wrote :

All,

A new patch for this has been out for a while. If you have any issues, please let me know.

Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.