[2.7] regression: Unable to compose machine because: Failed talking to pod

Bug #1859210 reported by Alexander Balderson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Alberto Donato

Bug Description

During 2.7 rc1 release testing; our stable runs are failing while trying to create machines on the pods:

2020-01-10-09:38:26 root DEBUG maas root pods read
2020-01-10-09:38:27 root DEBUG maas root zones read
2020-01-10-09:38:28 root DEBUG maas root pods create type=virsh name=leafeon power_address=qemu+ssh://maas@10.244.40.30/system zone=default
2020-01-10-09:38:32 root DEBUG maas root pod update 1 memory_over_commit_ratio=10 cpu_over_commit_ratio=10
2020-01-10-09:38:35 root DEBUG maas root machines read
2020-01-10-09:38:36 foundationcloudengine.layers.maaslayer INFO Creating elastic-1 in leafeon
2020-01-10-09:38:36 root DEBUG maas root pod compose 1 hostname=elastic-1 cores=2 memory=24576 storage=500.0 zone=1
2020-01-10-09:38:40 root ERROR Command failed: pod compose 1 hostname=elastic-1 cores=2 memory=24576 storage=500.0 zone=1
2020-01-10-09:38:40 root ERROR b'Unable to compose machine because: Failed talking to pod: Unable to compose elastic-1: error: Failed to start domain elastic-1\nerror: Unable to add bridge eno1 port vnet0: Operation not supported'

You can see the direct calls made in the log.

Related branches

tags: added: cdo-qa foundations-engine
Revision history for this message
Alberto Donato (ack) wrote :

hi, could you please attach "ip a" output on the maas node and a "virsh net-dumxml" for the virsh network being used by nodes?

Changed in maas:
status: New → Incomplete
Revision history for this message
Alexander Balderson (asbalderson) wrote :
Download full text (6.5 KiB)

ubuntu@leafeon:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master broam state UP group default qlen 1000
    link/ether 1c:98:ec:21:98:54 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1e98:ecff:fe21:9854/64 scope link
       valid_lft forever preferred_lft forever
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 1c:98:ec:21:98:55 brd ff:ff:ff:ff:ff:ff
4: eno49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 8c:dc:d4:b3:08:0c brd ff:ff:ff:ff:ff:ff
    inet 10.245.214.65/20 brd 10.245.223.255 scope global eno49
       valid_lft forever preferred_lft forever
    inet6 fe80::8edc:d4ff:feb3:80c/64 scope link
       valid_lft forever preferred_lft forever
5: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 1c:98:ec:21:98:56 brd ff:ff:ff:ff:ff:ff
6: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 1c:98:ec:21:98:57 brd ff:ff:ff:ff:ff:ff
7: eno50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 8c:dc:d4:b3:08:0d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::8edc:d4ff:feb3:80d/64 scope link
       valid_lft forever preferred_lft forever
8: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:11:0a:68:67:c8 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::211:aff:fe68:67c8/64 scope link
       valid_lft forever preferred_lft forever
9: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:11:0a:68:67:c9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::211:aff:fe68:67c9/64 scope link
       valid_lft forever preferred_lft forever
10: broam: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 1c:98:ec:21:98:54 brd ff:ff:ff:ff:ff:ff
    inet 10.244.40.30/21 brd 10.244.47.255 scope global broam
       valid_lft forever preferred_lft forever
    inet 10.244.40.34/32 brd 10.244.47.255 scope global broam
       valid_lft forever preferred_lft forever
    inet6 fe80::1e98:ecff:fe21:9854/64 scope link
       valid_lft forever preferred_lft forever
11: brinternal: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 8c:dc:d4:b3:08:0d brd ff:ff:ff:ff:ff:ff
    inet 192.168.33.21/25 brd 192.168.33.127 scope global brinternal
       valid_lft forever preferred_lft forever
    inet6 fe80::8edc:d4ff:feb3:80d/64 scope link
       valid_lft forever preferred_lft forever
12: eno50.2733@eno50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brinternal state UP group default qlen 1000
    link/ether 8c:dc:d4:b3:08:0d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::8ed...

Read more...

Changed in maas:
status: Incomplete → New
Changed in maas:
milestone: none → 2.7.0rc2
Revision history for this message
Alberto Donato (ack) wrote :

Could you please also attach the "machine read" output for the machine hosting the pod?

Changed in maas:
status: New → Incomplete
Revision history for this message
Alberto Donato (ack) wrote :

Also, could you please detail how is the virsh networking (and the broam bridge and its interfaces) set up ?

I tried to reproduce the setup with a container with an extra interface, added that to the "broam" bridge and copied the virsh network definition.
Enabled dhcp in maas for that subnet. I can compose the VM but commissioning times out so I guess there's something wrong in my network setup.

Revision history for this message
Alexander Balderson (asbalderson) wrote :
Download full text (3.5 KiB)

I think these netplan files should cover the networking setup

$ cat 50-cloud-init.yaml
# This file is generated from information provided by the datasource. Changes
# to it will not persist across an instance reboot. To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    bridges:
        brinternal:
            addresses:
            - 192.168.33.21/25
            interfaces:
            - eno50.2733
            macaddress: 8c:dc:d4:b3:08:0d
            mtu: 1500
            nameservers:
                addresses:
                - 10.245.208.5
                search:
                - maas
                - prod.solutionsqa
            parameters:
                forward-delay: 15
                stp: false
        broam:
            addresses:
            - 10.244.40.30/21
            interfaces:
            - eno1
            macaddress: 1c:98:ec:21:98:54
            mtu: 1500
            nameservers:
                addresses:
                - 10.245.208.5
                search:
                - maas
                - prod.solutionsqa
            parameters:
                forward-delay: 15
                stp: false
            routes:
            - metric: 0
                to: 10.245.160.0/21
                via: 10.244.40.1
            - table: 1
                to: 0.0.0.0/0
                via: 10.244.40.1
            routing-policy:
            - from: 10.244.40.0/21
                priority: 100
                table: 1
            - from: 10.244.40.0/21
                table: 254
                to: 10.244.40.0/21
    ethernets:
        eno1:
            match:
                macaddress: 1c:98:ec:21:98:54
            mtu: 1500
            set-name: eno1
        eno2:
            match:
                macaddress: 1c:98:ec:21:98:55
            mtu: 1500
            set-name: eno2
        eno3:
            match:
                macaddress: 1c:98:ec:21:98:56
            mtu: 1500
            set-name: eno3
        eno4:
            match:
                macaddress: 1c:98:ec:21:98:57
            mtu: 1500
            set-name: eno4
        eno49:
            addresses:
            - 10.245.214.85/20
            gateway4: 10.245.208.5
            match:
                macaddress: 8c:dc:d4:b3:08:0c
            mtu: 1500
            nameservers:
                addresses:
                - 10.245.208.5
                search:
                - maas
                - prod.solutionsqa
            routes:
            - metric: 0
                to: 192.168.224.0/22
                via: 10.245.208.1
            set-name: eno49
        eno50:
            match:
                macaddress: 8c:dc:d4:b3:08:0d
            mtu: 1500
            set-name: eno50
        ens1f0:
            match:
                macaddress: 00:11:0a:68:67:c8
            mtu: 1500
            set-name: ens1f0
        ens1f1:
            match:
                macaddress: 00:11:0a:68:67:c9
            mtu: 1500
            set-name: ens1f1
    version: 2
    vlans:
        eno50.2733:
            id: 2733
    ...

Read more...

Changed in maas:
status: Incomplete → Opinion
status: Opinion → New
Revision history for this message
Alberto Donato (ack) wrote :

Thanks, could you also attach the output from "node read" for the maas node and maas logs?

Changed in maas:
status: New → Incomplete
Revision history for this message
Alexander Balderson (asbalderson) wrote :

I'm including updated output since its from a different test:

2020-01-16-22:44:36 root DEBUG maas root pods create type=virsh name=meinfoo power_address=qemu+ssh://maas@10.244.40.31/system zone=zone2
2020-01-16-22:44:39 root DEBUG maas root pod update 2 memory_over_commit_ratio=10 cpu_over_commit_ratio=10
2020-01-16-22:44:42 root DEBUG maas root machines read
2020-01-16-22:44:43 foundationcloudengine.layers.maaslayer INFO Creating elastic-2 in meinfoo
2020-01-16-22:44:43 root DEBUG maas root pod compose 2 hostname=elastic-2 cores=2 memory=24576 storage=500.0 zone=2
2020-01-16-22:44:47 root ERROR Command failed: pod compose 2 hostname=elastic-2 cores=2 memory=24576 storage=500.0 zone=2
2020-01-16-22:44:47 root ERROR b'Unable to compose machine because: Failed talking to pod: Unable to compose elastic-2: error: Failed to start domain elastic-2\nerror: Unable to add bridge eno1 port vnet0: Operation not supported'

here is the node read:
https://pastebin.ubuntu.com/p/P2DH2XFqj2/

and the logs from the run are attached.

Changed in maas:
status: Incomplete → New
Revision history for this message
Alberto Donato (ack) wrote :

Hi, sorry, maybe it wasn't very clear from my previous message, I was asking for the "node read" on the machine running the maas rackcontroller (where the pod is running). I think the pastebin is for the VM?

Revision history for this message
Alberto Donato (ack) wrote :

Looking at the db (imported the dump from the logs) it seems there's an issue in maas view of interfaces on two of the three nodes: https://paste.ubuntu.com/p/BY2kQZg2YM/

On those, eno1 is reported both as a phyisical interface and a bridge.
This is probably be related to LP:#1847794, where due to an issue with refreshing interfaces on the controller there are leftover entries.

During FCE setup, does maas get installed after or before the interfaces/bridges are configured?

Revision history for this message
Alexander Balderson (asbalderson) wrote :

maas gets deployed AFTER the setup of the interfaces and bridges.
we use a meta-maas to deploy the 3 maas nodes, then install the maas for testing ontop of them.

Alberto Donato (ack)
Changed in maas:
status: New → Triaged
Revision history for this message
Alexander Balderson (asbalderson) wrote :

here is a log with extra logging.

Alberto Donato (ack)
Changed in maas:
assignee: nobody → Alberto Donato (ack)
importance: Undecided → Critical
importance: Critical → High
Alberto Donato (ack)
Changed in maas:
status: Triaged → In Progress
Revision history for this message
Alberto Donato (ack) wrote :

Thanks for attaching logs, but it doesn't seem this run failed with the error from the bug?

Do you happen to have logs for a case where it fails?

Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.