multipathd bcache disks do not get picked up by multipath-tools during boot

Bug #1887558 reported by Mirek
60
This bug affects 9 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Alexsander de Souza
3.3
Fix Committed
High
Alexsander de Souza
3.4
Fix Released
High
Alexsander de Souza
bcache-tools
New
Undecided
Unassigned
bcache-tools (Ubuntu)
New
Undecided
Unassigned

Bug Description

[ Impact ]

When using multipathd disks as bcache, multipath-tools does not pick the disks up at boot time.

This is due to an interaction between bcache-tools and multipath-tools. The udev rules for bcache-tools, specifically /lib/udev/rules.d/69-bcache.rules occupies the disk during boot so multipath-tools fails to interact with the disks.

[ Test Case ]

1) get an image
$ wget https://cdimage.ubuntu.com/ubuntu-server/daily-live/pending/noble-live-server-amd64.iso

2) Create bcache disk
$ fallocate -l 20G image.img
$ make-bcache -C image.img

3) Boot into iso with multipathed bcache disk
$ kvm -m 2048 -boot d -cdrom ./noble-live-server-amd64.iso -device virtio-scsi-pci,id=scsi -drive file=image.img,if=none,id=sda,format=raw,file.locking=off -device scsi-hd,drive=sda,serial=0001 -drive if=none,id=sdb,file=image.img,format=raw,file.locking=off -device scsi-hd,drive=sdb,serial=0001

Select "Try or install Ubuntu" wait for the OS to load. Press F2 to escape to a shell to check multipathed devices.

Related branches

Revision history for this message
Lee Trager (ltrager) wrote :

MAAS does not currently support multipath devices. Curtin, the tool MAAS uses to perform installations, does. We'd have to add support for gathering information on multipath devices during commissioning, a way to properly configure multipath storage, and a way to test multipath devices.

Changed in maas:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Mirek (mirek186) wrote :

I guess you could either probe for any mapper devices and detect multipath this way or do it over a serial number when collecting info about storage if the same serial more then once then just grab the first one or look for a mapper device.
Do you know where in the code it is so I could have a look myself?

Revision history for this message
Lee Trager (ltrager) wrote :

I looked at adding support for multipath on IBM Z. In that case each LPAR has access to mutlipath devices from all other LPARs. There is no way to locally determine which multipath device is for which LPAR, nor is there a way to determine if any are in use. IBM told me that its expected that a storage administrator tells users which multipath device to use. They were adding an API to get that information but it would still be configured manually in the JBOD.

The code for storage is in various places and would require alot of work to add multipath support

* Storage data actually comes from LXD - https://github.com/lxc/lxd/blob/master/lxd/resources/storage.go
* The data is processed in the metadata server - https://git.launchpad.net/maas/tree/src/metadataserver/builtin_scripts/hooks.py
* It has to be modeled. We model block devices and physical block devices. For multipath we'd most likely need to design a new model - https://git.launchpad.net/maas/tree/src/maasserver/models
* The API would have to be updated to interact with the new model - https://git.launchpad.net/maas/tree/src/maasserver/api/blockdevices.py
* The websocket would also have to be updated - https://git.launchpad.net/maas/tree/src/maasserver/websockets/handlers/node.py
* The preseed, which generates Curtin config, would require changes - https://git.launchpad.net/maas/tree/src/maasserver/preseed_storage.py
* We'd also want to update the UI to show that this is a multipath device - https://github.com/canonical-web-and-design/maas-ui

Lee Trager (ltrager)
Changed in maas:
assignee: nobody → Lee Trager (ltrager)
milestone: none → 2.10.0
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 3.0.0 → 3.0-beta1
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Mirek (mirek186) wrote :

Hi, sorry for the late reply, but the issue is still there. The machine-resource binary is still only listing each disk. I think the correct way of doing it would be first to check multipath, e.g. multipath -ll and if any disks found there you would have to discard them from standard SATA disk output. I can see MAAS is now using LXD resource API and they don't care about multipath so maybe fixing it there would fix it for MAAS at the same time.

Revision history for this message
Carlos Bravo (carlosbravo) wrote :

Hi, as of MAAS 3.1 and 3.2 Beta the issue is still present. I have several multipath devices and they still show multiple devices instead of an mpath device.

Revision history for this message
Junien F (axino) wrote :

Hi,

This is still a bug in MAAS 3.2.6. This will be a problem for PS6.

Thanks !

Changed in maas:
status: Fix Released → Triaged
importance: Wishlist → Undecided
assignee: Lee Trager (ltrager) → nobody
milestone: 3.0.0-beta1 → none
importance: Undecided → Wishlist
Changed in maas:
importance: Wishlist → Medium
milestone: none → 3.4.0
Revision history for this message
Björn Tillenius (bjornt) wrote :

Junien, and Mirek, could you both please attach the output of machine-resources and 'multipath -ll'? It would be good to see if both your setups require the same fix.

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Junien F (axino) wrote :
Changed in maas:
status: Incomplete → Triaged
Revision history for this message
Andy Wu (qch2012) wrote :

we have the this issue during PS6 deployment , MAAS 3.2.6 list all multipath devices separately, so it is not possible to configure bcache using multipath directly in MAAS

The workaround (credit to Junien Fririck) is to do the bache config in post-deployment, as following:

1. make-bcache -C /dev/nvme6n1p1 -B /dev/mapper/mpatha --writeback
2. copy /lib/udev/rules.d/69-bcache.rules to /etc/udev/rules.d/ folder, modify the rules to not register backing device (eg sd*) , so those device can still be used by multipathd

    cp /lib/udev/rules.d/69-bcache.rules /etc/udev/rules.d/.
    sed -i 's/sr\*/sr\*|sd\*/' /etc/udev/rules.d/69-bcache.rules

    # line 7 of modified rules look like this
    KERNEL=="fd*|sr*|sd*", GOTO="bcache_end

3. normally if bache is configured in MAAS, curtin will create udev rules to link the bache to /dev/disk/by-dname , here we need to do it manually

    uuid=$(bcache-super-show /dev/mapper/mpatha | grep dev.uuid | awk '{print $2}')
    echo "SUBSYSTEM==\"block\", ACTION==\"add|change\", ENV{CACHED_UUID}==\"$uuid\", SYMLINK+=\"disk/by-dname/bcache-osd1\"" > /etc/udev/rules.d/bcache-osd1.rules

4. trigger udev rules

    sudo udevadm trigger

5. update initramfs

    update-initramfs -u -k all

Revision history for this message
Junien F (axino) wrote :

Note that this workaround only works if you don't have sdX devices as part of bcache (in our case, we only have nvme* and mpath* devices). If you have both mpath and sdX devices as part of bcache, then you'd need something more elaborate.

tags: added: bug-council canonical-bootstack
Changed in maas:
importance: Medium → High
tags: removed: bug-council
Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

This bug is caused by /lib/udev/rules.d/69-bcache.rules (from bcache-tools) grabbing all block devices where ID_FS_TYPE=bcache, including the disks backing the multipath device. Latter when multipathd daemon comes up and tries to initialize the mpath devices, it fails because those disks are already in use.

no longer affects: bcache-tools
Changed in maas:
assignee: nobody → Alexsander de Souza (alexsander-souza)
Alberto Donato (ack)
Changed in maas:
milestone: 3.4.0 → 3.4.x
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in bcache-tools (Ubuntu):
status: New → Confirmed
Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote :

Hi, can we get an update on this bug ?

Revision history for this message
Paride Legovini (paride) wrote :

Hi @Alexsander, my understanding of comment 12 is that it's bcache-tools is claiming some devices it shouldn't claim, however I see that you removed the bcache-tools bug task shortly after. Looks like you had an idea on where this should be fixed; if this is the case can you share more about it? Thanks!

Changed in bcache-tools (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

Hi @paride, I removed the bcache-tools task unintentionally, we don't have a good solution to work-around this bcache-tools behaviour.

Changed in bcache-tools (Ubuntu):
status: Incomplete → New
Changed in maas:
status: Triaged → Fix Committed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hey,
Thorsten was so kind to reach out and ask for help, this is a tricky situation.

The issue seems to be outlined, of bdev udev rules being greedy and grabbing all devices before multipath service kicks in.

Sadly there isn't a great proposal or solution yet.
Nor have I read about an easy way to reproduce.

Discussions seem to happen in MM ~maas-bcache-tools-jbod-multipath-bug but I haven't joined that yet.

In regard to testing I'd expect a modified variant of my multipath/tgt autopkgtest should work using iscsi to create multiple paths to the same disk - which we then can set up as bcache and check the behavior (without having full systems deployed via maas).

tagging it for server team to have a look but until there is a breakthrough to find an approach I'm unsure about progress expectations.

tags: added: server-todo
Changed in maas:
milestone: 3.4.x → 3.5.0
no longer affects: maas/3.2
Changed in maas:
milestone: 3.5.0 → 3.5.0-beta1
status: Fix Committed → Fix Released
Bryce Harrington (bryce)
tags: added: server-triage-discuss
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

> I'd expect a modified variant of my multipath/tgt autopkgtest should work using iscsi to create multiple
> paths to the same disk

We have been chasing such a setup for a while (Sergio and I), but didn't manage to get it working. Do you have a simple recipe for such a setup, Christian?

Bryce Harrington (bryce)
tags: removed: server-triage-discuss
Revision history for this message
Mitchell Dzurick (mitchdz) wrote :
Download full text (3.4 KiB)

I attempted to reproduce this issue in a VM and wasn't succesful. I essentially copied the tgtbasedmpaths test but made the backing disk bcache cache before setting up the targets.

1) create vm
$ lxc launch ubuntu-daily:noble --vm n-vm
$ lxc shell n-vm
# apt install -y lsscsi multipath-tools open-iscsi tgt

2) setup virtual multipathd disk
```
targetname="iqn.2016-11.foo.com:target.iscsi"
cwd=$(pwd)
testdir="/mnt/tgtmpathtest"
localhost="127.0.0.1"
portal="${localhost}:3260"
maxpaths=4
backfn="backingfile"
expectwwid="60000000000000000e00000000010001"
testdisk="/dev/disk/by-id/wwn-0x${expectwwid}"

### Setup mpath devices

# Restart tgtd to make sure modules are all loaded
service tgt restart || echo "Failed to restart tgt" >&2

# prep SINGLE test file
truncate --size 100M ${backfn}

make-bcache -C ${backfn}

# create target
tgtadm --lld iscsi --op new --mode target --tid 1 -T "${targetname}"
# allow all tomake-bcache -C ${backfn}
 bind the target
tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL
# set backing file
tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b "${cwd}/${backfn}"

# scan for targets (locally)
iscsiadm --mode discovery --type sendtargets --portal ${localhost}

# login
echo "login #1"
iscsiadm --mode node --targetname "${targetname}" --portal ${portal} --login
# duplicate this session (always 1)
for i in $(seq 2 ${maxpaths})
do
    echo "extra login #${i}"
    iscsiadm --mode session -r 1 --op new
done

udevadm settle
sleep 5 # sleep a bit to allow device to be created.
```

And can confirm the disks are using bcache FS by:

# udevadm info /dev/dm-0 | grep ID_FS_TYPE
E: ID_FS_TYPE=bcache

Things look fine and as they should to me
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 10G 0 disk
├─sda1 8:1 0 9G 0 part /
├─sda14 8:14 0 4M 0 part
├─sda15 8:15 0 106M 0 part /boot/efi
└─sda16 259:0 0 913M 0 part /boot
sdb 8:16 0 100M 0 disk
└─mpatha 252:0 0 100M 0 mpath
sdc 8:32 0 100M 0 disk
└─mpatha 252:0 0 100M 0 mpath
sdd 8:48 0 100M 0 disk
└─mpatha 252:0 0 100M 0 mpath
sde 8:64 0 100M 0 disk
└─mpatha 252:0 0 100M 0 mpath
# multipath -ll
mpatha (360000000000000000e00000000010001) dm-0 IET,VIRTUAL-DISK
size=100M features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 10:0:0:1 sde 8:64 active ready running
|-+- policy='service-time 0' prio=1 status=enabled
| `- 7:0:0:1 sdb 8:16 active ready running
|-+- policy='service-time 0' prio=1 status=enabled
| `- 8:0:0:1 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 9:0:0:1 sdd 8:48 active ready running
# ls /dev/mapper
control mpatha

I'm not sure if I'm setting up this test incorrectly or not.

In fact if I run `udevadm test` I see the bcache udev rules fail.
# udevadm test /dev/mapper/mpatha
< cut here >
dm-0: /usr/lib/udev/rules.d/69-bcache.rules:21 RUN 'kmod load bcache'
dm-0: /usr/lib/udev/rules.d/69-bcache.rules:22 RUN 'bcache-register $tempnode'
dm-0: /usr/lib/udev/rules.d/69-bcache.rules:26 Importing properties from results of 'bcache-export-...

Read more...

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Fundamentally my setup is different than the scenario set up in the bug. I was initially confused why multipathing was even happening, so I will explain the scenario here. In case someone else is also confused.

The hardware is a JBOD enclosure where each disk has redundant I/O, hence the multipathing.

On boot, multipath correctly groups
* /dev/sd{a,b} -> /dev/mapper/mpatha
* /dev/sd{c,d} -> /dev/mapper/mpathb
* /dev/sd{e,f} -> /dev/mapper/mpathc
* /dev/sd{g,h} -> /dev/mapper/mpathd

At this point, the system has 4 NVME drives to use as the caching device, where the JBOD drives are the backing device.
make-bcache -C /dev/nvme0n1p1 -B /dev/mapper/mpatha --writeback
make-bcache -C /dev/nvme1n1p1 -B /dev/mapper/mpathb --writeback
make-bcache -C /dev/nvme2n1p1 -B /dev/mapper/mpathc --writeback
make-bcache -C /dev/nvme3n1p1 -B /dev/mapper/mpathd --writeback

At this point, everything is happy. We have multipathd devices inside a JBOD enclosure, where the physical disks are tagged as bcache devices.

The issue seems to be happening when you reboot the system, and the bcache udev rules grab the drives and multipath fails to make mpath{a,b,c,d} - because the drives are in use by bcache.

I don't think I see this on my system because this is an issue during boot when multipathd is not running.

With that said, the fact that the disks are part of a JBOD enclosure I don't think is an issue here, rather on boot disks with redundant i/o that are marked with bcache FS do not get initialized correctly, so I think there is still a way to reproduce this in a VM which would be nice so we can port this into a proper dep8 tests.

The problem with reboot testing in my setup is that on boot the virtual iscsi device is not available, I have to login manually. I'm not the most familiar with tgtadm and iscsiadm so not sure if these tools can make the device available at boot time which seems to be needed for this test.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

I have successfully created a kvm based reproducer.

1) get an image
$ wget https://cdimage.ubuntu.com/ubuntu-server/daily-live/pending/noble-live-server-amd64.iso

2) Create bcache disk
$ fallocate -l 20G image.img
$ make-bcache -C image.img

3) Boot into iso with multipathed bcache disk
$ kvm -m 2048 -boot d -cdrom ./noble-live-server-amd64.iso -device virtio-scsi-pci,id=scsi -drive file=image.img,if=none,id=sda,format=raw,file.locking=off -device scsi-hd,drive=sda,serial=0001 -drive if=none,id=sdb,file=image.img,format=raw,file.locking=off -device scsi-hd,drive=sdb,serial=0001

Select "Try or install Ubuntu" wait for the OS to load. Press F2 to escape to a shell to check multipathed devices.

You will see exactly what this bug describes - /dev/sd{a,b} is present with no multipathing.

Redo these steps without make-bcache on image.img and you will see a proper multipath /dev/mapper/mpatha present.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Any objection to renaming this bug title to something more accurate such as "multipathd bcache disks do not get picked up by multipath-tools during boot"?

summary: - Multipath JBOD storage devices are not shown via /dev/mapper but each
- path as a single device.
+ multipathd bcache disks do not get picked up by multipath-tools during
+ boot
description: updated
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.