/dev/by-dname symlinks based on wwn/wwid

Bug #1735839 reported by Dmitrii Shcherbakov
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
curtin
Fix Released
High
Unassigned

Bug Description

Currently /dev/disk/by-dname symlinks are not created if a device does not have a partition table or a super block.

There needs to be a way to create auto-create by-dname symlinks based on unique wwn/eui identifiers and symlinks via udev rules installed during the deployment stage.

Old description:
Block devices can be identified by a GPT placed on them.

Each partition table contains a UUID which uniquely identifies a device.

sudo blkid -o udev -p /dev/nvme0n1
ID_PART_TABLE_UUID=8c5e98a4-0d1e-40ae-8f43-80544a2c204b
ID_PART_TABLE_TYPE=gpt

gdisk:
Command (? for help): p
Disk /dev/nvme0n1: 1000215216 sectors, 476.9 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 8C5E98A4-0D1E-40AE-8F43-80544A2C204B

We could leverage that for software that needs to create its own partitions after deployment in MAAS for example. In this case there will be no need to create a new partition table post-deployment but there will be no partitions or file systems on a block device which will be ready for any use.

Example: Ceph needs to use block devices to create partitions but "zapping" of a partition table is configurable. If a partition table is pre-zapped during the deployment and a new clean one is created ceph-disk can just create partitions on an existing partition table. This may help us workaround this: pad.lv/1604501

Serial numbers are not super reliable as they depend on a block driver while in this case we rely on a superblock written at the deployment time. Even if we have an odd device that doesn't have reliable means of identification we can rely on a persistent UUID we ourselves put onto that device.

MAAS needs to grow an ability to configure empty block devices which will get a clean partition table at will too.

I think it's worthwhile to create a udev rule for ubuntu server in general: /dev/disk/by-gpt-uuid/<uuid>

Tags: cpe-onsite

Related branches

description: updated
Ryan Harper (raharper)
Changed in curtin:
importance: Undecided → Wishlist
status: New → Triaged
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

For dname links and regular disk devices it looks like ID_PART_TABLE_UUID is already used:

https://bazaar.launchpad.net/~curtin-dev/curtin/trunk/view/head:/curtin/commands/block_meta.py#L229
def make_dname(volume, storage_config):
...
    if vol.get('type') == "disk":
        rule.append(compose_udev_equality('ENV{DEVTYPE}', "disk"))
        rule.append(compose_udev_equality('ENV{ID_PART_TABLE_UUID}', ptuuid))

For other devices this is not present by default but they have other identification methods.

   elif vol.get('type') == "bcache":
        rule.append(compose_udev_equality("ENV{DEVNAME}", path))

Still, this is not usable for disks without partitions from MAAS.

Lee Trager (ltrager)
Changed in maas:
status: New → Triaged
importance: Undecided → Wishlist
milestone: none → 2.4.x
Revision history for this message
Scott Moser (smoser) wrote :

The issue with us provding /dev/disk/by-dname/ links based on content inside the disk is that the content can move. Ultimately when we added 'by-dname' links, we were intending it to be "disk name". Ie, the user would provide a name for the disk in maas and then could assume that /dev/disk/by-dname/<name> would point to that disk.

The implementation, however, ended up relying on contents of the disk.

The problem with this approach is that the contents can move.

Consider:
 a.) user boots system with disks named 'blue' and 'red' and 'boot'. Red is initially provisioned as empty.
 b.) user moves data from 'blue' to 'red'. exactly how doesn't matter, but consider 'dd if=/dev/disk/by-dname/blue of=/dev/disk/by-dname/red'. and then wipes /dev/disk/by-dname/blue.
 c.) user reboots
 d.) user formats /dev/disk/by-dname/blue to put new data on it.

If the data *on* blue was used to identify it, the user will have just destroyed their data, as /dev/disk/by-dname/blue will point to the 'red' disk.

What we decided was that to work our way out of this issue, we should/would provide a /dev/disk/by-<something> that would *only* consider serial numbers or wwid as basis for its names.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Having dnames tied to wwids (/sys/class/block/<nvme-name>/wwid) would be a good solution it seems as currently there are no dname symlinks at all unless there is some form of a superblock present on a device.

The use-case for that is software (SDS like Ceph, databases etc.) which manages raw block devices and does not expect partition tables or any other superblocks on them and may error out or delete superblocks/ptables that are already present.

This happens with charm-based ceph deployments for example - charms error out if they detect a non-pristine device (the one that has some lower LBAs occupied by non-zero values). So pre-creating a partition table to trigger dname logic is not a universal workaround.

Juju storage also depends on /dev/disk/by-dname symlink availability as this is the only way to reliably retrieve a persistent device path from MAAS as cloud provider.

summary: - feature request: /dev/by-dname symlinks based on GUID patition table
- UUIDs
+ feature request: /dev/by-dname symlinks based on wwn/wwid
summary: - feature request: /dev/by-dname symlinks based on wwn/wwid
+ /dev/by-dname symlinks based on wwn/wwid
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (5.4 KiB)

subscribed ~field-critical

Considering https://bugs.launchpad.net/juju/+bug/1691694 and some other design issues which make Juju storage not applicable for day 2 operations when used with MAAS for now this lp has been re-qualified into a bug by FE as, besides Juju storage, there is no way to logically tie dnames in MAAS with wwn/eui identifiers (stored as idpath parameter) and use them in charm config values to identify storage devices without partition tables/superblocks in a persistent way.

There are two types of NVME devices below SSDPE21K375GA and SSDPE2KX040T7 while they all appear as /dev/nvme<something>. dnames pointing to persistent symlinks are needed for them to be logically meaningful.

ls -al /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 1040 Jul 23 09:55 .
drwxr-xr-x 8 root root 160 Jul 23 09:55 ..
lrwxrwxrwx 1 root root 9 Jul 23 10:02 ata-XF1230-1A0960_7CW009J0 -> ../../sdb
lrwxrwxrwx 1 root root 10 Jul 23 10:02 ata-XF1230-1A0960_7CW009J0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Jul 23 10:02 ata-XF1230-1A0960_7CW009J0-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Jul 23 10:02 ata-XF1230-1A0960_7CW009J0-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Jul 23 10:02 ata-XF1230-1A0960_7CW009J0-part4 -> ../../sdb4
lrwxrwxrwx 1 root root 9 Jul 23 10:02 ata-XF1230-1A0960_7CW009NY -> ../../sda
lrwxrwxrwx 1 root root 10 Jul 23 10:02 ata-XF1230-1A0960_7CW009NY-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jul 23 10:02 ata-XF1230-1A0960_7CW009NY-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jul 23 10:02 ata-XF1230-1A0960_7CW009NY-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Jul 23 10:02 ata-XF1230-1A0960_7CW009NY-part4 -> ../../sda4
lrwxrwxrwx 1 root root 9 Jul 23 10:02 md-name-fnos-nvme08:0 -> ../../md0
lrwxrwxrwx 1 root root 9 Jul 23 10:02 md-name-fnos-nvme08:1 -> ../../md1
lrwxrwxrwx 1 root root 9 Jul 23 10:02 md-name-fnos-nvme08:2 -> ../../md2
lrwxrwxrwx 1 root root 9 Jul 23 10:02 md-name-fnos-nvme08:3 -> ../../md3
lrwxrwxrwx 1 root root 9 Jul 23 10:02 md-uuid-09b65367:fd20d811:43cd7b64:9e775cbb -> ../../md0
lrwxrwxrwx 1 root root 9 Jul 23 10:02 md-uuid-7242c36e:8e717cf2:528c60ba:2cfd1711 -> ../../md2
lrwxrwxrwx 1 root root 9 Jul 23 10:02 md-uuid-7839e948:130f8c65:0ba12c3c:6d8931d3 -> ../../md1
lrwxrwxrwx 1 root root 9 Jul 23 10:02 md-uuid-bb9db413:cd06ea5b:8d689c5a:d2fa83cd -> ../../md3
lrwxrwxrwx 1 root root 13 Jul 23 10:02 nvme-INTEL_SSDPE21K375GA_PHKE730200GL375AGN -> ../../nvme8n1
lrwxrwxrwx 1 root root 13 Jul 23 10:02 nvme-INTEL_SSDPE2KX040T7_PHLF802000Y84P0IGN -> ../../nvme4n1
lrwxrwxrwx 1 root root 13 Jul 23 10:02 nvme-INTEL_SSDPE2KX040T7_PHLF802001054P0IGN -> ../../nvme1n1
lrwxrwxrwx 1 root root 13 Jul 23 10:02 nvme-INTEL_SSDPE2KX040T7_PHLF802101JV4P0IGN -> ../../nvme9n1
lrwxrwxrwx 1 root root 13 Jul 23 10:02 nvme-INTEL_SSDPE2KX040T7_PHLF802101JX4P0IGN -> ../../nvme5n1
lrwxrwxrwx 1 root root 13 Jul 23 10:02 nvme-INTEL_SSDPE2KX040T7_PHLF802101KJ4P0IGN -> ../../nvme0n1
lrwxrwxrwx 1 root root 13 Jul 23 10:02 nvme-INTEL_SSDPE2KX040T7_PHLF802101RP4P0IGN -> ../../nvme7n1
lrwxrwxrwx 1 root root 13 Jul 23 10:02 nvme-INTEL_SSDPE2KX040T7_PHLF802101RT4P0IGN -> ../../nv...

Read more...

description: updated
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I may have found a workaround for a very specific case I am working on which may drop this to ~field-high:

1) naming a device in MAAS (e.g. nvme9n1p1 -> optane);
2) creating a whole-disk partition (which results in a partition table with a partition UUID being created);
3) deploying a node.

This results in a partition uuid-based symlink (no device symlink though) being created by curtin.

$ ls -al /dev/disk/by-dname/ | grep nvme
lrwxrwxrwx 1 root root 15 Jul 23 12:08 optane-part1 -> ../../nvme9n1p1

Using this as a workaround may only work for ceph-volume-based charm-ceph-osd deployments where it does not matter if a device special file is a partition or the whole device as LVM is used to slice it into chunks.

I will update this bug accordingly if I am able to work with that.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I am going to set this to ~field-high now as all-disk partition approach allowed the project affected by this to be moved forward.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (3.5 KiB)

Based on https://bugs.launchpad.net/curtin/+bug/1735839/comments/2

by-dname was intended to be "by device name" and ended up to be "by data name"

Curtin docs promise disk naming and partition naming:

https://curtin.readthedocs.io/en/latest/topics/storage.html
"If the name key is present, curtin will create a udev rule that makes a symbolic link to the disk with the given name value. This makes it easy to find disks on an installed system. The links are created in /dev/disk/by-dname/<name>. A link to each partition on the disk will also be created at /dev/disk/by-dname/<name>-part<number>..."

> "What we decided was that to work our way out of this issue, we should/would provide a /dev/disk/by-<something> that would *only* consider serial numbers or wwid as basis for its names."

I think "device name" vs "data name" usage should depend on an application and we need 2 symlink types:

* a "devicename" symlink should mean a wwn or serial number symlink for hw devices themselves, for virtual devices (partitions, bcache, md-raid etc.) - a superblock unique ID symlink;
* a "dataname" symlink should mean a GPT UUID/MBR disk identifier symlink for hw devices with a partition table, partition UUID symlink for partitions, superblock UUIDs for virtual devices.

There are several classes of block devices that I see:

1) hardware block devices with a unique factory identifier (wwn, EUI etc.);

lrwxrwxrwx 1 root root 14 Jul 25 16:28 nvme-eui.01000000010000005cd2e407a30d4f51 -> ../../nvme10n1
lrwxrwxrwx 1 root root 9 Jul 25 16:28 wwn-0x5000c500813becef -> ../../sda

2) hardware-level virtual block devices (created by RAID controllers) - WWN stability will depend on the raid controller's implementation but generally a RAID-controller superblock is written to every device in a RAID.

journalctl -k | grep -i lsi
JUN 08 16:38:57 ubuntu kernel: scsi 0:2:1:0: Direct-Access LSI MR9271-8i 3.46 PQ: 0 ANSI: 5

lspci | grep -i lsi
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05)

tree /dev/disk/by-id/ | grep sda
├── scsi-3600605b006deb2f0ff00001f01ff0cdb -> ../../sda
└── wwn-0x600605b006deb2f0ff00001f01ff0cdb -> ../../sda

3) virtual block devices. Availability of persistent and unique identifiers depends on hypervisor support and persistent configuration that a hypervisor provides (libvirt domain xml files -> qemu args, VMDK file metadata etc.)

For QEMU, IDE and SCSI devices can utilize a wwn passed via qemu command line and virtio devices can utilize a serial number parameter.

https://git.qemu.org/?p=qemu.git;a=commit;h=95ebda85e09ed2b7f00deb2adbdafa5ccf5db948
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=64cc22841e72d37d577416f5836155ecd0a9bfb6
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=9a502563eef7d7c2c9120b237059426e229eefe9

https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=6e73850b01ee7d5816a803d684e9d669dad036f3

virtio:

#...
-drive file=/var/lib/libvirt/images/maas-1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,serial=a76f8801-92d3-4aff-9c96-a51a054517a0 # ...

cat /sys/class/block/vda/serial
a76f8801-92d3-4aff-9

(virtio)scsi:

virsh dumpxml maas | grep wwn
      <wwn>5...

Read more...

Changed in maas:
status: Triaged → Invalid
importance: Wishlist → Undecided
milestone: 2.4.x → none
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Marking this as invalid for MAAS provided that:

1. MAAS doesn't create the udev rules matching to disk names.
2. MAAS already sends UUID's to curtin, as to whether curtin uses them/new ones are created, I dont know.

That said, can you please explain the problem you are actually hitting? The explanation on the bug reports explains what you are wanting to achieve and why, but doesn't highlight what problem you are currently facing.

Also, since I believe you would be using juju, and MAAS already has a UUID attached to a disk which knows which dname it has, why shouldn't juju be using such information to provide you what you need, instead of having to request a uuid being created on the disk.

And lastly, since this is a feature request, should this really be tagged as 'field-*' ? IIRC, since this is not a regression or a proper bug, and this is a feature request for functionality that doesn't yet exist, this should be tagged as such.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Andres,

The problem we have is quite simple: we cannot rely on Juju storage as of today and can only use charm config options which require lists of devices, i.e.

osd-devices: /dev/disk/by-<some-logical-name-from-maas>/logicalname0 /dev/disk/by-<some-logical-name-from-maas>/logicalname1 /dev/disk/by-<some-logical-name-from-maas>/logicalname2
journal
osd-journal: # ... space-separated device symlinks
bluestore-wal: # ... space-separated device symlinks
bluestore-db: # ... space-separated device symlinks

There is a single config string for all units of ceph-osd for a given node type so we have to use logical naming: IDs are unique per-node so by-id links cannot be used directly. Logical names have to be used as we have an all-NVMe setup with 2 device types: "regular" P4500 NVME data devices and P4800x Optane devices. Those devices are not plugged uniformly in terms of slots across machines and /dev/nvme<devnum>n<namespace-num> entries look the same on all nodes but may point to a different device type depending on a node.

# 375G Optane device is nvme8n1 but nvme4n1 on a different node
lrwxrwxrwx 1 root root 13 Jul 25 16:28 nvme-INTEL_SSDPE21K375GA_PHKE730200JK375AGN -> ../../nvme8n1
# 4T P4500 device
lrwxrwxrwx 1 root root 13 Jul 25 16:28 nvme-INTEL_SSDPE2KX040T7_PHLF802000X84P0IGN -> ../../nvme9n1

So the following configuration would result in an optane device being used as a data device on some nodes which would silently create an invalid Ceph device configuration:

osd-devices: /dev/nvme0n1 /dev/nvme1n1 # ... until /dev/nvme7n1
osd-journal: /dev/nvme8n1

Tags and Juju storage would have solved this problem quite nicely but only if there were no other issues to address.

There have been some operational concerns when using Juju storage with MAAS as a provider that got communicated to us from the operations teams and also some current bugs:

https://bugs.launchpad.net/juju/+bug/1783084
https://bugs.launchpad.net/juju/+bug/1778033
https://bugs.launchpad.net/juju/+bug/1691694

We currently cannot use partition tags due to the lack of that feature in MAAS https://bugs.launchpad.net/maas/+bug/1713239 and we rely on NVME device partitions for allocating some NVMe storage for bcache and using some for filestore journal or bluestore WAL & DB. So we cannot use Juju storage with MAAS-provided partitions either.

Likewise, we cannot rely on pre-created filesystems in MAAS and fstab fs UUID-based entries as:

1) with ceph encryption configured by charms file systems are created on top of device-mapper block devices that MAAS is not aware of;
2) the new stable ceph backend (bluestore) does not rely on kernel-mounted file systems so file system UUIDs are not usable.

I agree that MAAS already exposes relevant information for physical devices and composable devices (md-raid, bcache etc.) to curtin.

However, having MAAS to provide serials or WWNs for virtio or virtio-scsi devices would be a MAAS feature request in my view as there is some work involved in rendering proper libvirt domain xml.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

To better understand the context of this issue, I briefly discussed it with Dmitrii on IRC and we have identified 3 areas:

 A. Currently, we are relying on charm options to specify which storage devices to use for ceph data and journal or wal/db devices (/dev/disk/by-<logical-name-in-maas>/logicalname0 /dev/disk/by-<logical-name-in-maas>/logicalname1 and so on). That said:
 - The work around is to rename the block devices in MAAS (which renames the partition prefix and the dname symlink).
 - The desired fix is to use Juju storage instead (which is also supported by the ceph-osd charm), which would require solving [2] in MAAS.

 B. Curtin currently writes symlinks (by-dname) based on GPT partition UUIDs, partition UUIDs and other superblock identifiers. Curtin, however, doesn't create that for clean devices. As such, if curtin were to *also* create symlinks based on WWN or serial ID, it would help with the use cases. This means *this* bug applies for is valid for curtin, but invalid for MAAS.

 C. MAAS to preseed disk UUIDs for KVM pods VMs, so that when B is solved, disks would have proper symlinks. Requires a new bug to be filed for MAAS.

[2]: https://bugs.launchpad.net/maas/+bug/1713239

Ryan Harper (raharper)
Changed in curtin:
importance: Wishlist → High
status: Triaged → In Progress
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 3f03b1dd to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=3f03b1dd

Changed in curtin:
status: In Progress → Fix Committed
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Manually checked on a physical node with 1 device after replacing changed curtin files on a MAAS node (the code isn't in -proposed yet at the time of writing), seems to work:

ubuntu@redacted:~$ blkid
/dev/sda1: LABEL="root" UUID="2b708114-354c-4ece-9d2e-3b7c202cbf17" TYPE="ext4" PARTUUID="18b7f148-01"

ubuntu@redacted:~$ tree /dev/disk/by-dname/
/dev/disk/by-dname/
├── sda -> ../../sda
├── sda-part1 -> ../../sda1
└── sdb -> ../../sdb

0 directories, 3 files

ubuntu@emere:~$ ls -al /dev/disk/by-id/ | grep sdb
lrwxrwxrwx 1 root root 9 Dec 3 14:14 scsi-3600508b1001c914bbd22b1f25777ef8c -> ../../sdb
lrwxrwxrwx 1 root root 9 Dec 3 14:14 wwn-0x600508b1001c914bbd22b1f25777ef8c -> ../../sdb

ubuntu@emere:~$ cat /etc/udev/rules.d/sdb.rules
# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="3600508b1001c914bbd22b1f25777ef8c", ENV{ID_WWN_WITH_EXTENSION}=="0x600508b1001c914bbd22b1f25777ef8c", SYMLINK+="disk/by-dname/sdb"

SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="3600508b1001c914bbd22b1f25777ef8c", ENV{ID_WWN_WITH_EXTENSION}=="0x600508b1001c914bbd22b1f25777ef8c", SYMLINK+="disk/by-dname/sdb-part%n"

I'll try to get a node with an NVMe device (or create a virtual one) and check there as well.

Revision history for this message
Ryan Harper (raharper) wrote : Fixed in curtin version 18.2.

This bug is believed to be fixed in curtin in version 18.2. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

With this change curtin will error out (resulting in failed deployments in MAAS) if a serial or wwn is not specified on a block device. In my case this failure happened on a VM created manually via virsh (not pods) with block devices that do not have explicit serial or wwn values:

        Running command ['blkid', '-o', 'export', '/dev/vda'] with allowed return codes [0, 2] (capture=True)
        Running command ['udevadm', 'info', '--query=property', '/dev/vda'] with allowed return codes [0] (capture=True)
        An error occured handling 'vda': RuntimeError - Cannot create disk tag udev rule for /dev/vda [id=vda], missing 'serial' or 'wwn' value
        finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: configuring disk: vda
        TIMED BLOCK_META: 4.837
        finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: curtin command block-meta
        Traceback (most recent call last):
          File "/curtin/curtin/commands/main.py", line 201, in main
            ret = args.func(args)
          File "/curtin/curtin/log.py", line 97, in wrapper
            return log_time("TIMED %s: " % msg, func, *args, **kwargs)
          File "/curtin/curtin/log.py", line 79, in log_time
            return func(*args, **kwargs)
          File "/curtin/curtin/commands/block_meta.py", line 83, in block_meta
            return meta_custom(args)
          File "/curtin/curtin/commands/block_meta.py", line 1601, in meta_custom
            handler(command, storage_config_dict)
          File "/curtin/curtin/commands/block_meta.py", line 506, in disk_handler
            make_dname(info.get('id'), storage_config)
          File "/curtin/curtin/commands/block_meta.py", line 271, in make_dname
            byid = make_dname_byid(path, error_msg="id=%s" % vol.get('id'))
          File "/curtin/curtin/commands/block_meta.py", line 250, in make_dname_byid
            "missing 'serial' or 'wwn' value" % error_msg)
        RuntimeError: Cannot create disk tag udev rule for /dev/vda [id=vda], missing 'serial' or 'wwn' value
        Cannot create disk tag udev rule for /dev/vda [id=vda], missing 'serial' or 'wwn' value

Stderr: ''

After adding serial numbers curtin still uses the old by-id value (supplied by maas from the db) to try and lookup a device by serial so a recommissioning is needed after serial values are updated:

          File "/curtin/curtin/block/__init__.py", line 686, in lookup_disk
            raise ValueError("no disk with serial '%s' found" % serial_udev)
        ValueError: no disk with serial 'drive-scsi0-0-0-1' found
        no disk with serial 'drive-scsi0-0-0-1' found

Adding serial to VMs created via virsh pods was fixed in maas for 2.5:
https://bugs.launchpad.net/maas/+bug/1785755

I think we may break previous MAAS versions though.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

+1. Curtin should be backward compatible and not exclusively require this.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Another thing I noticed is that the udev rule for NVMe namespaces is not based on a wwid of a namespace - it is tied to a serial of a controller. For NVMe devices with a single namespace this is ok but NVMe controllers can create multiple namespaces each with its own unique identifier (e.g. in the NVMeOF use-case).

https://paste.ubuntu.com/p/WJYpnWmnxp/
for i in `ls -1 /dev/vd* /dev/sd* /dev/nvme*n*` ; do echo "Info for device $i" ; cat /etc/udev/rules.d/`basename $i`.rules ; echo "Unique attribute: " ; udevadm info -a --path $(udevadm info -q path -n $i) | grep -P 'serial|wwid' ; done

Info for device /dev/nvme0n1
# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="QEMU NVMe Ctrl_nvmec0", SYMLINK+="disk/by-dname/nvme0n1"

SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="QEMU NVMe Ctrl_nvmec0", SYMLINK+="disk/by-dname/nvme0n1-part%n"
Unique attribute:
    ATTR{wwid}=="nvme.8086-6e766d656330-51454d55204e564d65204374726c-00000001"
    ATTRS{serial}=="nvmec0 "

nvme.8086-6e766d656330-51454d55204e564d65204374726c-00000001 is a wwid of a namespace.

So, I think we should prefer ID_WWN over ID_SERIAL in general and use ID_SERIAL as a fallback:

udevadm info --query=property --name /dev/nvme0n1
DEVLINKS=/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_nvmec0 /dev/disk/by-path/pci-0000:00:03.0-nvme-1 /dev/disk/by-id/nvme-nvme.8086-6e766d656330-51454d55204e564d65204374726c-00000001 /dev/disk/by-dname/nvme0n1
DEVNAME=/dev/nvme0n1
DEVPATH=/devices/pci0000:00/0000:00:03.0/nvme/nvme0/nvme0n1
DEVTYPE=disk
ID_MODEL=QEMU NVMe Ctrl
ID_PATH=pci-0000:00:03.0-nvme-1
ID_PATH_TAG=pci-0000_00_03_0-nvme-1
ID_SERIAL=QEMU NVMe Ctrl_nvmec0
ID_SERIAL_SHORT=nvmec0
ID_WWN=nvme.8086-6e766d656330-51454d55204e564d65204374726c-00000001
MAJOR=259
MINOR=0
SUBSYSTEM=block
TAGS=:systemd:
USEC_INITIALIZED=1792441

NVMe spec references:

[1] "5.15.2 Identify Namespace data structure
Namespace Globally Unique Identifier (NGUID): This field contains a 128-bit value that is globally unique and assigned to the namespace when the namespace is created. This field remains fixed throughout the life of the namespace and is preserved across namespace and controller operations (e.g., controller reset, namespace format, etc.)."

A namespace ID is extracted from a controller and exported via sysfs via a wwid node:
https://github.com/torvalds/linux/blob/v4.14/drivers/nvme/host/core.c#L802-L818
https://github.com/torvalds/linux/blob/v4.14/drivers/nvme/host/core.c#L2054-L2081

[2] Namespaces also have non-unique (across controllers or NVMe subsystems) numeric namespace identifiers (NSID) that are similar to SCSI LUNs (nvme<controller-num>n<namespace-identifier>), see “Section 6.1.2 Valid and Invalid NSIDs” in the NVMe specification:

“Associated with each controller namespace is a namespace ID, labeled as NSID 1 and NSID 2, that is used by the controller to reference a specific namespace.

Changed in curtin:
status: Fix Released → New
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Andres, more on comment #15:

ID_SERIAL and ID_SERIAL_SHORT are generated for SCSI devices if not provided via domain XML but not for virtio-blk.

https://bugs.launchpad.net/maas/+bug/1807505
https://bugs.launchpad.net/maas/+bug/1807505/comments/3

Revision history for this message
Ryan Harper (raharper) wrote :

I didn't think there was a MAAS released that would provide only path: to disks. This is somewhat surprising since path is just not stable.

While we can warn when we attempt to create a dname for a disk without serial/wwn and move on; I do think MAAS should require serial/wwn. No one looks at the curtin log file on successful deploys; so this leaves the node open to unreliable dname if someone modifies the partition table (which is what this change was to fix) or plugs in a USB drive. Should we inject 'unstable' in to the dname value?

/dev/disk/by-dname/UNSTABLE-sda ../../sda

In terms of rule preference; we have some conflicting values. On some scsi disks, we get:

ID_SERIAL
ID_SERIAL_SHORT
ID_WWN
ID_WWN_WITH_EXTENSION

So, I'd like to see if we can build a reasonable order of preference. My initial thought based on my experience would be to prefer values in this order:

ID_WWN_WITH_EXTENSION
ID_WWN
ID_SERIAL
ID_SERIAL_SHORT

And lastly, curtin currently will embed as many of these values as are found.

DEVTYPE=={disk}, ENV{ID_WWN_WITH_EXTENSION}=="val", ENV{ID_WWN}=="val"

I think by combining these values in the rule; we'll ensure we don't misidentify a disk.

Ryan Harper (raharper)
Changed in curtin:
status: New → Incomplete
Revision history for this message
Joshua Powers (powersj) wrote :

Dmitrii, Andres,

Ryan has provided some options in #18 in order to resolve this (e.g. a warning message or other usage of other values). Consider the following background:

a) Prior to https://bugs.launchpad.net/bugs/1735839, dname was unstable in the face of users/charms wiping the partition table of devices

b) curtin introduced a change where dname udev rules include WWN or SERIAL values for disks so that users (and charms) can *trust* that if they set a dname it will be present and persistent

c) curtin now raises a RuntimeException when it encounters a device which has requested a dname, but the device does *NOT* include a persistent attribute like a WWN or SERIAL

d) The issue with the new behavior is that for some previous MAAS releases there are KVM nodes which do not contain a serial and these nodes, deployment fails as the storage config
also includes a dname for the disks

Below are three options we see to move this forward:

1) Revert all the changes associated with LP: #1735839, however this is the least desirable outcome; though field may have workaround for this in place already.

2) Log a warning when curtin detects a request for dname on devices that do not have serial. This will allow dnames to continue to be unstable on disks without serial numbers; there is no indication to users or deployments that dname is unstable on certain disks; this seems some
what worse in that we have stated that dnames are now stable, but in some cases they are not and the users have no idea until it fails them.

3) Keep the exception in place and release an SRU to previous MAAS so that new KVM pods have serial numbers, and that existing KVM nodes which are already defined have their guest xml updated to include a serial number such that upon deployment, dname becomes stable.

Curtin’s preference is for (3) so we can establish a guarantee for dname symlinks. That said we are fine with either (1) or (2) as well but would like Field and MAAS to discuss and come to an agreement on what to do and update bug with the decision. Thanks!

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

From the field engineering perspective, 2 would suffice as it would preserve backwards compatibility and solve the problem for contemporary versions of MAAS.

I think (2) could be modified to instead avoid creating symlinks for devices that do not have persistent identifiers. Existing 2.2 to 2.4 installations would still be able to deploy virtual machines but dname symlinks would not be created. Pod VMs have 1 block device and dname symlinks are not generally used with them anyway.

This would provide a way to still provision virtual machines but avoid creating symlinks for devices without persistent ids (they must not be used anyway).

For physical block devices with valid VPD pages containing serial or wwid there would be no difference as symlinks were not available for them anyway unless a partition table was present.

(3) would require 2 SRUs (one for Xenial, one for Bionic) and also specific ordering of updates (MAAS first, then Curtin). This would set us back it terms of getting the newer Curtin version for a while I imagine.

Anything against the modified version of (2) I proposed?

Revision history for this message
Ryan Harper (raharper) wrote :

> (2) could be modified to instead avoid creating symlinks for devices that do not have persistent identifiers.

Existing MAAS users 2.2 through 2.4 will upgrade curtin and new deployments of VMs won't have dnames. Is this acceptable? Won't users file that as a bug?

That may be acceptable if we have a MAAS version released where dnames are stable (and enforced, though the question remains where to enforce this; in curtin or MAAS)?

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

> Is this acceptable? Won't users file that as a bug?

I suspect that either there is a very small amount of users or no users at all that depend on dname symlinks in VMs. Based on field engineering experience there are none because if VMs are created with a single block device (where root fs resides), file system paths are typically used in automation, not dname symlinks. None of our current bundles rely on by-dname symlinks with pod VMs.

I don't like to use ad-hoc statistics to reason about behavioral changes but it is what it is considering the situation.

> That may be acceptable if we have a MAAS version released where dnames are stable (and enforced, though the question remains where to enforce this; in curtin or MAAS)?

We cannot enforce that in every case, e.g. when I use virsh provider + my own domain XML without serial/wwid specified for block devices, I don't expect MAAS to parse domain XML and tell me there is something missing.

However, I would warn a user via MAAS based on outputs of 00-maas-07-block-devices commissioning script: e.g. if '"SERIAL": "drive-scsi0-0-0-0"' is present, it is clearly not a persistent ID, and a user should be told that it cannot be relied on for persistent device symlinks.

Likewise, if there is a physical device with a damaged EEPROM or otherwise inaccessible Vital Product Data (VPD), in which case serial/wwid wouldn't be accessible, MAAS should tell the user about that.

To summarize: it's better to warn explicitly than fail commissioning.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1735839] Re: /dev/by-dname symlinks based on wwn/wwid

On Wed, Jan 30, 2019 at 6:20 PM Dmitrii Shcherbakov <
<email address hidden>> wrote:

> > Is this acceptable? Won't users file that as a bug?
>
> I suspect that either there is a very small amount of users or no users
> at all that depend on dname symlinks in VMs. Based on field engineering
> experience there are none because if VMs are created with a single block
> device (where root fs resides), file system paths are typically used in
> automation, not dname symlinks. None of our current bundles rely on by-
> dname symlinks with pod VMs.
>
> I don't like to use ad-hoc statistics to reason about behavioral changes
> but it is what it is considering the situation.
>

If you and MAAS are OK with this; I won't object. I'll just return any
bugs to you =)

>
> > That may be acceptable if we have a MAAS version released where dnames
> are stable (and enforced, though the question remains where to enforce
> this; in curtin or MAAS)?
>
> We cannot enforce that in every case, e.g. when I use virsh provider +
> my own domain XML without serial/wwid specified for block devices, I
> don't expect MAAS to parse domain XML and tell me there is something
> missing.
>
> However, I would warn a user via MAAS based on outputs of 00-maas-07
> -block-devices commissioning script: e.g. if '"SERIAL": "drive-
> scsi0-0-0-0"' is present, it is clearly not a persistent ID, and a user
> should be told that it cannot be relied on for persistent device
> symlinks.
>
> Likewise, if there is a physical device with a damaged EEPROM or
> otherwise inaccessible Vital Product Data (VPD), in which case
> serial/wwid wouldn't be accessible, MAAS should tell the user about
> that.
>
> To summarize: it's better to warn explicitly than fail commissioning.
>

Even better to have MAAS prevent users from including 'name' values for
disks
which don't have a serial attribute. I guess I'm also looking for MAAS to
do its part
for ensuring that dname is stable by refusing to allow users to create a
'name' on storage
devices if the don't have a persistent attribute.

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1735839
>
> Title:
> /dev/by-dname symlinks based on wwn/wwid
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1735839/+subscriptions
>

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

> If you and MAAS are OK with this; I won't object. I'll just return any
bugs to you =)

I am quite confident in this. Using persistent device links with clean devices is a fairly narrow (but important) use-case - i.e. we use it for Ceph only and we have never done this in VMs so far.

Another way to look at it is: there was never a commitment to create by-dname symlinks without a reliable identifier to back them up. So we are fixing a bug which reads as "symlinks get created without a persistent identifier with virtio-blk devices".

At the same time, if symlinks do not get created anymore for this case, we are not breaking MAAS in terms of VM deployment, thus fulfilling the promise to MAAS about curtin compatibility.

When it comes to Juju storage, it retrieves "by-id" symlinks stored in MAAS for physical or emulated devices so we are not breaking anything here either.

I suspect Andres' concern in #15 is more about not being able to deploy, rather than not having symlinks.

> Even better to have MAAS prevent users from including 'name' values for disks which don't have a serial attribute.

I think this would be valuable to have and I would really like to see this. We should discuss this during the next sprint but it should not be a blocking feature for this lp.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

My perspective is that if curtin introduced a change of behavior on when to create dnames that's breaking current working software, then curtin needs to handle backwards compatibility.

That means that new version of curtin needs to work exactly the same way as it used to for older versions of MAAS.

That is, if no serial or wwn's are present for a disk, curtin should fallback to create dnames the way it did before.

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 35fd01f0 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=35fd01f0

Changed in curtin:
status: Incomplete → Fix Committed
Revision history for this message
Chris Sanders (chris.sanders) wrote :

I've subscribed field-high, this is an active issue for expanding clouds with Optane and Nvme both present.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Tested on (master at the time of writing): https://git.launchpad.net/curtin/commit/?id=7afd77faf99739cc2fe7ab0c256d2e153bccb604

LGTM

Note: see the attached domain.xml of a test machine sda (root disk) and vda do not have a serial attached. All devices except for sda were renamed in MAAS to <commissioning-dev-name>-renamed

maas <profile> machine read xeqrrw # post-deployment (successful without some serials which is expected)
https://paste.ubuntu.com/p/B69nKQ6zqg/

maas maas machine get-curtin-config xeqrrw
https://paste.ubuntu.com/p/dc8BJRs5Hd/

ubuntu@maas-vhost6:~$ tree /dev/disk/by-dname/
/dev/disk/by-dname/
├── nvme0n1-renamed -> ../../nvme0n1
├── nvme1n1-renamed -> ../../nvme1n1
├── sda -> ../../sda
├── sda-part1 -> ../../sda1
├── sdb-renamed -> ../../sdb
├── sdc-renamed -> ../../sdc
├── sdd-renamed -> ../../sdd
├── sde-renamed -> ../../sde
└── vdb-renamed -> ../../vdb

ubuntu@maas-vhost6:~$ for i in `ls -1 /dev/nvme* | grep -P 'nvme[0-9]n1$'` ; do echo $i ; cat /sys/class/block/`basename $i`/device/serial ; done
/dev/nvme0n1
nvmec0
/dev/nvme1n1
nvmec1

ubuntu@maas-vhost6:~$ for i in `ls -1 /sys/class/block/ | grep -P 'sd[a-z]$'` ; do echo $i ; sudo sg_vpd --page=0x80 /dev/$i ; done
sda
Unit serial number VPD page:
fetching VPD page failed: Illegal request
sdb
Unit serial number VPD page:
  Unit serial number: disk0
sdc
Unit serial number VPD page:
  Unit serial number: disk1
sdd
Unit serial number VPD page:
  Unit serial number: disk4
sde
Unit serial number VPD page:
  Unit serial number: disk5

for i in `ls -1 /dev/vd* | grep -P 'vd[a-z]$'` ; do echo $i ; cat /sys/class/block/`basename $i`/serial ; done
/dev/vda
/dev/vdb
disk3

root@maas-vhost6:~# grep "missing 'serial' or 'wwn' value" curtin-install.log
Cannot create disk tag udev rule for /dev/vda [id=vda-renamed], missing 'serial' or 'wwn' value

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (4.0 KiB)

Also the resulting udev rules for #28:

for i in `ls -d /etc/udev/rules.d/*` ; do echo $i ; cat $i ; printf '\n\n' ; done
/etc/udev/rules.d/nvme0n1-renamed.rules
# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="disk", ENV{ID_WWN}=="nvme.8086-6e766d656330-51454d55204e564d65204374726c-00000001", ENV{ID_SERIAL}=="QEMU NVMe Ctrl_nvmec0", ENV{ID_SERIAL_SHORT}=="nvmec0", SYMLINK+="disk/by-dname/nvme0n1-renamed"

SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="partition", ENV{ID_WWN}=="nvme.8086-6e766d656330-51454d55204e564d65204374726c-00000001", ENV{ID_SERIAL}=="QEMU NVMe Ctrl_nvmec0", ENV{ID_SERIAL_SHORT}=="nvmec0", SYMLINK+="disk/by-dname/nvme0n1-renamed-part%n"

/etc/udev/rules.d/nvme1n1-renamed.rules
# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="disk", ENV{ID_WWN}=="nvme.8086-6e766d656331-51454d55204e564d65204374726c-00000001", ENV{ID_SERIAL}=="QEMU NVMe Ctrl_nvmec1", ENV{ID_SERIAL_SHORT}=="nvmec1", SYMLINK+="disk/by-dname/nvme1n1-renamed"

SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="partition", ENV{ID_WWN}=="nvme.8086-6e766d656331-51454d55204e564d65204374726c-00000001", ENV{ID_SERIAL}=="QEMU NVMe Ctrl_nvmec1", ENV{ID_SERIAL_SHORT}=="nvmec1", SYMLINK+="disk/by-dname/nvme1n1-renamed-part%n"

/etc/udev/rules.d/sda-part1.rules
# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="partition", ENV{ID_PART_ENTRY_UUID}=="31e824cf-01", SYMLINK+="disk/by-dname/sda-part1"

/etc/udev/rules.d/sda.rules
# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="disk", ENV{ID_PART_TABLE_UUID}=="31e824cf", SYMLINK+="disk/by-dname/sda"

SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0", ENV{ID_SERIAL_SHORT}=="drive-scsi0-0-0-0", SYMLINK+="disk/by-dname/sda"

SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0", ENV{ID_SERIAL_SHORT}=="drive-scsi0-0-0-0", SYMLINK+="disk/by-dname/sda-part%n"

/etc/udev/rules.d/sdb-renamed.rules
# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="0QEMU_QEMU_HARDDISK_disk0", ENV{ID_SERIAL_SHORT}=="disk0", SYMLINK+="disk/by-dname/sdb-renamed"

SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="0QEMU_QEMU_HARDDISK_disk0", ENV{ID_SERIAL_SHORT}=="disk0", SYMLINK+="disk/by-dname/sdb-renamed-part%n"

/etc/udev/rules.d/sdc-renamed.rules
# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="0QEMU_QEMU_HARDDISK_disk1", ENV{ID_SERIAL_SHORT}=="disk1", SYMLINK+="disk/by-dname/sdc-renamed"

SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="0QEMU_QEMU_HARDDISK_disk1", ENV{ID_SERIAL_SHORT}=="disk1", SYMLINK+="disk/by-dname/sdc-renamed-part%n"

/etc/udev/rules.d/sdd-renamed.rules
# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="QEMU_HARDDISK_disk4", ENV{ID_SERIAL_SHORT}=="disk4", SYMLINK+="disk/by-dname/sdd-renamed"

SUBSYSTEM=="block", ACTION=="add|change", ENV{DEVT...

Read more...

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Just verified that it works as required on Bionic after updating to the SRU-ed package:

https://bugs.launchpad.net/ubuntu/+source/curtin/+bug/1817964

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.