cannot partition existing raid

Bug #1932976 reported by Michael Hudson-Doyle
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin
Fix Released
Undecided
Unassigned

Bug Description

"preserve = true" for a disk means "preserve the partition table".

For a raid it means "preserve the raid array" but ALSO "preserve the partition table".

So you can't reformat a pre-existing RAID :(

Related branches

summary: - cannot parition existing raid
+ cannot paritition existing raid
summary: - cannot paritition existing raid
+ cannot partition existing raid
Revision history for this message
Ryan Harper (raharper) wrote :

Hrm, I didn't think this was true (that preserve means keeping partition table).

looking at raid_handler, if you set wipe: superblock, it will wipe a verified.

```
create_raid = True
if preserve:
    raid_verify(md_devname, raidlevel, device_paths, spare_device_paths)
    LOG.debug('raid %s already present, skipping create', md_devname)
    create_raid = False

if create_raid:
    mdadm.mdadm_create(md_devname, raidlevel,
                       device_paths, spare_device_paths, container_dev,
                       info.get('mdname', ''), metadata)

wipe_mode = info.get('wipe')
if wipe_mode:
    if wipe_mode == 'superblock' and create_raid:
        # Newly created raid devices already wipe member superblocks at
        # their data offset (this is equivalent to wiping the assembled
        # device, see curtin.block.mdadm.zero_device for more details.
        pass
    else:
        LOG.debug('Wiping raid device %s mode=%s', md_devname, wipe_mode)
        block.wipe_volume(md_devname, mode=wipe_mode, exclusive=False)
```

Curtin will verify the array, lower the create_raid flag, and then if wipe
is set, it will clear the array; it only *skips* clearing if wipe=superblock
and we've just created the array (which already wipes the underlying disks).

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Right, but the problem is in disk_handler. Consider this code:

    if config.value_as_boolean(info.get('preserve')):
        # Handle preserve flag, verifying if ptable specified in config
        if ptable and ptable != PTABLE_UNSUPPORTED:
            current_ptable = block.get_part_table_type(disk)
            LOG.debug('disk: current ptable type: %s', current_ptable)
            if current_ptable not in PTABLES_SUPPORTED:
                raise ValueError(
                    "disk '%s' does not have correct partition table or "
                    "cannot be read, but preserve is set to true. "
                    "cannot continue installation." % info.get('id'))

After the wipe in raid_handler, block.get_part_table_type(disk) will return None so the error gets raised. Perhaps the conditional here should check wipe as well? (And maybe not check preserve, come to think of it...)

Revision history for this message
Ryan Harper (raharper) wrote :

I'm missing the storage config you're trying to process. IIUC, our existing preserve: true raid scenario looks like this (see curtin/examples/tests/preserve-raid.yaml) but let's extend it and create a partition on the raid

- type: disk
  id: sda
  preserve: true
- type: disk
  id: sdb
  preserve: true
- type: raid
  id: raid-md1
  raidlevel: raid1
  devices: [sda, sdb]
  preserve: true
- type: partition
  id raid1-md1-part1
  device: raid-md1
  size: 1G
- type: format
 ....

The disk handler never runs against a raid device, just like it doesn't run against lvm or any other composed device.

I think if you extend the existing preserve-raid vmtest to create a different partition layout but keep the raid mirror; that should just work and if not we'll see the traceback on where it failed.

Revision history for this message
Ryan Harper (raharper) wrote :

Could you explain what "reformat a pre-existing RAID" means?

I'm confident that we can add/remove partitions to a preserved raid; we can apply a format to a preserved raid (single block device) or to existing or new raid partitions.

Ryan Harper (raharper)
Changed in curtin:
status: New → Incomplete
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Re: [Bug 1932976] Re: cannot partition existing raid

On Sat, 26 Jun 2021 at 02:50, Ryan Harper <email address hidden>
wrote:

> I'm missing the storage config you're trying to process.

Yes, fair. The reason I'm running into this is to handle machines with
VROC, which often ship with containers and volumes pre-congfigured (or
configured with the UEFI application for this, i.e. pre-OS boot). So I can
show you a config but you probably don't have access to a machine that can
run it...

IIUC, our
> existing preserve: true raid scenario looks like this (see
> curtin/examples/tests/preserve-raid.yaml) but let's extend it and create
> a partition on the raid
>

I pushed a new vmtest to the branch in review (not using the new flag added
by the review). I haven't run it yet, I will do so soon.

> The disk handler never runs against a raid device, just like it doesn't
> run against lvm or any other composed device.
>

Sure it does:
https://git.launchpad.net/curtin/tree/curtin/commands/block_meta.py#n1612

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So I've run the test now and this is how it fails: https://paste.ubuntu.com/p/QvDzvMFzJQ/ (this is how subiquity fails on the VROC machine too)

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for the additional information.

We have at least two scenarios:

a) keep raid array and partition table as-is {preserve: true, wipe: None}
b) keep raid array and wipe partition table {preserve: true, wipe: superblock}

That looks like we could update disk_handler to run the preserve path if preserve: true and wipe is None (unset).

We could also extend this wipe check only to composable types (ie, we could check type:XXX in disk_handler and only check preserve if type is disk).

I think the latter works better (ignoring preserve setting if info's type doesn't match the handler).

Thoughts?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Yes, I think keying off wipe makes some amount of sense. I'm not wild about special casing type == 'disk' but can see that it would be more backwards compatible.

I wonder if it would be clearer to have a ptable_handler that can be called by all of disk_handler, raid_handler, bcache_handler. There wouldn't be much left in disk_handler I guess but maybe that's OK!

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 82765ab3 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=82765ab3

Changed in curtin:
status: Incomplete → Fix Committed
Revision history for this message
Dan Bungert (dbungert) wrote : Fixed in curtin version 21.3.

This bug is believed to be fixed in curtin in version 21.3. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.