Code review comment for ~mwhudson/curtin:raid-min-levels

Revision history for this message
Ryan Harper (raharper) wrote :

> Hi Ryan,
>

Hi Thimo,

> I am the author of the PR
> (https://github.com/canonical/subiquity/pull/929) as well as the
> originating bug report
> (https://bugs.launchpad.net/subiquity/+bug/1880023). Thanks for
> considering the changes. A few comments from my side:
> I think this change is first of all a fix for a bug. The bug is
> that the code and the comment above state these values are
> "minimum number of devices for a given raid level". The original
> numbers are incorrect in this regard (see
> https://github.com/canonical/subiquity/pull/929 for
> pointers to the relevant information sources).

Thanks for the pointers. It's certainly true that mdadm does
accept lower number of drives for some configurations; lower than
the current minimum numbers. We could accept the adjustments to
these more accurate numbers.

However I'd like to consider the details around what the lower
number gives users. I'll reply in-line below as you discuss the
details below.

> Regarding your RAID5 comment: a RAID5 with two components is not
> missing parity. The parity is just coincidentally the data itself
> which means the data on the device is arranged (parity
> distribution on equal data will not have an effect) like
> a RAID1. Nevertheless the metadata is RAID5 and also the upgrade
> path using "grow" is more obvious (RAID5 two component to RAID5
> three component). Of course it is possible to do the
> intermediate step (RAID1 two components to RAID5 two components)
> before.

A two-drive RAID5 is RAID1 with different metadata format. Why
not just create a RAID1 with your two disks? IIUC, you can
already convert RAID1 to RAID5.

I believe users who do understand they are creating RAID5 in
degraded mode and that they can grow their RAID5 later also
know they can convert a RAID1 into RAID5 as well.

The expansion of an array is out-of-scope for the installer where
as producing non-degraded in the least confusing configurations
is very much in-scope.

>
> As a side-note, "mdadm" complains if the creation is missing some
> devices and it is not explicitly told "missing", so I am
> generally wondering why it is necessary to repeat the check in
> the installer (due to missing dry-run feature of mdadm?).

The 'missing' keyword is interesting. This does allow creating
an array of larger number of disks without all being present.
I'm still finding this to be a dubious "feature" for the initial
installer. We can do the following:

mdadm --create /dev/md0 --level=5 --raid-devices=3 \
      disk1 disk2 missing

But this does not work

mdadm --create /dev/md0 --level=5 --raid-devices=5 \
      disk1 disk2 missing missing missing

[ 887.817306] md/raid:md0: not enough operational devices (3/5 failed)
mdadm: RUN_ARRAY failed: Input/output error

The manual says this:

"For a RAID4 or RAID5 array at most one slot
 can be "missing"; for a RAID6 array at most two slots."

For any user who's created a RAID array with disks missing;
they'll eventually need to issue a mdadm --add or --grow command
in which case they could convert the RAID array from mirror to
parity type as well.

>
> Regarding the layout support, I think this is a much appreciated
> feature but it is not a dependency to fix the current issue.
>
> Since we are at the layout topic: I am not sure if there is much
> interest for the layout feature beyond RAID10. Thus it might also
> be an option to have two(/three) different RAID10 modes like
> RAID10near and RAID10far. The main question then would be if it
> is safe to assume the number of copies to be 2 (from mdadm man
> page: "2 is normal, 3 can be useful").
>

For min-drives on RAID5 it's not related. For RAID10 my
understanding is that the default layout for RAID10 would not
give you the optimal single thread perf the linux-raid thread
I found[1] indicated that f2 would be best (but this may be dated).

1. https://marc.info/?l=linux-raid&m=141051176728630&w=2

A big portion of what curtin is providing is a language in which
to describe your storage configuration. Subiquity is a consumer
of this language and in-short is a user-facing tool to help
construct curtin's storage configuration files.

Generally curtin delegates behavior choices to the constructor
of the configuration. That is as Michael pointed out, curtin
shouldn't second guess that someone wants to construct a two-disk
RAID5 array etc. I generally agree with that perspective.

The other end of this is that without guidance users end up
constructing things that don't actually work or are sub-optimal
and the blame falls to the installer for letting users footgun
their server. Subiquity goes through great efforts to prevent
users from constructing configurations which result in an
unbootable server even if curtin is more than happy to do so.

If we are to enable these special cases then we must provide
configuration language sufficient to describe what is possible
rather than only the special cases and be explicit about it.

For example, the 2-disk raid5 array config could look like:

- type: raid
  level: 5
  raid_devices: 3
  devices:
    - /dev/sda
    - /dev/sdb

And the RAID10 array with specific layout could be

- type: raid
  level: 10
  raid_devices: 2
  devices:
    - /dev/sda
    - /dev/sdb
  layout: f2

However, we'd like to sanity check these configurations, for
example

- type: raid
  level: 5
  raid_devices: 5
  devices:
    - /dev/sdb
    - /dev/sdc

Will fail at runtime (as I showed above) but this is a poor
experience when curtin's block schema can and should validate
the configuration and reject it indicating that RAID5 can only
allow one missing device.

Likewise, adding layout support will need additional checks
and validations so that the install experience isn't painful
and the language we use to describe what we want isn't ambiguous
allowing curtin to incrementally add feature/function over time.

« Back to merge proposal