Merge ~mwhudson/curtin:raid-min-levels into curtin:master

Proposed by Michael Hudson-Doyle
Status: Needs review
Proposed branch: ~mwhudson/curtin:raid-min-levels
Merge into: curtin:master
Diff against target: 34 lines (+4/-7)
2 files modified
curtin/block/mdadm.py (+2/-4)
tests/unittests/test_block_mdadm.py (+2/-3)
Reviewer Review Type Date Requested Status
Server Team CI bot continuous-integration Approve
curtin developers Pending
Review via email: mp+400931@code.launchpad.net

Commit message

mdadm: RAID levels 5 and 10 can both be created with 2 devices

RAID5 with 2 devices is a bit strange but it's not really curtin's place
to check this IMO.

RAID10 with 2 is not that strange though, perhaps linux "RAID10" should
have been called something else so it doesn't get confused with
"RAID1+0" so much, which is really something else.

Description of the change

I guess I should add a vmtest to verify my assertion that the RAIDs described can, in fact, be created.

To post a comment you must log in.
Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Ryan Harper (raharper) wrote :

I'm kinda -1 on this. Is there a launchpad bug for this?

From mdadm manpage:

-n, --raid-devices=
              Specify the number of active devices in the array. This, plus the number of spare devices (see below) must equal the number of component-devices (includ‐
              ing "missing" devices) that are listed on the command line for --create. Setting a value of 1 is probably a mistake and so requires that --force be spec‐
              ified first. A value of 1 will then be allowed for linear, multipath, RAID0 and RAID1. It is never allowed for RAID4, RAID5 or RAID6.

RAID5 without the third is missing the parity and is RAID1 AFAICT. Looking for sources to see where I'm wrong about that.

RAID10 with two disks will require (IMO) supporting layouts (which is good, but that's a feature add with testing I think).

https://marc.info/?l=linux-raid&m=141051176728630&w=2

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Well I'm doing this because an actual user is asking for this https://github.com/canonical/subiquity/pull/929. I'm not sure curtin should really stand between users and their desires, however strange they may be.

I do need to bash raid10 layouts into my head at some point.

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for the link to the discussion.

I think layouts are a requirement for relaxing the minimum drive check. If the config includes a layout specification, then we can assume the config is aware of the number of drives/data-copy requirements.

Revision history for this message
Thimo E (thimoe) wrote :

Hi Ryan,
I am the author of the PR (https://github.com/canonical/subiquity/pull/929) as well as the originating bug report (https://bugs.launchpad.net/subiquity/+bug/1880023).
Thanks for considering the changes. A few comments from my side:
I think this change is first of all a fix for a bug. The bug is that the code and the comment above state these values are "minimum number of devices for a given raid level". The original numbers are incorrect in this regard (see https://github.com/canonical/subiquity/pull/929 for pointers to the relevant information sources).

Also there are valid use-cases for both reductions. For (linux) RAID10 it is obviously single stream read speed and for RAID5 the main reason is expandability.

Regarding your RAID5 comment: a RAID5 with two components is not missing parity. The parity is just coincidentally the data itself which means the data on the device is arranged (parity distribution on equal data will not have an effect) like a RAID1. Nevertheless the metadata is RAID5 and also the upgrade path using "grow" is more obvious (RAID5 two component to RAID5 three component). Of course it is possible to do the intermediate step (RAID1 two components to RAID5 two components) before.

As a side-note, "mdadm" complains if the creation is missing some devices and it is not explicitly told "missing", so I am generally wondering why it is necessary to repeat the check in the installer (due to missing dry-run feature of mdadm?).

Regarding the layout support, I think this is a much appreciated feature but it is not a dependency to fix the current issue.
Since we are at the layout topic: I am not sure if there is much interest for the layout feature beyond RAID10. Thus it might also be an option to have two(/three) different RAID10 modes like RAID10near and RAID10far. The main question then would be if it is safe to assume the number of copies to be 2 (from mdadm man page: "2 is normal, 3 can be useful").

BR

Revision history for this message
Ryan Harper (raharper) wrote :
Download full text (5.8 KiB)

> Hi Ryan,
>

Hi Thimo,

> I am the author of the PR
> (https://github.com/canonical/subiquity/pull/929) as well as the
> originating bug report
> (https://bugs.launchpad.net/subiquity/+bug/1880023). Thanks for
> considering the changes. A few comments from my side:
> I think this change is first of all a fix for a bug. The bug is
> that the code and the comment above state these values are
> "minimum number of devices for a given raid level". The original
> numbers are incorrect in this regard (see
> https://github.com/canonical/subiquity/pull/929 for
> pointers to the relevant information sources).

Thanks for the pointers. It's certainly true that mdadm does
accept lower number of drives for some configurations; lower than
the current minimum numbers. We could accept the adjustments to
these more accurate numbers.

However I'd like to consider the details around what the lower
number gives users. I'll reply in-line below as you discuss the
details below.

> Regarding your RAID5 comment: a RAID5 with two components is not
> missing parity. The parity is just coincidentally the data itself
> which means the data on the device is arranged (parity
> distribution on equal data will not have an effect) like
> a RAID1. Nevertheless the metadata is RAID5 and also the upgrade
> path using "grow" is more obvious (RAID5 two component to RAID5
> three component). Of course it is possible to do the
> intermediate step (RAID1 two components to RAID5 two components)
> before.

A two-drive RAID5 is RAID1 with different metadata format. Why
not just create a RAID1 with your two disks? IIUC, you can
already convert RAID1 to RAID5.

I believe users who do understand they are creating RAID5 in
degraded mode and that they can grow their RAID5 later also
know they can convert a RAID1 into RAID5 as well.

The expansion of an array is out-of-scope for the installer where
as producing non-degraded in the least confusing configurations
is very much in-scope.

>
> As a side-note, "mdadm" complains if the creation is missing some
> devices and it is not explicitly told "missing", so I am
> generally wondering why it is necessary to repeat the check in
> the installer (due to missing dry-run feature of mdadm?).

The 'missing' keyword is interesting. This does allow creating
an array of larger number of disks without all being present.
I'm still finding this to be a dubious "feature" for the initial
installer. We can do the following:

mdadm --create /dev/md0 --level=5 --raid-devices=3 \
      disk1 disk2 missing

But this does not work

mdadm --create /dev/md0 --level=5 --raid-devices=5 \
      disk1 disk2 missing missing missing

[ 887.817306] md/raid:md0: not enough operational devices (3/5 failed)
mdadm: RUN_ARRAY failed: Input/output error

The manual says this:

"For a RAID4 or RAID5 array at most one slot
 can be "missing"; for a RAID6 array at most two slots."

For any user who's created a RAID array with disks missing;
they'll eventually need to issue a mdadm --add or --grow command
in which case they could convert the RAID array from mirror to
parity type as well.

>
> Regarding the layout support, I think this is a much appreciated
> feature but i...

Read more...

Revision history for this message
Thimo E (thimoe) wrote :
Download full text (7.1 KiB)

Hi Ryan,

first of all thank you for your answer and sorry for the long reply:

> [...]
> A two-drive RAID5 is RAID1 with different metadata format. Why
> not just create a RAID1 with your two disks? IIUC, you can
> already convert RAID1 to RAID5.
>
> I believe users who do understand they are creating RAID5 in
> degraded mode and that they can grow their RAID5 later also
> know they can convert a RAID1 into RAID5 as well.
>
> The expansion of an array is out-of-scope for the installer where
> as producing non-degraded in the least confusing configurations
> is very much in-scope.
One important point first: A two component RAID5 is absolutely NOT a degraded array.
Since this is really an important point, let me detail this further:
With two components you have two possibilities to create a RAID5:
 a) two component RAID5 (non-degraded)
 b) three component RAID5 (degraded)

There are several differences between both configurations:
The capacity of b) is double the capacity of a).
On the other hand if you lose one drive, in a) case your data is still existing - b) already lost it.
Also, you will have a performance impact on b) since the RAID has to reconstruct (CPU!) (due to degraded state)
the data blocks which would be put to the third component device.
Regarding the data arrangement on the component devices, it would look like the following:
a)
0 1 <-- component
A P1(=A)
P2(=B) B
C P3(=C)
....

b)
0 1 2(missing) <-- component
A B P1(=A^B)
C P2(=C^D) D
P3(=E^F) E F
....

So I am _not_ asking for the degraded array feature which is for sure an "expert"/corner case feature.

Let me hijack your "semi-educated" user standpoint:
A user with the following requirements:
 - Availability (survives one component failure)
 - Expandability (Can add space on demand)

will stumble quite easily over the "grow" feature of mdadm for RAID5. Thus he might - after minor research - select two disk RAID5 happily if the installer allows. On the other hand - if the installer only allows RAID1, it is not so easy to find/imagine that this actually also satisfies his expandability requirement.

>> As a side-note, "mdadm" complains if the creation is missing some
>> devices and it is not explicitly told "missing", so I am
>> generally wondering why it is necessary to repeat the check in
>> the installer (due to missing dry-run feature of mdadm?).

> The 'missing' keyword is interesting. This does allow creating
> an array of larger number of disks without all being present.
> I'm still finding this to be a dubious "feature" for the initial
> installer.
> [...]
As mentioned above, I did not want to propose the creation of degraded arrays.
I was wondering, why you implement the checks and error messages in curtin instead of just issuing the mdadm command and in case of an error propagating back the error messages from mdadm (which are often quite explanatory).

>> Regarding the layout support, I think this is a much appreciated
>> feature but it is not a dependency to fix the current issue.
>>
>> Since we are at the layout topic: I am not sure if there is much
>> interest for the layout feature beyond RAID10...

Read more...

Revision history for this message
Ryan Harper (raharper) wrote :
Download full text (9.7 KiB)

Thimo,

Thanks for the thoughtful reply. I've replied more below to your questions.
I would say here as a summary (tl;dr):

I'm OK with curtin supporting two-disk RAID5 and RAID10. I'd like to see
this in the form of adding raid_devices: N field to the type: raid structure.

I don't have direct control over subiquity; but I would strongly advice
against enabling two-disk RAID5/10 in the UI mostly to prevent user confusion
about expected capacity or redundancy. For the advanced user who does
understand what's happening I would have them use the Automated Installer[1]
option which lets users supply custom storage configurations (non-interactive
install). Michael, I'm interested in your thoughts on this approach.

> Hi Ryan,
>
> first of all thank you for your answer and sorry for the long reply:
>
> > [...]
> > A two-drive RAID5 is RAID1 with different metadata format. Why
> > not just create a RAID1 with your two disks? IIUC, you can
> > already convert RAID1 to RAID5.
> >
> > I believe users who do understand they are creating RAID5 in
> > degraded mode and that they can grow their RAID5 later also
> > know they can convert a RAID1 into RAID5 as well.
> >
> > The expansion of an array is out-of-scope for the installer where
> > as producing non-degraded in the least confusing configurations
> > is very much in-scope.

> One important point first: A two component RAID5 is absolutely NOT a degraded array.
> ....
>
> So I am _not_ asking for the degraded array feature which is for sure an "expert"/corner case feature.

Thank you for the follow-up. I was mistaken which I found out later as I
ran a few mdadm commands.

>
> Let me hijack your "semi-educated" user standpoint:
> A user with the following requirements:
> - Availability (survives one component failure)
> - Expandability (Can add space on demand)
>
> will stumble quite easily over the "grow" feature of mdadm for RAID5. Thus
> he might - after minor research - select two disk RAID5 happily if the
> installer allows. On the other hand - if the installer only allows RAID1, it
> is not so easy to find/imagine that this actually also satisfies his
> expandability requirement.

I suspect the same "semi-educuated" user will also find the examples in
the manpage for expanding or migrating from mirror to parity arrays as well.

>
> >> As a side-note, "mdadm" complains if the creation is missing some
> >> devices and it is not explicitly told "missing", so I am
> >> generally wondering why it is necessary to repeat the check in
> >> the installer (due to missing dry-run feature of mdadm?).
>
> > The 'missing' keyword is interesting. This does allow creating
> > an array of larger number of disks without all being present.
> > I'm still finding this to be a dubious "feature" for the initial
> > installer.
> > [...]
> As mentioned above, I did not want to propose the creation of degraded
> arrays. I was wondering, why you implement the checks and error messages in
> curtin instead of just issuing the mdadm command and in case of an error
> propagating back the error messages from mdadm (which are often quite
> explanatory).

The main reason why curtin wraps and validates various configurat...

Read more...

Revision history for this message
Thimo E (thimoe) wrote :

Hi Ryan,

> The main reason why curtin wraps and validates various configurations is that
> it's used inside of other frontend installers which have user-facing UIs
> in which not all users will grok what exactly is going wrong.
>
> The other goal is to catch as many errors in configuration without actually
> having to boot and run this. MAAS, for example, uses curtin in bare-metal
> deployments where some of the hardware may take quite some time (many
> minutes) to power-cycle, boot, start the install only to find out it
> failed due to misconfiguration.
>
> With the configuration language and our storage schema MAAS and Subiquity
> can validate the configuration before starting an install.
OK, understood.

> > Still, I would propose to handle this in a subsequent PR to not intermix a
> > "bugfix" with new feature.
> That's fine; it can be implemented separately.
Is it OK to merge this small fix, then?

> This is a good point. Yes you're correct it would create the degraded array.
> If curtin is going to allow someone to create a two-disk RAID5 which can use
> --grow; why wouldn't we support a 3-disk raid5 created degraded so user can
> add a hotspare later to recover it?
Of course I do not want to hold someone back doing strange things,
but here it should be clearly documented that this potentially leads to RAIDs without redundancy.

> > > And the RAID10 array with specific layout could be
> > >
> > > - type: raid
> > > level: 10
> > > raid_devices: 2
> > > devices:
> > > - /dev/sda
> > > - /dev/sdb
> > > layout: f2
> > Same here, raid_devices would be unnecessary unless you want to support it.
>
> raid_devices is required if we are to allow independent control over the
> value of --raid-devices and telling mdadm how many drives to expect.
One downside of this raid_devices entry would be that it is easy to create non-rudundant RAIDs by accident when copying installation templates.
Even though I would be fine with this, you could also make this more explicit/visible by introducing the "missing" keyword, e.g.:
- type: raid
  level: 5
  raid_devices: 3
  devices:
    - /dev/sda
    - /dev/sdb
would be:
- type: raid
  level: 5
  devices:
    - /dev/sda
    - /dev/sdb
    - missing

BR

Unmerged commits

0ce2ac9... by Michael Hudson-Doyle on 2021-04-11

mdadm: RAID levels 5 and 10 can both be created with 2 devices

RAID5 with 2 devices is a bit strange but it's not really curtin's place
to check this IMO.

RAID10 with 2 is not that strange though, perhaps linux "RAID10" should
have been called something else so it doesn't get confused with
"RAID1+0" so much, which is really something else.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/curtin/block/mdadm.py b/curtin/block/mdadm.py
2index a6ac970..7bdde02 100644
3--- a/curtin/block/mdadm.py
4+++ b/curtin/block/mdadm.py
5@@ -529,11 +529,9 @@ def md_raidlevel_short(raidlevel):
6 def md_minimum_devices(raidlevel):
7 ''' return the minimum number of devices for a given raid level '''
8 rl = md_raidlevel_short(raidlevel)
9- if rl in [0, 1, 'linear', 'stripe', 'container']:
10+ if rl in [0, 1, 'linear', 'stripe', 'container', 5, 10]:
11 return 2
12- if rl in [5]:
13- return 3
14- if rl in [6, 10]:
15+ if rl == 6:
16 return 4
17
18 return -1
19diff --git a/tests/unittests/test_block_mdadm.py b/tests/unittests/test_block_mdadm.py
20index b04cf82..6e375d4 100644
21--- a/tests/unittests/test_block_mdadm.py
22+++ b/tests/unittests/test_block_mdadm.py
23@@ -800,9 +800,8 @@ class TestBlockMdadmMdHelpers(CiTestCase):
24
25 def test_md_minimum_devices(self):
26 min_to_rl = {
27- 2: [0, 1, 'linear', 'stripe'],
28- 3: [5],
29- 4: [6, 10],
30+ 2: [0, 1, 'linear', 'stripe', 5, 10],
31+ 4: [6],
32 }
33
34 for rl in [0, 1, 5, 6, 10, 'linear', 'stripe']:

Subscribers

People subscribed via source and target branches