Merge into master : fix/lvm-over-raid : lp:~raharper/curtin : Git : Code : curtin

Reviewer	Review Type	Date Requested	Status
Server Team CI bot	continuous-integration		Approve on 2018-08-17
Scott Moser (community)		2018-07-26	Approve on 2018-08-17
Review via email: mp+351384@code.launchpad.net

Revision history for this message

Server Team CI bot (server-team-bot) wrote on 2018-07-26:

#

PASSED: Continuous integration, rev:c0c14a52e7cb2f4b84cfc8ca971a4946a9dc9b01
https://jenkins.ubuntu.com/server/job/curtin-ci/1001/
Executed test runs:
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-arm64/1001
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-ppc64el/1001
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-s390x/1001
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=torkoal/1001

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/curtin-ci/1001/rebuild

review: Approve (continuous-integration)

Revision history for this message

Server Team CI bot (server-team-bot) wrote on 2018-07-27:

#

PASSED: Continuous integration, rev:90332eca4feaf945e2b00315685f309ef9d75633
https://jenkins.ubuntu.com/server/job/curtin-ci/1003/
Executed test runs:
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-arm64/1003
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-ppc64el/1003
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-s390x/1003
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=torkoal/1003

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/curtin-ci/1003/rebuild

review: Approve (continuous-integration)

Revision history for this message

Ryan Harper (raharper) wrote on 2018-07-27:

#

Fixing up trusty; it's lvm2 is fickle without lvmetad.

Revision history for this message

Scott Moser (smoser) on 2018-07-27:

#

Revision history for this message

Ryan Harper (raharper) wrote on 2018-07-30:

#

OK, I think I've got trusty sorted out.

https://jenkins.ubuntu.com/server/job/curtin-vmtest-devel-debug/80/console

It's now passing, so I think we can review this.

Revision history for this message

Server Team CI bot (server-team-bot) wrote on 2018-07-30:

#

PASSED: Continuous integration, rev:61c92f3ef4c4750d5698d703644a7a316867b138
https://jenkins.ubuntu.com/server/job/curtin-ci/1004/
Executed test runs:
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-arm64/1004
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-ppc64el/1004
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-s390x/1004
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=torkoal/1004

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/curtin-ci/1004/rebuild

review: Approve (continuous-integration)

Revision history for this message

Scott Moser (smoser) wrote on 2018-07-31:

#

I had some comments inline in my review above.
Please address or explain.

review: Needs Information

Revision history for this message

Ryan Harper (raharper) wrote on 2018-07-31:

#

> is there a reason not to have lvm_scan do the udevadm_settle ?
> it seems like all callers are probably interested in having that happen.
>
> maybe we even have race conditions elsewhere as I don't see anywhere else
> that does use settle after calling this.

In blockmeta, creating a volgroup does not create any device nodes, but creating the
logical volumes will. We don't currently do a devsync() (which does a settle using
the --exit-if-exists=/path/to/device/expected to have a fast exit path if the
device node has been created) at the end of lvm logical volume creation, but
the subsequent call of the user of the lvm device uses get_path_to_storage() which
will do a devsync().

I don't think we've any race conditions due calls to scan in block-meta, the devsync
ensures we don't.

And generally adding the settle to the scan can't use the more specific --exist-if-exists
parameter since in clear-holders, we don't know what to expect.

So I generally think that leaving the need for settle up to the caller outside of the
scan itself.

> can we make this '--activate=ay' instead of 2 parms?
> Its much more clear that way. I thought you'd typod'
> your commit message with a rogue 'ay' in it.

Yes

> comments needed in dirty-disk yaml for shutting down lvm/raid

Yes, will add comments.

Revision history for this message

Scott Moser (smoser) wrote on 2018-08-01:

#

> > is there a reason not to have lvm_scan do the udevadm_settle ?
> > it seems like all callers are probably interested in having that happen.
> >
> > maybe we even have race conditions elsewhere as I don't see anywhere else
> > that does use settle after calling this.
>
> In blockmeta, creating a volgroup does not create any device nodes, but

But if that is the first time that 'lvm_scan' is run, I'm not sure
how that would not possibly create device nodes.

> creating the logical volumes will. We don't currently do a devsync()
> (which does a settle using the --exit-if-exists=/path/to/device/expected

I generally think that '--exit-if-exists=' is a bug.
consider this:
a.) /dev/sda has 1 partition on it, so '/dev/sda1' exists (from previous
install).
b.) we repartition /dev/sda, blockdev --rereadpt, and and then call
'udevadm settle --exit-if-exists=/dev/sda1'
c.) udevadm checks to see if /dev/sda1 exists. It does (it always did)
d.) udev events continue, and re-read the partition table, delete all
the old partitions and re-create them.
e.) while 'd' is happening our code thinks it is fine to use /dev/sda1

I think we have this general fallacy in logic at places.

/dev/sda1 above could just as well be /dev/vg0 or any other device.

> to have a fast exit path if the device node has been created) at the end
> of lvm logical volume creation, but > the subsequent call of the user
> of the lvm device uses get_path_to_storage() which will do a devsync().

udevadm settle is *not* a slow operation if there are no events in the
queue. It is a very important one if there *is* events in the queue.

  $ time udevadm settle
  real 0m0.013s
  user 0m0.004s
  sys 0m0.001s

So... I'd much rather have any functions that may cause udevadm events
to call settle by default and be safe than try to shave off an extra
1/100th of a second in a OS install.

Revision history for this message

Ryan Harper (raharper) wrote on 2018-08-01:

#

Download full text (4.0 KiB)

On Wed, Aug 1, 2018 at 8:13 AM Scott Moser <email address hidden> wrote:
>
> > > is there a reason not to have lvm_scan do the udevadm_settle ?
> > > it seems like all callers are probably interested in having that happen.
> > >
> > > maybe we even have race conditions elsewhere as I don't see anywhere else
> > > that does use settle after calling this.
> >
> > In blockmeta, creating a volgroup does not create any device nodes, but
>
> But if that is the first time that 'lvm_scan' is run, I'm not sure
> how that would not possibly create device nodes.

A volume group is not directly accessible as a block device. You have
to create an LV on a VG to get an accessible block device.

>
> > creating the logical volumes will. We don't currently do a devsync()
> > (which does a settle using the --exit-if-exists=/path/to/device/expected
>
> I generally think that '--exit-if-exists=' is a bug.
> consider this:
> a.) /dev/sda has 1 partition on it, so '/dev/sda1' exists (from previous
> install).
> b.) we repartition /dev/sda, blockdev --rereadpt, and and then call
> 'udevadm settle --exit-if-exists=/dev/sda1'

We wipe the disk's partition table, and then force a reread of the
partition table
and ensure that the disk no longer has *any* partitions, whatever
their name was or the location.

The disk is clean before we create any new partitions.

> c.) udevadm checks to see if /dev/sda1 exists. It does (it always did)
> d.) udev events continue, and re-read the partition table, delete all
> the old partitions and re-create them.
> e.) while 'd' is happening our code thinks it is fine to use /dev/sda1
>
> I think we have this general fallacy in logic at places.

I disagree. The order of events that happen in curtin are as follows:

1) clear-holders calls wipe_superblock on /dev/sda (which has a /dev/sda1)
2) if the device being wiped has partitions (it does), then we wipe
the partition table
      and then reread the partition table, settle and query the disk
for any partitions, repeat
      until there are no partitions reported from the kernel, we fail
if we cannot wipe the partitions.
3) after all disks are clear (no partitions are left on any disk
specified which has wipe superblock
4) we create a partition table (empty) on /dev/sda
5) we create a first partition on /dev/sda, say /dev/sda1
6) we query the path to /dev/sda1 via the get_path_to_storage() which
calls devsync() which
    does a udevadm settle --exit-if-exists

There isn't a logic bug here because we know that *we* were the ones
to create the partition and we know what the expected /dev path is.

>
> /dev/sda1 above could just as well be /dev/vg0 or any other device.

It won't be a volume group, but yes, any other block device that we
create. The logic above holds for those as well. Where do you think
it fails after accounting for how we wipe devices and ensure they're
clear (including partitions)?

>
> > to have a fast exit path if the device node has been created) at the end
> > of lvm logical volume creation, but > the subsequent call of the user
> > of the lvm device uses get_path_to_storage() which will do a devsync().
>
> udevadm settle is *not* a slow op...

On Wed, Aug 1, 2018 at 8:13 AM Scott Moser <ssmoser2+ubuntu@gmail.com> wrote:
>
> > > is there a reason not to have lvm_scan do the udevadm_settle ?
> > > it seems like all callers are probably interested in having that happen.
> > >
> > > maybe we even have race conditions elsewhere as I don't see anywhere else
> > > that does use settle after calling this.
> >
> > In blockmeta, creating a volgroup does not create any device nodes, but
>
> But if that is the first time that 'lvm_scan' is run, I'm not sure
> how that would not possibly create device nodes.

A volume group is not directly accessible as a block device.  You have
to create an LV on a VG to get an accessible block device.

>
> > creating the logical volumes will.  We don't currently do a devsync()
> > (which does a settle using the --exit-if-exists=/path/to/device/expected
>
> I generally think that '--exit-if-exists=' is a bug.
> consider this:
>  a.) /dev/sda has 1 partition on it, so '/dev/sda1' exists (from previous
> install).
>  b.) we repartition /dev/sda, blockdev --rereadpt, and  and then call
>    'udevadm settle --exit-if-exists=/dev/sda1'

We wipe the disk's partition table, and then force a reread of the
partition table
and ensure that the disk no longer has *any* partitions, whatever
their name was or the location.

The disk is clean before we create any new partitions.

>  c.) udevadm checks to see  if /dev/sda1 exists.  It does (it always did)
>  d.) udev events continue, and re-read the partition table, delete all
>      the old partitions and re-create them.
>  e.) while 'd' is happening our code thinks it is fine to use /dev/sda1
>
> I think we have this general fallacy in logic at places.

I disagree.  The order of events that happen in curtin are as follows:

1) clear-holders calls wipe_superblock on /dev/sda (which has a /dev/sda1)
2)  if the device being wiped has partitions (it does), then we wipe
the partition table
      and then reread the partition table, settle and query the disk
for any partitions, repeat
      until there are no partitions reported from the kernel, we fail
if we cannot wipe the partitions.
3) after all disks are clear (no partitions are left on any disk
specified which has wipe superblock
4) we create a partition table (empty) on /dev/sda
5) we create a first partition on /dev/sda, say /dev/sda1
6) we query the path to /dev/sda1 via the get_path_to_storage() which
calls devsync() which
    does a udevadm settle --exit-if-exists

There isn't a logic bug here because we know that *we* were the ones
to create the partition and we know what the expected /dev path is.

>
> /dev/sda1 above could just as well be /dev/vg0 or any other device.

It won't be a volume group, but yes, any other block device that we
create.  The logic above holds for those as well.  Where do you think
it fails after accounting for how we wipe devices and ensure they're
clear (including partitions)?

>
> > to have a fast exit path if the device node has been created) at the end
> > of lvm logical volume creation,  but > the subsequent call of the user
> > of the lvm device uses get_path_to_storage() which will do a devsync().
>
> udevadm settle is *not* a slow operation if there are no events in the
> queue.  It is a very important one if there *is* events in the queue.
>
>   $ time udevadm settle
>   real  0m0.013s
>   user  0m0.004s
>   sys   0m0.001s

At best, at worst, the default timeout for udevadm settle is 120
seconds; and without an --exit-if-exists, we wait on *all* events in
the queue at the time of the settle call.

>
> So... I'd much rather have any functions that may cause udevadm events
> to call settle by default and be safe than try to shave off an extra
> 1/100th of a second in a OS install.

The point isn't to shave off seconds, it's to not bother waiting on
events that are unrelated to the work we're performing.  Any time we
avoid calls to settle, we help avoid potential 120 second slowdowns.

>
> --
> https://code.launchpad.net/~raharper/curtin/+git/curtin/+merge/351384
> You are the owner of ~raharper/curtin:fix/lvm-over-raid.

Revision history for this message

Scott Moser (smoser) wrote on 2018-08-08:

#

Download full text (4.6 KiB)

> > > creating the logical volumes will. We don't currently do a devsync()
> > > (which does a settle using the --exit-if-exists=/path/to/device/expected
> >
> > I generally think that '--exit-if-exists=' is a bug.
> > consider this:
> > a.) /dev/sda has 1 partition on it, so '/dev/sda1' exists (from previous
> > install).
> > b.) we repartition /dev/sda, blockdev --rereadpt, and and then call
> > 'udevadm settle --exit-if-exists=/dev/sda1'
>
> We wipe the disk's partition table, and then force a reread of the
> partition table
> and ensure that the disk no longer has *any* partitions, whatever
> their name was or the location.
>
> The disk is clean before we create any new partitions.
>
> > c.) udevadm checks to see if /dev/sda1 exists. It does (it always did)
> > d.) udev events continue, and re-read the partition table, delete all
> > the old partitions and re-create them.
> > e.) while 'd' is happening our code thinks it is fine to use /dev/sda1
> >
> > I think we have this general fallacy in logic at places.
>
> I disagree. The order of events that happen in curtin are as follows:
>
> 1) clear-holders calls wipe_superblock on /dev/sda (which has a /dev/sda1)
> 2) if the device being wiped has partitions (it does), then we wipe
> the partition table
> and then reread the partition table, settle and query the disk
> for any partitions, repeat
> until there are no partitions reported from the kernel, we fail
> if we cannot wipe the partitions.
> 3) after all disks are clear (no partitions are left on any disk
> specified which has wipe superblock
> 4) we create a partition table (empty) on /dev/sda
> 5) we create a first partition on /dev/sda, say /dev/sda1
> 6) we query the path to /dev/sda1 via the get_path_to_storage() which
> calls devsync() which
> does a udevadm settle --exit-if-exists
>
> There isn't a logic bug here because we know that *we* were the ones
> to create the partition and we know what the expected /dev path is.
>
> >
> > /dev/sda1 above could just as well be /dev/vg0 or any other device.
>
> It won't be a volume group, but yes, any other block device that we
> create. The logic above holds for those as well. Where do you think
> it fails after accounting for how we wipe devices and ensure they're
> clear (including partitions)?
>
>
> >
> > > to have a fast exit path if the device node has been created) at the end
> > > of lvm logical volume creation, but > the subsequent call of the user
> > > of the lvm device uses get_path_to_storage() which will do a devsync().
> >
> > udevadm settle is *not* a slow operation if there are no events in the
> > queue. It is a very important one if there *is* events in the queue.
> >
> > $ time udevadm settle
> > real 0m0.013s
> > user 0m0.004s
> > sys 0m0.001s
>
> At best, at worst, the default timeout for udevadm settle is 120
> seconds; and without an --exit-if-exists, we wait on *all* events in
> the queue at the time of the settle call.

>
> >
> > So... I'd much rather have any functions that may cause udevadm events
> > to call settle by default and be safe than try to shave off an extra
> > 1/100th of a second in a OS...

> > > creating the logical volumes will.  We don't currently do a devsync()
> > > (which does a settle using the --exit-if-exists=/path/to/device/expected
> >
> > I generally think that '--exit-if-exists=' is a bug.
> > consider this:
> >  a.) /dev/sda has 1 partition on it, so '/dev/sda1' exists (from previous
> > install).
> >  b.) we repartition /dev/sda, blockdev --rereadpt, and  and then call
> >    'udevadm settle --exit-if-exists=/dev/sda1'
> 
> We wipe the disk's partition table, and then force a reread of the
> partition table
> and ensure that the disk no longer has *any* partitions, whatever
> their name was or the location.
> 
> The disk is clean before we create any new partitions.
> 
> >  c.) udevadm checks to see  if /dev/sda1 exists.  It does (it always did)
> >  d.) udev events continue, and re-read the partition table, delete all
> >      the old partitions and re-create them.
> >  e.) while 'd' is happening our code thinks it is fine to use /dev/sda1
> >
> > I think we have this general fallacy in logic at places.
> 
> I disagree.  The order of events that happen in curtin are as follows:
> 
> 1) clear-holders calls wipe_superblock on /dev/sda (which has a /dev/sda1)
> 2)  if the device being wiped has partitions (it does), then we wipe
> the partition table
>       and then reread the partition table, settle and query the disk
> for any partitions, repeat
>       until there are no partitions reported from the kernel, we fail
> if we cannot wipe the partitions.
> 3) after all disks are clear (no partitions are left on any disk
> specified which has wipe superblock
> 4) we create a partition table (empty) on /dev/sda
> 5) we create a first partition on /dev/sda, say /dev/sda1
> 6) we query the path to /dev/sda1 via the get_path_to_storage() which
> calls devsync() which
>     does a udevadm settle --exit-if-exists
> 
> There isn't a logic bug here because we know that *we* were the ones
> to create the partition and we know what the expected /dev path is.
> 
> >
> > /dev/sda1 above could just as well be /dev/vg0 or any other device.
> 
> It won't be a volume group, but yes, any other block device that we
> create.  The logic above holds for those as well.  Where do you think
> it fails after accounting for how we wipe devices and ensure they're
> clear (including partitions)?
> 
> 
> >
> > > to have a fast exit path if the device node has been created) at the end
> > > of lvm logical volume creation,  but > the subsequent call of the user
> > > of the lvm device uses get_path_to_storage() which will do a devsync().
> >
> > udevadm settle is *not* a slow operation if there are no events in the
> > queue.  It is a very important one if there *is* events in the queue.
> >
> >   $ time udevadm settle
> >   real  0m0.013s
> >   user  0m0.004s
> >   sys   0m0.001s
> 
> At best, at worst, the default timeout for udevadm settle is 120
> seconds; and without an --exit-if-exists, we wait on *all* events in
> the queue at the time of the settle call.

> 
> >
> > So... I'd much rather have any functions that may cause udevadm events
> > to call settle by default and be safe than try to shave off an extra
> > 1/100th of a second in a OS install.
> 
> The point isn't to shave off seconds, it's to not bother waiting on
> events that are unrelated to the work we're performing.  Any time we
> avoid calls to settle, we help avoid potential 120 second slowdowns.

non-sense.
have you ever seen 'udevadm settle' take 120 seconds?  Can you give
an example of when it *could* do that?  The only thing I can reasonably
come up with are disk based operations on large md or something.
Those are exactly the reasons we want to settle.  Anything else that
was seemingly unrelated but is taking significant time honestly
probably something important to an installer.  I'd much rather
see a log entry that showed a timeout failed on a settle than
try to diagnose some related failure.

If you use --exit-if-exists without having settle'd and checked that
the thing didn't exist before you perform an operation that would
cause it to it is a logic flaw.  and its one you're going to pull
your hair out trying to find.

Your only argument against calling udevadm settle is to not waste time.
And my argument is that it is not a waste of time both because
a.) it is a very insignificant amount of time except for when it is
   important.
b.) it is important in most cases.

I'd much prefer that any function that makes changes to disks
settle by default.  If the caller wants to *not* settle (because
they have a better way or more knowledge) then we can do that.
The function is just more caller friendly if it doesn't have race
conditions waiting to explode in the user's face.

Revision history for this message

Ryan Harper (raharper) wrote on 2018-08-08:

#

Download full text (7.0 KiB)

On Wed, Aug 8, 2018 at 9:58 AM Scott Moser <email address hidden> wrote:
>
> > > > creating the logical volumes will. We don't currently do a devsync()
> > > > (which does a settle using the --exit-if-exists=/path/to/device/expected
> > >
> > > I generally think that '--exit-if-exists=' is a bug.
> > > consider this:
> > > a.) /dev/sda has 1 partition on it, so '/dev/sda1' exists (from previous
> > > install).
> > > b.) we repartition /dev/sda, blockdev --rereadpt, and and then call
> > > 'udevadm settle --exit-if-exists=/dev/sda1'
> >
> > We wipe the disk's partition table, and then force a reread of the
> > partition table
> > and ensure that the disk no longer has *any* partitions, whatever
> > their name was or the location.
> >
> > The disk is clean before we create any new partitions.
> >
> > > c.) udevadm checks to see if /dev/sda1 exists. It does (it always did)
> > > d.) udev events continue, and re-read the partition table, delete all
> > > the old partitions and re-create them.
> > > e.) while 'd' is happening our code thinks it is fine to use /dev/sda1
> > >
> > > I think we have this general fallacy in logic at places.
> >
> > I disagree. The order of events that happen in curtin are as follows:
> >
> > 1) clear-holders calls wipe_superblock on /dev/sda (which has a /dev/sda1)
> > 2) if the device being wiped has partitions (it does), then we wipe
> > the partition table
> > and then reread the partition table, settle and query the disk
> > for any partitions, repeat
> > until there are no partitions reported from the kernel, we fail
> > if we cannot wipe the partitions.
> > 3) after all disks are clear (no partitions are left on any disk
> > specified which has wipe superblock
> > 4) we create a partition table (empty) on /dev/sda
> > 5) we create a first partition on /dev/sda, say /dev/sda1
> > 6) we query the path to /dev/sda1 via the get_path_to_storage() which
> > calls devsync() which
> > does a udevadm settle --exit-if-exists
> >
> > There isn't a logic bug here because we know that *we* were the ones
> > to create the partition and we know what the expected /dev path is.
> >
> > >
> > > /dev/sda1 above could just as well be /dev/vg0 or any other device.
> >
> > It won't be a volume group, but yes, any other block device that we
> > create. The logic above holds for those as well. Where do you think
> > it fails after accounting for how we wipe devices and ensure they're
> > clear (including partitions)?
> >
> >
> > >
> > > > to have a fast exit path if the device node has been created) at the end
> > > > of lvm logical volume creation, but > the subsequent call of the user
> > > > of the lvm device uses get_path_to_storage() which will do a devsync().
> > >
> > > udevadm settle is *not* a slow operation if there are no events in the
> > > queue. It is a very important one if there *is* events in the queue.
> > >
> > > $ time udevadm settle
> > > real 0m0.013s
> > > user 0m0.004s
> > > sys 0m0.001s
> >
> > At best, at worst, the default timeout for udevadm settle is 120
> > seconds; and without an --exit-if-exists, we wait on *all* events in
> > the queue at the time ...

On Wed, Aug 8, 2018 at 9:58 AM Scott Moser <ssmoser2+ubuntu@gmail.com> wrote:
>
> > > > creating the logical volumes will.  We don't currently do a devsync()
> > > > (which does a settle using the --exit-if-exists=/path/to/device/expected
> > >
> > > I generally think that '--exit-if-exists=' is a bug.
> > > consider this:
> > >  a.) /dev/sda has 1 partition on it, so '/dev/sda1' exists (from previous
> > > install).
> > >  b.) we repartition /dev/sda, blockdev --rereadpt, and  and then call
> > >    'udevadm settle --exit-if-exists=/dev/sda1'
> >
> > We wipe the disk's partition table, and then force a reread of the
> > partition table
> > and ensure that the disk no longer has *any* partitions, whatever
> > their name was or the location.
> >
> > The disk is clean before we create any new partitions.
> >
> > >  c.) udevadm checks to see  if /dev/sda1 exists.  It does (it always did)
> > >  d.) udev events continue, and re-read the partition table, delete all
> > >      the old partitions and re-create them.
> > >  e.) while 'd' is happening our code thinks it is fine to use /dev/sda1
> > >
> > > I think we have this general fallacy in logic at places.
> >
> > I disagree.  The order of events that happen in curtin are as follows:
> >
> > 1) clear-holders calls wipe_superblock on /dev/sda (which has a /dev/sda1)
> > 2)  if the device being wiped has partitions (it does), then we wipe
> > the partition table
> >       and then reread the partition table, settle and query the disk
> > for any partitions, repeat
> >       until there are no partitions reported from the kernel, we fail
> > if we cannot wipe the partitions.
> > 3) after all disks are clear (no partitions are left on any disk
> > specified which has wipe superblock
> > 4) we create a partition table (empty) on /dev/sda
> > 5) we create a first partition on /dev/sda, say /dev/sda1
> > 6) we query the path to /dev/sda1 via the get_path_to_storage() which
> > calls devsync() which
> >     does a udevadm settle --exit-if-exists
> >
> > There isn't a logic bug here because we know that *we* were the ones
> > to create the partition and we know what the expected /dev path is.
> >
> > >
> > > /dev/sda1 above could just as well be /dev/vg0 or any other device.
> >
> > It won't be a volume group, but yes, any other block device that we
> > create.  The logic above holds for those as well.  Where do you think
> > it fails after accounting for how we wipe devices and ensure they're
> > clear (including partitions)?
> >
> >
> > >
> > > > to have a fast exit path if the device node has been created) at the end
> > > > of lvm logical volume creation,  but > the subsequent call of the user
> > > > of the lvm device uses get_path_to_storage() which will do a devsync().
> > >
> > > udevadm settle is *not* a slow operation if there are no events in the
> > > queue.  It is a very important one if there *is* events in the queue.
> > >
> > >   $ time udevadm settle
> > >   real  0m0.013s
> > >   user  0m0.004s
> > >   sys   0m0.001s
> >
> > At best, at worst, the default timeout for udevadm settle is 120
> > seconds; and without an --exit-if-exists, we wait on *all* events in
> > the queue at the time of the settle call.
>
> >
> > >
> > > So... I'd much rather have any functions that may cause udevadm events
> > > to call settle by default and be safe than try to shave off an extra
> > > 1/100th of a second in a OS install.
> >
> > The point isn't to shave off seconds, it's to not bother waiting on
> > events that are unrelated to the work we're performing.  Any time we
> > avoid calls to settle, we help avoid potential 120 second slowdowns.
>
> non-sense.
> have you ever seen 'udevadm settle' take 120 seconds?  Can you give
> an example of when it *could* do that?  The only thing I can reasonably

https://ask.fedoraproject.org/en/question/81006/slow-boot-time-udev-settle/
https://askubuntu.com/questions/888010/about-systemd-udev-settle-service-and-slow-booting
https://bugzilla.redhat.com/show_bug.cgi?id=735866
https://bbs.archlinux.org/viewtopic.php?id=189369
https://forums.opensuse.org/showthread.php/506828-Udev-3-minute-delay-on-boot
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754987
https://lists.debian.org/debian-user/2015/04/msg01592.html

Settle can block up to 120 seconds any time one or more events
that are in the queue when settle is called that do not get processed.
The event in queue may be unrelated to storage; it can be any event
in the system (net, block, usb); settle doesn't care.

> come up with are disk based operations on large md or something.
> Those are exactly the reasons we want to settle.  Anything else that
> was seemingly unrelated but is taking significant time honestly
> probably something important to an installer.  I'd much rather
> see a log entry that showed a timeout failed on a settle than
> try to diagnose some related failure.

Recall this discussion:
  https://bugs.launchpad.net/cloud-init/+bug/1766287
  https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/344198

>
> If you use --exit-if-exists without having settle'd and checked that
> the thing didn't exist before you perform an operation that would
> cause it to it is a logic flaw.  and its one you're going to pull
> your hair out trying to find.

I've already showed that  we ensure that disks have no partitions
and constructed devices are destroy. We only call --exit-if-exists
after we've issued commands which create new devices;

>
> Your only argument against calling udevadm settle is to not waste time.
> And my argument is that it is not a waste of time both because
> a.) it is a very insignificant amount of time except for when it is
>    important.

It may be short, it may be up to 120 seconds.  It introduces variability
that we don't need and if you don't know that it's needed it cannot
provide "safety" w.r.t devices.

> b.) it is important in most cases.

My argument is that we should only call settle when we *need* it.
Extra calls to settle just because at best waste a few milliseconds,
at worse large amounts of latency to the install path.

Calling settle when no events are in the queue doesn't
ensure safety.  One needs to know when you've queued events (ie you've
created a new device)
and then calling settle is effective, and we use exit-if-exists to
keep us from waiting
for events that are not related to the device we created.

We know when we need to call it.

>
> I'd much prefer that any function that makes changes to disks
> settle by default.  If the caller wants to *not* settle (because

Note that lvm_scan does *not* create any devices.  It *may* activate
lvs if some are present.

> they have a better way or more knowledge) then we can do that.
> The function is just more caller friendly if it doesn't have race
> conditions waiting to explode in the user's face.

The assumption that we always need to settle is the core issue.
Callers always need to know whether they need to call settle,
and in curtin, we have that knowledge.

Also where is the race?  You keep pointing to the --exist-if-exists scenario
but that's unrelated to an lvm scan were we might activate an LV
of whom's path we don't know (and wouldn't)

Revision history for this message

Scott Moser (smoser) wrote on 2018-08-09:

#

Download full text (5.8 KiB)

> > non-sense.
> > have you ever seen 'udevadm settle' take 120 seconds? Can you give
> > an example of when it *could* do that? The only thing I can reasonably
>
> https://ask.fedoraproject.org/en/question/81006/slow-boot-time-udev-settle/
> https://askubuntu.com/questions/888010/about-systemd-udev-settle-service-and-
> slow-booting
> https://bugzilla.redhat.com/show_bug.cgi?id=735866
> https://bbs.archlinux.org/viewtopic.php?id=189369
> https://forums.opensuse.org/showthread.php/506828-Udev-3-minute-delay-on-boot
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754987
> https://lists.debian.org/debian-user/2015/04/msg01592.html

Every single example link above is in regard to slow boot times.
Boot is a time when there is a flurry of udev events. curtin does
not run in that time frame which is a *huge* outlier in the general
life of a system.

In order to get actual numbers on how long we spend time waiting for
udev events that I think are important, I put up:
https://code.launchpad.net/~smoser/curtin/+git/curtin/+merge/352791

A full vmtest run produced 13098 calls to udevadm settle over 257
installs. Of those calls, 1555 took over .1 seconds. The longest
coming in during TrustyHWEXTestRaid5Bcache at 1.809 seconds.
http://paste.ubuntu.com/p/t2k6tDw6x7/

Here is a log of total time spent in settle along with total time
spent in 'curtin install'.
http://paste.ubuntu.com/p/rNsqfpmxkr/
Admittedly the total time we spend "waiting" can be 10% of the
time of our install, but I'd still argue that those waits are
almost certainly valid.

> Settle can block up to 120 seconds any time one or more events
> that are in the queue when settle is called that do not get processed.
> The event in queue may be unrelated to storage; it can be any event
> in the system (net, block, usb); settle doesn't care.

And you think that at the time frame that an installer is running
there is likely a flurry of net or usb events coming in?
Go ahead, find a busy system that is not booting and run 'udevadm monitor'
See if you see a single event over the course of 10 minutes or so.

I gave that a try, I ran 'lxc launch' on a lxd with zfs. I was honestly
floored at how many events came through. (1163 UDEV , 1163 KERNEL).
I ran a loop of 'time udevadm settle' during that, and it never took more
than 1/100th of a second.

> > come up with are disk based operations on large md or something.
> > Those are exactly the reasons we want to settle. Anything else that
> > was seemingly unrelated but is taking significant time honestly
> > probably something important to an installer. I'd much rather
> > see a log entry that showed a timeout failed on a settle than
> > try to diagnose some related failure.
>
> Recall this discussion:
> https://bugs.launchpad.net/cloud-init/+bug/1766287
> https://code.launchpad.net/~raharper/cloud-init/+git/cloud-
> init/+merge/344198

Both of our responses on that bug is that we'd much rather
have *stable* behavior than fast behavior. That is what I'm arguing for here.
So I'm confused on your pointing to it.

> My argument is that we should only call settle when we *need* it.
> Extra calls to settle just because at best waste a...

> > non-sense.
> > have you ever seen 'udevadm settle' take 120 seconds?  Can you give
> > an example of when it *could* do that?  The only thing I can reasonably
> 
> https://ask.fedoraproject.org/en/question/81006/slow-boot-time-udev-settle/
> https://askubuntu.com/questions/888010/about-systemd-udev-settle-service-and-
> slow-booting
> https://bugzilla.redhat.com/show_bug.cgi?id=735866
> https://bbs.archlinux.org/viewtopic.php?id=189369
> https://forums.opensuse.org/showthread.php/506828-Udev-3-minute-delay-on-boot
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754987
> https://lists.debian.org/debian-user/2015/04/msg01592.html

Every single example link above is in regard to slow boot times.
Boot is a time when there is a flurry of udev events.  curtin does
not run in that time frame which is a *huge* outlier in the general
life of a system.

In order to get actual numbers on how long we spend time waiting for
udev events that I think are important, I put up:
 https://code.launchpad.net/~smoser/curtin/+git/curtin/+merge/352791

A full vmtest run produced 13098 calls to udevadm settle over 257
installs.  Of those calls, 1555 took over .1 seconds.  The longest
coming in during TrustyHWEXTestRaid5Bcache at 1.809 seconds.
 http://paste.ubuntu.com/p/t2k6tDw6x7/

Here is a log of total time spent in settle along with total time
spent in 'curtin install'.
 http://paste.ubuntu.com/p/rNsqfpmxkr/
Admittedly the total time we spend "waiting" can be 10% of the
time of our install, but I'd still argue that those waits are
almost certainly valid.

> Settle can block up to 120 seconds any time one or more events
> that are in the queue when settle is called that do not get processed.
> The event in queue may be unrelated to storage; it can be any event
> in the system (net, block, usb); settle doesn't care.

And you think that at the time frame that an installer is running
there is likely a flurry of net or usb events coming in?  
Go ahead, find a busy system that is not booting and run 'udevadm monitor'
See if you see a single event over the course of 10 minutes or so.

I gave that a try, I ran 'lxc launch' on a lxd with zfs.  I was honestly
floored at how many events came through.  (1163 UDEV , 1163 KERNEL).
I ran a loop of 'time udevadm settle' during that, and it never took more
than 1/100th of a second.

> > come up with are disk based operations on large md or something.
> > Those are exactly the reasons we want to settle.  Anything else that
> > was seemingly unrelated but is taking significant time honestly
> > probably something important to an installer.  I'd much rather
> > see a log entry that showed a timeout failed on a settle than
> > try to diagnose some related failure.
> 
> Recall this discussion:
>   https://bugs.launchpad.net/cloud-init/+bug/1766287
>   https://code.launchpad.net/~raharper/cloud-init/+git/cloud-
> init/+merge/344198

Both of our responses on that bug is that we'd much rather
have *stable* behavior than fast behavior.  That is what I'm arguing for here.
So I'm confused on your pointing to it.

> My argument is that we should only call settle when we *need* it.
> Extra calls to settle just because at best waste a few milliseconds,
> at worse large amounts of latency to the install path.
> 
> Calling settle when no events are in the queue doesn't
> ensure safety.  One needs to know when you've queued events (ie you've
> created a new device)
> and then calling settle is effective, and we use exit-if-exists to
> keep us from waiting
> for events that are not related to the device we created.
> 
> We know when we need to call it.

I accept that in some scenarios it may be possible for the caller of
a function to desire to not have a udevadm settle called, because they
know they are accounting for that at a higher level.  I've suggested
I'm fine with the option to pass in 'skip_settle' to different functions.
I think that in general, the function should behave in a safe way
for the caller.   If the caller wants to shave off precious milliseconds
then they can, but the default should be safety.

> > I'd much prefer that any function that makes changes to disks
> > settle by default.  If the caller wants to *not* settle (because
> 
> Note that lvm_scan does *not* create any devices.  It *may* activate
> lvs if some are present.

If it can activate logical volumes, couldn't those logical volumes
be part of a volume group which then created a device as a result of
an event?  There is a *ton* of magic like that in udev.

> > they have a better way or more knowledge) then we can do that.
> > The function is just more caller friendly if it doesn't have race
> > conditions waiting to explode in the user's face.
> 
> The assumption that we always need to settle is the core issue.
> Callers always need to know whether they need to call settle,
> and in curtin, we have that knowledge.

Thats an argument for a python library that is only usable by uber
elite people who hold all knowledge of multiple distributions, kernels,
udev subsystem versions, hooks and other things in their head.

Looking at 'git log', neither you nor I fit into that group.

> Also where is the race?  You keep pointing to the --exist-if-exists scenario
> but that's unrelated to an lvm scan were we might activate an LV
> of whom's path we don't know (and wouldn't)

Are you sure that lvm_scan (which does pvscan followed by 
'vgscan --mknodes') cannot possibly end up causing udev events that cause
new volume groups that causes the creation of or deletion of new block
devices?

I just want to make functions we have like 'lvm_scan' generally usable.
I do not want them to have pointy edges that bite us or other users
of curtin.  If you're certain that 'lvm_scan' by itself is safe 
then I'm fine to not have it call udevadm settle.

If it is not safe then lets add a 'skip_settle' argument and the
caller can call it.

Revision history for this message

Server Team CI bot (server-team-bot) wrote on 2018-08-10:

#

PASSED: Continuous integration, rev:ed603a4e7eba552cb4345961c8dca5a71de8e343
https://jenkins.ubuntu.com/server/job/curtin-ci/1027/
Executed test runs:
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-arm64/1027
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-ppc64el/1027
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=metal-s390x/1027
    SUCCESS: https://jenkins.ubuntu.com/server/job/curtin-ci/nodes=torkoal/1027

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/curtin-ci/1027/rebuild

review: Approve (continuous-integration)

Revision history for this message

Ryan Harper (raharper) wrote on 2018-08-13:

#

jenkins vmtest run approves:

https://jenkins.ubuntu.com/server/job/curtin-vmtest-devel-debug/91/console

----------------------------------------------------------------------
Ran 3198 tests in 13805.907s

OK (SKIP=288)
Mon, 13 Aug 2018 18:33:50 +0000: vmtest end [0] in 13809s

Revision history for this message

Scott Moser (smoser) wrote on 2018-08-17:

#

bah. a comment i had made a week ago inline that didn't get saved.

this does look really good though at this point.

Revision history for this message

Ryan Harper (raharper) wrote on 2018-08-17:

#

On Fri, Aug 17, 2018 at 7:51 AM Scott Moser <email address hidden> wrote:
>
> bah. a comment i had made a week ago inline that didn't get saved.
>
> this does look really good though at this point.
>
>
> Diff comments:
>
> > diff --git a/examples/tests/lvmoverraid.yaml b/examples/tests/lvmoverraid.yaml
>
> By adding the lvm stop and mdadm stop you are pushing us through a very close path that we'd previously not been through. But aren't you also ensuring we do not hit the previously tested (and more common) code path?
> At least for lvm wouldn't the ephemeral environment normal have
> The pscan running?

Thanks for raising this.

In the previous "common" case, the LVMs were never removed (kernel
still knew of them, device nodes existed) since we've never performed
the stop; and they were never "rediscovered" by a scan since the lvm
tools examine /etc/lvm/* for config before looking at the disk. I
confirmed this when attempting to reproduce the failure only worked
once I added the above code to the dirty-disk mode.

I believe that what we are doing now matches more closely to what
curtin would see on first-boot of a system which has LVM on block
devices but the OS (our ephemeral image) will have no record of it and
it can *only* be found via our lvm_scan() + vg_activate().

>
> > new file mode 100644
> > index 0000000..a1d41e9
>
>
> --
> https://code.launchpad.net/~raharper/curtin/+git/curtin/+merge/351384
> You are the owner of ~raharper/curtin:fix/lvm-over-raid.

Revision history for this message

Scott Moser (smoser) wrote on 2018-08-17:

#

Sounds reasonable.

On Fri, Aug 17, 2018 at 10:24 AM Ryan Harper <email address hidden>
wrote:

> On Fri, Aug 17, 2018 at 7:51 AM Scott Moser <email address hidden>
> wrote:
> >
> > bah. a comment i had made a week ago inline that didn't get saved.
> >
> > this does look really good though at this point.
> >
> >
> > Diff comments:
> >
> > > diff --git a/examples/tests/lvmoverraid.yaml
> b/examples/tests/lvmoverraid.yaml
> >
> > By adding the lvm stop and mdadm stop you are pushing us through a very
> close path that we'd previously not been through. But aren't you also
> ensuring we do not hit the previously tested (and more common) code path?
> > At least for lvm wouldn't the ephemeral environment normal have
> > The pscan running?
>
> Thanks for raising this.
>
> In the previous "common" case, the LVMs were never removed (kernel
> still knew of them, device nodes existed) since we've never performed
> the stop; and they were never "rediscovered" by a scan since the lvm
> tools examine /etc/lvm/* for config before looking at the disk. I
> confirmed this when attempting to reproduce the failure only worked
> once I added the above code to the dirty-disk mode.
>
> I believe that what we are doing now matches more closely to what
> curtin would see on first-boot of a system which has LVM on block
> devices but the OS (our ephemeral image) will have no record of it and
> it can *only* be found via our lvm_scan() + vg_activate().
>
>
>
> >
> > > new file mode 100644
> > > index 0000000..a1d41e9
> >
> >
> > --
> > https://code.launchpad.net/~raharper/curtin/+git/curtin/+merge/351384
> > You are the owner of ~raharper/curtin:fix/lvm-over-raid.
>
> --
> https://code.launchpad.net/~raharper/curtin/+git/curtin/+merge/351384
> You are reviewing the proposed merge of ~raharper/curtin:fix/lvm-over-raid
> into curtin:master.
>

Revision history for this message

Scott Moser (smoser) wrote on 2018-08-17:

#

sounds good.

review: Approve

Revision history for this message

Server Team CI bot (server-team-bot) wrote on 2018-08-17:

#

FAILED: Autolanding.
More details in the following jenkins job:
https://jenkins.ubuntu.com/server/job/curtin-autoland-test/15/
Executed test runs:
None: https://jenkins.ubuntu.com/server/job/admin-lp-git-autoland/52/console

review: Needs Fixing (continuous-integration)

Revision history for this message

Server Team CI bot (server-team-bot) on 2018-08-17:

#

review: Approve (continuous-integration)

curtin

Merge ~raharper/curtin:fix/lvm-over-raid into curtin:master

Commit message

Description of the change

Preview Diff

Subscribers

 diff --git a/curtin/block/clear_holders.py b/curtin/block/clear_holders.py
 index 9d73b28..a2042d5 100644
 --- a/curtin/block/clear_holders.py
 +++ b/curtin/block/clear_holders.py
@@ -624,6 +624,9 @@ def start_clear_holders_deps():
      # all disks and partitions should be sufficient to remove the mdadm
      # metadata
      mdadm.mdadm_assemble(scan=True, ignore_errors=True)
++    # scan and activate for logical volumes
++    lvm.lvm_scan()
++    lvm.activate_volgroups()
      # the bcache module needs to be present to properly detect bcache devs
      # on some systems (precise without hwe kernel) it may not be possible to
      # lad the bcache module bcause it is not present in the kernel. if this
 diff --git a/curtin/block/lvm.py b/curtin/block/lvm.py
 index 8643245..eca64f6 100644
 --- a/curtin/block/lvm.py
 +++ b/curtin/block/lvm.py
@@ -57,14 +57,32 @@ def lvmetad_running():
                                           '/run/lvmetad.pid'))
--def lvm_scan():
++def activate_volgroups():
++    """
++    Activate available volgroups and logical volumes within.
++
++    # found
++    % vgchange -ay
++      1 logical volume(s) in volume group "vg1sdd" now active
++
++    # none found (no output)
++    % vgchange -ay
++    """
++
++    # vgchange handles syncing with udev by default
++    # see man 8 vgchange and flag --noudevsync
++    out, _ = util.subp(['vgchange', '--activate=y'], capture=True)
++    if out:
++        LOG.info(out)
++
++
++def lvm_scan(activate=True):
      """
      run full scan for volgroups, logical volumes and physical volumes
      """
--    # the lvm tools lvscan, vgscan and pvscan on ubuntu precise do not
--    # support the flag --cache. the flag is present for the tools in ubuntu
--    # trusty and later. since lvmetad is used in current releases of
--    # ubuntu, the --cache flag is needed to ensure that the data cached by
++    # prior to xenial, lvmetad is not packaged, so even if a tool supports
++    # flag --cache it has no effect. In Xenial and newer the --cache flag is
++    # used (if lvmetad is running) to ensure that the data cached by
      # lvmetad is updated.
      # before appending the cache flag though, check if lvmetad is running. this
 diff --git a/examples/tests/dirty_disks_config.yaml b/examples/tests/dirty_disks_config.yaml
 index 75d44c3..fb9a0d6 100644
 --- a/examples/tests/dirty_disks_config.yaml
 +++ b/examples/tests/dirty_disks_config.yaml
@@ -27,6 +27,31 @@ bucket:
        # disable any rpools to trigger disks with zfs_member label but inactive
        # pools
        zpool export rpool ||:
++  - &lvm_stop |
++      #!/bin/sh
++      # This function disables any existing lvm logical volumes that
++      # have been created during the early storage config stage
++      # and simulates the effect of booting into a system with existing
++      # (but inactive) lvm configuration.
++      for vg in `pvdisplay -C --separator = -o vg_name --noheadings`; do
++         vgchange -an $vg ||:
++      done
++      # disable the automatic pvscan, we want to test that curtin
++      # can find/enable logical volumes without this service
++      command -v systemctl && systemctl mask lvm2-pvscan\@.service
++      # remove any existing metadata written from early disk config
++      rm -rf /etc/lvm/archive /etc/lvm/backup
++  - &mdadm_stop  |
++      #!/bin/sh
++      # This function disables any existing raid devices which may
++      # have been created during the early storage config stage
++      # and simulates the effect of booting into a system with existing
++      # but inactive mdadm configuration.
++      for md in /dev/md*; do
++          mdadm --stop $md ||:
++      done
++      # remove any existing metadata written from early disk config
++      rm -f /etc/mdadm/mdadm.conf
  early_commands:
    # running block-meta custom from the install environment
@@ -34,9 +59,11 @@ early_commands:
    # the disks exactly as in this config before the rest of the install
    # will just blow it all away.  We have clean out other environment
    # that could unintentionally mess things up.
--  blockmeta: [env, -u, OUTPUT_FSTAB,
++  01-blockmeta: [env, -u, OUTPUT_FSTAB,
                TARGET_MOUNT_POINT=/tmp/my.bdir/target,
                WORKING_DIR=/tmp/my.bdir/work.d,
                curtin, --showtrace, -v, block-meta, --umount, custom]
--  enable_swaps: [sh, -c, *swapon]
--  disable_rpool: [sh, -c, *zpool_export]
++  02-enable_swaps: [sh, -c, *swapon]
++  03-disable_rpool: [sh, -c, *zpool_export]
++  04-lvm_stop: [sh, -c, *lvm_stop]
++  05-mdadm_stop: [sh, -c, *mdadm_stop]
 diff --git a/examples/tests/lvmoverraid.yaml b/examples/tests/lvmoverraid.yaml
 new file mode 100644
 index 0000000..a1d41e9
 --- /dev/null
 +++ b/examples/tests/lvmoverraid.yaml
@@ -0,0 +1,98 @@
++storage:
++    config:
++    -   grub_device: true
++        id: disk-0
++        model: QEMU_HARDDISK
++        name: 'main_disk'
++        serial: disk-a
++        preserve: false
++        ptable: gpt
++        type: disk
++        wipe: superblock
++    -   grub_device: false
++        id: disk-2
++        name: 'disk-2'
++        serial: disk-b
++        preserve: false
++        type: disk
++        wipe: superblock
++    -   grub_device: false
++        id: disk-1
++        name: 'disk-1'
++        serial: disk-c
++        preserve: false
++        type: disk
++        wipe: superblock
++    -   grub_device: false
++        id: disk-3
++        name: 'disk-3'
++        serial: disk-d
++        preserve: false
++        type: disk
++        wipe: superblock
++    -   grub_device: false
++        id: disk-4
++        name: 'disk-4'
++        serial: disk-e
++        preserve: false
++        type: disk
++        wipe: superblock
++    -   device: disk-0
++        flag: bios_grub
++        id: part-0
++        preserve: false
++        size: 1048576
++        type: partition
++    -   device: disk-0
++        flag: ''
++        id: part-1
++        preserve: false
++        size: 4G
++        type: partition
++    -   devices:
++        - disk-2
++        - disk-1
++        id: raid-0
++        name: md0
++        raidlevel: 1
++        spare_devices: []
++        type: raid
++    -   devices:
++        - disk-3
++        - disk-4
++        id: raid-1
++        name: md1
++        raidlevel: 1
++        spare_devices: []
++        type: raid
++    -   devices:
++        - raid-0
++        - raid-1
++        id: vg-0
++        name: vg0
++        type: lvm_volgroup
++    -   id: lv-0
++        name: lv-0
++        size: 3G
++        type: lvm_partition
++        volgroup: vg-0
++    -   fstype: ext4
++        id: fs-0
++        preserve: false
++        type: format
++        volume: part-1
++    -   fstype: ext4
++        id: fs-1
++        preserve: false
++        type: format
++        volume: lv-0
++    -   device: fs-0
++        id: mount-0
++        path: /
++        type: mount
++    -   device: fs-1
++        id: mount-1
++        path: /home
++        type: mount
++    version: 1
++
 diff --git a/tests/unittests/test_block_lvm.py b/tests/unittests/test_block_lvm.py
 index 341f2fa..22fb064 100644
 --- a/tests/unittests/test_block_lvm.py
 +++ b/tests/unittests/test_block_lvm.py
@@ -75,24 +75,24 @@ class TestBlockLvm(CiTestCase):
      @mock.patch('curtin.block.lvm.util')
      def test_lvm_scan(self, mock_util, mock_lvmetad):
          """check that lvm_scan formats commands correctly for each release"""
++        cmds = [['pvscan'], ['vgscan', '--mknodes']]
          for (count, (codename, lvmetad_status, use_cache)) in enumerate(
--                [('precise', False, False), ('precise', True, False),
--                 ('trusty', False, False), ('trusty', True, True),
--                 ('vivid', False, False), ('vivid', True, True),
--                 ('wily', False, False), ('wily', True, True),
++                [('precise', False, False),
++                 ('trusty', False, False),
                   ('xenial', False, False), ('xenial', True, True),
--                 ('yakkety', True, True), ('UNAVAILABLE', True, True),
                   (None, True, True), (None, False, False)]):
              mock_util.lsb_release.return_value = {'codename': codename}
              mock_lvmetad.return_value = lvmetad_status
              lvm.lvm_scan()
--            self.assertEqual(
--                len(mock_util.subp.call_args_list), 2 * (count + 1))
--            for (expected, actual) in zip(
--                    [['pvscan'], ['vgscan', '--mknodes']],
--                    mock_util.subp.call_args_list[2 * count:2 * count + 2]):
--                if use_cache:
--                    expected.append('--cache')
--                self.assertEqual(mock.call(expected, capture=True), actual)
++            expected = [cmd for cmd in cmds]
++            for cmd in expected:
++                if lvmetad_status:
++                    cmd.append('--cache')
++
++            calls = [mock.call(cmd, capture=True) for cmd in expected]
++            self.assertEqual(len(expected), len(mock_util.subp.call_args_list))
++            mock_util.subp.has_calls(calls)
++            mock_util.subp.reset_mock()
++
  # vi: ts=4 expandtab syntax=python
 diff --git a/tests/unittests/test_clear_holders.py b/tests/unittests/test_clear_holders.py
 index 6c29171..21f76be 100644
 --- a/tests/unittests/test_clear_holders.py
 +++ b/tests/unittests/test_clear_holders.py
@@ -779,10 +779,12 @@ class TestClearHolders(CiTestCase):
          mock_gen_holders_tree.return_value = self.example_holders_trees[1][1]
          clear_holders.assert_clear(device)
++    @mock.patch('curtin.block.clear_holders.lvm')
      @mock.patch('curtin.block.clear_holders.zfs')
      @mock.patch('curtin.block.clear_holders.mdadm')
      @mock.patch('curtin.block.clear_holders.util')
--    def test_start_clear_holders_deps(self, mock_util, mock_mdadm, mock_zfs):
++    def test_start_clear_holders_deps(self, mock_util, mock_mdadm, mock_zfs,
++                                      mock_lvm):
          mock_zfs.zfs_supported.return_value = True
          clear_holders.start_clear_holders_deps()
          mock_mdadm.mdadm_assemble.assert_called_with(
@@ -790,11 +792,12 @@ class TestClearHolders(CiTestCase):
          mock_util.load_kernel_module.assert_has_calls([
                  mock.call('bcache'), mock.call('zfs')])
++    @mock.patch('curtin.block.clear_holders.lvm')
      @mock.patch('curtin.block.clear_holders.zfs')
      @mock.patch('curtin.block.clear_holders.mdadm')
      @mock.patch('curtin.block.clear_holders.util')
      def test_start_clear_holders_deps_nozfs(self, mock_util, mock_mdadm,
--                                            mock_zfs):
++                                            mock_zfs, mock_lvm):
          """test that we skip zfs modprobe on unsupported platforms"""
          mock_zfs.zfs_supported.return_value = False
          clear_holders.start_clear_holders_deps()
 diff --git a/tests/vmtests/test_lvm_raid.py b/tests/vmtests/test_lvm_raid.py
 new file mode 100644
 index 0000000..0c50941
 --- /dev/null
 +++ b/tests/vmtests/test_lvm_raid.py
@@ -0,0 +1,50 @@
++# This file is part of curtin. See LICENSE file for copyright and license info.
++
++from .releases import base_vm_classes as relbase
++from .test_mdadm_bcache import TestMdadmAbs
++from .test_lvm import TestLvmAbs
++
++import textwrap
++
++
++class TestLvmOverRaidAbs(TestMdadmAbs, TestLvmAbs):
++    conf_file = "examples/tests/lvmoverraid.yaml"
++    active_mdadm = "2"
++    nr_cpus = 2
++    dirty_disks = True
++    extra_disks = ['10G'] * 4
++
++    collect_scripts = TestLvmAbs.collect_scripts
++    collect_scripts += TestMdadmAbs.collect_scripts + [textwrap.dedent("""
++        cd OUTPUT_COLLECT_D
++        ls -al /dev/md* > dev_md
++        cp -a /etc/mdadm etc_mdadm
++        cp -a /etc/lvm etc_lvm
++        """)]
++
++    fstab_expected = {
++        '/dev/vg1/lv1': '/srv/data',
++        '/dev/vg1/lv2': '/srv/backup',
++    }
++    disk_to_check = [('main_disk', 1),
++                     ('md0', 0),
++                     ('md1', 0)]
++
++    def test_lvs(self):
++        self.check_file_strippedline("lvs", "lv-0=vg0")
++
++    def test_pvs(self):
++        self.check_file_strippedline("pvs", "vg0=/dev/md0")
++        self.check_file_strippedline("pvs", "vg0=/dev/md1")
++
++
++class CosmicTestLvmOverRaid(relbase.cosmic, TestLvmOverRaidAbs):
++    __test__ = True
++
++
++class BionicTestLvmOverRaid(relbase.bionic, TestLvmOverRaidAbs):
++    __test__ = True
++
++
++class XenialGATestLvmOverRaid(relbase.xenial_ga, TestLvmOverRaidAbs):
++    __test__ = True
 diff --git a/tests/vmtests/test_lvm_root.py b/tests/vmtests/test_lvm_root.py
 index 8ca69d4..bc8b047 100644
 --- a/tests/vmtests/test_lvm_root.py
 +++ b/tests/vmtests/test_lvm_root.py
@@ -113,7 +113,7 @@ class XenialTestUefiLvmRootXfs(relbase.xenial, TestUefiLvmRootAbs):
+     }
--@VMBaseClass.skip_by_date("1652822", fixby="2019-06-01")
++@VMBaseClass.skip_by_date("1652822", fixby="2019-06-01", install=False)
  class XenialTestUefiLvmRootXfsBootXfs(relbase.xenial, TestUefiLvmRootAbs):
      """This tests xfs root and xfs boot with uefi.