Merge lp:~raharper/curtin/trunk.fix-lvm-over-raid into lp:~curtin-dev/curtin/trunk

Proposed by Ryan Harper
Status: Merged
Merged at revision: 525
Proposed branch: lp:~raharper/curtin/trunk.fix-lvm-over-raid
Merge into: lp:~curtin-dev/curtin/trunk
Diff against target: 47 lines (+13/-6)
2 files modified
curtin/block/mdadm.py (+10/-6)
tests/vmtests/test_mdadm_bcache.py (+3/-0)
To merge this branch: bzr merge lp:~raharper/curtin/trunk.fix-lvm-over-raid
Reviewer Review Type Date Requested Status
Scott Moser (community) Approve
Chad Smith Approve
Server Team CI bot continuous-integration Approve
Review via email: mp+329765@code.launchpad.net

Description of the change

mdadm: handle write failures to sysfs entries when stopping mdadm

mdadm_stop command tries a number of methods to stop an existing
array. Writes to sysfs fail differently, this patch catches both
IOErrors and OSErrors when writing to sysfs entries, logs and continues
with the stop operation.

Updating the LVM over RAID vmtest to run in dirty_disk mode reproduces
a failure reported and is resolved with this branch.

LP: #1708052

To post a comment you must log in.
Revision history for this message
Ryan Harper (raharper) wrote :

vmtest on this branch running over the LVM over RAID test-case is here:

https://jenkins.ubuntu.com/server/view/curtin/job/curtin-vmtest-devel-amd64/25/

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Scott Moser (smoser) wrote :

minor comments.

Revision history for this message
Ryan Harper (raharper) :
524. By Ryan Harper

Drop 'pass' in exception handling, expand logging message

Revision history for this message
Ryan Harper (raharper) wrote :

I updated per comments, however, I didn't log the exception stacktrace itself; this triggers exception detection in curtin's vmtest install.log parsing; it's not terribly useful since we're ignoring the error (it's an expected possibility while shutting down a block device).

Ready for review again.

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
525. By Ryan Harper

merge from trunk

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
526. By Ryan Harper

merge from trunk

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Chad Smith (chad.smith) :
Revision history for this message
Scott Moser (smoser) wrote :

I agree with Chad's request for the actual errors.

The only other thing that I would suggest, is that if we can determine in the exception block that this failure can be ignored, then we could/should not log it at WARN. Ie, if we tried and everything still ends up OK, then the warning is just loud noise that looks scary later.

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks, I'll move the LOG message.

Revision history for this message
Ryan Harper (raharper) wrote :

> I agree with Chad's request for the actual errors.
>
> The only other thing that I would suggest, is that if we can determine in the
> exception block that this failure can be ignored, then we could/should not log
> it at WARN. Ie, if we tried and everything still ends up OK, then the warning
> is just loud noise that looks scary later.

I'll replace the WARN with debug.

527. By Ryan Harper

LOG outside of try/except, reduce log messages to debug (expected errors)

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
528. By Ryan Harper

Include exception as string, indicate error is non-fatal

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Ryan Harper (raharper) wrote :

The log messages of sysfs write failures look like this:

[ 64.954808] cloud-init[1435]: mdadm: (non-fatal) write to /sys/class/block/md2/md/sync_max failed [Errno 16] Device or resource busy
[ 65.302878] cloud-init[1435]: mdadm: (non-fatal) write to /sys/class/block/md3/md/sync_action failed [Errno 13] Permission denied: '/sys/class/block/md3/md/sync_action'
[ 65.305867] cloud-init[1435]: mdadm: (non-fatal) write to /sys/class/block/md3/md/sync_max failed [Errno 13] Permission denied: '/sys/class/block/md3/md/sync_max'

Revision history for this message
Chad Smith (chad.smith) wrote :

Thank you for the change here +1

review: Approve
Revision history for this message
Scott Moser (smoser) wrote :

This is generally what i was asking for.
It means that at some point if we ever saw "[Errno 2] No such file or directory" it might hint to something else.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'curtin/block/mdadm.py'
--- curtin/block/mdadm.py 2017-05-12 21:06:28 +0000
+++ curtin/block/mdadm.py 2017-09-07 16:10:35 +0000
@@ -273,7 +273,11 @@
273 LOG.debug('%s/sync_max = %s', sync_action, val)273 LOG.debug('%s/sync_max = %s', sync_action, val)
274 if val != "idle":274 if val != "idle":
275 LOG.debug("mdadm: setting array sync_action=idle")275 LOG.debug("mdadm: setting array sync_action=idle")
276 util.write_file(sync_action, content="idle")276 try:
277 util.write_file(sync_action, content="idle")
278 except (IOError, OSError) as e:
279 LOG.debug("mdadm: (non-fatal) write to %s failed %s",
280 sync_action, e)
277281
278 # Setting the sync_{max,min} may can help prevent the array from282 # Setting the sync_{max,min} may can help prevent the array from
279 # changing back to 'resync' which may prevent the array from being283 # changing back to 'resync' which may prevent the array from being
@@ -283,11 +287,11 @@
283 if val != "0":287 if val != "0":
284 LOG.debug("mdadm: setting array sync_{min,max}=0")288 LOG.debug("mdadm: setting array sync_{min,max}=0")
285 try:289 try:
286 util.write_file(sync_max, content="0")290 for sync_file in [sync_max, sync_min]:
287 util.write_file(sync_min, content="0")291 util.write_file(sync_file, content="0")
288 except IOError:292 except (IOError, OSError) as e:
289 LOG.warning('mdadm: failed to set sync_{max,min} values')293 LOG.debug('mdadm: (non-fatal) write to %s failed %s',
290 pass294 sync_file, e)
291295
292 # one wonders why this command doesn't do any of the above itself?296 # one wonders why this command doesn't do any of the above itself?
293 out, err = util.subp(["mdadm", "--manage", "--stop", devpath],297 out, err = util.subp(["mdadm", "--manage", "--stop", devpath],
294298
=== modified file 'tests/vmtests/test_mdadm_bcache.py'
--- tests/vmtests/test_mdadm_bcache.py 2017-08-02 15:46:35 +0000
+++ tests/vmtests/test_mdadm_bcache.py 2017-09-07 16:10:35 +0000
@@ -413,6 +413,9 @@
413 # we have to avoid a systemd hang due to the way it handles dmcrypt413 # we have to avoid a systemd hang due to the way it handles dmcrypt
414 extra_kern_args = "--- luks=no"414 extra_kern_args = "--- luks=no"
415 active_mdadm = "4"415 active_mdadm = "4"
416 # running in dirty mode catches some race/errors with mdadm_stop
417 nr_cpus = 2
418 dirty_disks = True
416 # initialize secondary disk419 # initialize secondary disk
417 extra_disks = ['5G', '5G', '5G']420 extra_disks = ['5G', '5G', '5G']
418 disk_to_check = [('main_disk', 1),421 disk_to_check = [('main_disk', 1),

Subscribers

People subscribed via source and target branches