Merge lp:~raharper/curtin/trunk.fix-lvm-over-raid into lp:~curtin-dev/curtin/trunk

Proposed by Ryan Harper
Status: Merged
Merged at revision: 525
Proposed branch: lp:~raharper/curtin/trunk.fix-lvm-over-raid
Merge into: lp:~curtin-dev/curtin/trunk
Diff against target: 47 lines (+13/-6)
2 files modified
curtin/block/mdadm.py (+10/-6)
tests/vmtests/test_mdadm_bcache.py (+3/-0)
To merge this branch: bzr merge lp:~raharper/curtin/trunk.fix-lvm-over-raid
Reviewer Review Type Date Requested Status
Scott Moser (community) Approve
Chad Smith Approve
Server Team CI bot continuous-integration Approve
Review via email: mp+329765@code.launchpad.net

Description of the change

mdadm: handle write failures to sysfs entries when stopping mdadm

mdadm_stop command tries a number of methods to stop an existing
array. Writes to sysfs fail differently, this patch catches both
IOErrors and OSErrors when writing to sysfs entries, logs and continues
with the stop operation.

Updating the LVM over RAID vmtest to run in dirty_disk mode reproduces
a failure reported and is resolved with this branch.

LP: #1708052

To post a comment you must log in.
Revision history for this message
Ryan Harper (raharper) wrote :

vmtest on this branch running over the LVM over RAID test-case is here:

https://jenkins.ubuntu.com/server/view/curtin/job/curtin-vmtest-devel-amd64/25/

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Scott Moser (smoser) wrote :

minor comments.

Revision history for this message
Ryan Harper (raharper) :
524. By Ryan Harper

Drop 'pass' in exception handling, expand logging message

Revision history for this message
Ryan Harper (raharper) wrote :

I updated per comments, however, I didn't log the exception stacktrace itself; this triggers exception detection in curtin's vmtest install.log parsing; it's not terribly useful since we're ignoring the error (it's an expected possibility while shutting down a block device).

Ready for review again.

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
525. By Ryan Harper

merge from trunk

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
526. By Ryan Harper

merge from trunk

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Chad Smith (chad.smith) :
Revision history for this message
Scott Moser (smoser) wrote :

I agree with Chad's request for the actual errors.

The only other thing that I would suggest, is that if we can determine in the exception block that this failure can be ignored, then we could/should not log it at WARN. Ie, if we tried and everything still ends up OK, then the warning is just loud noise that looks scary later.

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks, I'll move the LOG message.

Revision history for this message
Ryan Harper (raharper) wrote :

> I agree with Chad's request for the actual errors.
>
> The only other thing that I would suggest, is that if we can determine in the
> exception block that this failure can be ignored, then we could/should not log
> it at WARN. Ie, if we tried and everything still ends up OK, then the warning
> is just loud noise that looks scary later.

I'll replace the WARN with debug.

527. By Ryan Harper

LOG outside of try/except, reduce log messages to debug (expected errors)

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
528. By Ryan Harper

Include exception as string, indicate error is non-fatal

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Ryan Harper (raharper) wrote :

The log messages of sysfs write failures look like this:

[ 64.954808] cloud-init[1435]: mdadm: (non-fatal) write to /sys/class/block/md2/md/sync_max failed [Errno 16] Device or resource busy
[ 65.302878] cloud-init[1435]: mdadm: (non-fatal) write to /sys/class/block/md3/md/sync_action failed [Errno 13] Permission denied: '/sys/class/block/md3/md/sync_action'
[ 65.305867] cloud-init[1435]: mdadm: (non-fatal) write to /sys/class/block/md3/md/sync_max failed [Errno 13] Permission denied: '/sys/class/block/md3/md/sync_max'

Revision history for this message
Chad Smith (chad.smith) wrote :

Thank you for the change here +1

review: Approve
Revision history for this message
Scott Moser (smoser) wrote :

This is generally what i was asking for.
It means that at some point if we ever saw "[Errno 2] No such file or directory" it might hint to something else.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'curtin/block/mdadm.py'
2--- curtin/block/mdadm.py 2017-05-12 21:06:28 +0000
3+++ curtin/block/mdadm.py 2017-09-07 16:10:35 +0000
4@@ -273,7 +273,11 @@
5 LOG.debug('%s/sync_max = %s', sync_action, val)
6 if val != "idle":
7 LOG.debug("mdadm: setting array sync_action=idle")
8- util.write_file(sync_action, content="idle")
9+ try:
10+ util.write_file(sync_action, content="idle")
11+ except (IOError, OSError) as e:
12+ LOG.debug("mdadm: (non-fatal) write to %s failed %s",
13+ sync_action, e)
14
15 # Setting the sync_{max,min} may can help prevent the array from
16 # changing back to 'resync' which may prevent the array from being
17@@ -283,11 +287,11 @@
18 if val != "0":
19 LOG.debug("mdadm: setting array sync_{min,max}=0")
20 try:
21- util.write_file(sync_max, content="0")
22- util.write_file(sync_min, content="0")
23- except IOError:
24- LOG.warning('mdadm: failed to set sync_{max,min} values')
25- pass
26+ for sync_file in [sync_max, sync_min]:
27+ util.write_file(sync_file, content="0")
28+ except (IOError, OSError) as e:
29+ LOG.debug('mdadm: (non-fatal) write to %s failed %s',
30+ sync_file, e)
31
32 # one wonders why this command doesn't do any of the above itself?
33 out, err = util.subp(["mdadm", "--manage", "--stop", devpath],
34
35=== modified file 'tests/vmtests/test_mdadm_bcache.py'
36--- tests/vmtests/test_mdadm_bcache.py 2017-08-02 15:46:35 +0000
37+++ tests/vmtests/test_mdadm_bcache.py 2017-09-07 16:10:35 +0000
38@@ -413,6 +413,9 @@
39 # we have to avoid a systemd hang due to the way it handles dmcrypt
40 extra_kern_args = "--- luks=no"
41 active_mdadm = "4"
42+ # running in dirty mode catches some race/errors with mdadm_stop
43+ nr_cpus = 2
44+ dirty_disks = True
45 # initialize secondary disk
46 extra_disks = ['5G', '5G', '5G']
47 disk_to_check = [('main_disk', 1),

Subscribers

People subscribed via source and target branches