Merge lp:~gandelman-a/charms/precise/ceph/fail_early into lp:~charmers/charms/precise/ceph/trunk

Proposed by Adam Gandelman
Status: Rejected
Rejected by: Adam Gandelman
Proposed branch: lp:~gandelman-a/charms/precise/ceph/fail_early
Merge into: lp:~charmers/charms/precise/ceph/trunk
Diff against target: 84 lines (+33/-6)
2 files modified
hooks/hooks.py (+32/-5)
revision (+1/-1)
To merge this branch: bzr merge lp:~gandelman-a/charms/precise/ceph/fail_early
Reviewer Review Type Date Requested Status
charmers Pending
Review via email: mp+170902@code.launchpad.net

Description of the change

This ensures a hook error if call to ceph-disk-prepare fails. Avoids silently failing and running into difficult-to-debug ceph errors later. Currently charm continues to deploy even if this fails, leading to a broken cluster later: http://paste.ubuntu.com/5787852/

Going further: If configured to osd-reformat='yes', it might be a good idea to run a bit of scrubbing on the disk in osdize() just prior to calling ceph-disk-prepare, to ensure it is unmounted and partition table zapped. I have a patch I'm still testing to do this. That I'll throw up for review. In the meantime, it would be good to simply fail early.

To post a comment you must log in.
Revision history for this message
James Page (james-page) wrote :

Not sure this is actually the cause of your broken cluster; the OSD disk is not being correctly detected as already initialized during subsequent executions of the config-changed hook.

That said; this should be a check_call.

I'm currently refactoring this charm to use charm-helpers - OK if I pull this change in which that piece?

Revision history for this message
James Page (james-page) wrote :

OK _ I think I see the first break; ceph-disk would fail to prepare a non-existant/bad disk (such as a config-drive).

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Thanks, James. I've only noticed this breakage on the newer version of ceph. As you pointed out on IRC, ceph-disk-prepare no longer zaps disk prior to preparation.

I'm okay with this waiting till your refactoring, I'll push up my changes to this branch that does the disk zapping anyway. FYI- I noticed post-zap, ceph-disk-prepare could sometimes hang for a while waiting on 'udevadm settle

I've also pushed up some storage related helpers to lp:charm-helpers via MP https://code.launchpad.net/~gandelman-a/charm-helpers/storage_helpers/+merge/171144. These might be useful in the ceph charm as well.

62. By Adam Gandelman

Ensure OSD device is free before preparing.

Unmerged revisions

62. By Adam Gandelman

Ensure OSD device is free before preparing.

61. By Adam Gandelman

Fail hook if ceph-disk-prepare fails.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'hooks/hooks.py'
2--- hooks/hooks.py 2013-06-20 21:15:17 +0000
3+++ hooks/hooks.py 2013-06-24 18:24:30 +0000
4@@ -70,7 +70,7 @@
5 e_mountpoint = utils.config_get('ephemeral-unmount')
6 if (e_mountpoint and
7 filesystem_mounted(e_mountpoint)):
8- subprocess.call(['umount', e_mountpoint])
9+ unmount_filesystem(e_mountpoint)
10
11 osd_journal = utils.config_get('osd-journal')
12 if (osd_journal and
13@@ -174,18 +174,19 @@
14 'Path {} does not exist - bailing'.format(dev))
15 return
16
17- if (ceph.is_osd_disk(dev) and not
18- reformat_osd()):
19+ if ceph.is_osd_disk(dev) and not reformat_osd():
20 utils.juju_log('INFO',
21 'Looks like {} is already an OSD, skipping.'
22 .format(dev))
23 return
24
25- if device_mounted(dev):
26+ if device_mounted(dev) and not reformat_osd():
27 utils.juju_log('INFO',
28 'Looks like {} is in use, skipping.'.format(dev))
29 return
30
31+ ensure_free_osd_device(dev)
32+
33 cmd = ['ceph-disk-prepare']
34 # Later versions of ceph support more options
35 if ceph.get_ceph_version() >= "0.48.3":
36@@ -202,7 +203,7 @@
37 # Just provide the device - no other options
38 # for older versions of ceph
39 cmd.append(dev)
40- subprocess.call(cmd)
41+ subprocess.check_call(cmd)
42
43
44 def device_mounted(dev):
45@@ -213,6 +214,32 @@
46 return subprocess.call(['grep', '-wqs', fs, '/proc/mounts']) == 0
47
48
49+def unmount_filesystem(mountpoint):
50+ return subprocess.check_call(['umount', mountpoint])
51+
52+
53+def device_mountpoints(device):
54+ '''
55+ Return all currently mounted filesystems for all partitions of device
56+ '''
57+ mounted_fs = []
58+ for mount in open('/proc/mounts').read().splitlines():
59+ mount = mount.split(' ')
60+ if mount[0].startswith(device):
61+ mounted_fs.append(mount[1])
62+ return mounted_fs
63+
64+
65+def ensure_free_osd_device(dev):
66+ '''
67+ Ensures device is unmounted and zapped in preparation for ceph-disk-prepare
68+ '''
69+ subprocess.call(['stop', 'ceph-all'])
70+ for mp in device_mountpoints(dev):
71+ unmount_filesystem(mp)
72+ ceph.zap_disk(dev)
73+
74+
75 def mon_relation():
76 utils.juju_log('INFO', 'Begin mon-relation hook.')
77 emit_cephconf()
78
79=== modified file 'revision'
80--- revision 2013-06-20 23:58:01 +0000
81+++ revision 2013-06-24 18:24:30 +0000
82@@ -1,1 +1,1 @@
83-92
84+103

Subscribers

People subscribed via source and target branches