Juju Charms Collection
ceph package

Merge lp:~gandelman-a/charms/precise/ceph/fail_early into lp:~charmers/charms/precise/ceph/trunk

Proposed by Adam Gandelman on 2013-06-21

Status:	Rejected
Rejected by:	Adam Gandelman on 2013-06-24
Proposed branch:	lp:~gandelman-a/charms/precise/ceph/fail_early
Merge into:	lp:~charmers/charms/precise/ceph/trunk
Diff against target:	84 lines (+33/-6) 2 files modified hooks/hooks.py (+32/-5) revision (+1/-1)
To merge this branch:	bzr merge lp:~gandelman-a/charms/precise/ceph/fail_early
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
charmers		2013-06-21	Pending
Review via email: mp+170902@code.launchpad.net

Description of the change

This ensures a hook error if call to ceph-disk-prepare fails. Avoids silently failing and running into difficult-to-debug ceph errors later. Currently charm continues to deploy even if this fails, leading to a broken cluster later: http://paste.ubuntu.com/5787852/

Going further: If configured to osd-reformat='yes', it might be a good idea to run a bit of scrubbing on the disk in osdize() just prior to calling ceph-disk-prepare, to ensure it is unmounted and partition table zapped. I have a patch I'm still testing to do this. That I'll throw up for review. In the meantime, it would be good to simply fail early.

Revision history for this message

James Page (james-page) wrote on 2013-06-24:

Not sure this is actually the cause of your broken cluster; the OSD disk is not being correctly detected as already initialized during subsequent executions of the config-changed hook.

That said; this should be a check_call.

I'm currently refactoring this charm to use charm-helpers - OK if I pull this change in which that piece?

Revision history for this message

James Page (james-page) wrote on 2013-06-24:

OK _ I think I see the first break; ceph-disk would fail to prepare a non-existant/bad disk (such as a config-drive).

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2013-06-24:

Thanks, James. I've only noticed this breakage on the newer version of ceph. As you pointed out on IRC, ceph-disk-prepare no longer zaps disk prior to preparation.

I'm okay with this waiting till your refactoring, I'll push up my changes to this branch that does the disk zapping anyway. FYI- I noticed post-zap, ceph-disk-prepare could sometimes hang for a while waiting on 'udevadm settle

I've also pushed up some storage related helpers to lp:charm-helpers via MP https://code.launchpad.net/~gandelman-a/charm-helpers/storage_helpers/+merge/171144. These might be useful in the ceph charm as well.

lp:~gandelman-a/charms/precise/ceph/fail_early updated on 2013-06-24

62. By Adam Gandelman on 2013-06-24: Ensure OSD device is free before preparing.

Unmerged revisions

62. By Adam Gandelman on 2013-06-24: Ensure OSD device is free before preparing.
61. By Adam Gandelman on 2013-06-21: Fail hook if ceph-disk-prepare fails.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Adam Gandelman

Landscape

charmers

 === modified file 'hooks/hooks.py'
 --- hooks/hooks.py	2013-06-20 21:15:17 +0000
 +++ hooks/hooks.py	2013-06-24 18:24:30 +0000
@@ -70,7 +70,7 @@
      e_mountpoint = utils.config_get('ephemeral-unmount')
      if (e_mountpoint and
          filesystem_mounted(e_mountpoint)):
--        subprocess.call(['umount', e_mountpoint])
++        unmount_filesystem(e_mountpoint)
      osd_journal = utils.config_get('osd-journal')
      if (osd_journal and
@@ -174,18 +174,19 @@
                         'Path {} does not exist - bailing'.format(dev))
          return
--    if (ceph.is_osd_disk(dev) and not
--        reformat_osd()):
++    if ceph.is_osd_disk(dev) and not reformat_osd():
          utils.juju_log('INFO',
                         'Looks like {} is already an OSD, skipping.'
                         .format(dev))
          return
--    if device_mounted(dev):
++    if device_mounted(dev) and not reformat_osd():
          utils.juju_log('INFO',
                         'Looks like {} is in use, skipping.'.format(dev))
          return
++    ensure_free_osd_device(dev)
++
      cmd = ['ceph-disk-prepare']
      # Later versions of ceph support more options
      if ceph.get_ceph_version() >= "0.48.3":
@@ -202,7 +203,7 @@
          # Just provide the device - no other options
          # for older versions of ceph
          cmd.append(dev)
--    subprocess.call(cmd)
++    subprocess.check_call(cmd)
  def device_mounted(dev):
@@ -213,6 +214,32 @@
      return subprocess.call(['grep', '-wqs', fs, '/proc/mounts']) == 0
++def unmount_filesystem(mountpoint):
++    return subprocess.check_call(['umount', mountpoint])
++
++
++def device_mountpoints(device):
++    '''
++    Return all currently mounted filesystems for all partitions of device
++    '''
++    mounted_fs = []
++    for mount in open('/proc/mounts').read().splitlines():
++        mount = mount.split(' ')
++        if mount[0].startswith(device):
++            mounted_fs.append(mount[1])
++    return mounted_fs
++
++
++def ensure_free_osd_device(dev):
++    '''
++    Ensures device is unmounted and zapped in preparation for ceph-disk-prepare
++    '''
++    subprocess.call(['stop', 'ceph-all'])
++    for mp in device_mountpoints(dev):
++        unmount_filesystem(mp)
++    ceph.zap_disk(dev)
++
++
  def mon_relation():
      utils.juju_log('INFO', 'Begin mon-relation hook.')
      emit_cephconf()
 === modified file 'revision'
 --- revision	2013-06-20 23:58:01 +0000
 +++ revision	2013-06-24 18:24:30 +0000
@@ -1,1 +1,1 @@
--92
++103

Juju Charms Collectionceph package

Merge lp:~gandelman-a/charms/precise/ceph/fail_early into lp:~charmers/charms/precise/ceph/trunk

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers

Juju Charms Collection
ceph package