Merge lp:~seyeongkim/charms/trusty/ceph/lp1411652 into lp:~openstack-charmers-archive/charms/trusty/ceph/next

Proposed by Seyeong Kim
Status: Rejected
Rejected by: Edward Hope-Morley
Proposed branch: lp:~seyeongkim/charms/trusty/ceph/lp1411652
Merge into: lp:~openstack-charmers-archive/charms/trusty/ceph/next
Diff against target: 49 lines (+31/-0)
2 files modified
hooks/ceph.py (+22/-0)
hooks/ceph_hooks.py (+9/-0)
To merge this branch: bzr merge lp:~seyeongkim/charms/trusty/ceph/lp1411652
Reviewer Review Type Date Requested Status
Edward Hope-Morley Needs Fixing
Billy Olsen Needs Fixing
Chris Holcombe (community) Needs Fixing
Chris MacNaughton (community) Needs Information
Review via email: mp+281732@code.launchpad.net
To post a comment you must log in.
Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #15563 ceph-next for xtrusia mp281732
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/15563/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #16664 ceph-next for xtrusia mp281732
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/16664/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #8538 ceph-next for xtrusia mp281732
    AMULET OK: passed

Build: http://10.245.162.77:8080/job/charm_amulet_test/8538/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #15701 ceph-next for xtrusia mp281732
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/15701/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #16815 ceph-next for xtrusia mp281732
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/16815/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #8586 ceph-next for xtrusia mp281732
    AMULET OK: passed

Build: http://10.245.162.77:8080/job/charm_amulet_test/8586/

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I'm very nervous about unilaterally removing monitors and OSDs on the stop hook. I think that we may want to make this stop hook actions instead of a hook that can be called implicitly.

What is the goal with this change? To remove nodes from a production Ceph cluster or maybe to remove OSDs for maintenance?

review: Needs Information
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

Thanks for your review.

As you can see the bug report, mon, osd ( + crushmap, auth ) are still remains after unit added & removed several times.

You mean this needs kind of "shutdown" hook instead of stop hook?
This seems not which i can do..

Thanks.

> I'm very nervous about unilaterally removing monitors and OSDs on the stop
> hook. I think that we may want to make this stop hook actions instead of a
> hook that can be called implicitly.
>
> What is the goal with this change? To remove nodes from a production Ceph
> cluster or maybe to remove OSDs for maintenance?

Revision history for this message
James Page (james-page) wrote :

A charms 'stop' hook is called as the instance of the charm is destroyed on
a unit - so I think its appropriate to remove the OSD from the cluster at
this point, as the next step taken by juju will be to destroy the machine
its running on; dropping the OSD from the cluster configuration here LGTM.

On Mon, Jan 11, 2016 at 4:38 AM, Seyeong Kim <email address hidden>
wrote:

> Thanks for your review.
>
> As you can see the bug report, mon, osd ( + crushmap, auth ) are still
> remains after unit added & removed several times.
>
> You mean this needs kind of "shutdown" hook instead of stop hook?
> This seems not which i can do..
>
> Thanks.
>
> > I'm very nervous about unilaterally removing monitors and OSDs on the
> stop
> > hook. I think that we may want to make this stop hook actions instead of
> a
> > hook that can be called implicitly.
> >
> > What is the goal with this change? To remove nodes from a production Ceph
> > cluster or maybe to remove OSDs for maintenance?
> --
>
> https://code.launchpad.net/~xtrusia/charms/trusty/ceph/lp1411652/+merge/281732
> Your team OpenStack Charmers is requested to review the proposed merge of
> lp:~xtrusia/charms/trusty/ceph/lp1411652 into
> lp:~openstack-charmers/charms/trusty/ceph/next.
>

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

> On Jan 11, 2016, at 1:24 AM, James Page <email address hidden> wrote:
>
> A charms 'stop' hook is called as the instance of the charm is destroyed on
> a unit

Is there no other possible time that the stop hook will be called? I'm still nervous about it but I suppose that when you're destroying machines that you are hopefully aware that the cluster may become permanently unhealthy?

Chris

Revision history for this message
Chris Holcombe (xfactor973) wrote :

I agree with Chris MacNaughton on this one. It would be really nice if Juju could say no I can't remove this OSD or Monitor but allows a --force flag that allows you to take it down. It doesn't look like this destroys any data. Could you test to see if you remove the OSDs and then add them again if the cluster will form again? I believe it will but it's just a guess.

review: Needs Fixing
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

According to the documentation:

"stop runs immediately before the end of the unit's destruction sequence. It should be used to ensure that the charm's software is not running, and will not start again on reboot.

This hook is called when a service removal is requested by the client. It should implement the following logic:

Stop the service
Remove any files/configuration created during the service lifecycle
Prepare any backup(s) of the service that are required for restore purposes."

I think then, that if re-adding the OSD works after this that we can say most of the logic should be handled. I'd like to see the stop hook also remove the ceph-mon-all service and ceph-osd-all service from upstart so that they don't run on reboot of the machine as well.

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@xfactor973
Hello. I tested only charms over openstack env and maas with one lxc for one unit.
There was no special problem. but after you comment. I think there could be a problem in case that one machine for multiple unit.
I haven't check this case yet. will do in this week and modify code if there is something.
Thanks.

> I agree with Chris MacNaughton on this one. It would be really nice if Juju
> could say no I can't remove this OSD or Monitor but allows a --force flag that
> allows you to take it down. It doesn't look like this destroys any data.
> Could you test to see if you remove the OSDs and then add them again if the
> cluster will form again? I believe it will but it's just a guess.

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

Ok.. I was wrong. seems no way to deploy multiple ceph osd to one machine.

Then, It's working fine with several tries destroying and adding ceph.

but I'm testing more now..

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

I tried several times this.

but it was fine and cluster was stable.

I think we need test by someone not me.

Revision history for this message
Chris Holcombe (xfactor973) wrote :

@xtrusia I'll try to deploy this on ec2 in the next day or so to see how it performs. I'm sure it's fine. :)

Revision history for this message
Chris Holcombe (xfactor973) wrote :

Ok I ran this patch through a few paces. I have an idea that shouldn't be hard to do that I think will improve things.

One thing that I noticed when I removed a node was it didn't completely take it out of the crush map. I had to do that manually:

```
root@ip-172-31-36-39:~# ceph osd tree
# id weight type name up/down reweight
-1 3 root default
-2 0 host ip-172-31-30-72
-3 1 host ip-172-31-9-197
2 1 osd.2 up 1
-4 1 host ip-172-31-36-39
1 1 osd.1 up 1
-5 1 host ip-172-31-46-49
3 1 osd.3 up 1
-6 0 host ip-172-31-24-18
root@ip-172-31-36-39:~# ceph osd crush remove ip-172-31-30-72
removed item id -2 name 'ip-172-31-30-72' from crush map
```
I really love it if this patch did a poll to let ceph remap the data while the OSD's are getting removed. I wrote some pseudo code to give you the idea: https://gist.github.com/cholcombe973/468572a9bf73efb6f87d We can give the user an option to say wait x minutes or remove the node regardless of whether or not the cluster is unhealthy. This will at least give the cluster a chance to rebalance the other OSD's. The only downside is sometimes rebalances take days or weeks to complete.

review: Needs Fixing
Revision history for this message
Chris Holcombe (xfactor973) wrote :

A possible future patch that we can roll is having the ceph charm add disks that were previously removed from the cluster. We can do this by referencing the fsid in /var/lib/ceph/osd/ceph-*/ceph_fsid. If it matches the current ceph cluster than add it back in. This would make it such that removing a unit isn't the end of the world for your cluster if you did it by accident.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

I think this change looks generally good to me, minus the couple of changes need in the inline comments and some unit tests.

The OSDs and Mons should have their auth removed so they couldn't access the cluster even if they wanted. If they try and connect because a daemon is started, they'll get denied and that's expected. It'd be nice if the related ceph systemd/upstart services were disabled (not removed unless the package is removed), but I don't see that as a requirement for this particular change.

In current code, what we have is the removal of a ceph unit will completely trash the unit and not clean up anything properly within ceph. This significantly improves things in that it removes almost everything related to this node from the crushmap, the monitor cluster, etc. This is goodness. (Remove the crush bucket for the host to get the last dangling stuff, though it won't have any weight so it shouldn't be *that* big of a deal in the crush algorithm, but certainly could affect retries).

As far as polling for the healthy cluster and for the placement groups to be rebalanced. I think that's a great feature, however I don't think it should be part of this particular merge proposal. The ceph service units have auth access to the cluster, but the ceph-osd units do not (and in a real-world deployment there'd be ~3 to 5 ceph units to >10 ceph-osd units). This would introduce an inconsistency with how the ceph and the ceph-osd units behave with respect to one another. I think this is a change that should be implemented outside of this merge proposal.

So bottom line, I'm marking this as 'Needs Fixing' - but I don't think we need to add the polling or service removal as part of this particular change as it is. Address the points inline below, add unit tests, and we can make an incremental improvement and provide future MPs to address the enhancements requested in the review.

review: Needs Fixing
Revision history for this message
Billy Olsen (billy-olsen) wrote :

fwiw, there's a few things at play here... the data on the OSD is intentionally not touched so that it can be reused later. Charm defaults to not reformatting the disk in an effort to prevent data loss so a user can bring the OSD back into the cluster. We'd need to check if this is handled automagically upon adding the device. In a MAAS environment, I believe the disks are wiped clean when the node is reclaimed by MAAS (an option at least). Thus, a juju destroy-unit ceph will instead be more catastrophic to the OSD themselves.

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Hey Billy,

Some of this may change in the near future as we're planning to deprecate the current ceph charm in favor of a ceph-mon charm + ceph-osd charm to clarify what deployments should look like. At that point, we can cleanly separate the removal of services from osd and monitor specific deployments.

Chris
> On Feb 3, 2016, at 5:14 AM, Billy Olsen <email address hidden> wrote:
>
> fwiw, there's a few things at play here... the data on the OSD is intentionally not touched so that it can be reused later. Charm defaults to not reformatting the disk in an effort to prevent data loss so a user can bring the OSD back into the cluster. We'd need to check if this is handled automagically upon adding the device. In a MAAS environment, I believe the disks are wiped clean when the node is reclaimed by MAAS (an option at least). Thus, a juju destroy-unit ceph will instead be more catastrophic to the OSD themselves.
> --
> https://code.launchpad.net/~xtrusia/charms/trusty/ceph/lp1411652/+merge/281732
> You are reviewing the proposed merge of lp:~xtrusia/charms/trusty/ceph/lp1411652 into lp:~openstack-charmers/charms/trusty/ceph/next.

Revision history for this message
Chris Holcombe (xfactor973) wrote :

I agree, lets skip the polling feature for now and add it in a future merge.

128. By Seyeong Kim

add removing crush map by hostname

129. By Seyeong Kim

merge from upstream

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #115 ceph-next for xtrusia mp281732
    UNIT OK: passed

Build: http://10.245.162.36:8080/job/charm_unit_test/115/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #132 ceph-next for xtrusia mp281732
    LINT OK: passed

Build: http://10.245.162.36:8080/job/charm_lint_check/132/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #9 ceph-next for xtrusia mp281732
    AMULET OK: passed

Build: http://10.245.162.36:8080/job/charm_amulet_test/9/

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Hi Seyeong, the Openstack charm development workflow has changed and is now performed using git repos under the Openstack project [0]. Can you please resubmit this to both of these locations:

  * http://github.com/openstack/charm-ceph
  * http://github.com/openstack/charm-ceph-mon

Note that you need to apply to both these charms since we have not yet deprecated the ceph charm so still need to keep the two synced in terms of mon code. Thanks.

[0] https://github.com/openstack-charmers/openstack-community/blob/master/README.dev-charms.md

review: Needs Fixing

Unmerged revisions

129. By Seyeong Kim

merge from upstream

128. By Seyeong Kim

add removing crush map by hostname

127. By Seyeong Kim

fix lint

126. By Seyeong Kim

add stop hook

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'hooks/ceph.py'
2--- hooks/ceph.py 2016-01-12 14:17:36 +0000
3+++ hooks/ceph.py 2016-02-04 08:22:15 +0000
4@@ -487,3 +487,25 @@
5
6 def filesystem_mounted(fs):
7 return subprocess.call(['grep', '-wqs', fs, '/proc/mounts']) == 0
8+
9+
10+def remove_mon():
11+ hostname = get_unit_hostname()
12+ subprocess.check_call(["ceph", "mon", "remove", hostname])
13+
14+
15+def get_osdids():
16+ osddir = os.listdir("/var/lib/ceph/osd/")
17+ osdids = []
18+ for osd in osddir:
19+ osdids.append(osd.split('-')[1])
20+ return osdids
21+
22+
23+def shutdown_osd(osd_id):
24+ hostname = get_unit_hostname()
25+ subprocess.check_call(["ceph", "osd", "crush", "remove", "osd."+osd_id])
26+ subprocess.check_call(["ceph", "osd", "crush", "remove", hostname])
27+ subprocess.check_call(["ceph", "auth", "del", "osd."+osd_id])
28+ subprocess.check_call(["service", "ceph-osd", "stop", "id="+osd_id])
29+ subprocess.check_call(["ceph", "osd", "rm", osd_id])
30
31=== modified file 'hooks/ceph_hooks.py'
32--- hooks/ceph_hooks.py 2016-01-22 15:37:01 +0000
33+++ hooks/ceph_hooks.py 2016-02-04 08:22:15 +0000
34@@ -413,6 +413,15 @@
35 ceph.start_osds(get_devices())
36
37
38+@hooks.hook('stop')
39+def stop():
40+ osdids = ceph.get_osdids()
41+ for osd_id in osdids:
42+ ceph.shutdown_osd(osd_id)
43+
44+ ceph.remove_mon()
45+
46+
47 @hooks.hook('nrpe-external-master-relation-joined')
48 @hooks.hook('nrpe-external-master-relation-changed')
49 def update_nrpe_config():

Subscribers

People subscribed via source and target branches