Juju Charms Collection
cassandra package

Merge lp:~stub/charms/trusty/cassandra/wait-for-joining into lp:charms/trusty/cassandra

Proposed by Stuart Bishop on 2016-02-24

Status:

Merged

Merge reported by:

Stuart Bishop

Merged at revision:

not available

Proposed branch:

lp:~stub/charms/trusty/cassandra/wait-for-joining

Merge into:

lp:charms/trusty/cassandra

Prerequisite:

lp:~stub/charms/trusty/cassandra/ensure-thrift

Diff against target:

116 lines (+36/-17)

2 files modified

hooks/actions.py (+34/-15)
tests/test_actions.py (+2/-2)

To merge this branch:

bzr merge lp:~stub/charms/trusty/cassandra/wait-for-joining

High

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Cory Johns (community)		2016-02-24	Needs Information on 2016-05-26
Review via email: mp+286993@code.launchpad.net

Commit message

Make bootstrap procedure safer

Description of the change

We were previously capping the replication factor of the system_auth keyspace to three. I have since found documentation recommending increasing the replication factor so every node has a copy of this data.

Also, a somewhat undocumented caveat to bootstrap is that, at least when using vnodes, only one node should be JOINING at a time. Keep hold of the lock until the bootstrapping node has become NORMAL, which also neatly stops other nodes from rebooting while it is attempting to stream data.

Revision history for this message

Cory Johns (johnsca) wrote on 2016-05-26:

Overall this looks good, but I'm a bit concerned about the potential for a deadlock in the charm due to the JOINING / NORMAL wait. Would it be possible to add a timeout to that loop that, if exceeded, puts the charm into a "waiting" state and have it re-try on the next update-status hook?

review: Needs Information

Revision history for this message

Stuart Bishop (stub) wrote on 2016-05-27:

@johnsca Yes, I think that would be best. It may not be a minor operation though, as I need to ensure that other units don't attempt to join until the waiting unit has completed and the charmhelpers.coordinator lock being held only lasts for the duration of the hook. Or perhaps Cassandra is robust enough now in the supported versions that I can reliably add several units to the cluster at once and they will happily block until it is their turn to bootstrap (which would greatly simplify things). The other issue I need to avoid is to keep failed units in their blocked state, but that seems fairly minor. This will of course fall out much more naturally when I get a chance to rewrite the charm for charms.reactive.

It would still be good to land this as is IMO, as a deadlocked unit is preferable to the current behaviour of not noticing the issue and ploughing ahead with operations potentially dangerous to the whole cluster.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Stuart Bishop

charmers

 === modified file 'hooks/actions.py'
 --- hooks/actions.py	2016-02-24 10:02:13 +0000
 +++ hooks/actions.py	2016-02-24 10:02:13 +0000
@@ -432,18 +432,13 @@
  def needs_reset_auth_keyspace_replication():
      '''Guard for reset_auth_keyspace_replication.'''
      num_nodes = helpers.num_nodes()
--    n = min(num_nodes, 3)
      datacenter = hookenv.config()['datacenter']
      with helpers.connect() as session:
          strategy_opts = helpers.get_auth_keyspace_replication(session)
          rf = int(strategy_opts.get(datacenter, -1))
          hookenv.log('system_auth rf={!r}'.format(strategy_opts))
--        # If the node count has increased, we will change the rf.
--        # If the node count is decreasing, we do nothing as the service
--        # may be being destroyed.
--        if rf < n:
--            return True
--    return False
++        # If the node count has changed, we should change the rf.
++        return rf != num_nodes
  @leader_only
@@ -453,23 +448,23 @@
  def reset_auth_keyspace_replication():
      # Cassandra requires you to manually set the replication factor of
      # the system_auth keyspace, to ensure availability and redundancy.
--    # We replication factor in this service's DC can be no higher than
--    # the number of bootstrapped nodes. We also cap this at 3 to ensure
--    # we don't have too many seeds.
++    # The recommendation is to set the replication factor so that every
++    # node has a copy.
      num_nodes = helpers.num_nodes()
--    n = min(num_nodes, 3)
      datacenter = hookenv.config()['datacenter']
      with helpers.connect() as session:
          strategy_opts = helpers.get_auth_keyspace_replication(session)
          rf = int(strategy_opts.get(datacenter, -1))
          hookenv.log('system_auth rf={!r}'.format(strategy_opts))
--        if rf != n:
++        if rf != num_nodes:
              strategy_opts['class'] = 'NetworkTopologyStrategy'
--            strategy_opts[datacenter] = n
++            strategy_opts[datacenter] = num_nodes
              if 'replication_factor' in strategy_opts:
                  del strategy_opts['replication_factor']
              helpers.set_auth_keyspace_replication(session, strategy_opts)
--            helpers.repair_auth_keyspace()
++            if rf < num_nodes:
++                # Increasing rf, need to run repair.
++                helpers.repair_auth_keyspace()
              helpers.set_active()
@@ -565,7 +560,9 @@
      Per documented procedure for adding new units to a cluster, wait 2
      minutes if the unit has just bootstrapped to ensure other units
--    do not attempt bootstrap too soon.
++    do not attempt bootstrap too soon. Also, wait until completed joining
++    to ensure we keep the lock and ensure other nodes don't restart or
++    bootstrap.
      '''
      if not helpers.is_bootstrapped():
          if coordinator.relid is not None:
@@ -573,6 +570,28 @@
              hookenv.log('Post-bootstrap 2 minute delay')
              time.sleep(120)  # Must wait 2 minutes between bootstrapping nodes.
++        join_msg_set = False
++        while True:
++            status = helpers.get_node_status()
++            if status == 'NORMAL':
++                break
++            elif status == 'JOINING':
++                if not join_msg_set:
++                    helpers.status_set('maintenance', 'Still joining cluster')
++                    join_msg_set = True
++                time.sleep(10)
++                continue
++            else:
++                if status is None:
++                    helpers.status_set('blocked',
++                                       'Unexpectedly shutdown during '
++                                       'bootstrap')
++                else:
++                    helpers.status_set('blocked',
++                                       'Failed to bootstrap ({})'
++                                       ''.format(status))
++                raise SystemExit(0)
++
      # Unconditionally call this to publish the bootstrapped flag to
      # the peer relation, as the first unit was bootstrapped before
      # the peer relation existed.
 === modified file 'tests/test_actions.py'
 --- tests/test_actions.py	2016-02-24 10:02:13 +0000
 +++ tests/test_actions.py	2016-02-24 10:02:13 +0000
@@ -535,7 +535,7 @@
          connect().__enter__.return_value = sentinel.session
          connect().__exit__.return_value = False
--        num_nodes.return_value = 4
++        num_nodes.return_value = 3
          get_auth_ks_rep.return_value = {'another': '8',
                                          'mydc': '3'}
          self.assertFalse(actions.needs_reset_auth_keyspace_replication())
@@ -565,7 +565,7 @@
          actions.reset_auth_keyspace_replication('')
          set_auth_ks_rep.assert_called_once_with(
              sentinel.session,
--            {'class': 'NetworkTopologyStrategy', 'another': '8', 'mydc': 3})
++            {'class': 'NetworkTopologyStrategy', 'another': '8', 'mydc': 4})
          repair.assert_called_once_with()
          set_active.assert_called_once_with()

Juju Charms Collectioncassandra package