Etcd Charm

Charm sets refresh.timer to invalid value "None"

Bug #1823293 reported by Simon Fels on 2019-04-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Etcd Charm	Fix Released	High	Mike Wilson

Bug Description

With the latest version of the etcd charm (revision 411) I see the following error in the logs:

unit-etcd-1: 09:24:00 INFO unit.etcd/1.juju-log Invoking reactive handler: reactive/etcd.py:439:set_snapd_timer
unit-etcd-1: 09:24:00 INFO unit.etcd/1.juju-log setting snapd_refresh timer to: None
unit-etcd-1: 09:24:00 INFO unit.etcd/1.juju-log Set config refresh.timer=None for snap core
unit-etcd-1: 09:24:00 DEBUG unit.etcd/1.update-status error: cannot perform the following tasks:
unit-etcd-1: 09:24:00 DEBUG unit.etcd/1.update-status - Run configure hook of "core" snap (run hook "configure": cannot parse "None": "None" is not a valid weekday)
unit-etcd-1: 09:24:00 ERROR unit.etcd/1.juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 73, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-etcd-1/charm/reactive/etcd.py", line 451, in set_snapd_timer
    snap.set_refresh_timer(timer)
  File "lib/charms/layer/snap.py", line 231, in set_refresh_timer
    set(snapname='core', key='refresh.timer', value=timer)
  File "lib/charms/layer/snap.py", line 197, in set
    ['snap', 'set', snapname, '{}={}'.format(key, value)])
  File "/usr/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['snap', 'set', 'core', 'refresh.timer=None']' returned non-zero exit status 1.

This is with having three instances of etcd running.

See original description

Simon Fels (morphis) on 2019-04-05

description:

updated

Revision history for this message

Tim Van Steenburgh (tvansteenburgh) wrote on 2019-04-05:

Simon, I'm running CDK with etcd-411 and don't have this issue, so there must be something unique to your deployment that's causing this. Can you include `juju status` output please, or a copy of the bundle you deployed, including any configuration that's been set on the charms? Thanks!

Changed in charm-etcd:
status:	New → Triaged

Revision history for this message

Mike Wilson (knobby) wrote on 2019-04-10:

Seems similar to https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/657

Note that I just saw this in a deploy that also had a livepatch subordinate charm.

Tim Van Steenburgh (tvansteenburgh) on 2019-04-10

Changed in charm-etcd:
importance:	Undecided → High
assignee:	nobody → Mike Wilson (knobby)

Revision history for this message

Simon Fels (morphis) wrote on 2019-04-11:

Thanks Tim! I don't have the model active anymore right now. It's not a CDK one and uses custom software which has no relation with Kubernetes.

The etcd charm is configured in the bundle as:

  etcd:
    charm: 'cs:etcd-411'
    num_units: 1
    options:
      channel: 3.2/stable
    to:
      - '0'

I've seen the crash happening when you have more than a single etcd node in the cluster. In the case above the cluster had three etcd nodes.

I can try to reproduce this again if necessary.

Revision history for this message

Syed Mohammad Adnan Karim (karimsye) wrote on 2019-04-11:

The deploy Mike was talking about was one of my k8s deployments on top of openstack. I was able to solve the issue by setting the livepatch charm's snapd_refresh config to max. By default, the livepatch charm has a snapd_refresh="" and etcd charm has a snapd_refresh="max".

Revision history for this message

Mike Wilson (knobby) wrote on 2019-04-11:

https://github.com/juju-solutions/layer-etcd/pull/148

Revision history for this message

Mike Wilson (knobby) wrote on 2019-04-11:

Thanks to kwmonroe's help this was an easy fix. The issue here is that when the subordinate canonical-livepatch shows up, it smashes the config for snapd_refresh to an empty string since that is the default. The etcd charm expects a non-empty value and so it gets confused at this point. The code fix is what was used to fix this on the Kubernetes charms, which is to notice that the string is empty and then populate it from the etcd charm's values.

Really, this name collision is an unfortunate thing that needs better handling somewhere.

Mike Wilson (knobby) on 2019-04-15

Changed in charm-etcd:
status:	Triaged → In Progress

Revision history for this message

Mike Wilson (knobby) wrote on 2019-04-22:

kwmonroe also applied this fix to the Kubernetes charms: https://github.com/charmed-kubernetes/layer-kubernetes-master-worker-base/pull/2

Changed in charm-etcd:
status:	In Progress → Fix Committed

Tim Van Steenburgh (tvansteenburgh) on 2019-04-25

Changed in charm-etcd:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

auto-github-juju-solutions-bundle-canonical-kubernetes #657
[closed area/kubernetes-worker field/normal] Edit

Bug watches keep track of this bug in other bug trackers.