Charm sets refresh.timer to invalid value "None"

Bug #1823293 reported by Simon Fels
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Etcd Charm
Fix Released
High
Mike Wilson

Bug Description

With the latest version of the etcd charm (revision 411) I see the following error in the logs:

unit-etcd-1: 09:24:00 INFO unit.etcd/1.juju-log Invoking reactive handler: reactive/etcd.py:439:set_snapd_timer
unit-etcd-1: 09:24:00 INFO unit.etcd/1.juju-log setting snapd_refresh timer to: None
unit-etcd-1: 09:24:00 INFO unit.etcd/1.juju-log Set config refresh.timer=None for snap core
unit-etcd-1: 09:24:00 DEBUG unit.etcd/1.update-status error: cannot perform the following tasks:
unit-etcd-1: 09:24:00 DEBUG unit.etcd/1.update-status - Run configure hook of "core" snap (run hook "configure": cannot parse "None": "None" is not a valid weekday)
unit-etcd-1: 09:24:00 ERROR unit.etcd/1.juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 73, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-etcd-1/charm/reactive/etcd.py", line 451, in set_snapd_timer
    snap.set_refresh_timer(timer)
  File "lib/charms/layer/snap.py", line 231, in set_refresh_timer
    set(snapname='core', key='refresh.timer', value=timer)
  File "lib/charms/layer/snap.py", line 197, in set
    ['snap', 'set', snapname, '{}={}'.format(key, value)])
  File "/usr/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['snap', 'set', 'core', 'refresh.timer=None']' returned non-zero exit status 1.

This is with having three instances of etcd running.

Simon Fels (morphis)
description: updated
Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

Simon, I'm running CDK with etcd-411 and don't have this issue, so there must be something unique to your deployment that's causing this. Can you include `juju status` output please, or a copy of the bundle you deployed, including any configuration that's been set on the charms? Thanks!

Changed in charm-etcd:
status: New → Triaged
Revision history for this message
Mike Wilson (knobby) wrote :

Seems similar to https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/657

Note that I just saw this in a deploy that also had a livepatch subordinate charm.

Changed in charm-etcd:
importance: Undecided → High
assignee: nobody → Mike Wilson (knobby)
Revision history for this message
Simon Fels (morphis) wrote :

Thanks Tim! I don't have the model active anymore right now. It's not a CDK one and uses custom software which has no relation with Kubernetes.

The etcd charm is configured in the bundle as:

  etcd:
    charm: 'cs:etcd-411'
    num_units: 1
    options:
      channel: 3.2/stable
    to:
      - '0'

I've seen the crash happening when you have more than a single etcd node in the cluster. In the case above the cluster had three etcd nodes.

I can try to reproduce this again if necessary.

Revision history for this message
Syed Mohammad Adnan Karim (karimsye) wrote :

The deploy Mike was talking about was one of my k8s deployments on top of openstack. I was able to solve the issue by setting the livepatch charm's snapd_refresh config to max. By default, the livepatch charm has a snapd_refresh="" and etcd charm has a snapd_refresh="max".

Revision history for this message
Mike Wilson (knobby) wrote :
Revision history for this message
Mike Wilson (knobby) wrote :

Thanks to kwmonroe's help this was an easy fix. The issue here is that when the subordinate canonical-livepatch shows up, it smashes the config for snapd_refresh to an empty string since that is the default. The etcd charm expects a non-empty value and so it gets confused at this point. The code fix is what was used to fix this on the Kubernetes charms, which is to notice that the string is empty and then populate it from the etcd charm's values.

Really, this name collision is an unfortunate thing that needs better handling somewhere.

Mike Wilson (knobby)
Changed in charm-etcd:
status: Triaged → In Progress
Revision history for this message
Mike Wilson (knobby) wrote :

kwmonroe also applied this fix to the Kubernetes charms: https://github.com/charmed-kubernetes/layer-kubernetes-master-worker-base/pull/2

Changed in charm-etcd:
status: In Progress → Fix Committed
Changed in charm-etcd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.