Juju 1.25.10: Conversation scope is missing sometimes - causes relation hook failure

Bug #1663633 reported by Alex Kavanagh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Won't Fix
Undecided
Unassigned

Bug Description

Strange to describe bug. Essentially, it looks like 1.25.10 (and possible 1.25.9) have a bug where during a hook execution the relation is essentially missing for the hook that was invoked during that hook's execution.

This is happening with reactive charms, but doesn't occur (in my testing) with 2.1-beta5. I don't know if it's happening with non-reactive charms.

The problem is occurring with Amulet tests which bootstrap an environment, deploy the charms and relate them as quickly as possible. It doesn't happen every time; perhaps 80% of the time it will fail.

One other oddity, which may be pertinent, is that if you bootstrap the environment, leave it for a good 10 minutes or so and then run the Amulet tests without re-bootstrapping it then completes without issue.

Relevant evidence from ceph-fs charm (from /var/log/juju/unit-ceph-fs-0.log):

2017-02-09 14:59:18 INFO worker.uniter.jujuc server.go:172 running hook tool "juju-log" ["-l" "INFO" "Invoking reactive handler: hooks/relations/ceph-mds/requires.py:27:changed"]
2017-02-09 14:59:18 DEBUG worker.uniter.jujuc server.go:173 hook context id "ceph-fs/0-ceph-mds-relation-changed-2104537497373060467"; dir "/var/lib/juju/agents/unit-ceph-fs-0/charm"
...
snip
...
2017-02-09 14:59:20 INFO worker.uniter.jujuc server.go:172 running hook tool "relation-ids" ["--format=json" "mon"]
2017-02-09 14:59:20 DEBUG worker.uniter.jujuc server.go:173 hook context id "ceph-fs/0-ceph-mds-relation-changed-2104537497373060467"; dir "/var/lib/juju/agents/unit-ceph-fs-0/charm"
2017-02-09 14:59:20 INFO worker.uniter.jujuc server.go:172 running hook tool "status-set" ["blocked" "No MDS detected using current configuration"]
2017-02-09 14:59:20 DEBUG worker.uniter.jujuc server.go:173 hook context id "ceph-fs/0-ceph-mds-relation-changed-2104537497373060467"; dir "/var/lib/juju/agents/unit-ceph-fs-0/charm"
2017-02-09 14:59:20 DEBUG juju.worker.uniter.context context.go:326 [WORKLOAD-STATUS] blocked: No MDS detected using current configuration
2017-02-09 14:59:21 INFO juju.worker.uniter.context context.go:579 handling reboot
2017-02-09 14:59:21 INFO juju.worker.uniter.operation runhook.go:113 ran "ceph-mds-relation-changed" hook
2017-02-09 14:59:21 INFO juju.worker.uniter.operation executor.go:103 committing operation "run relation-changed (3; ceph-mon/0) hook"

and from the manila charm (var/log/juju/unit-manila-0.log):

2017-02-08 15:46:53 DEBUG juju.worker.uniter modes.go:31 [AGENT-STATUS] executing: running amqp-relation-changed hook
...
snip
...

2017-02-08 15:46:57 DEBUG worker.uniter.jujuc server.go:173 hook context id "manila/0-amqp-relation-changed-5381050132888847016"; dir "/var/lib/juju/agents/unit-manila-0/charm"
2017-02-08 15:46:57 INFO juju-log amqp:5: Writing file /etc/memcached.conf root:root 444
2017-02-08 15:47:04 INFO worker.uniter.jujuc server.go:172 running hook tool "juju-log" ["Application Version: 2.0.0"]
2017-02-08 15:47:04 DEBUG worker.uniter.jujuc server.go:173 hook context id "manila/0-amqp-relation-changed-5381050132888847016"; dir "/var/lib/juju/agents/unit-manila-0/charm"
2017-02-08 15:47:04 INFO juju-log amqp:5: Application Version: 2.0.0
2017-02-08 15:47:04 INFO amqp-relation-changed active
2017-02-08 15:47:04 INFO amqp-relation-changed inactive
2017-02-08 15:47:04 INFO amqp-relation-changed active
2017-02-08 15:47:04 INFO amqp-relation-changed inactive
2017-02-08 15:47:04 INFO worker.uniter.jujuc server.go:172 running hook tool "status-set" ["blocked" "Services not running that should be: manila-scheduler, manila-data"]
2017-02-08 15:47:04 DEBUG worker.uniter.jujuc server.go:173 hook context id "manila/0-amqp-relation-changed-5381050132888847016"; dir "/var/lib/juju/agents/unit-manila-0/charm"
2017-02-08 15:47:04 DEBUG juju.worker.uniter.context context.go:326 [WORKLOAD-STATUS] blocked: Services not running that should be: manila-scheduler, manila-data
2017-02-08 15:47:04 INFO amqp-relation-changed Traceback (most recent call last):
2017-02-08 15:47:04 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-manila-0/charm/hooks/amqp-relation-changed", line 19, in <module>
2017-02-08 15:47:04 INFO amqp-relation-changed main()
2017-02-08 15:47:04 INFO amqp-relation-changed File "/usr/local/lib/python3.5/dist-packages/charms/reactive/__init__.py", line 78, in main
2017-02-08 15:47:04 INFO amqp-relation-changed bus.dispatch()
2017-02-08 15:47:04 INFO amqp-relation-changed File "/usr/local/lib/python3.5/dist-packages/charms/reactive/bus.py", line 434, in dispatch
2017-02-08 15:47:04 INFO amqp-relation-changed _invoke(other_handlers)
2017-02-08 15:47:04 INFO amqp-relation-changed File "/usr/local/lib/python3.5/dist-packages/charms/reactive/bus.py", line 417, in _invoke
2017-02-08 15:47:04 INFO amqp-relation-changed handler.invoke()
2017-02-08 15:47:04 INFO amqp-relation-changed File "/usr/local/lib/python3.5/dist-packages/charms/reactive/bus.py", line 291, in invoke
2017-02-08 15:47:04 INFO amqp-relation-changed self._action(*args)
2017-02-08 15:47:04 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-manila-0/charm/reactive/manila_handlers.py", line 130, in config_changed
2017-02-08 15:47:04 INFO amqp-relation-changed render_stuff(*args)
2017-02-08 15:47:04 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-manila-0/charm/reactive/manila_handlers.py", line 115, in render_stuff
2017-02-08 15:47:04 INFO amqp-relation-changed manila_plugin.clear_changed()
2017-02-08 15:47:04 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-manila-0/charm/hooks/relations/manila-plugin/requires.py", line 123, in clear_changed
2017-02-08 15:47:04 INFO amqp-relation-changed conversation = self.conversation()
2017-02-08 15:47:04 INFO amqp-relation-changed File "/usr/local/lib/python3.5/dist-packages/charms/reactive/relations.py", line 263, in conversation
2017-02-08 15:47:04 INFO amqp-relation-changed raise ValueError("Conversation with scope '%s' not found" % scope)
2017-02-08 15:47:04 INFO amqp-relation-changed ValueError: Conversation with scope 'rabbitmq-server/0' not found
2017-02-08 15:47:04 INFO juju.worker.uniter.context context.go:579 handling reboot
2017-02-08 15:47:04 ERROR juju.worker.uniter.operation runhook.go:107 hook "amqp-relation-changed" failed: exit status 1

Resources:
OSCI output for ceph-fs: https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline/openstack/charm-ceph-fs/431574/1/2246/index.html

OSCI output for manila-fs: https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline/openstack/charm-manila-generic/428839/3/2231/index.html

edited: Changed 1.2.9 -> 1.25.9 and 1.2.19 -> 1.25.10 as prompted by @anastasia below.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Alex,
Please confirm - when you say 1.2.9/12.10, you mean 1.25.9/1.25.10?

It does sound that there is a timing issue, potentially a race. As in your description, giving bootstrap enough time to settle seem to allow you to run Amulet testing successful, we cannot consider this issue as Critical and are unlikely to address it in 1.25 series.

As you say, the problem does not occur with Juju 2.1.

Changed in juju-core:
status: New → Incomplete
summary: - Juju 1.2.10: Conversation scope is missing sometimes - causes relation
+ Juju 1.25.10: Conversation scope is missing sometimes - causes relation
hook failure
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

@anastasia,
Sorry, yes, I did mean 1.25.9 and 1.25.10 -- not sure why I dropped the '5'! I've edited the original comment.

Yes, it does seem like a race condition, but it's a blocker for CI. I'm not sure if I can put it a workaround.

description: updated
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Alex Kavanagh (ajkavanagh),
It seems that some hooks that are run miss some required data until next hooks are executing. Since the system needs time to stabilise, especially when there are a few hooks activity going on, we can only guarantee *eventual consistency* :) This matches the behavior that you are observing - sometimes things work, sometimes not.
Obviously, things have improved in Juju 2.x based on your observations.

Changed in juju-core:
status: Incomplete → Won't Fix
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

@anastasia,

Okay, but there's definite weirdness going on. This is AFTER the joined hook has run. It's not that the data is missing, it's that charms.reactive can't find the relation during a subsequent hook (for the same relation). This is what seems to fail. It may be a charms.reactive problem, but I'm guessing that just uses the relation-ids to find the relation. I worked around it by catching the error, but I still think there is a bug lurking somewhere in there.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.