prometheus unit stuck in "maintenance", with no hook running

Bug #1668142 reported by Paul Collins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Prometheus Charm
Fix Released
High
Unassigned

Bug Description

Today during a deploy I discovered my environment was failing to settle because:

prometheus/0* maintenance idle 2 10.25.2.224 9090/tcp,12321/tcp Updating configuration

but without any config-changed hook actively running.

I did a manual "juju run --unit prometheus/0 status-set active", which unstuck things, but it's not clear how the unit ended up in the state in the first place.

One thing that did occur earlier was that scrape-jobs was set to a malformed (but possibly still valid YAML) value, but that happened much earlier than the last update:

    application-status:
      current: maintenance
      message: Updating configuration
      since: 27 Feb 2017 02:04:17Z

but maybe I'm not interpreting "since" correctly. Or maybe the reactive framework thought it should still be in maintenance and therefore keeps applying status-set, updating "since", but the unit is still in "active" and has not returned to "maintenance". Shruggity shrug.

Related branches

Paul Collins (pjdc)
summary: - prometheus unit stuck in "maintenance" with no hookl running
+ prometheus unit stuck in "maintenance", with no hook running
Revision history for this message
Stuart Bishop (stub) wrote :

The handler that sets the status to active never ran, which indicates other handlers that should be run have not run. The unit is in an unknown state, despite having papered over the problem.

Revision history for this message
Stuart Bishop (stub) wrote :

active state only gets set in restart_prometheus(), but reconfiguring may only trigger a reload so the status is not reset.

Changed in prometheus-charm:
status: New → Triaged
importance: Undecided → High
Stuart Bishop (stub)
Changed in prometheus-charm:
status: Triaged → Fix Committed
Tom Haddon (mthaddon)
Changed in prometheus-charm:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.