service with no units stuck in lifecycle dying

Bug #1233457 reported by Kapil Thangavelu
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
William Reade
1.16
Fix Released
Critical
William Reade
juju-core (Ubuntu)
Fix Released
Undecided
Unassigned
Saucy
Won't Fix
Undecided
Unassigned

Bug Description

[Impact]
Services will no service units get stuck in 'dying' state preventing their removal from a deployment.

[Test Case]
juju deploy mysql
juju terminate-machine --force <machineid of mysql>
juju destroy-service mysql

[Regression Potential]
Part of the upstream tested 1.16.6 release. Change looks limited to impacted code path only

[Original Report]
[Report from the field, a service with no units (previously destroyed) is stuck in lifecycle dying. Per status snippet

mysql:
 charm: local:precise/mysql-309
 exposed:false
 life: dying
 relations:
     cluster:
     - mysql

Related branches

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

poking at the underlying mongodb shows that the mysql service still has an extant relation and no units, per william on irc <fwereade> hazmat, to me the really critical thing is that one of those units apparently managed to leave scope without updating the relation doc's unitcount

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

<fwereade> hazmat, the service was kept alive by the relation, which was kept alive by its unit count, which implied there'd be a unit to do the final leavescope and set off the dominos to take down the relation and the service

John A Meinel (jameinel)
Changed in juju-core:
importance: Undecided → High
status: New → Triaged
Curtis Hovey (sinzui)
tags: added: destroy-service
Revision history for this message
Kapil Thangavelu (hazmat) wrote : Re: [Bug 1233457] Re: service with no units stuck in lifecycle dying

I have an export of the mongodb for this environment, if anyone needs it
for additional analysis.

On Sat, Oct 12, 2013 at 1:21 PM, Curtis Hovey <email address hidden> wrote:

> ** Tags added: destroy-service
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1233457
>
> Title:
> service with no units stuck in lifecycle dying
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1233457/+subscriptions
>

Curtis Hovey (sinzui)
tags: added: cts-cloud-review
removed: cts
Changed in juju-core:
milestone: none → 1.17.0
Revision history for this message
Curtis Hovey (sinzui) wrote :

This issue relates to bug 1205451. In this case, the machine terminated before the state server could tell the agent that it is dead. In the other bug, the machine terminated for other reasons. In both cases, The state-server does not recognise that the agent and machine are gone, so it only needs to remove the record of the agent.

Revision history for this message
Kapil Thangavelu (hazmat) wrote : Re: [Bug 1233457] [NEW] service with no units stuck in lifecycle dying

On Friday, October 25, 2013, Curtis Hovey wrote:

> This issue relates to bug 1205451. In this case, the machine terminated
> before the state server could tell the agent that it is dead. In the
> other bug, the machine terminated for other reasons. In both cases, The
> state-server does not recognise that the agent and machine are gone, so
> it only needs to remove the record of the agent.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1233457
>
> Title:
> service with no units stuck in lifecycle dying
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1233457/+subscriptions
>

Revision history for this message
William Reade (fwereade) wrote :

I don't *think* it's related to lp:1205451 -- according to the transaction log captured by hazmat, mysql/0 never actually tried to leave relation scope for that relation... but *did* otherwise shut down cleanly. This points to a bug in the Uniter itself; still investigating.

Curtis Hovey (sinzui)
tags: added: state-server
Revision history for this message
William Reade (fwereade) wrote :

Root cause remains undetermined, but we can still ensure units are not removed without leaving all their relation scopes. Fix in progress.

Changed in juju-core:
assignee: nobody → William Reade (fwereade)
Curtis Hovey (sinzui)
Changed in juju-core:
status: Triaged → In Progress
William Reade (fwereade)
Changed in juju-core:
milestone: 1.17.0 → 2.0
Mark Ramm (mark-ramm)
Changed in juju-core:
importance: High → Critical
milestone: 2.0 → 1.17.0
Tim Penhey (thumper)
Changed in juju-core:
status: In Progress → Fix Committed
William Reade (fwereade)
Changed in juju-core:
milestone: 1.17.0 → 2.0
status: Fix Committed → In Progress
William Reade (fwereade)
Changed in juju-core:
milestone: 2.0 → 1.17.0
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
importance: Critical → High
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
James Page (james-page)
Changed in juju-core (Ubuntu):
status: New → Fix Released
James Page (james-page)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in juju-core (Ubuntu Saucy):
status: New → Confirmed
Revision history for this message
Rolf Leggewie (r0lf) wrote :

saucy has seen the end of its life and is no longer receiving any updates. Marking the saucy task for this ticket as "Won't Fix".

Changed in juju-core (Ubuntu Saucy):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.