juju-core

destroy-unit --force

Bug #1089289 reported by William Reade on 2012-12-12

This bug affects 16 people

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Won't Fix	High	William Reade

Bug Description

At some stage, we need to implement this flag, which forcibly sets the Unit to Dead; this will be used to work around the possibility of non-responsive (or just plain broken) unit agents (which can otherwise block the destruction of machines, services, and relations).

This is potentially tricky because it may necessitate the cleanup of large numbers of relations, subordinates, the subordinates' relations, and potentially some services.

See original description

Tags:

Related branches

lp:~fwereade/juju-core/fix-1089289-for-1.16

Merged into lp:juju-core/1.16 at revision 1985

Juju Engineering: Pending requested 2013-11-13

William Reade (fwereade) on 2012-12-12

description:

updated

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2013-05-02:

This has bitten me a few times already, units getting wedged (more than one underlying cause) and there's no recourse except destroying the environment.

Revision history for this message

William Reade (fwereade) wrote on 2013-05-06:

With reference to lp:1173224, I'm inclined to redefine the desired action as "the unit agent should run all hooks appropriate to Dying as usual, but ignore all errors". Tolerable?

Kapil Thangavelu (hazmat) on 2013-05-23

Changed in juju-core:
status:	New → Confirmed

Revision history for this message

David Britton (dpb) wrote on 2013-05-24:

It seems this state gets entered quite often if there is ever an error in your deployment. I can repeat it with:

juju deploy service
# make sure ^ has some kind of deployment error
juju destroy-service service # unit will not go away
juju resolved service
juju destroy-service service #unit is gone, but service is in 'dying' state
# at this point you are stuck.

Revision history for this message

William Reade (fwereade) wrote on 2013-06-15:

David, I haven't been able to reproduce that situation... I'd expect that a charm in which every hook failed would need to have 4 hooks resolved before it was finally removed (install, config-changed, start, stop); although, in the course of investigating this, I did find that the unit agent can sometimes resolve more than one error in response to a single request.

Next time you encounter it, would you ping me in #juju-dev so I can try to investigate a bit more?

Changed in juju-core:
status:	Confirmed → Triaged
importance:	Undecided → High

Revision history for this message

William Reade (fwereade) wrote on 2013-06-15:

See also lp:1190715

Nick Veitch (evilnick) on 2013-06-28

tags:

added: doc

William Reade (fwereade) on 2013-09-28

summary:

- remove-unit --force
+ destroy-unit --force

Curtis Hovey (sinzui) on 2013-10-12

tags:

added: destroy-unit

Revision history for this message

Jeff Lane  (bladernr) wrote on 2013-10-20:

I can reliably create this in EC2 with the quantum-gateway charm:

ubuntu@ip-10-0-0-14:~$ juju status quantum-gateway
environment: amazon
machines:
  "14":
    agent-state: started
    agent-version: 1.16.0
    dns-name: ec2-54-205-199-95.compute-1.amazonaws.com
    instance-id: i-cfa5c3a8
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
services:
  quantum-gateway:
    charm: cs:precise/quantum-gateway-7
    exposed: false
    relations:
      amqp:
      - rabbitmq-server
      cluster:
      - quantum-gateway
      quantum-network-service:
      - nova-cloud-controller
      shared-db:
      - mysql
    units:
      quantum-gateway/0:
        agent-state: down
        agent-state-info: (installed)
        agent-version: 1.16.0
        life: dying
        machine: "14"

so I did this:

juju deploy --config $YAML quantum-gateway
juju add-relation quantum-gateway mysql
juju add-relation quantum-gateway nova-cloud-controller
juju add-relation quantum-gateway rabbitmq-server

my yaml file has this for quantum-gateway:
quantum-gateway:
openstack-origin: cloud:precise-grizzly/updates
ext-port: 'eth0'

and with this, every time I try to deploy quantum-gateway, the instance spins up then the EC2 dashboard shows it failing the second check (the 2nd check is connectivity after booting).

I am unable to contact this node at all, via juju ssh, direct ssh, or any other means, so I'm thinking something in the charm may be re-writing the network config, but that's really just a guess as I can't access the node to check the logs to see what happens.

Curtis Hovey (sinzui) on 2013-10-25

tags:

added: docs
removed: doc

Curtis Hovey (sinzui) on 2013-11-07

tags:

added: cts-cloud-review

William Reade (fwereade) on 2013-11-13

Changed in juju-core:
assignee:	nobody → William Reade (fwereade)
milestone:	none → 2.0
status:	Triaged → In Progress

Revision history for this message

William Reade (fwereade) wrote on 2013-11-15:

Sorry, the progress reported on this was for lp:1089291 -- I had a miswiring in my brain.

Changed in juju-core:
status:	In Progress → Triaged

Revision history for this message

William Reade (fwereade) wrote on 2014-01-02:

A unit that has not yet started running can already be removed with a plain `destroy-unit`, but once it's started a forcible removal becomes unsafe -- any processes started by the charm will continue to run, will interact unpredictably with new units on the same machine, and may even become a security risk. `destroy-machine --force` allows a whole machine (and all its units) to be decommissioned at once, and is the only safe way to accomplish forcible removal of running units; hence, WONTFIX.

Changed in juju-core:
status:	Triaged → Won't Fix

Curtis Hovey (sinzui) on 2014-03-04

Changed in juju-core:
milestone:	2.0 → none

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2014-03-30:

This was marked as a pre-requisite for service-destroy --force. The machine is not reusable for multiple for deploys (its marked dirty). This over abundance of caution when cleaning up is causing usability issues. Either juju is a tool for deployments at scale (ie manage thousand of units) or its not... having the admin manually have to clean up after juju after already expressing intent. Alternatively we should document killing the machine on force suffices and default to that for service-destroy --force

Revision history for this message

HeinMueck (cperz) wrote on 2014-08-03:

#10

It would be funny, woud it not make you weep.

- Juju can run one charm at a time. So you either have one machine or one unit per charm.
- You go with the flow and install some openstack components in units
- one breaks, you want to reinstall?

Well, no problem, just kill your fine infrastructure at once and rebuild it from scratch - your customers will love you for that.

Sorry guys, no kididng. How would you try to convince me it would be any good idea using maas and juju for setting up an infrastructure?

Wontfix = dontuse

Amazing.

Revision history for this message

William Reade (fwereade) wrote on 2014-08-03:

#11

I don't quite understand the missing use case here.

If you want to deploy one unit per machine, you can, and then force-destroying the machine is equivalent to force-destroying the unit.

If you want to deploy multiple units per machine, you can, in two ways:

If you want to hulk-smash them together in the same OS, you risk unexpected interactions; and because force-destroying a unit on such a system would leave the machine in a dangerously unknowable state, we just forbid that and require that you clear down the whole machine.

BUT, if you want your units to be on the same hardware, but nicely isolated from one another -- which you probably want anyway -- you can deploy units into containers on the top-level machines; and then if a unit misbehaves you can force-destroy its container (which is just another machine to juju) and leave the parent machine -- and its other containers -- untouched.

What's the scenario that causes you to have to rebuild anything from scratch?

Revision history for this message

HeinMueck (cperz) wrote on 2014-08-06:

#12

My understanding was that this behavior is intended, that you go for destroying the machine with all units.

My usecase is an OpenStack Deployment (small environment, not so many machines, but also not much traffic), where all the "minor" services go into one machine, but separated units. Compute gets its own machine.

Now, I wanted to extend the original installation and this required a configuration change to one of the compontents. As this setting only applies at deploy time, I had to redeploy this component. Did not work.

Your intended workflow I understand as destroy the machine and all its units - thus "the cloud" - and redeploy.

It left me frustrated. And especially the case of managing a cloud installation on bare metal with MaaS and juju seems questionable reading your statement. For most customer workloads are "cheap", I understand the vote for destroy and rebuild in this domain.

Revision history for this message

Jason Meinzer (meinzerj) wrote on 2015-02-12:

#13

I'm running into similar problems while testing juju and learning about OpenStack. I've had to completely destroy my MAAS, Juju, etc environments more times than I can count. While this is fine during testing, I've started to wonder if it's even possible to use Juju in production. Rebootstrapping everything when you run into inevitable problems isn't a solution.

Revision history for this message

Fabrice Matrat (fabricematrat) wrote on 2016-02-05:

#14

I have manage to deploy a subordinate unit but destroyed it and the service just after.
The charm wasn't even installed so the subordinate unit wants to install before destroying itself which won't ever happen.
I can't force destroy the subordinate unit so the only thing left might be to delete the machine with all the unit already well in place.
This is the kind of situation where I really would love a force destroy of a unit.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1173224

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.