Merge into trunk : null-provider-destroy-environment : Code : juju-core

Status:	Work in progress
Proposed branch:	lp:~axwalk/juju-core/null-provider-destroy-environment
Merge into:	lp:~go-bot/juju-core/trunk
Diff against target:	179 lines (+124/-4) 3 files modified provider/common/destroy.go (+111/-0) provider/null/environ.go (+13/-0) provider/null/environ_test.go (+0/-4)
To merge this branch:	bzr merge lp:~axwalk/juju-core/null-provider-destroy-environment
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Juju Engineering		2013-11-01	Pending
Review via email: mp+193538@code.launchpad.net

Description of the change

provider/null: first half of Environ.Destroy()

This is CL 1 of 2: destroy-environment will work
by first destroying all units, and non-environment
manager machines; then, the environment-manager
machine(s) will be destroyed by sshing to them
and killing jujud. jujud will catch the signal,
and clean itself up.

The next CL will implement the manager machine
killing bit.

https://codereview.appspot.com/20720043/

Revision history for this message

William Reade (fwereade) wrote on 2013-11-01:

#

I think this needs a bit more thought. This stuff all has to happen over
the API this cycle anyway; this may restrict our ability to ssh into
machines and kill them with signals. Didn't we decide that it'd be best
if a machine agent were to clean *itself* up once the machine were dead?
That'd solve the manual case trivially, and wouldn't matter in other
providers -- if the machine's actually being decommissioned, it doesn't
matter if we interrupt the final cleanup.

https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go
File provider/common/destroy.go (right):

https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go#newcode34
provider/common/destroy.go:34: // DestroyUnitMachines destroys all of
the units and machines without
I think we should be doing all this on the other side of the API,
shouldn't we? This is basically as -much-as-possible a "clean" shutdown
of the environment, and we'll want to be exposing that capability over
the API server.

We'll *also* probably want a "dirty" shutdown that does
as-close-as-possible what we do today -- just saving machines with
manager jobs for last -- but that won't be viable when manually
provisioned machines are in the mix.

https://codereview.appspot.com/20720043/

Reply

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2013-11-01:

#

Reviewers: mp+193538_code.launchpad.net, fwereade,

Message:
On 2013/11/01 09:47:46, fwereade wrote:
> I think this needs a bit more thought. This stuff all has to happen
over the API
> this cycle anyway;

Okay. I thought you said you didn't want to gate anything on
destroy-environment, so I just went for the quick and dirty solution.
FWIW, I think this bit that I have implemented can easily be moved
behind the API server.

> this may restrict our ability to ssh into machines and kill
> them with signals. Didn't we decide that it'd be best if a machine
agent were to
> clean *itself* up once the machine were dead? That'd solve the manual
case
> trivially, and wouldn't matter in other providers -- if the machine's
actually
> being decommissioned, it doesn't matter if we interrupt the final
cleanup.

Yes, this is what I would prefer to do. I was under the impression
destroy-environment via API was out of the picture.

So, if we do move destroy-environment behind the API, then we still need
to call DestroyUnitMachines and then tear down the API server, right?
We'd call a new "DestroyEnvironment" API, and the API server would call
Environ.Destroy. That would call DestroyUnitMachines and then tear
itself down.

https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go
> File provider/common/destroy.go (right):

https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go#newcode34
> provider/common/destroy.go:34: // DestroyUnitMachines destroys all of
the units
> and machines without
> I think we should be doing all this on the other side of the API,
shouldn't we?
> This is basically as -much-as-possible a "clean" shutdown of the
environment,
> and we'll want to be exposing that capability over the API server.

Yes, I think that's ideal.

> We'll *also* probably want a "dirty" shutdown that does
as-close-as-possible
> what we do today -- just saving machines with manager jobs for last --
but that
> won't be viable when manually provisioned machines are in the mix.

Not sure I understand this bit. Why do we need two types of shutdown?

Description:
provider/null: first half of Environ.Destroy()

This is CL 1 of 2: destroy-environment will work
by first destroying all units, and non-environment
manager machines; then, the environment-manager
machine(s) will be destroyed by sshing to them
and killing jujud. jujud will catch the signal,
and clean itself up.

The next CL will implement the manager machine
killing bit.

https://code.launchpad.net/~axwalk/juju-core/null-provider-destroy-environment/+merge/193538

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/20720043/

Affected files (+126, -4 lines):
   A [revision details]
   M provider/common/destroy.go
   M provider/null/environ.go
   M provider/null/environ_test.go

Reviewers: mp+193538_code.launchpad.net, fwereade,

Message:
On 2013/11/01 09:47:46, fwereade wrote:
> I think this needs a bit more thought. This stuff all has to happen
over the API
> this cycle anyway;

Okay. I thought you said you didn't want to gate anything on
destroy-environment, so I just went for the quick and dirty solution.
FWIW, I think this bit that I have implemented can easily be moved
behind the API server.

> this may restrict our ability to ssh into machines and kill
> them with signals. Didn't we decide that it'd be best if a machine
agent were to
> clean *itself* up once the machine were dead? That'd solve the manual
case
> trivially, and wouldn't matter in other providers -- if the machine's
actually
> being decommissioned, it doesn't matter if we interrupt the final
cleanup.

Yes, this is what I would prefer to do. I was under the impression
destroy-environment via API was out of the picture.

So, if we do move destroy-environment behind the API, then we still need
to call DestroyUnitMachines and then tear down the API server, right?
We'd call a new "DestroyEnvironment" API, and the API server would call
Environ.Destroy. That would call DestroyUnitMachines and then tear
itself down.

https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go
> File provider/common/destroy.go (right):

https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go#newcode34
> provider/common/destroy.go:34: // DestroyUnitMachines destroys all of
the units
> and machines without
> I think we should be doing all this on the other side of the API,
shouldn't we?
> This is basically as -much-as-possible a "clean" shutdown of the
environment,
> and we'll want to be exposing that capability over the API server.

Yes, I think that's ideal.

> We'll *also* probably want a "dirty" shutdown that does
as-close-as-possible
> what we do today -- just saving machines with manager jobs for last --
but that
> won't be viable when manually provisioned machines are in the mix.

Not sure I understand this bit. Why do we need two types of shutdown?

Description:
provider/null: first half of Environ.Destroy()

This is CL 1 of 2: destroy-environment will work
by first destroying all units, and non-environment
manager machines; then, the environment-manager
machine(s) will be destroyed by sshing to them
and killing jujud. jujud will catch the signal,
and clean itself up.

The next CL will implement the manager machine
killing bit.

https://code.launchpad.net/~axwalk/juju-core/null-provider-destroy-environment/+merge/193538

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/20720043/

Affected files (+126, -4 lines):
   A [revision details]
   M provider/common/destroy.go
   M provider/null/environ.go
   M provider/null/environ_test.go

Reply

Revision history for this message

John A Meinel (jameinel) wrote on 2013-11-02:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

> https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go#newcode34
>
>
>
provider/common/destroy.go:34: // DestroyUnitMachines destroys all of
> the units and machines without I think we should be doing all this
> on the other side of the API, shouldn't we? This is basically as
> -much-as-possible a "clean" shutdown of the environment, and we'll
> want to be exposing that capability over the API server.
>
> We'll *also* probably want a "dirty" shutdown that does
> as-close-as-possible what we do today -- just saving machines with
> manager jobs for last -- but that won't be viable when manually
> provisioned machines are in the mix.
>
> https://codereview.appspot.com/20720043/
>

I was thinking about that. I wondered if we might want the default
"juju destroy-environment" command to actually kick off the clean
destruction, monitor its progress, and then do a dirty one if things
seem to hang.

Thoughts?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJ1YWMACgkQJdeBCYSNAAODMQCeNDu1X/xx7XK9kRUCyuO6QE/R
YnkAn0lTUX1Oana+ZeUZZEp5f6LzbPkC
=5ism
-----END PGP SIGNATURE-----

Reply

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2013-11-05:

#

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> ...
>
> > https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go#ne
> wcode34
> >
> >
> >
> provider/common/destroy.go:34: // DestroyUnitMachines destroys all of
> > the units and machines without I think we should be doing all this
> > on the other side of the API, shouldn't we? This is basically as
> > -much-as-possible a "clean" shutdown of the environment, and we'll
> > want to be exposing that capability over the API server.
> >
> > We'll *also* probably want a "dirty" shutdown that does
> > as-close-as-possible what we do today -- just saving machines with
> > manager jobs for last -- but that won't be viable when manually
> > provisioned machines are in the mix.
> >
> > https://codereview.appspot.com/20720043/
> >
>
> I was thinking about that. I wondered if we might want the default
> "juju destroy-environment" command to actually kick off the clean
> destruction, monitor its progress, and then do a dirty one if things
> seem to hang.
>
> Thoughts?

What would be the benefit? Is it just to handle manually provisioned machines? Otherwise I don't see any value, unless there's a problem with the way we do it now that I'm not aware of.

Reply

Unmerged revisions

2010. By Andrew Wilkins on 2013-11-01

provider/null: first half of Environ.Destroy()

This is CL 1 of 2: destroy-environment will work
by first destroying all units, and non-environment
manager machines; then, the environment-manager
machine(s) will be destroyed by sshing to them
and killing jujud. jujud will catch the signal,
and clean itself up.

The next CL will implement the manager machine
killing bit.

 === modified file 'provider/common/destroy.go'
 --- provider/common/destroy.go	2013-10-02 00:29:29 +0000
 +++ provider/common/destroy.go	2013-11-01 03:27:23 +0000
@@ -4,7 +4,13 @@
  package common
  import (
++	"fmt"
++
  	"launchpad.net/juju-core/environs"
++	coreerrors "launchpad.net/juju-core/errors"
++	"launchpad.net/juju-core/juju"
++	"launchpad.net/juju-core/state"
++	"launchpad.net/juju-core/utils"
+ )
  // Destroy is a common implementation of the Destroy method defined on
@@ -24,3 +30,108 @@
+ 	}
  	return err
+ }
++
++// DestroyUnitMachines destroys all of the units and machines without
++// the JobManageEnviron job.
++//
++// The supplied AttemptStrategy governs how long the entire operation
++// may take overall (attempt.Total), and how long each individual action
++// may take (attempt.Delay).
++func DestroyUnitMachines(env environs.Environ, attemptStrategy utils.AttemptStrategy) error {
++	conn, err := juju.NewConn(env)
++	if err != nil {
++		return err
++	}
++	defer conn.Close()
++	machines, err := conn.State.AllMachines()
++	if err != nil {
++		return err
++	}
++	attempt := attemptStrategy.Start()
++
++	// Destroy units on all machines. Some providers may allow
++	// units to be hosted on machines with JobManageEnviron,
++	// hence the additional loop.
++	if err := destroyUnits(machines, attempt); err != nil {
++		return err
++	}
++	for _, m := range machines {
++		if !hasJob(m, state.JobManageEnviron) {
++			logger.Infof("destroying %v", m.Tag())
++			if err := m.Destroy(); err != nil {
++				logger.Errorf("failed to destroy %v", m.Tag())
++				return err
++			}
++		}
++	}
++	for _, m := range machines {
++		if !hasJob(m, state.JobManageEnviron) {
++			logger.Infof("waiting for %v to die", m.Tag())
++			if !waitNotFound(m.Refresh, attempt) {
++				return fmt.Errorf("%s was not removed from state in a timely manner", m.Tag())
++			}
++		}
++	}
++	return nil
++}
++
++// destroyUnits destroys all of the principal units on the specified machines,
++// and waits for them to be removed from state.
++func destroyUnits(machines []*state.Machine, attempt *utils.Attempt) error {
++	// First, advance all units to Dying.
++	var allUnits []*state.Unit
++	for _, m := range machines {
++		logger.Infof("destroying units on %v", m.Tag())
++		units, err := m.Units()
++		if err != nil {
++			logger.Errorf("failed to list units on %v: %v", m.Tag(), err)
++			return err
++		}
++		for _, u := range units {
++			if !u.IsPrincipal() {
++				continue
++			}
++			logger.Infof("destroying %v", u.Tag())
++			if err := u.Destroy(); err != nil {
++				logger.Errorf("failed to destroy %v: %v", m.Tag(), err)
++				return err
++			}
++			allUnits = append(allUnits, u)
++		}
++	}
++	// Now wait for the units to be removed from state.
++	for _, u := range allUnits {
++		logger.Infof("waiting for %v to die", u.Tag())
++		if !waitNotFound(u.Refresh, attempt) {
++			return fmt.Errorf("%s was not removed from state in a timely manner", u.Tag())
++		}
++	}
++	return nil
++}
++
++// waitNotFound calls the provided function in a loop, until either the
++// function returns an error that satisfies errors.IsNotFoundError, or no
++// more attempts are allowed.
++//
++// waitNotFound returns true if the function returned a satisfying error,
++// and false otherwise.
++func waitNotFound(f func() error, attempt *utils.Attempt) bool {
++	for {
++		if err := f(); coreerrors.IsNotFoundError(err) {
++			return true
++		}
++		if !attempt.Next() {
++			return false
++		}
++	}
++}
++
++// hasJob returns true iff the machine has a specified job.
++func hasJob(m *state.Machine, j state.MachineJob) bool {
++	for _, mj := range m.Jobs() {
++		if mj == j {
++			return true
++		}
++	}
++	return false
++}
 === modified file 'provider/null/environ.go'
 --- provider/null/environ.go	2013-10-24 00:20:59 +0000
 +++ provider/null/environ.go	2013-11-01 03:27:23 +0000
@@ -8,6 +8,7 @@
  	"net"
  	"path"
  	"sync"
++	"time"
  	"launchpad.net/loggo"
@@ -26,6 +27,7 @@
  	"launchpad.net/juju-core/state"
  	"launchpad.net/juju-core/state/api"
  	"launchpad.net/juju-core/tools"
++	"launchpad.net/juju-core/utils"
  	"launchpad.net/juju-core/worker/localstorage"
+ )
@@ -181,6 +183,17 @@
+ }
  func (e *nullEnviron) Destroy() error {
++	// Destroy units and machines that may host them. This will leave only
++	// the bootstrap host's machine agent intact, which we must tear down
++	// separately.
++	attemptStrategy := utils.AttemptStrategy{
++		Total: 60 * time.Second,
++		Delay: 250 * time.Millisecond,
++	}
++	if err := common.DestroyUnitMachines(e, attemptStrategy); err != nil {
++		return err
++	}
++	// TODO(axw) tear down machine 0 by ssh execing pkill -SIGABRT jujud.
  	return errors.New("null provider destruction is not implemented yet")
+ }
 === modified file 'provider/null/environ_test.go'
 --- provider/null/environ_test.go	2013-10-21 21:49:04 +0000
 +++ provider/null/environ_test.go	2013-11-01 03:27:23 +0000
@@ -78,10 +78,6 @@
  	c.Assert(instances[0], gc.IsNil)
+ }
--func (s *environSuite) TestDestroy(c *gc.C) {
--	c.Assert(s.env.Destroy(), gc.ErrorMatches, "null provider destruction is not implemented yet")
--}
--
  func (s *environSuite) TestLocalStorageConfig(c *gc.C) {
  	c.Assert(s.env.StorageDir(), gc.Equals, "/var/lib/juju/storage")
  	c.Assert(s.env.cfg.storageListenAddr(), gc.Equals, ":8040")

juju-core

Merge lp:~axwalk/juju-core/null-provider-destroy-environment into lp:~go-bot/juju-core/trunk

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers