Merge lp:~axwalk/juju-core/null-provider-destroy-environment into lp:~go-bot/juju-core/trunk

Proposed by Andrew Wilkins
Status: Work in progress
Proposed branch: lp:~axwalk/juju-core/null-provider-destroy-environment
Merge into: lp:~go-bot/juju-core/trunk
Diff against target: 179 lines (+124/-4)
3 files modified
provider/common/destroy.go (+111/-0)
provider/null/environ.go (+13/-0)
provider/null/environ_test.go (+0/-4)
To merge this branch: bzr merge lp:~axwalk/juju-core/null-provider-destroy-environment
Reviewer Review Type Date Requested Status
Juju Engineering Pending
Review via email: mp+193538@code.launchpad.net

Description of the change

provider/null: first half of Environ.Destroy()

This is CL 1 of 2: destroy-environment will work
by first destroying all units, and non-environment
manager machines; then, the environment-manager
machine(s) will be destroyed by sshing to them
and killing jujud. jujud will catch the signal,
and clean itself up.

The next CL will implement the manager machine
killing bit.

https://codereview.appspot.com/20720043/

To post a comment you must log in.
Revision history for this message
William Reade (fwereade) wrote :

I think this needs a bit more thought. This stuff all has to happen over
the API this cycle anyway; this may restrict our ability to ssh into
machines and kill them with signals. Didn't we decide that it'd be best
if a machine agent were to clean *itself* up once the machine were dead?
That'd solve the manual case trivially, and wouldn't matter in other
providers -- if the machine's actually being decommissioned, it doesn't
matter if we interrupt the final cleanup.

https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go
File provider/common/destroy.go (right):

https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go#newcode34
provider/common/destroy.go:34: // DestroyUnitMachines destroys all of
the units and machines without
I think we should be doing all this on the other side of the API,
shouldn't we? This is basically as -much-as-possible a "clean" shutdown
of the environment, and we'll want to be exposing that capability over
the API server.

We'll *also* probably want a "dirty" shutdown that does
as-close-as-possible what we do today -- just saving machines with
manager jobs for last -- but that won't be viable when manually
provisioned machines are in the mix.

https://codereview.appspot.com/20720043/

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Reviewers: mp+193538_code.launchpad.net, fwereade,

Message:
On 2013/11/01 09:47:46, fwereade wrote:
> I think this needs a bit more thought. This stuff all has to happen
over the API
> this cycle anyway;

Okay. I thought you said you didn't want to gate anything on
destroy-environment, so I just went for the quick and dirty solution.
FWIW, I think this bit that I have implemented can easily be moved
behind the API server.

> this may restrict our ability to ssh into machines and kill
> them with signals. Didn't we decide that it'd be best if a machine
agent were to
> clean *itself* up once the machine were dead? That'd solve the manual
case
> trivially, and wouldn't matter in other providers -- if the machine's
actually
> being decommissioned, it doesn't matter if we interrupt the final
cleanup.

Yes, this is what I would prefer to do. I was under the impression
destroy-environment via API was out of the picture.

So, if we do move destroy-environment behind the API, then we still need
to call DestroyUnitMachines and then tear down the API server, right?
We'd call a new "DestroyEnvironment" API, and the API server would call
Environ.Destroy. That would call DestroyUnitMachines and then tear
itself down.

https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go
> File provider/common/destroy.go (right):

https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go#newcode34
> provider/common/destroy.go:34: // DestroyUnitMachines destroys all of
the units
> and machines without
> I think we should be doing all this on the other side of the API,
shouldn't we?
> This is basically as -much-as-possible a "clean" shutdown of the
environment,
> and we'll want to be exposing that capability over the API server.

Yes, I think that's ideal.

> We'll *also* probably want a "dirty" shutdown that does
as-close-as-possible
> what we do today -- just saving machines with manager jobs for last --
but that
> won't be viable when manually provisioned machines are in the mix.

Not sure I understand this bit. Why do we need two types of shutdown?

Description:
provider/null: first half of Environ.Destroy()

This is CL 1 of 2: destroy-environment will work
by first destroying all units, and non-environment
manager machines; then, the environment-manager
machine(s) will be destroyed by sshing to them
and killing jujud. jujud will catch the signal,
and clean itself up.

The next CL will implement the manager machine
killing bit.

https://code.launchpad.net/~axwalk/juju-core/null-provider-destroy-environment/+merge/193538

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/20720043/

Affected files (+126, -4 lines):
   A [revision details]
   M provider/common/destroy.go
   M provider/null/environ.go
   M provider/null/environ_test.go

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

> https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go#newcode34
>
>
>
provider/common/destroy.go:34: // DestroyUnitMachines destroys all of
> the units and machines without I think we should be doing all this
> on the other side of the API, shouldn't we? This is basically as
> -much-as-possible a "clean" shutdown of the environment, and we'll
> want to be exposing that capability over the API server.
>
> We'll *also* probably want a "dirty" shutdown that does
> as-close-as-possible what we do today -- just saving machines with
> manager jobs for last -- but that won't be viable when manually
> provisioned machines are in the mix.
>
> https://codereview.appspot.com/20720043/
>

I was thinking about that. I wondered if we might want the default
"juju destroy-environment" command to actually kick off the clean
destruction, monitor its progress, and then do a dirty one if things
seem to hang.

Thoughts?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJ1YWMACgkQJdeBCYSNAAODMQCeNDu1X/xx7XK9kRUCyuO6QE/R
YnkAn0lTUX1Oana+ZeUZZEp5f6LzbPkC
=5ism
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Wilkins (axwalk) wrote :

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> ...
>
> > https://codereview.appspot.com/20720043/diff/1/provider/common/destroy.go#ne
> wcode34
> >
> >
> >
> provider/common/destroy.go:34: // DestroyUnitMachines destroys all of
> > the units and machines without I think we should be doing all this
> > on the other side of the API, shouldn't we? This is basically as
> > -much-as-possible a "clean" shutdown of the environment, and we'll
> > want to be exposing that capability over the API server.
> >
> > We'll *also* probably want a "dirty" shutdown that does
> > as-close-as-possible what we do today -- just saving machines with
> > manager jobs for last -- but that won't be viable when manually
> > provisioned machines are in the mix.
> >
> > https://codereview.appspot.com/20720043/
> >
>
> I was thinking about that. I wondered if we might want the default
> "juju destroy-environment" command to actually kick off the clean
> destruction, monitor its progress, and then do a dirty one if things
> seem to hang.
>
> Thoughts?

What would be the benefit? Is it just to handle manually provisioned machines? Otherwise I don't see any value, unless there's a problem with the way we do it now that I'm not aware of.

Unmerged revisions

2010. By Andrew Wilkins

provider/null: first half of Environ.Destroy()

This is CL 1 of 2: destroy-environment will work
by first destroying all units, and non-environment
manager machines; then, the environment-manager
machine(s) will be destroyed by sshing to them
and killing jujud. jujud will catch the signal,
and clean itself up.

The next CL will implement the manager machine
killing bit.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'provider/common/destroy.go'
2--- provider/common/destroy.go 2013-10-02 00:29:29 +0000
3+++ provider/common/destroy.go 2013-11-01 03:27:23 +0000
4@@ -4,7 +4,13 @@
5 package common
6
7 import (
8+ "fmt"
9+
10 "launchpad.net/juju-core/environs"
11+ coreerrors "launchpad.net/juju-core/errors"
12+ "launchpad.net/juju-core/juju"
13+ "launchpad.net/juju-core/state"
14+ "launchpad.net/juju-core/utils"
15 )
16
17 // Destroy is a common implementation of the Destroy method defined on
18@@ -24,3 +30,108 @@
19 }
20 return err
21 }
22+
23+// DestroyUnitMachines destroys all of the units and machines without
24+// the JobManageEnviron job.
25+//
26+// The supplied AttemptStrategy governs how long the entire operation
27+// may take overall (attempt.Total), and how long each individual action
28+// may take (attempt.Delay).
29+func DestroyUnitMachines(env environs.Environ, attemptStrategy utils.AttemptStrategy) error {
30+ conn, err := juju.NewConn(env)
31+ if err != nil {
32+ return err
33+ }
34+ defer conn.Close()
35+ machines, err := conn.State.AllMachines()
36+ if err != nil {
37+ return err
38+ }
39+ attempt := attemptStrategy.Start()
40+
41+ // Destroy units on all machines. Some providers may allow
42+ // units to be hosted on machines with JobManageEnviron,
43+ // hence the additional loop.
44+ if err := destroyUnits(machines, attempt); err != nil {
45+ return err
46+ }
47+ for _, m := range machines {
48+ if !hasJob(m, state.JobManageEnviron) {
49+ logger.Infof("destroying %v", m.Tag())
50+ if err := m.Destroy(); err != nil {
51+ logger.Errorf("failed to destroy %v", m.Tag())
52+ return err
53+ }
54+ }
55+ }
56+ for _, m := range machines {
57+ if !hasJob(m, state.JobManageEnviron) {
58+ logger.Infof("waiting for %v to die", m.Tag())
59+ if !waitNotFound(m.Refresh, attempt) {
60+ return fmt.Errorf("%s was not removed from state in a timely manner", m.Tag())
61+ }
62+ }
63+ }
64+ return nil
65+}
66+
67+// destroyUnits destroys all of the principal units on the specified machines,
68+// and waits for them to be removed from state.
69+func destroyUnits(machines []*state.Machine, attempt *utils.Attempt) error {
70+ // First, advance all units to Dying.
71+ var allUnits []*state.Unit
72+ for _, m := range machines {
73+ logger.Infof("destroying units on %v", m.Tag())
74+ units, err := m.Units()
75+ if err != nil {
76+ logger.Errorf("failed to list units on %v: %v", m.Tag(), err)
77+ return err
78+ }
79+ for _, u := range units {
80+ if !u.IsPrincipal() {
81+ continue
82+ }
83+ logger.Infof("destroying %v", u.Tag())
84+ if err := u.Destroy(); err != nil {
85+ logger.Errorf("failed to destroy %v: %v", m.Tag(), err)
86+ return err
87+ }
88+ allUnits = append(allUnits, u)
89+ }
90+ }
91+ // Now wait for the units to be removed from state.
92+ for _, u := range allUnits {
93+ logger.Infof("waiting for %v to die", u.Tag())
94+ if !waitNotFound(u.Refresh, attempt) {
95+ return fmt.Errorf("%s was not removed from state in a timely manner", u.Tag())
96+ }
97+ }
98+ return nil
99+}
100+
101+// waitNotFound calls the provided function in a loop, until either the
102+// function returns an error that satisfies errors.IsNotFoundError, or no
103+// more attempts are allowed.
104+//
105+// waitNotFound returns true if the function returned a satisfying error,
106+// and false otherwise.
107+func waitNotFound(f func() error, attempt *utils.Attempt) bool {
108+ for {
109+ if err := f(); coreerrors.IsNotFoundError(err) {
110+ return true
111+ }
112+ if !attempt.Next() {
113+ return false
114+ }
115+ }
116+}
117+
118+// hasJob returns true iff the machine has a specified job.
119+func hasJob(m *state.Machine, j state.MachineJob) bool {
120+ for _, mj := range m.Jobs() {
121+ if mj == j {
122+ return true
123+ }
124+ }
125+ return false
126+}
127
128=== modified file 'provider/null/environ.go'
129--- provider/null/environ.go 2013-10-24 00:20:59 +0000
130+++ provider/null/environ.go 2013-11-01 03:27:23 +0000
131@@ -8,6 +8,7 @@
132 "net"
133 "path"
134 "sync"
135+ "time"
136
137 "launchpad.net/loggo"
138
139@@ -26,6 +27,7 @@
140 "launchpad.net/juju-core/state"
141 "launchpad.net/juju-core/state/api"
142 "launchpad.net/juju-core/tools"
143+ "launchpad.net/juju-core/utils"
144 "launchpad.net/juju-core/worker/localstorage"
145 )
146
147@@ -181,6 +183,17 @@
148 }
149
150 func (e *nullEnviron) Destroy() error {
151+ // Destroy units and machines that may host them. This will leave only
152+ // the bootstrap host's machine agent intact, which we must tear down
153+ // separately.
154+ attemptStrategy := utils.AttemptStrategy{
155+ Total: 60 * time.Second,
156+ Delay: 250 * time.Millisecond,
157+ }
158+ if err := common.DestroyUnitMachines(e, attemptStrategy); err != nil {
159+ return err
160+ }
161+ // TODO(axw) tear down machine 0 by ssh execing pkill -SIGABRT jujud.
162 return errors.New("null provider destruction is not implemented yet")
163 }
164
165
166=== modified file 'provider/null/environ_test.go'
167--- provider/null/environ_test.go 2013-10-21 21:49:04 +0000
168+++ provider/null/environ_test.go 2013-11-01 03:27:23 +0000
169@@ -78,10 +78,6 @@
170 c.Assert(instances[0], gc.IsNil)
171 }
172
173-func (s *environSuite) TestDestroy(c *gc.C) {
174- c.Assert(s.env.Destroy(), gc.ErrorMatches, "null provider destruction is not implemented yet")
175-}
176-
177 func (s *environSuite) TestLocalStorageConfig(c *gc.C) {
178 c.Assert(s.env.StorageDir(), gc.Equals, "/var/lib/juju/storage")
179 c.Assert(s.env.cfg.storageListenAddr(), gc.Equals, ":8040")

Subscribers

People subscribed via source and target branches

to status/vote changes: