Merge lp:~dave-cheney/pyjuju/go-provisioning-strawman into lp:pyjuju/go

Proposed by Dave Cheney
Status: Merged
Approved by: Gustavo Niemeyer
Approved revision: 207
Merged at revision: 201
Proposed branch: lp:~dave-cheney/pyjuju/go-provisioning-strawman
Merge into: lp:pyjuju/go
Diff against target: 239 lines (+198/-4)
2 files modified
cmd/jujud/provisioning.go (+114/-3)
cmd/jujud/provisioning_test.go (+84/-1)
To merge this branch: bzr merge lp:~dave-cheney/pyjuju/go-provisioning-strawman
Reviewer Review Type Date Requested Status
Juju Engineering Pending
Review via email: mp+107714@code.launchpad.net

Description of the change

cmd/jujud: strawman provisioning agent

This is a strawman provisioning agent proposal.

Following Gustavo's suggestion, this revision does not include
any functionality to respond to machines changes.

https://codereview.appspot.com/6250068/

To post a comment you must log in.
Revision history for this message
Roger Peppe (rogpeppe) wrote :

this is looking nice.
a few minor remarks below.

https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go
File cmd/jujud/provisioning.go (right):

https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go#newcode17
cmd/jujud/provisioning.go:17: const PROVIDER_MACHINE_ID =
"provider-machine-id"
s/PROVIDER_MACHINE_ID/providerMachineId/

UNDERSCORED_CAPS aren't conventional.

https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go#newcode120
cmd/jujud/provisioning.go:120: return fmt.Errorf("machine-%010d already
reports a provider id %q, skipping", m.Id(), id)
s/machine-%010d/machine %d/

https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go#newcode208
cmd/jujud/provisioning.go:208: return "", fmt.Errorf("findProviderId:
machine-%010d key not found: %q", m.Id(), PROVIDER_MACHINE_ID)
i think 'machine %d' would work better - the %010 stuff is detail
internal to state.

https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go#newcode210
cmd/jujud/provisioning.go:210: if _, ok := id.(string); !ok {
rather than doing the type conversion twice, perhaps:

if id, ok := id.(string); ok {
     return id, nil
}
return "", fmt.Errorf("machine %d has invalid value for %s: %#v",
m.Id(), PROVIDER_MACHINE_ID, id)

https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go#newcode221
cmd/jujud/provisioning.go:221: if len(insts) < 1 {
this can't happen.

https://codereview.appspot.com/6250068/

Revision history for this message
Dave Cheney (dave-cheney) wrote :

Thanks Rog. I'll address those in the pre req by adding machine.InstanceId() and pushing the logic into that state.

d

On 29/05/2012, at 18:44, Roger Peppe <email address hidden> wrote:

> this is looking nice.
> a few minor remarks below.
>
>
>
> https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go
> File cmd/jujud/provisioning.go (right):
>
> https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go#newcode17
> cmd/jujud/provisioning.go:17: const PROVIDER_MACHINE_ID =
> "provider-machine-id"
> s/PROVIDER_MACHINE_ID/providerMachineId/
>
> UNDERSCORED_CAPS aren't conventional.
>
> https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go#newcode120
> cmd/jujud/provisioning.go:120: return fmt.Errorf("machine-%010d already
> reports a provider id %q, skipping", m.Id(), id)
> s/machine-%010d/machine %d/
>
> https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go#newcode208
> cmd/jujud/provisioning.go:208: return "", fmt.Errorf("findProviderId:
> machine-%010d key not found: %q", m.Id(), PROVIDER_MACHINE_ID)
> i think 'machine %d' would work better - the %010 stuff is detail
> internal to state.
>
> https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go#newcode210
> cmd/jujud/provisioning.go:210: if _, ok := id.(string); !ok {
> rather than doing the type conversion twice, perhaps:
>
> if id, ok := id.(string); ok {
> return id, nil
> }
> return "", fmt.Errorf("machine %d has invalid value for %s: %#v",
> m.Id(), PROVIDER_MACHINE_ID, id)
>
> https://codereview.appspot.com/6250068/diff/2001/cmd/jujud/provisioning.go#newcode221
> cmd/jujud/provisioning.go:221: if len(insts) < 1 {
> this can't happen.
>
> https://codereview.appspot.com/6250068/
>
> --
> https://code.launchpad.net/~dave-cheney/juju/go-provisioning-strawman/+merge/107714
> You are the owner of lp:~dave-cheney/juju/go-provisioning-strawman.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

This is going in a great direction Dave, and I'm excited to see this.

At the same time, I'm concerned about the approach. I'm not comfortable
with us knowingly getting in critical chunks of the functionality while
knowing that it "lacks robust
error checking and retry logic", as I can't help you fixing issues and
catching up bugs that you tell me are known but you don't care about
right now.

I'd prefer if this spike, which is a great test bed, was now broken down
into smaller chunks that you are actually *sure* about, so that we and
the other developers can jointly review and agree is a good step
forward.

I don't mind if each of those steps are not entirely functional, for
example. We can have a first one that is just about connecting and
reconnecting to the environment reliably, for instance, ignoring any
machine management, and so on.

What follows is a few additional ideas on it:

https://codereview.appspot.com/6250068/diff/6001/cmd/jujud/provisioning.go
File cmd/jujud/provisioning.go (right):

https://codereview.appspot.com/6250068/diff/6001/cmd/jujud/provisioning.go#newcode17
cmd/jujud/provisioning.go:17: const PROVIDER_MACHINE_ID =
"provider-machine-id"
providerMachineId would be the convention, I think

https://codereview.appspot.com/6250068/diff/6001/cmd/jujud/provisioning_test.go
File cmd/jujud/provisioning_test.go (right):

https://codereview.appspot.com/6250068/diff/6001/cmd/jujud/provisioning_test.go#newcode18
cmd/jujud/provisioning_test.go:18: log.Debug = true
Shouldn't the suite be using the testing package's log helper instead?

https://codereview.appspot.com/6250068/diff/6001/cmd/jujud/provisioning_test.go#newcode126
cmd/jujud/provisioning_test.go:126: <-time.After(2 * time.Second)
That's not great. That's the kind of reason that we need a proper Stop
method on the provisioning agent, so that we can be sure that it is
happy to terminate, rather than waiting a random amount of time for it
to finish running. Otherwise, we're getting into exactly the same kind
of issue we got into in the Python implementation.

Please have a look at the underlying infrastructure of watchers for
inspiration. They reliably terminate, both erroring or not. They also do
that synchronously so that it is possible to wait, and to check for
errors from them.

We need something like that for the provisioning agent.

https://codereview.appspot.com/6250068/

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Some more comments.

Also, it just occurred to me that, as a very first step getting just the
testing environment in place, with zero functionality, in a way that
reliably starts and stops, is already a fantastic task on itself.

https://codereview.appspot.com/6250068/diff/6001/cmd/jujud/provisioning.go
File cmd/jujud/provisioning.go (right):

https://codereview.appspot.com/6250068/diff/6001/cmd/jujud/provisioning.go#newcode53
cmd/jujud/provisioning.go:53: a.State, err =
state.Open(&a.Conf.StateInfo)
This is a rather extensive function that might be broken down further in
more manageable bits for clarity.

https://codereview.appspot.com/6250068/diff/6001/cmd/jujud/provisioning.go#newcode75
cmd/jujud/provisioning.go:75: // step 2. listen for changes to the
environment or the machine topology and action both.
Step 1 needs to be able to be more resilient, and step 2 must be able to
fallback to step 1 in case of issues with the connection. It must also
be able to deal with Stop of the watcher and re-watching, but that
should be somewhat natural.

https://codereview.appspot.com/6250068/diff/6001/cmd/jujud/provisioning.go#newcode122
cmd/jujud/provisioning.go:122: return fmt.Errorf("machine-%010d already
reports a provider id %q, skipping", m.Id(), id)
We should never talk about internal ZooKeeper keys to the user. This
should be saying "machine %d" instead.

https://codereview.appspot.com/6250068/diff/6001/cmd/jujud/provisioning.go#newcode134
cmd/jujud/provisioning.go:134: // store the reference from the provider
in ZK
Let's please not reference ZooKeeper all the time. It is the environment
state that we have at hand, with a nice abstraction that we created
precisely so we don't think in terms of ZooKeeper all the time.
ZooKeeper is hopefully going away soon, but the state will remain as-is.

https://codereview.appspot.com/6250068/

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Thanks Dave. That is now looking like a great step forward that we can
polish for integration easily.

Some comments:

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go
File cmd/jujud/provisioning.go (right):

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go#newcode37
cmd/jujud/provisioning.go:37: st, err := state.Open(&a.Conf.StateInfo)
What happens if the connection is broken? We'll need to reestablish it
and restart from the beginning, since all the assumptions are now wrong.
Maybe this has to be moved into the outer loop or something. Will leave
details with you at this point.

That said, I suggest keeping this for the branch coming right after this
one. It feels like what we have here is a solid step forward, and we can
comfortably iterate over the state reestablishment in a new branch
before continuing with everything else.

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go#newcode56
cmd/jujud/provisioning.go:56: type environment struct {
Those two types feel unnecessary. It looks like your model would work
equally well if we had environWatcher and machinesWatcher both within
the Provisioner struct in place of the environment and machines fields,
plus stopMachinesWatcher and stopEnvironWatcher methods (the two
invalidate ones), with no further changes.

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go#newcode71
cmd/jujud/provisioning.go:71: func (e *environment) invalidate() {
This should return the error it finds. Call sites can selectively ignore
it, but methods such as Provisioner.Stop should return the problems it
finds, so we can have a handle on them if we want (testing should
definitely check, for instance).

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go#newcode73
cmd/jujud/provisioning.go:73: log.Printf("provisioner: environment
watcher exited: %v", e.watcher.Stop())
That Stop call is the most critical task done by this method. It
shouldn't be disguised at the end of a log call.

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go#newcode136
cmd/jujud/provisioning.go:136: func (p *Provisioner) innerLoop() {
Nice organization.

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go#newcode152
cmd/jujud/provisioning.go:152: log.Printf("provisioning: new
configuartion applied")
s/configuartion/environment configuration/ (note typo)

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go#newcode159
cmd/jujud/provisioning.go:159: }
I'm wondering if rather than "continue" in the problems above, we should
"break", and here have something that verifies if the state is still
sane? Otherwise this will be an infinite loop, won't it?

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go#newcode167
cmd/jujud/provisioning.go:167: p.environment.invalidate()
I suggest logic like this here:

p.tomb.Kill(nil)
err1 := p.stopEnvironWatcher()
err2 := p.stopMachinesWatcher()
err3 := p.tomb.Wait()
for err := []error{err3, err2, err1} {
     if err != nil {
         return err
     }
}
return nil

https://codereview.appspot.com/6250068/

204. By Dave Cheney on 2012-06-01

log stop errors

205. By Dave Cheney on 2012-06-03

merge from trunk

206. By Dave Cheney on 2012-06-03

responding to review comments

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :
Download full text (5.1 KiB)

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go
File cmd/jujud/provisioning.go (right):

https://codereview.appspot.com/6250068/diff/5004/cmd/jujud/provisioning.go#newcode37
cmd/jujud/provisioning.go:37: st, err := state.Open(&a.Conf.StateInfo)
On 2012/06/01 01:58:27, dfc wrote:
> > What happens if the connection is broken? We'll need to reestablish
it and
> > restart from the beginning, since all the assumptions are now wrong.
Maybe
> this
> > has to be moved into the outer loop or something. Will leave details
with you
> at
> > this point.

> I looked into this more this morning and it looks like if the
underlying
> connection to ZK is interrupted then the c client reconnects behind
the scenes.
> I tested this by running the PA, then shutting down zk, then starting
it again a
> minute later. None of the watchers exited.

> I'm going to keep playing with this, but at this point, if gozk never
reports an
> error when it cant talk to the zk server, I can't think of a way to
detect
> connection failure.

We've covered this on juju-dev@. A TODO would be good here as well.

https://codereview.appspot.com/6250068/diff/16001/cmd/jujud/provisioning.go
File cmd/jujud/provisioning.go (right):

https://codereview.appspot.com/6250068/diff/16001/cmd/jujud/provisioning.go#newcode56
cmd/jujud/provisioning.go:56: // environChanges returns a channel that
will receive the new *ConfigNode
As we discussed, that logic can be simplified as we can bubble up onto
the top from any hiccups, rather than retrying in place. I'm happy both
for this branch to be changed to cover that, or for it to be integrated
with these and then refactored. I'll comment in-place and let the
decision with you.

https://codereview.appspot.com/6250068/diff/16001/cmd/jujud/provisioning.go#newcode57
cmd/jujud/provisioning.go:57: // when a change is detected.
// when a change in the environment configuration is detected.

https://codereview.appspot.com/6250068/diff/16001/cmd/jujud/provisioning.go#newcode65
cmd/jujud/provisioning.go:65: // stopEnvironWatcher stops and
invalidates the current environWatcher.
s/and invalidates//

https://codereview.appspot.com/6250068/diff/16001/cmd/jujud/provisioning.go#newcode69
cmd/jujud/provisioning.go:69: log.Printf("provisioning: environWatcher
reported error on Stop: %v", err)
This is Printf rather than Debugf, so we need to be a bit nicer to the
user (no method and field names). Something like this might do:

"environment configuration watcher: %v"

https://codereview.appspot.com/6250068/diff/16001/cmd/jujud/provisioning.go#newcode76
cmd/jujud/provisioning.go:76: // changes returns a channel that will
receive the new *ConfigNode when a
Needs updating.

https://codereview.appspot.com/6250068/diff/16001/cmd/jujud/provisioning.go#newcode89
cmd/jujud/provisioning.go:89: log.Printf("provisioning: machinesWatcehr
reported error on Stop: %v", err)
"machines watcher: %v"

https://codereview.appspot.com/6250068/diff/16001/cmd/jujud/provisioning.go#newcode107
cmd/jujud/provisioning.go:107: // TODO(dfc) we need a method like
state.IsValid() here to exit cleanly if
IsConnected?

https://codereview.appspot.com/6250068/diff/16001/cmd/jujud/...

Read more...

207. By Dave Cheney on 2012-06-05

Responding to review comments, final tweaks

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

That last set of changes are awesome. LGTM.

https://codereview.appspot.com/6250068/

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'cmd/jujud/provisioning.go'
2--- cmd/jujud/provisioning.go 2012-05-08 19:29:43 +0000
3+++ cmd/jujud/provisioning.go 2012-06-05 08:03:19 +0000
4@@ -1,9 +1,16 @@
5 package main
6
7 import (
8- "fmt"
9 "launchpad.net/gnuflag"
10 "launchpad.net/juju/go/cmd"
11+ "launchpad.net/juju/go/environs"
12+ "launchpad.net/juju/go/log"
13+ "launchpad.net/juju/go/state"
14+ "launchpad.net/tomb"
15+
16+ // register providers
17+ _ "launchpad.net/juju/go/environs/dummy"
18+ _ "launchpad.net/juju/go/environs/ec2"
19 )
20
21 // ProvisioningAgent is a cmd.Command responsible for running a provisioning agent.
22@@ -27,5 +34,109 @@
23
24 // Run runs a provisioning agent.
25 func (a *ProvisioningAgent) Run(_ *cmd.Context) error {
26- return fmt.Errorf("MachineAgent.Run not implemented")
27-}
28+ // TODO(dfc) place the logic in a loop with a suitable delay
29+ st, err := state.Open(&a.Conf.StateInfo)
30+ if err != nil {
31+ return err
32+ }
33+ p := NewProvisioner(st)
34+ return p.Wait()
35+}
36+
37+type Provisioner struct {
38+ st *state.State
39+ environ environs.Environ
40+ tomb tomb.Tomb
41+
42+ environWatcher *state.ConfigWatcher
43+ machinesWatcher *state.MachinesWatcher
44+}
45+
46+// NewProvisioner returns a Provisioner.
47+func NewProvisioner(st *state.State) *Provisioner {
48+ p := &Provisioner{
49+ st: st,
50+ }
51+ go p.loop()
52+ return p
53+}
54+
55+func (p *Provisioner) loop() {
56+ defer p.tomb.Done()
57+
58+ p.environWatcher = p.st.WatchEnvironConfig()
59+ // TODO(dfc) we need a method like state.IsConnected() here to exit cleanly if
60+ // there is a connection problem.
61+ for {
62+ select {
63+ case <-p.tomb.Dying():
64+ return
65+ case config, ok := <-p.environWatcher.Changes():
66+ if !ok {
67+ err := p.environWatcher.Stop()
68+ if err != nil {
69+ p.tomb.Kill(err)
70+ }
71+ return
72+ }
73+ var err error
74+ p.environ, err = environs.NewEnviron(config.Map())
75+ if err != nil {
76+ log.Printf("provisioner loaded invalid environment configuration: %v", err)
77+ continue
78+ }
79+ log.Printf("provisioner loaded new environment configuration")
80+ p.innerLoop()
81+ }
82+ }
83+}
84+
85+func (p *Provisioner) innerLoop() {
86+ p.machinesWatcher = p.st.WatchMachines()
87+ // TODO(dfc) we need a method like state.IsConnected() here to exit cleanly if
88+ // there is a connection problem.
89+ for {
90+ select {
91+ case <-p.tomb.Dying():
92+ return
93+ case change, ok := <-p.environWatcher.Changes():
94+ if !ok {
95+ err := p.environWatcher.Stop()
96+ if err != nil {
97+ p.tomb.Kill(err)
98+ }
99+ return
100+ }
101+ config, err := environs.NewConfig(change.Map())
102+ if err != nil {
103+ log.Printf("provisioner loaded invalid environment configuration: %v", err)
104+ continue
105+ }
106+ p.environ.SetConfig(config)
107+ log.Printf("provisioner loaded new environment configuration")
108+ case machines, ok := <-p.machinesWatcher.Changes():
109+ if !ok {
110+ err := p.machinesWatcher.Stop()
111+ if err != nil {
112+ p.tomb.Kill(err)
113+ }
114+ return
115+ }
116+ p.processMachines(machines)
117+ }
118+ }
119+}
120+
121+// Wait waits for the Provisioner to exit.
122+func (p *Provisioner) Wait() error {
123+ return p.tomb.Wait()
124+}
125+
126+// Stop stops the Provisioner and returns any error encountered while
127+// provisioning.
128+func (p *Provisioner) Stop() error {
129+ p.tomb.Kill(nil)
130+ return p.tomb.Wait()
131+}
132+
133+func (p *Provisioner) processMachines(changes *state.MachinesChange) {}
134
135=== modified file 'cmd/jujud/provisioning_test.go'
136--- cmd/jujud/provisioning_test.go 2012-05-30 02:13:48 +0000
137+++ cmd/jujud/provisioning_test.go 2012-06-05 08:03:19 +0000
138@@ -2,13 +2,43 @@
139
140 import (
141 . "launchpad.net/gocheck"
142+ "launchpad.net/gozk/zookeeper"
143 "launchpad.net/juju/go/cmd"
144+ "launchpad.net/juju/go/environs/dummy"
145+ "launchpad.net/juju/go/state"
146+ "launchpad.net/juju/go/testing"
147 )
148
149-type ProvisioningSuite struct{}
150+type ProvisioningSuite struct {
151+ zkConn *zookeeper.Conn
152+ st *state.State
153+}
154
155 var _ = Suite(&ProvisioningSuite{})
156
157+func (s *ProvisioningSuite) SetUpTest(c *C) {
158+ zk, session, err := zookeeper.Dial(zkAddr, 15e9)
159+ c.Assert(err, IsNil)
160+ event := <-session
161+ c.Assert(event.Ok(), Equals, true)
162+ c.Assert(event.Type, Equals, zookeeper.EVENT_SESSION)
163+ c.Assert(event.State, Equals, zookeeper.STATE_CONNECTED)
164+
165+ s.zkConn = zk
166+ info := &state.Info{
167+ Addrs: []string{zkAddr},
168+ }
169+ testing.ZkRemoveTree(s.zkConn, "/")
170+ s.st, err = state.Initialize(info)
171+ c.Assert(err, IsNil)
172+
173+ dummy.Reset()
174+}
175+
176+func (s *ProvisioningSuite) TearDownTest(c *C) {
177+ s.zkConn.Close()
178+}
179+
180 func (s *ProvisioningSuite) TestParseSuccess(c *C) {
181 create := func() (cmd.Command, *AgentConf) {
182 a := &ProvisioningAgent{}
183@@ -22,3 +52,56 @@
184 err := ParseAgentCommand(a, []string{"nincompoops"})
185 c.Assert(err, ErrorMatches, `unrecognized args: \["nincompoops"\]`)
186 }
187+
188+func initProvisioningAgent() (*ProvisioningAgent, error) {
189+ args := []string{"--zookeeper-servers", zkAddr}
190+ c := &ProvisioningAgent{}
191+ return c, initCmd(c, args)
192+}
193+
194+func (s *ProvisioningSuite) TestProvisionerStartStop(c *C) {
195+ p := NewProvisioner(s.st)
196+ c.Assert(p.Stop(), IsNil)
197+}
198+
199+func (s *ProvisioningSuite) TestProvisionerEnvironmentChange(c *C) {
200+ p := NewProvisioner(s.st)
201+
202+ // seed /environment to point to dummy
203+ env, err := s.st.Environment()
204+ c.Assert(err, IsNil)
205+ env.Set("type", "dummy")
206+ env.Set("zookeeper", false)
207+ env.Set("name", "testing")
208+ _, err = env.Write()
209+ c.Assert(err, IsNil)
210+
211+ // twiddle with the environment
212+ env, err = s.st.Environment()
213+ c.Assert(err, IsNil)
214+ env.Set("name", "testing2")
215+ _, err = env.Write()
216+ c.Assert(err, IsNil)
217+ env.Set("name", "testing3")
218+ _, err = env.Write()
219+ c.Assert(p.Stop(), IsNil)
220+}
221+
222+func (s *ProvisioningSuite) TestProvisionerStopOnStateClose(c *C) {
223+ p := NewProvisioner(s.st)
224+
225+ // seed /environment to point to dummy
226+ env, err := s.st.Environment()
227+ c.Assert(err, IsNil)
228+ env.Set("type", "dummy")
229+ env.Set("zookeeper", false)
230+ env.Set("name", "testing")
231+ _, err = env.Write()
232+ c.Assert(err, IsNil)
233+
234+ s.st.Close()
235+
236+ c.Assert(p.Wait(), ErrorMatches, "watcher.*")
237+ c.Assert(p.Stop(), ErrorMatches, "watcher.*")
238+
239+}

Subscribers

People subscribed via source and target branches

to all changes: