Merge lp:~themue/juju-core/go-state-lifecycle-relation into lp:~juju/juju-core/trunk

Proposed by Frank Mueller
Status: Rejected
Rejected by: Gustavo Niemeyer
Proposed branch: lp:~themue/juju-core/go-state-lifecycle-relation
Merge into: lp:~juju/juju-core/trunk
Diff against target: 460 lines (+301/-14)
8 files modified
state/lifecycle.go (+121/-0)
state/lifecycle_test.go (+72/-0)
state/relation.go (+32/-0)
state/relation_internal_test.go (+1/-1)
state/relation_test.go (+56/-3)
state/service.go (+3/-5)
state/service_test.go (+4/-0)
state/state.go (+12/-5)
To merge this branch: bzr merge lp:~themue/juju-core/go-state-lifecycle-relation
Reviewer Review Type Date Requested Status
The Go Language Gophers Pending
Review via email: mp+117382@code.launchpad.net

Description of the change

state: lifecycle implementation started

The branch contains the type definition and some utility
functions for lifecycles in general. A first integration
into relation is done and the tests changed (very naive by
just setting the lifecycle state before removing). So the
tests of state are working while any outer relying code
may fail. This has to be done in a follow-up CL.

https://codereview.appspot.com/6455065/

To post a comment you must log in.
Revision history for this message
Roger Peppe (rogpeppe) wrote :

looks plausible. a few minor comments below.

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go
File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode18
state/lifecycle.go:18: func (l Life) checkNext(next Life) error {
this seems a little more involved than it has to be.
perhaps something like this might be better?

func (l Life) checkNext(next Life) error {
 switch {
 case l == next:
  return nil
 case l == Alive && next == Dying:
  return nil
 case l == Dying && next == Dead:
  return nil
 }
 return fmt.Errorf("illegal lifecycle state change from %q to %q", l,
next)
}

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode38
state/lifecycle.go:38: func createLifeNode(zk *zookeeper.Conn, basepath
string) (err error) {
does the life state need to be stored in a separate node? most entities
already have an associated config node, and the life state could be
stored there. maybe there's a good reason for the separation though.

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode40
state/lifecycle.go:40: defer errorContextf(&err, "cannot create
lifecycle state node %q", lifepath)
i wonder if the error context is necessary here, given that the
zookeeper error mentions the path and the create operation.

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode48
state/lifecycle.go:48: defer errorContextf(&err, "cannot write lifecycle
state node %q", lifepath)
ditto

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode56
state/lifecycle.go:56: defer errorContextf(&err, "cannot read lifecycle
state node %q", lifepath)
ditto

https://codereview.appspot.com/6455065/diff/1/state/lifecycle_test.go
File state/lifecycle_test.go (right):

https://codereview.appspot.com/6455065/diff/1/state/lifecycle_test.go#newcode21
state/lifecycle_test.go:21: c.Assert(Alive.checkNext(Dying), IsNil)
this seems like a good place for a table-driven test.

https://codereview.appspot.com/6455065/diff/1/state/relation.go
File state/relation.go (right):

https://codereview.appspot.com/6455065/diff/1/state/relation.go#newcode101
state/relation.go:101: func (r *Relation) readLife() (err error) {
why is this a different method from Life?

https://codereview.appspot.com/6455065/

Revision history for this message
Aram Hăvărneanu (aramh) wrote :

No changes to watchers in this CL?

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go
File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode18
state/lifecycle.go:18: func (l Life) checkNext(next Life) error {
s/checkNext/isNextValid/ perhaps?

https://codereview.appspot.com/6455065/diff/1/state/relation.go
File state/relation.go (right):

https://codereview.appspot.com/6455065/diff/1/state/relation.go#newcode101
state/relation.go:101: func (r *Relation) readLife() (err error) {
On 2012/07/31 07:48:58, rog wrote:
> why is this a different method from Life?

Because of potential caching in the future, perhaps.

https://codereview.appspot.com/6455065/

Revision history for this message
William Reade (fwereade) wrote :
Download full text (3.9 KiB)

Good start, couple of issues

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go
File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode18
state/lifecycle.go:18: func (l Life) checkNext(next Life) error {
validChange?

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode38
state/lifecycle.go:38: func createLifeNode(zk *zookeeper.Conn, basepath
string) (err error) {
On 2012/07/31 07:48:58, rog wrote:
> does the life state need to be stored in a separate node? most
entities already
> have an associated config node, and the life state could be stored
there. maybe
> there's a good reason for the separation though.

ISTM that we'll be watching a *lot* of these, and handling events for
every unrelated change will be somewhat tedious and unnecessary so, or
reflection, I'm +1 on separate nodes.

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode40
state/lifecycle.go:40: defer errorContextf(&err, "cannot create
lifecycle state node %q", lifepath)
On 2012/07/31 07:48:58, rog wrote:
> i wonder if the error context is necessary here, given that the
zookeeper error
> mentions the path and the create operation.

I reckon the extra information has some value, and is more readable
besides.

https://codereview.appspot.com/6455065/diff/1/state/lifecycle_test.go
File state/lifecycle_test.go (right):

https://codereview.appspot.com/6455065/diff/1/state/lifecycle_test.go#newcode21
state/lifecycle_test.go:21: c.Assert(Alive.checkNext(Dying), IsNil)
On 2012/07/31 07:48:58, rog wrote:
> this seems like a good place for a table-driven test.
+1

https://codereview.appspot.com/6455065/diff/1/state/lifecycle_test.go#newcode50
state/lifecycle_test.go:50: defer testing.ZkRemoveTree(s.ZkConn,
basepath)
If ZkConnSuite doesn't nuke ZK at the end of each test, surely it
should?

https://codereview.appspot.com/6455065/diff/1/state/relation.go
File state/relation.go (right):

https://codereview.appspot.com/6455065/diff/1/state/relation.go#newcode79
state/relation.go:79: life Life
I'm not sure this pulls its weight.

https://codereview.appspot.com/6455065/diff/1/state/relation.go#newcode119
state/relation.go:119: return writeLifeNode(r.st.zk, r.zkPath(), r.life)
We probably shouldn't alter r.life until we know the write succeeded.

https://codereview.appspot.com/6455065/diff/1/state/service.go
File state/service.go (right):

https://codereview.appspot.com/6455065/diff/1/state/service.go#newcode211
state/service.go:211: r := &Relation{s.st, key, Alive,
make([]RelationEndpoint, len(tr.Endpoints))}
Shouldn't you get the actual life value rather than just assume it's
alive? TBH this life field seems to be causing more problems than it
solves...

https://codereview.appspot.com/6455065/diff/1/state/service_test.go
File state/service_test.go (left):

https://codereview.appspot.com/6455065/diff/1/state/service_test.go#oldcode443
state/service_test.go:443: err = s.State.RemoveRelation(rel)
I haven't seen any new tests for RemoveRelation -- surely it should
panic if the relation is not Dead?

https://codereview.appspot.com/6455065/diff/1/state/state.go
File state/state.go (righ...

Read more...

Revision history for this message
Aram Hăvărneanu (aramh) wrote :

https://codereview.appspot.com/6455065/diff/1/state/relation.go
File state/relation.go (right):

https://codereview.appspot.com/6455065/diff/1/state/relation.go#newcode79
state/relation.go:79: life Life
On 2012/07/31 09:12:33, fwereade wrote:
> I'm not sure this pulls its weight.

Not sure what weight you are referring to, but this is the way it's done
in mstate after a few previous heavier iterations.

https://codereview.appspot.com/6455065/diff/1/state/service.go
File state/service.go (right):

https://codereview.appspot.com/6455065/diff/1/state/service.go#newcode211
state/service.go:211: r := &Relation{s.st, key, Alive,
make([]RelationEndpoint, len(tr.Endpoints))}
> Shouldn't you get the actual life value rather than just assume it's
> alive? TBH this life field seems to be causing more problems than it
> solves...

I think this is just part of the initial lifecycle bootstrap, after we
modify state client this would change in a subsequent CL.

https://codereview.appspot.com/6455065/diff/1/state/state.go
File state/state.go (right):

https://codereview.appspot.com/6455065/diff/1/state/state.go#newcode427
state/state.go:427: return fmt.Errorf("not found")
> I suspect this shouldn't be an error any more... we're just cleaning
up,
> we might not be the only concurrent caller, and if it's already gone
> then we're happy.

I thought there's one one agent responsible for calling Remove*, the
single one receiving the Dead event. I think this is an error to be
called more than once. The corrective agent might call this
independently,
but the CE should deal with it failing anyway.

https://codereview.appspot.com/6455065/

Revision history for this message
Aram Hăvărneanu (aramh) wrote :

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go
File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode9
state/lifecycle.go:9: type Life string
Hmm, this simplifies returning errors, but the integer approach in
mstate makes state change checking trivial because all you need to do is

   if (newLife < life) { ...

https://codereview.appspot.com/6455065/

Revision history for this message
William Reade (fwereade) wrote :

On 2012/07/31 09:36:47, aram wrote:
> https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go
> File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode9
> state/lifecycle.go:9: type Life string
> Hmm, this simplifies returning errors, but the integer approach in
mstate makes
> state change checking trivial because all you need to do is

> if (newLife < life) { ...

Does a string cost us so much as to not be worth the clarity?

https://codereview.appspot.com/6455065/

Revision history for this message
William Reade (fwereade) wrote :

On 2012/07/31 09:29:17, aram wrote:
> https://codereview.appspot.com/6455065/diff/1/state/relation.go
> File state/relation.go (right):

https://codereview.appspot.com/6455065/diff/1/state/relation.go#newcode79
> state/relation.go:79: life Life
> On 2012/07/31 09:12:33, fwereade wrote:
> > I'm not sure this pulls its weight.

> Not sure what weight you are referring to, but this is the way it's
done in
> mstate after a few previous heavier iterations.

> https://codereview.appspot.com/6455065/diff/1/state/service.go
> File state/service.go (right):

https://codereview.appspot.com/6455065/diff/1/state/service.go#newcode211
> state/service.go:211: r := &Relation{s.st, key, Alive,
make([]RelationEndpoint,
> len(tr.Endpoints))}
> > Shouldn't you get the actual life value rather than just assume it's
> > alive? TBH this life field seems to be causing more problems than it
> > solves...

> I think this is just part of the initial lifecycle bootstrap, after we
> modify state client this would change in a subsequent CL.

If we just dropped the field and called Life everywhere we needed it,
then we could have code that wasn't actively inconsistent. We could add
the caching as required in a subsequent CL.

> https://codereview.appspot.com/6455065/diff/1/state/state.go
> File state/state.go (right):

https://codereview.appspot.com/6455065/diff/1/state/state.go#newcode427
> state/state.go:427: return fmt.Errorf("not found")
> > I suspect this shouldn't be an error any more... we're just cleaning
up,
> > we might not be the only concurrent caller, and if it's already gone
> > then we're happy.

> I thought there's one one agent responsible for calling Remove*, the
> single one receiving the Dead event. I think this is an error to be
> called more than once. The corrective agent might call this
independently,
> but the CE should deal with it failing anyway.

My point is just: there are concurrent callers of this. The end goal is
that the relation no longer exist. At least one of the callers will see
the relation already gone from the topology; if we treat this as an
error, but the other client failed before it could delete the ZK nodes,
how will we ever be able to collect the relation nodes?

https://codereview.appspot.com/6455065/

Revision history for this message
Aram Hăvărneanu (aramh) wrote :

> Does a string cost us so much as to not be worth the clarity?

Actually it was about improving clarity. Here's rog's suggestion:

http://play.golang.org/p/ZLVcLY96Oy

https://codereview.appspot.com/6455065/

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :
Download full text (3.3 KiB)

First round:

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go
File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode18
state/lifecycle.go:18: func (l Life) checkNext(next Life) error {
checkNext seems like a bad approach altogether. We need to know more
than whether a change is allowed to make a decision of what to do (e.g.
Alive => Dying when Dead is fine, but does nothing). I suggest inlining
logic into SetLife.

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode38
state/lifecycle.go:38: func createLifeNode(zk *zookeeper.Conn, basepath
string) (err error) {
On 2012/07/31 09:12:33, fwereade wrote:
> On 2012/07/31 07:48:58, rog wrote:
> > does the life state need to be stored in a separate node? most
entities
> already
> > have an associated config node, and the life state could be stored
there.
> maybe
> > there's a good reason for the separation though.

> ISTM that we'll be watching a *lot* of these, and handling events for
every
> unrelated change will be somewhat tedious and unnecessary so, or
reflection, I'm
> +1 on separate nodes.

I agree with rog here. We have to watch a lot of things in general.
Maybe I just haven't seen the issue clearly yet, but it sounds to me
like separating the nodes will actually create *more* things for
watching, rather than less.

https://codereview.appspot.com/6455065/diff/1/state/lifecycle.go#newcode56
state/lifecycle.go:56: defer errorContextf(&err, "cannot read lifecycle
state node %q", lifepath)
This is all changing given we don't need independent nodes, but FWIW
let's please not sprawl errorContextf more than strictly necessary.
fmt.Errorf works well.

https://codereview.appspot.com/6455065/diff/1/state/relation.go
File state/relation.go (right):

https://codereview.appspot.com/6455065/diff/1/state/relation.go#newcode79
state/relation.go:79: life Life
On 2012/07/31 09:29:17, aram wrote:
> On 2012/07/31 09:12:33, fwereade wrote:
> > I'm not sure this pulls its weight.

> Not sure what weight you are referring to, but this is the way it's
done in
> mstate after a few previous heavier iterations.

I believe mstate is different in that the value is mapped straight from
the database. The issue is related to the caching topic we debated at
the end of last week.

https://codereview.appspot.com/6455065/diff/1/state/relation.go#newcode79
state/relation.go:79: life Life
Frank, have you seen the work by rog on agent tooling? It sounds like we
could have a single implementation of resourceLife that could be used
everywhere.

https://codereview.appspot.com/6455065/diff/1/state/relation.go#newcode114
state/relation.go:114: if err := r.life.checkNext(life); err != nil {
This doesn't work, as it's testing a memory value. The database could
have shifted several times while we're sitting on our couch evaluating
this. It must be put in a retry change block.

Also, checkNext is a red-herring. You must do a local test here that
takes into account what you want the end result to be. Alive is *never*
acceptable. Dying is *always* fine, but only does anything if it is
currenlty Alive, as someone else could have shifted from A...

Read more...

Revision history for this message
Aram Hăvărneanu (aramh) wrote :

https://codereview.appspot.com/6455065/diff/11001/state/lifecycle.go
File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/11001/state/lifecycle.go#newcode96
state/lifecycle.go:96: value, found := config.Get(lifeKey)
Isn't there a race between config.Get() and config.{Set,Write}()?

https://codereview.appspot.com/6455065/diff/11001/state/lifecycle_test.go
File state/lifecycle_test.go (right):

https://codereview.appspot.com/6455065/diff/11001/state/lifecycle_test.go#newcode90
state/lifecycle_test.go:90: c.Assert(func() { lc.SetLife(test.panicLife)
}, PanicMatches, panicMessage)
Did not know about PanicMatches, thanks!

https://codereview.appspot.com/6455065/diff/11001/state/state.go
File state/state.go (right):

https://codereview.appspot.com/6455065/diff/11001/state/state.go#newcode419
state/state.go:419: return fmt.Errorf("cannot retrieve lifecacle state
of relation %q: %s", r, err)
s/lifecacle/lifecycle/

https://codereview.appspot.com/6455065/

Revision history for this message
Aram Hăvărneanu (aramh) wrote :

https://codereview.appspot.com/6455065/diff/11001/state/lifecycle.go
File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/11001/state/lifecycle.go#newcode26
state/lifecycle.go:26: var transitions = [nLife][nLife]bool{
I love shorter names in general, but perhaps validTransitions is a
better name here?

https://codereview.appspot.com/6455065/

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

https://codereview.appspot.com/6455065/diff/11001/state/lifecycle.go
File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/11001/state/lifecycle.go#newcode91
state/lifecycle.go:91: func (lc *lifecycle) SetLife(next Life) error {
The issues described last week are still not handled:

https://codereview.appspot.com/6455065/diff/1/state/relation.go#newcode114
state/relation.go:114: if err := r.life.checkNext(life); err != nil {
This doesn't work, as it's testing a memory value. The database could
have
shifted several times while we're sitting on our couch evaluating this.
It must
be put in a retry change block.

Also, checkNext is a red-herring. You must do a local test here that
takes into
account what you want the end result to be. Alive is *never* acceptable.
Dying
is *always* fine, but only does anything if it is currenlty Alive, as
someone
else could have shifted from Alive to Dying and to Dead concurrently,
and the
Dying effect is guaranteed even then. Setting to Dead is fine unless
it's Alive,
but only does anything if not already Dead.

https://codereview.appspot.com/6455065/diff/11001/state/relation.go
File state/relation.go (right):

https://codereview.appspot.com/6455065/diff/11001/state/relation.go#newcode111
state/relation.go:111: func (r *Relation) Life() Life {
We shouldn't have to declare those methods for every type. Embedding a
reasonable type should be enough. You can use Roger's work as a
reference.

https://codereview.appspot.com/6455065/diff/11001/state/relation.go#newcode125
state/relation.go:125: // Refresh reloads the cached data of the
relation.
// Refresh reloads the state of the lifecycle.

This is somewhat unfortunate since it's caching a specific setting and
ignoring the rest, but I guess it's a good compromise to maintain
compatibility with mstate for the moment.

https://codereview.appspot.com/6455065/

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

https://codereview.appspot.com/6455065/diff/17001/state/lifecycle.go
File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/17001/state/lifecycle.go#newcode45
state/lifecycle.go:45: // init initializes the lifecycle with the state
Alive .
Why do we need this? We should be creating the actual nodes with life
set to Alive in the first place, rather than changing them to Alive
after the fact via this method.

https://codereview.appspot.com/6455065/diff/17001/state/lifecycle.go#newcode88
state/lifecycle.go:88: // Attention! The config map is defined by the
ConfigNode.
As we debated ad infinitum, there are other configuration nodes that we
manipulate without ConfigNode, so this isn't really a relevant note.

What might be more useful is:

// Note that there are unknown keys in the node.

https://codereview.appspot.com/6455065/diff/17001/state/lifecycle.go#newcode98
state/lifecycle.go:98: if !validTransitions[current][next] {
Please add a test verifying that SetLife(Dying) when the current state
is Dead works and does nothing. This was mentioned in the very first
review I made and still seems unhandled.

https://codereview.appspot.com/6455065/diff/17001/state/service.go
File state/service.go (right):

https://codereview.appspot.com/6455065/diff/17001/state/service.go#newcode212
state/service.go:212: if err := r.Refresh(); err != nil {
This workflow doesn't make sense:

1) r := newRelation
2) r.Refresh
3) r.endpoint = ...

If we're going to be adding Refresh now, the sites that call it should
be sane, otherwise we're corrupting the whole logic in a way that we
won't be able to get out of in the near future. Please ping me online
for us to sort this out.

https://codereview.appspot.com/6455065/diff/17001/state/state.go
File state/state.go (right):

https://codereview.appspot.com/6455065/diff/17001/state/state.go#newcode348
state/state.go:348: err = newRelation(s, relationKey).lifecycle.init()
Why writing a second time to the node just for setting life=alive?

https://codereview.appspot.com/6455065/diff/17001/state/state.go#newcode401
state/state.go:401: if err = r.Refresh(); err != nil {
Same case as before.

https://codereview.appspot.com/6455065/

359. By Frank Mueller

state: merged trunk to stay up-to-date

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

https://codereview.appspot.com/6455065/diff/17001/state/lifecycle.go
File state/lifecycle.go (right):

https://codereview.appspot.com/6455065/diff/17001/state/lifecycle.go#newcode45
state/lifecycle.go:45: // init initializes the lifecycle with the state
Alive .
On 2012/08/14 12:36:06, TheMue wrote:
> On 2012/08/13 15:48:49, niemeyer wrote:
> > Why do we need this? We should be creating the actual nodes with
life set to
> > Alive in the first place, rather than changing them to Alive after
the fact
> via
> > this method.

> Maybe this comment is misleading. init() is to write the initial Alive
into the
> config node. init() is called immediately after the creation of the
instance, so
> the config node should be empty so far. But I preferred to take care
that it's
> really so.

The comment isn't misleading. The logic itself is wrong. You can't
create a node that has no lifecycle state and then initialize it after
the fact. Meanwhile, someone else will pick it up, and will see wrong
state.

This, specifically, is a relevant race condition:

98 r := newRelation(st, key)
99 if err := r.lifecycle.init(); err != nil {

We've been going back and forth on this, and I'm feeling unproductive
reviewing this over and over about the same problems.

I suggest we stop work on state, put that aside, and focus entirely on
mstate. Let's implement watches and presence, and move forward.

https://codereview.appspot.com/6455065/

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

On 2012/08/28 17:24:11, TheMue wrote:
> On 2012/08/28 17:15:06, niemeyer wrote:
> > https://codereview.appspot.com/6455065/diff/17001/state/lifecycle.go
> > File state/lifecycle.go (right):
> >
> >
https://codereview.appspot.com/6455065/diff/17001/state/lifecycle.go#newcode45
> > state/lifecycle.go:45: // init initializes the lifecycle with the
state Alive
> .
> > On 2012/08/14 12:36:06, TheMue wrote:
> > > On 2012/08/13 15:48:49, niemeyer wrote:
> > > > Why do we need this? We should be creating the actual nodes with
life set
> to
> > > > Alive in the first place, rather than changing them to Alive
after the
> fact
> > > via
> > > > this method.
> > >
> > > Maybe this comment is misleading. init() is to write the initial
Alive into
> > the
> > > config node. init() is called immediately after the creation of
the
> instance,
> > so
> > > the config node should be empty so far. But I preferred to take
care that
> it's
> > > really so.
> >
> > The comment isn't misleading. The logic itself is wrong. You can't
create a
> node
> > that has no lifecycle state and then initialize it after the fact.
Meanwhile,
> > someone else will pick it up, and will see wrong state.
> >
> > This, specifically, is a relevant race condition:
> >
> > 98 r := newRelation(st, key)
> > 99 if err := r.lifecycle.init(); err != nil {
> >
> > We've been going back and forth on this, and I'm feeling
unproductive
> reviewing
> > this over and over about the same problems.
> >
> > I suggest we stop work on state, put that aside, and focus entirely
on mstate.
> > Let's implement watches and presence, and move forward.

> Take a look at newRelation(), it doesn't create a node, only the
struct with
> lifecycle initialized:

What is this logic doing Frank:

   46 func (lc *lifecycle) init() error {
   47 config, err := readConfigNode(lc.zk, lc.path)
   48 if err != nil {
   49 return err
   50 }
   51 config.Set(lifeKey, Alive)
   52 changes, err := config.Write()
   53 if err != nil {
   54 return err
   55 }

Either:

1) The node was created before without a lifecycle

or

2) An empty node is being read, and is being written down
    just with lifecycle information and nothing else

Both are wrong.

This is feeling very unproductive to me, Frank. This is in review for
several weeks, and the issues that are brought up are repeatedly
receiving replies that indicate lack of care instead of being fixed.

I'm marking the branch as rejected.

https://codereview.appspot.com/6455065/

Unmerged revisions

359. By Frank Mueller

state: merged trunk to stay up-to-date

358. By Frank Mueller

state: changed lifecycle from SetLife() to Kill() and Die()

357. By Frank Mueller

state: changed lifecycle after review, now uses RetryChange and is embedable

356. By Frank Mueller

state: merged trunk and worked off the review comments

355. By Frank Mueller

state: first lifecycle utils and implementation for relation

354. By Roger Peppe

state: implement MachineInfoWatcher

R=niemeyer
CC=
https://codereview.appspot.com/6447054

353. By William Reade

add RelationState type...

...and use it to reconcile HookQueue state with initial watcher event on
load.

As part of this, I added a RelationId to HookQueue, and return it in every
HookInfo, so that we can identify what relation the hook is actually meant
to run against.

R=niemeyer
CC=
https://codereview.appspot.com/6441047

352. By Roger Peppe

environs/ec2: start agents via agent-specific symbolic links

R=niemeyer
CC=
https://codereview.appspot.com/6454057

351. By William Reade

HookQueue now looks more like a watcher

R=niemeyer
CC=
https://codereview.appspot.com/6453043

350. By Roger Peppe

worker/machiner: fix test to add new Container method

R=
CC=
https://codereview.appspot.com/6440053

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'state/lifecycle.go'
2--- state/lifecycle.go 1970-01-01 00:00:00 +0000
3+++ state/lifecycle.go 2012-08-21 08:50:23 +0000
4@@ -0,0 +1,121 @@
5+package state
6+
7+import (
8+ "fmt"
9+ "launchpad.net/goyaml"
10+ "launchpad.net/gozk/zookeeper"
11+)
12+
13+// Life represents the lifecycle state of a state entity.
14+type Life string
15+
16+const (
17+ Alive Life = "alive"
18+ Dying Life = "dying"
19+ Dead Life = "dead"
20+)
21+
22+const lifeKey = "life"
23+
24+// lifecycle manages the lifecycle of state entities
25+// in the config node.
26+type lifecycle struct {
27+ zk *zookeeper.Conn
28+ path string
29+}
30+
31+// init initializes the lifecycle in the config node with the state Alive .
32+func (lc *lifecycle) init() error {
33+ config, err := readConfigNode(lc.zk, lc.path)
34+ if err != nil {
35+ return err
36+ }
37+ config.Set(lifeKey, Alive)
38+ changes, err := config.Write()
39+ if err != nil {
40+ return err
41+ }
42+ // Check if only 'life = Alive' has been added.
43+ if len(changes) != 1 ||
44+ changes[0].Type != ItemAdded ||
45+ (changes[0].OldValue != nil && changes[0].NewValue != Alive) {
46+ return fmt.Errorf("initial lifecycle state cannot be set")
47+ }
48+ return nil
49+}
50+
51+// Life returns the cached lifecycle state.
52+func (lc *lifecycle) Life() (Life, error) {
53+ config, err := readConfigNode(lc.zk, lc.path)
54+ if err != nil {
55+ return "", err
56+ }
57+ value, found := config.Get(lifeKey)
58+ if !found {
59+ panic("lifecycle state is not set")
60+ }
61+ return Life(value.(string)), nil
62+}
63+
64+// Kill changes the lifecycle to 'Dying' if it's 'Alive'
65+// or already 'Dying'. No change will happen if the lifecycle
66+// is 'Dead'.
67+func (lc *lifecycle) Kill() error {
68+ kill := func(oldYaml string, stat *zookeeper.Stat) (string, error) {
69+ // Note that there are unknown keys in the node.
70+ var config map[string]interface{}
71+ if err := goyaml.Unmarshal([]byte(oldYaml), &config); err != nil {
72+ return "", err
73+ }
74+ tmpLife, found := config[lifeKey]
75+ if !found {
76+ panic("lifecycle state is not set")
77+ }
78+ current := Life(tmpLife.(string))
79+ if current == Alive {
80+ config[lifeKey] = Dying
81+ newYaml, err := goyaml.Marshal(config)
82+ if err != nil {
83+ return "", err
84+ }
85+ return string(newYaml), nil
86+ }
87+ return oldYaml, nil
88+ }
89+ if err := lc.zk.RetryChange(lc.path, 0, zkPermAll, kill); err != nil {
90+ return err
91+ }
92+ return nil
93+}
94+
95+// Die changes the lifecycle to 'Dead' if it's 'Dying'.
96+func (lc *lifecycle) Die() error {
97+ die := func(oldYaml string, stat *zookeeper.Stat) (string, error) {
98+ // Note that there are unknown keys in the node.
99+ var config map[string]interface{}
100+ if err := goyaml.Unmarshal([]byte(oldYaml), &config); err != nil {
101+ return "", err
102+ }
103+ tmpLife, found := config[lifeKey]
104+ if !found {
105+ panic("lifecycle state is not set")
106+ }
107+ current := Life(tmpLife.(string))
108+ switch current {
109+ case Alive:
110+ panic("lifecycle state is alive, must not die")
111+ case Dying:
112+ config[lifeKey] = Dead
113+ newYaml, err := goyaml.Marshal(config)
114+ if err != nil {
115+ return "", err
116+ }
117+ return string(newYaml), nil
118+ }
119+ return oldYaml, nil
120+ }
121+ if err := lc.zk.RetryChange(lc.path, 0, zkPermAll, die); err != nil {
122+ return err
123+ }
124+ return nil
125+}
126
127=== added file 'state/lifecycle_test.go'
128--- state/lifecycle_test.go 1970-01-01 00:00:00 +0000
129+++ state/lifecycle_test.go 2012-08-21 08:50:23 +0000
130@@ -0,0 +1,72 @@
131+package state
132+
133+import (
134+ "fmt"
135+ . "launchpad.net/gocheck"
136+ "launchpad.net/gozk/zookeeper"
137+ "launchpad.net/juju-core/testing"
138+)
139+
140+type LifecycleSuite struct {
141+ testing.ZkConnSuite
142+ t *topology
143+}
144+
145+var _ = Suite(&LifecycleSuite{})
146+
147+func (s *LifecycleSuite) TearDownSuite(c *C) {
148+ s.ZkConn.Close()
149+}
150+
151+func (s *LifecycleSuite) SetUpTest(c *C) {
152+ zkPermAll := zookeeper.WorldACL(zookeeper.PERM_ALL)
153+ _, err := s.ZkConn.Create("/r", "", 0, zkPermAll)
154+ c.Assert(err, IsNil)
155+}
156+
157+func (s *LifecycleSuite) TearDownTest(c *C) {
158+ testing.ZkRemoveTree(s.ZkConn, "/r")
159+}
160+
161+var lifecyleTests = []struct {
162+ prepare func(lc lifecycle) error
163+ change func(lc lifecycle) error
164+ panicMatch string
165+ life Life
166+}{
167+ {nil, nil, "", Alive},
168+ {nil, func(lc lifecycle) error { return lc.Kill() }, "", Dying},
169+ {nil, func(lc lifecycle) error { return lc.Die() }, "lifecycle state is alive, must not die", ""},
170+ {func(lc lifecycle) error { return lc.Kill() }, func(lc lifecycle) error { return lc.Kill() }, "", Dying},
171+ {func(lc lifecycle) error { return lc.Kill() }, func(lc lifecycle) error { return lc.Die() }, "", Dead},
172+ {func(lc lifecycle) error { lc.Kill(); return lc.Die() }, func(lc lifecycle) error { return lc.Kill() }, "", Dead},
173+ {func(lc lifecycle) error { lc.Kill(); return lc.Die() }, func(lc lifecycle) error { return lc.Die() }, "", Dead},
174+}
175+
176+func (s *LifecycleSuite) TestLifecycle(c *C) {
177+ for i, test := range lifecyleTests {
178+ c.Logf("test %d", i)
179+ lc := lifecycle{
180+ zk: s.ZkConn,
181+ path: fmt.Sprintf("/r/%d", i),
182+ }
183+ err := lc.init()
184+ c.Assert(err, IsNil)
185+ life, err := lc.Life()
186+ c.Assert(err, IsNil)
187+ c.Assert(life, Equals, Alive)
188+ if test.prepare != nil {
189+ c.Assert(test.prepare(lc), IsNil)
190+ }
191+ if test.change != nil {
192+ if test.panicMatch != "" {
193+ c.Assert(func() { test.change(lc) }, PanicMatches, test.panicMatch)
194+ } else {
195+ c.Assert(test.change(lc), IsNil)
196+ life, err = lc.Life()
197+ c.Assert(err, IsNil)
198+ c.Assert(life, Equals, test.life)
199+ }
200+ }
201+ }
202+}
203
204=== modified file 'state/relation.go'
205--- state/relation.go 2012-08-17 12:10:32 +0000
206+++ state/relation.go 2012-08-21 08:50:23 +0000
207@@ -78,6 +78,28 @@
208 st *State
209 key string
210 endpoints []RelationEndpoint
211+ lifecycle
212+}
213+
214+func newRelation(st *State, key string) *Relation {
215+ r := &Relation{
216+ st: st,
217+ key: key,
218+ endpoints: []RelationEndpoint{},
219+ }
220+ r.lifecycle = lifecycle{
221+ zk: st.zk,
222+ path: r.zkConfigPath(),
223+ }
224+ return r
225+}
226+
227+func createRelation(st *State, key string) (*Relation, error) {
228+ r := newRelation(st, key)
229+ if err := r.lifecycle.init(); err != nil {
230+ return nil, err
231+ }
232+ return r, nil
233 }
234
235 func (r *Relation) String() string {
236@@ -155,6 +177,16 @@
237 }, nil
238 }
239
240+// zkPath returns the ZooKeeper base path for the relation.
241+func (r *Relation) zkPath() string {
242+ return fmt.Sprintf("/relations/%s", r.key)
243+}
244+
245+// zkConfigPath returns the ZooKeeper path for the relation configuration.
246+func (r *Relation) zkConfigPath() string {
247+ return r.zkPath() + "/config"
248+}
249+
250 // RelationUnit holds information about a single unit in a relation, and
251 // allows clients to conveniently access unit-specific functionality.
252 type RelationUnit struct {
253
254=== modified file 'state/relation_internal_test.go'
255--- state/relation_internal_test.go 2012-07-26 18:40:19 +0000
256+++ state/relation_internal_test.go 2012-08-21 08:50:23 +0000
257@@ -18,7 +18,7 @@
258 RelationEndpoint{"jeff", "ifce", "group", RolePeer, charm.ScopeGlobal},
259 RelationEndpoint{"mike", "ifce", "group", RolePeer, charm.ScopeGlobal},
260 RelationEndpoint{"bill", "ifce", "group", RolePeer, charm.ScopeGlobal},
261- }}
262+ }, lifecycle{nil, ""}}
263 eps, err := r.RelatedEndpoints("mike")
264 c.Assert(err, IsNil)
265 c.Assert(eps, DeepEquals, []RelationEndpoint{
266
267=== modified file 'state/relation_test.go'
268--- state/relation_test.go 2012-08-17 09:13:31 +0000
269+++ state/relation_test.go 2012-08-21 08:50:23 +0000
270@@ -86,12 +86,16 @@
271 assertOneRelation(c, req, 0, reqep, proep)
272
273 // Remove the relation, and check it can't be removed again.
274+ err = rel.Kill()
275+ c.Assert(err, IsNil)
276+ err = rel.Die()
277+ c.Assert(err, IsNil)
278 err = s.State.RemoveRelation(rel)
279 c.Assert(err, IsNil)
280 assertNoRelations(c, pro)
281 assertNoRelations(c, req)
282 err = s.State.RemoveRelation(rel)
283- c.Assert(err, ErrorMatches, `cannot remove relation "pro:foo req:bar": not found`)
284+ c.Assert(err, ErrorMatches, `cannot remove relation "pro:foo req:bar": relation with key .* not found`)
285
286 // Check that we can add it again if we want to; but this time,
287 // give one of the endpoints container scope and check that both
288@@ -120,11 +124,15 @@
289 assertOneRelation(c, peer, 0, peerep)
290
291 // Remove the relation, and check it can't be removed again.
292+ err = rel.Kill()
293+ c.Assert(err, IsNil)
294+ err = rel.Die()
295+ c.Assert(err, IsNil)
296 err = s.State.RemoveRelation(rel)
297 c.Assert(err, IsNil)
298 assertNoRelations(c, peer)
299 err = s.State.RemoveRelation(rel)
300- c.Assert(err, ErrorMatches, `cannot remove relation "peer:baz": not found`)
301+ c.Assert(err, ErrorMatches, `cannot remove relation "peer:baz": relation with key .* not found`)
302 }
303
304 func (s *RelationSuite) TestRemoveServiceRemovesRelations(c *C) {
305@@ -133,6 +141,10 @@
306 peerep := state.RelationEndpoint{"peer", "ifce", "baz", state.RolePeer, charm.ScopeGlobal}
307 rel, err := s.State.AddRelation(peerep)
308 c.Assert(err, IsNil)
309+ err = rel.Kill()
310+ c.Assert(err, IsNil)
311+ err = rel.Die()
312+ c.Assert(err, IsNil)
313 err = s.State.RemoveService(peer)
314 c.Assert(err, IsNil)
315 _, err = s.State.Service("peer")
316@@ -140,7 +152,44 @@
317 _, err = s.State.Relation(peerep)
318 c.Assert(err, ErrorMatches, `cannot get relation "peer:baz": relation does not exist`)
319 err = s.State.RemoveRelation(rel)
320- c.Assert(err, ErrorMatches, `cannot remove relation "peer:baz": not found`)
321+ c.Assert(err, ErrorMatches, `cannot remove relation "peer:baz": relation with key .* not found`)
322+}
323+
324+func (s *RelationSuite) TestLifecycle(c *C) {
325+ peer, err := s.State.AddService("peer", s.charm)
326+ c.Assert(err, IsNil)
327+ peerep := state.RelationEndpoint{"peer", "ifce", "baz", state.RolePeer, charm.ScopeGlobal}
328+ assertNoRelations(c, peer)
329+
330+ rel, err := s.State.AddRelation(peerep)
331+ c.Assert(err, IsNil)
332+ life, err := rel.Life()
333+ c.Assert(err, IsNil)
334+ c.Assert(life, Equals, state.Alive)
335+
336+ // Check legal next state.
337+ err = rel.Kill()
338+ c.Assert(err, IsNil)
339+ life, err = rel.Life()
340+ c.Assert(err, IsNil)
341+ c.Assert(life, Equals, state.Dying)
342+
343+ // Check legal repeated state setting.
344+ err = rel.Kill()
345+ c.Assert(err, IsNil)
346+ life, err = rel.Life()
347+ c.Assert(err, IsNil)
348+ c.Assert(life, Equals, state.Dying)
349+
350+ // Check non-dead removal.
351+ c.Assert(func() { s.State.RemoveRelation(rel) }, PanicMatches, `relation .* is not dead`)
352+
353+ // Check final state.
354+ err = rel.Die()
355+ c.Assert(err, IsNil)
356+ life, err = rel.Life()
357+ c.Assert(err, IsNil)
358+ c.Assert(life, Equals, state.Dead)
359 }
360
361 func assertNoRelations(c *C, srv *state.Service) {
362@@ -247,6 +296,10 @@
363 assertSettings(ru1, map[string]interface{}{})
364
365 // Trash the relation and check we can't get anything any more.
366+ err = rel.Kill()
367+ c.Assert(err, IsNil)
368+ err = rel.Die()
369+ c.Assert(err, IsNil)
370 err = s.State.RemoveRelation(rel)
371 c.Assert(err, IsNil)
372 _, err = ru0.ReadSettings("peer/1")
373
374=== modified file 'state/service.go'
375--- state/service.go 2012-08-17 12:10:32 +0000
376+++ state/service.go 2012-08-21 08:50:23 +0000
377@@ -209,8 +209,7 @@
378 }
379 relations := []*Relation{}
380 for key, tr := range trs {
381- r := &Relation{s.st, key, make([]RelationEndpoint, len(tr.Endpoints))}
382- i := 0
383+ r := newRelation(s.st, key)
384 for _, tep := range tr.Endpoints {
385 sname := s.name
386 if tep.Service != s.key {
387@@ -218,10 +217,9 @@
388 return nil, err
389 }
390 }
391- r.endpoints[i] = RelationEndpoint{
392+ r.endpoints = append(r.endpoints, RelationEndpoint{
393 sname, tr.Interface, tep.RelationName, tep.RelationRole, tr.Scope,
394- }
395- i++
396+ })
397 }
398 relations = append(relations, r)
399 }
400
401=== modified file 'state/service_test.go'
402--- state/service_test.go 2012-07-25 12:25:39 +0000
403+++ state/service_test.go 2012-08-21 08:50:23 +0000
404@@ -440,6 +440,10 @@
405 assertNoChange()
406
407 // Remove one of the relations; check change.
408+ err = rel.Kill()
409+ c.Assert(err, IsNil)
410+ err = rel.Die()
411+ c.Assert(err, IsNil)
412 err = s.State.RemoveRelation(rel)
413 c.Assert(err, IsNil)
414 assertChange(nil, []int{0})
415
416=== modified file 'state/state.go'
417--- state/state.go 2012-08-17 12:10:32 +0000
418+++ state/state.go 2012-08-21 08:50:23 +0000
419@@ -346,6 +346,7 @@
420 return
421 }
422 relationKey = strings.Split(path, "/")[2]
423+ _, err = createRelation(s, relationKey)
424 return
425 }
426
427@@ -397,7 +398,7 @@
428 if err != nil {
429 return nil, err
430 }
431- r = &Relation{s, key, nil}
432+ r = newRelation(s, key)
433 for _, tep := range tr.Endpoints {
434 sname, err := t.ServiceName(tep.Service)
435 if err != nil {
436@@ -412,14 +413,20 @@
437
438 // RemoveRelation removes the supplied relation.
439 func (s *State) RemoveRelation(r *Relation) error {
440- err := retryTopologyChange(s.zk, func(t *topology) error {
441- if !t.HasRelation(r.key) {
442- return fmt.Errorf("not found")
443- }
444+ life, err := r.Life()
445+ if err != nil {
446+ return fmt.Errorf("cannot remove relation %q: %s", r, err)
447+ }
448+ if life != Dead {
449+ panic(fmt.Sprintf("relation %q is not dead", r))
450+ }
451+ err = retryTopologyChange(s.zk, func(t *topology) error {
452 return t.RemoveRelation(r.key)
453 })
454 if err != nil {
455 return fmt.Errorf("cannot remove relation %q: %s", r, err)
456 }
457+ // TODO(mue) Check removal of ZK tree for relation.
458+ // It effects the reusage of keys.
459 return nil
460 }

Subscribers

People subscribed via source and target branches