pyjuju

Code review comment for lp:~hazmat/pyjuju/relation-state

relation-state
Merge into trunk

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2010-10-26:

On Tue, 26 Oct 2010 11:32:44 -0400, Gustavo Niemeyer
<email address hidden> wrote:

> The overall feeling of the work so far is very good. Feels generally
> clean, and understandable.

Cool, thanks for the review, replies inline.

>
> Some specific points:
>
>
> [1]
>
> + def add_relation(self, relation_id):
> + """Add a relation with given id between the given services.
> (...)
> + def has_relation(self, relation_id):
> + """Return True if service_id was previously added to this
> topology.
>
> These docstrings aren't reflecting the implementation.

Yeah.. they where prior to some discussions we had, when we discussed
separating out add relation from associating a service to it. I'll update
them.

>
> [2]
>
> + def has_relation_service(self, relation_id, service_id):
>
> relation_has_service might read better here.

agreed.

>
> [3]
>
> + def unassign_service_to_relation(self, relation_id, service_id):
> + """Disassociate service to relation.
>
> Similarly, shouldn't that be "_from_relation" and "from relation"?
>

sounds good as well.

> [4]
>
> All of the date for the relation is missing from the topology (relation
> name, relation type, relation role). Without at least some of these in
> the topology, we can't do the mapping of a user provided name to the
> relation id without iterating through the nodes.
>

The formula author aka 'user' name for a relation is specific to a
service, its not global to the relation. We can lookup this meta
information when we retrieve the services for a relation, we don't have to
iterate all relation nodes, just the ones associated to a service.
Additionally, we don't want to bloat the limited space we have in the
topology node for non coordination data, all of this additional metadata
is effectively static, formula derived metadata.

> [5]
>
> + Type is used to distinguish the type of the relation, ie.
> + 'database.postgres', or 'http.cache.varnish'.
>
> We haven't discussed/specified the namespacing of the types. We have to
> talk about "http.cache.varnish" vs. "varnish".
>

agreed, i can remove the docstring namespaced type reference, as its not
germane to the implementation at this point.

> [6]
>
>
> + node_data = {"type": relation_type}
> + path = yield self._client.create("/relations/relation-",
> + yaml.safe_dump(node_data),
>
> I'm wondering a bit about where to store the information for the
> relation. It feels a bit clumsy to have the relation type inside the
> relation node itself, as a first gut feeling. Imagine what it'd take to
> implement the matching of relation types to resolve dependencies. We'd
> basically have to look at the topology and then iterate over every node
> we have to discover which of them is responsible for which relation
> type/name/role.
>

We haven't really discussed dependency resolution using existing services.
I think we can construct a secondary index (potentially multi-node) of the
relation data to do resolutions, but unless we want the index to be
primary storage, i think there is still value to having this 'identity'
information stored on the node.

relation name / relation role are service specific and stored directly on
their respective container nodes.

> [7]
>
> + internal_id = path.rsplit("/", 1)[1]
>
> FWIW, that's os.path.basename(path)

noted, fwiw i copied it from the service state impl ;-)

>
> [8]
>
> + # Not clear if we really need to update the topology till the
> relation
> + # is in use. Topology updates are broadcasting to most connected
> + # clients. perhaps a separate batch api would be useful to
> minimize
> + # broadcast events on the topology.
>
> Yeah, it's a good question. I guess it's alright for now, since the
> model feels good. Adding brand new relations should be an operation
> generally done after a user action, anyway, so this is not a fast path
> we'll have to worry about hopefully.

Atm. we'll be doing deploys of service and dependencies when at formula
publishing time, so it will effectively be a bulk update topology update,
where size is the number of relations of the deployed service, but without
the api its not something we can utilize there. Its not required atm.

>
> [9]
>
> + @param name - The service's name for this relation.
> + @param role - The service's role within this relation.
>
> I've mentioned this a few times, and we really have to define how to
> call these. I've been calling these "relation role" and "relation
> name". These terms are unique for the relation, and can be used by
> themselves. We cannot use "service role" and "service name" by itself,
> though. We'll always have to say "the service role in this relation"
> and "the service name in this relation" instead, due to the fact that a
> service has many relations and thus many names and roles.
>
> We have to make a decision regarding this nomenclature over UDS and stop
> using mixed terms to avoid confusion.

We should agree on terminology. In both cases we're referring to a
service's role within a relationship. As long as we define and use the
term consistently the ambiguity can be minimized. I'm okay with either.
relation name and relation role are fine, but we need to stress in
definition that while its target/context is a relation, its subject is a
service.

>
> [10]
>
> + # If it is assigned, its an error.
> + if topology.has_relation_service(
> + self._internal_id, service_state.internal_id):
> + raise ServiceRelationAlreadyAssigned(
> + service_state.service_name, name, role)
> +
> + # If its not, then update the node, and continue.
> + yield retry_change(
> + self._client, path, lambda content, stat: node_data)
>
> There's a small race here. Not sure if there are real problems that
> might surface out of it. It's worth a comment, at least.
>

okay.

> [11]
>
>
> + alive_path = "/relations/%s/%s/%s" % (
> + self._relation_id, self._service_id, unit_state.internal_id)
>
> Why is the service in there, and what about the role directory? Also,
> why is the ServiceRelationState necessary? This doesn't seem to reflect
> the specification we discussed

i used a shortcut so i could it to review, as the branch was getting
larged, as noted in the merge request. i'll fix that up.

cheers,

kapil

On Tue, 26 Oct 2010 11:32:44 -0400, Gustavo Niemeyer  
<gustavo@niemeyer.net> wrote:

> The overall feeling of the work so far is very good.  Feels generally  
> clean, and understandable.

Cool, thanks for the review, replies inline.

>
> Some specific points:
>
>
> [1]
>
> +    def add_relation(self, relation_id):
> +        """Add a relation with given id between the given services.
> (...)
> +    def has_relation(self, relation_id):
> +        """Return True if service_id was previously added to this  
> topology.
>
> These docstrings aren't reflecting the implementation.

Yeah.. they where prior to some discussions we had, when we discussed  
separating out add relation from associating a service to it. I'll update  
them.

>
> [2]
>
> +    def has_relation_service(self, relation_id, service_id):
>
> relation_has_service might read better here.

agreed.

>
> [3]
>
> +    def unassign_service_to_relation(self, relation_id, service_id):
> +        """Disassociate service to relation.
>
> Similarly, shouldn't that be "_from_relation" and "from relation"?
>

sounds good as well.

> [4]
>
> All of the date for the relation is missing from the topology (relation  
> name, relation type, relation role).  Without at least some of these in  
> the topology, we can't do the mapping of a user provided name to the  
> relation id without iterating through the nodes.
>

The formula author aka 'user' name for a relation is specific to a  
service, its not global to the relation. We can lookup this meta  
information when we retrieve the services for a relation, we don't have to  
iterate all relation nodes, just the ones associated to a service.  
Additionally, we don't want to bloat the limited space we have in the  
topology node for non coordination data, all of this additional metadata  
is effectively static, formula derived metadata.

> [5]
>
> +        Type is used to distinguish the type of the relation, ie.
> +        'database.postgres', or 'http.cache.varnish'.
>
> We haven't discussed/specified the namespacing of the types.  We have to  
> talk about "http.cache.varnish" vs. "varnish".
>

agreed, i can remove the docstring namespaced type reference, as its not  
germane to the implementation at this point.

> [6]
>
>
> +        node_data = {"type": relation_type}
> +        path = yield self._client.create("/relations/relation-",
> +                                         yaml.safe_dump(node_data),
>
> I'm wondering a bit about where to store the information for the  
> relation.  It feels a bit clumsy to have the relation type inside the  
> relation node itself, as a first gut feeling.  Imagine what it'd take to  
> implement the matching of relation types to resolve dependencies.  We'd  
> basically have to look at the topology and then iterate over every node  
> we have to discover which of them is responsible for which relation  
> type/name/role.
>

We haven't really discussed dependency resolution using existing services.  
I think we can construct a secondary index (potentially multi-node) of the  
relation data to do resolutions, but unless we want the index to be  
primary storage, i think there is still value to having this 'identity'  
information stored on the node.

relation name / relation role are service specific and stored directly on  
their respective container nodes.

> [7]
>
> +        internal_id = path.rsplit("/", 1)[1]
>
> FWIW, that's os.path.basename(path)

noted, fwiw i copied it from the service state impl ;-)

>
> [8]
>
> +        # Not clear if we really need to update the topology till the  
> relation
> +        # is in use. Topology updates are broadcasting to most connected
> +        # clients. perhaps a separate batch api would be useful to  
> minimize
> +        # broadcast events on the topology.
>
> Yeah, it's a good question.  I guess it's alright for now, since the  
> model feels good.  Adding brand new relations should be an operation  
> generally done after a user action, anyway, so this is not a fast path  
> we'll have to worry about hopefully.

Atm. we'll be doing deploys of service and dependencies when at formula  
publishing time, so it will effectively be a bulk update topology update,  
where size is the number of relations of the deployed service, but without  
the api its not something we can utilize there. Its not required atm.

>
> [9]
>
> +        @param name - The service's name for this relation.
> +        @param role - The service's role within this relation.
>
> I've mentioned this a few times, and we really have to define how to  
> call these.  I've been calling these "relation role" and "relation  
> name".  These terms are unique for the relation, and can be used by  
> themselves.  We cannot use "service role" and "service name" by itself,  
> though.  We'll always have to say "the service role in this relation"  
> and "the service name in this relation" instead, due to the fact that a  
> service has many relations and thus many names and roles.
>
> We have to make a decision regarding this nomenclature over UDS and stop  
> using mixed terms to avoid confusion.

We should agree on terminology. In both cases we're referring to a  
service's role within a relationship. As long as we define and use the  
term consistently the ambiguity can be minimized. I'm okay with either.  
relation name and relation role are fine, but we need to stress in  
definition that while its target/context is a relation, its subject is a  
service.

>
> [10]
>
> +            # If it is assigned, its an error.
> +            if topology.has_relation_service(
> +                self._internal_id, service_state.internal_id):
> +                raise ServiceRelationAlreadyAssigned(
> +                    service_state.service_name, name, role)
> +
> +            # If its not, then update the node, and continue.
> +            yield retry_change(
> +                self._client, path, lambda content, stat: node_data)
>
> There's a small race here.  Not sure if there are real problems that  
> might surface out of it.  It's worth a comment, at least.
>

okay.

> [11]
>
>
> +        alive_path = "/relations/%s/%s/%s" % (
> +            self._relation_id, self._service_id, unit_state.internal_id)
>
> Why is the service in there, and what about the role directory?  Also,  
> why is the ServiceRelationState necessary?  This doesn't seem to reflect  
> the specification we discussed

i used a shortcut so i could it to review, as the branch was getting  
larged, as noted in the merge request. i'll fix that up.

cheers,

kapil

« Back to merge proposal