pyjuju

Code review comment for lp:~jimbaker/pyjuju/expose-provider-ec2

expose-provider-ec2
Merge into trunk

Revision history for this message

Jim Baker (jimbaker) wrote on 2011-08-09:

On Fri, Jul 29, 2011 at 3:09 PM, Gustavo Niemeyer <email address hidden>wrote:

> Review: Needs Fixing
> Here are a few additional points Jim. Nothing huge, I think:
>
>
> [1]
>
> self._watched_services = {}
> + self._watched_machines = set()
>
>
> Bogus line. It's duplicated.
>

Fixed

>
>
> [2]
>
> machine = yield self.provider.get_machine(instance_id)
> machine.machine_id = machine_id
> (...)
> current_ports = yield self.provider.get_opened_ports(machine)
>
> We really need to fix this. You ask the provider for a machine, it
> gives you the machine with a field it has no idea about how to
> retrieve, you stuff an id into that object, and then give it _back_
> to the provider because it will need it.
>
> Result:
>
> - The provider is the one responsible for its machines, but can't
> create one with the needed data.
> - The provider expects a machine_id in arbitrary moments, and doesn't
> tell when.
> - Arbitrary locations set the id, because they magically know the dance.
>
> No one but you understand that. We need a more consistent API to
> handle this.
>
> If get_open_ports and friends need unique identifier, let's provide it
> explicitly. This should be the external id, not the internal id used
> within zookeeper.
>
> Unless the providers can set the machine_id attribute, please take it
> off from the machine.
>

Changed accordingly to a model where machine_id is passed explicitly.
Confirmed with William that this is OK.

>
>
> [3]
>
> + def __init__(self, instance_id, dns_name=None,
> + private_dns_name=None, launch_time=None,
> machine_id=None):
>
> Per above, please drop this.
>

Reverted

>
> [4]
>
> - launch_deferred = self.get_machine_variables(machine_data)
> - launch_deferred.addCallback(self.start_machine)
> + machine_variables = yield self.get_machine_variables(machine_data)
> + provider_machines = yield self.start_machine(
> + machine_variables, machine_data)
>
> The idea of having a data bag like machine_data was a bad one. Now we
> have logic all around which depends on arbitrary fields inside it, and
> do not say so. Let's not increase the problem.
>
> If start_machine needs the machine_id, let's just provide the machine_id
> rather than a bag.
>

Changed so that start_machine takes machine_id. Note there are two
start_machine functions, which is probably not ideal. I only refactored this
one.

>
> [5]
>
> + return open_provider_port(self, machine, port, protocol)
>
> Much better, thanks!
>

Cool!

>
> [6]
>
> + except EC2Error:
> + # AWS may return an error when the instance is not found
> + # (but presumably eventually will be), eg,
> + # txaws.ec2.exception.EC2Error: Error Message:
> + # The instance ID 'i-afe113ce' does not exist
> + raise ProviderInteractionError(
> + "Machine (id: %r) was not found" % provider_id)
>
> This looks too generic. What other errors can EC2Error represent?
>

Well, I have certainly seen them, but I don't know of a complete list. I
will look. I have made this and the similar usage in
ensemble.providers.ec2.accessor, more generic.

>
> Also, should we perhaps try again a couple of times before giving
> up if the provider_id was indeed not found, to compensate for the
> boredom of eventual consistency?
>
> For the latter, free to just add a comment in case you don't want
> to do now. The former looks like an issue we'll want to address
> before merging.
>

Hmm, I think the best way to do this is take advantage of the periodic
machine check (and make that more robust per bug 639888). I added some
corresponding comments.

A scenario where a simple retry might have issues is where a service is
mistakenly exposed then quickly unexposed. It's possible that a retry
associated with exposed might stamp on that of unexposed.

> [7]
>
> - # TODO: For now authorize external traffic to the instance,
> - # except for the ZooKeeper port. The expose functionality
>
> Good to see that stuff going away.
>

Definitely

>
> [8]
>
> + if ensemble_machine_group in group_ids:
> + yield self._provider.ec2.delete_security_group(
> + ensemble_machine_group)
>
> Has this logic been tested against EC2? Won't this command explode if
> the group does not exist?
>

Tested numerous times, but I added tests around scenarios of security groups
existing, including getting deleted in a race (which would trigger the
above).

>
> [9]
>
> + except EC2Error:
> + raise ProviderInteractionError(
> + "Machine (id: %r) not found when attempting to open %s/%s" % (
>
> Again, is that the only error that can occur? EC2Error feels more generic
> than that.
>

Generalized

>
> [10]
>
> + except EC2Error:
> + raise ProviderInteractionError(
> + "Machine (id: %r) not found when attempting to close %s/%s" %
> (
>
> Ditto.
>

Likewise

>
> [11]
>
> + except EC2Error:
> + raise ProviderInteractionError(
> + "Machine (id: %r) not found when attempting "
> + "to get opened ports" % machine.instance_id)
>
> Ditto.
>

Same

>
> [12]
>
> + for port in xrange(int(ip_permission.from_port),
> + int(ip_permission.to_port) + 1):
> + opened_ports.add((port, ip_permission.ip_protocol))
>
> Feels like we may have issues around this. I can easily see someone
> adding a huge range of ports, and Ensemble exploding while trying to
> handle them individually.
>
> We only open individual ports, right? If so, I suggest skipping
> ranges, since they must necessarily have been opened by hand.
>

Now skips ranges, and of course tests that case.

On Fri, Jul 29, 2011 at 3:09 PM, Gustavo Niemeyer <gustavo@niemeyer.net>wrote:

> Review: Needs Fixing
> Here are a few additional points Jim.  Nothing huge, I think:
>
>
> [1]
>
>         self._watched_services = {}
> +        self._watched_machines = set()
>
>
> Bogus line.  It's duplicated.
>

Fixed

>
>
> [2]
>
>             machine = yield self.provider.get_machine(instance_id)
>             machine.machine_id = machine_id
>             (...)
>             current_ports = yield self.provider.get_opened_ports(machine)
>
> We really need to fix this.  You ask the provider for a machine, it
> gives you the machine with a field it has no idea about how to
> retrieve, you stuff an id into that object, and then give it _back_
> to the provider because it will need it.
>
> Result:
>
> - The provider is the one responsible for its machines, but can't
>  create one with the needed data.
> - The provider expects a machine_id in arbitrary moments, and doesn't
>  tell when.
> - Arbitrary locations set the id, because they magically know the dance.
>
> No one but you understand that.  We need a more consistent API to
> handle this.
>
> If get_open_ports and friends need unique identifier, let's provide it
> explicitly.  This should be the external id, not the internal id used
> within zookeeper.
>
> Unless the providers can set the machine_id attribute, please take it
> off from the machine.
>

Changed accordingly to a model where machine_id is passed explicitly.
Confirmed with William that this is OK.

>
>
> [3]
>
> +    def __init__(self, instance_id, dns_name=None,
> +                 private_dns_name=None, launch_time=None,
> machine_id=None):
>
> Per above, please drop this.
>

Reverted

>
> [4]
>
> -        launch_deferred = self.get_machine_variables(machine_data)
> -        launch_deferred.addCallback(self.start_machine)
> +        machine_variables = yield self.get_machine_variables(machine_data)
> +        provider_machines = yield self.start_machine(
> +            machine_variables, machine_data)
>
> The idea of having a data bag like machine_data was a bad one.  Now we
> have logic all around which depends on arbitrary fields inside it, and
> do not say so.  Let's not increase the problem.
>
> If start_machine needs the machine_id, let's just provide the machine_id
> rather than a bag.
>

Changed so that start_machine takes machine_id. Note there are two
start_machine functions, which is probably not ideal. I only refactored this
one.

>
> [5]
>
> +        return open_provider_port(self, machine, port, protocol)
>
> Much better, thanks!
>

Cool!

>
> [6]
>
> +        except EC2Error:
> +            # AWS may return an error when the instance is not found
> +            # (but presumably eventually will be), eg,
> +            # txaws.ec2.exception.EC2Error: Error Message:
> +            # The instance ID 'i-afe113ce' does not exist
> +            raise ProviderInteractionError(
> +                "Machine (id: %r) was not found" % provider_id)
>
> This looks too generic.  What other errors can EC2Error represent?
>

Well, I have certainly seen them, but I don't know of a complete list. I
will look. I have made this and the similar usage in
ensemble.providers.ec2.accessor, more generic.

>
> Also, should we perhaps try again a couple of times before giving
> up if the provider_id was indeed not found, to compensate for the
> boredom of eventual consistency?
>
> For the latter, free to just add a comment in case you don't want
> to do now.  The former looks like an issue we'll want to address
> before merging.
>

Hmm, I think the best way to do this is take advantage of the periodic
machine check (and make that more robust per bug 639888). I added some
corresponding comments.

A scenario where a simple retry might have issues is where a service is
mistakenly exposed then quickly unexposed. It's possible that a retry
associated with exposed might stamp on that of unexposed.

> [7]
>
> -            # TODO: For now authorize external traffic to the instance,
> -            # except for the ZooKeeper port.  The expose functionality
>
> Good to see that stuff going away.
>

Definitely

>
> [8]
>
> +        if ensemble_machine_group in group_ids:
> +            yield self._provider.ec2.delete_security_group(
> +                ensemble_machine_group)
>
> Has this logic been tested against EC2?  Won't this command explode if
> the group does not exist?
>

Tested numerous times, but I added tests around scenarios of security groups
existing, including getting deleted in a race (which would trigger the
above).

>
> [9]
>
> +    except EC2Error:
> +        raise ProviderInteractionError(
> +            "Machine (id: %r) not found when attempting to open %s/%s" % (
>
> Again, is that the only error that can occur?  EC2Error feels more generic
> than that.
>

Generalized

>
> [10]
>
> +    except EC2Error:
> +        raise ProviderInteractionError(
> +            "Machine (id: %r) not found when attempting to close %s/%s" %
> (
>
> Ditto.
>

Likewise

>
> [11]
>
> +    except EC2Error:
> +        raise ProviderInteractionError(
> +            "Machine (id: %r) not found when attempting "
> +            "to get opened ports" % machine.instance_id)
>
> Ditto.
>

Same

>
> [12]
>
> +        for port in xrange(int(ip_permission.from_port),
> +                           int(ip_permission.to_port) + 1):
> +            opened_ports.add((port, ip_permission.ip_protocol))
>
> Feels like we may have issues around this.  I can easily see someone
> adding a huge range of ports, and Ensemble exploding while trying to
> handle them individually.
>
> We only open individual ports, right?  If so, I suggest skipping
> ranges, since they must necessarily have been opened by hand.
>

Now skips ranges, and of course tests that case.

« Back to merge proposal