On Fri, Jul 29, 2011 at 3:09 PM, Gustavo Niemeyer <email address hidden>wrote:
> Review: Needs Fixing
> Here are a few additional points Jim. Nothing huge, I think:
>
>
> [1]
>
> self._watched_services = {}
> + self._watched_machines = set()
>
>
> Bogus line. It's duplicated.
>
Fixed
>
>
> [2]
>
> machine = yield self.provider.get_machine(instance_id)
> machine.machine_id = machine_id
> (...)
> current_ports = yield self.provider.get_opened_ports(machine)
>
> We really need to fix this. You ask the provider for a machine, it
> gives you the machine with a field it has no idea about how to
> retrieve, you stuff an id into that object, and then give it _back_
> to the provider because it will need it.
>
> Result:
>
> - The provider is the one responsible for its machines, but can't
> create one with the needed data.
> - The provider expects a machine_id in arbitrary moments, and doesn't
> tell when.
> - Arbitrary locations set the id, because they magically know the dance.
>
> No one but you understand that. We need a more consistent API to
> handle this.
>
> If get_open_ports and friends need unique identifier, let's provide it
> explicitly. This should be the external id, not the internal id used
> within zookeeper.
>
> Unless the providers can set the machine_id attribute, please take it
> off from the machine.
>
Changed accordingly to a model where machine_id is passed explicitly.
Confirmed with William that this is OK.
>
> [4]
>
> - launch_deferred = self.get_machine_variables(machine_data)
> - launch_deferred.addCallback(self.start_machine)
> + machine_variables = yield self.get_machine_variables(machine_data)
> + provider_machines = yield self.start_machine(
> + machine_variables, machine_data)
>
> The idea of having a data bag like machine_data was a bad one. Now we
> have logic all around which depends on arbitrary fields inside it, and
> do not say so. Let's not increase the problem.
>
> If start_machine needs the machine_id, let's just provide the machine_id
> rather than a bag.
>
Changed so that start_machine takes machine_id. Note there are two
start_machine functions, which is probably not ideal. I only refactored this
one.
>
> [6]
>
> + except EC2Error:
> + # AWS may return an error when the instance is not found
> + # (but presumably eventually will be), eg,
> + # txaws.ec2.exception.EC2Error: Error Message:
> + # The instance ID 'i-afe113ce' does not exist
> + raise ProviderInteractionError(
> + "Machine (id: %r) was not found" % provider_id)
>
> This looks too generic. What other errors can EC2Error represent?
>
Well, I have certainly seen them, but I don't know of a complete list. I
will look. I have made this and the similar usage in
ensemble.providers.ec2.accessor, more generic.
>
> Also, should we perhaps try again a couple of times before giving
> up if the provider_id was indeed not found, to compensate for the
> boredom of eventual consistency?
>
> For the latter, free to just add a comment in case you don't want
> to do now. The former looks like an issue we'll want to address
> before merging.
>
Hmm, I think the best way to do this is take advantage of the periodic
machine check (and make that more robust per bug 639888). I added some
corresponding comments.
A scenario where a simple retry might have issues is where a service is
mistakenly exposed then quickly unexposed. It's possible that a retry
associated with exposed might stamp on that of unexposed.
> [7]
>
> - # TODO: For now authorize external traffic to the instance,
> - # except for the ZooKeeper port. The expose functionality
>
> Good to see that stuff going away.
>
Definitely
>
> [8]
>
> + if ensemble_machine_group in group_ids:
> + yield self._provider.ec2.delete_security_group(
> + ensemble_machine_group)
>
> Has this logic been tested against EC2? Won't this command explode if
> the group does not exist?
>
Tested numerous times, but I added tests around scenarios of security groups
existing, including getting deleted in a race (which would trigger the
above).
>
> [9]
>
> + except EC2Error:
> + raise ProviderInteractionError(
> + "Machine (id: %r) not found when attempting to open %s/%s" % (
>
> Again, is that the only error that can occur? EC2Error feels more generic
> than that.
>
Generalized
>
> [10]
>
> + except EC2Error:
> + raise ProviderInteractionError(
> + "Machine (id: %r) not found when attempting to close %s/%s" %
> (
>
> Ditto.
>
Likewise
>
> [11]
>
> + except EC2Error:
> + raise ProviderInteractionError(
> + "Machine (id: %r) not found when attempting "
> + "to get opened ports" % machine.instance_id)
>
> Ditto.
>
Same
>
> [12]
>
> + for port in xrange(int(ip_permission.from_port),
> + int(ip_permission.to_port) + 1):
> + opened_ports.add((port, ip_permission.ip_protocol))
>
> Feels like we may have issues around this. I can easily see someone
> adding a huge range of ports, and Ensemble exploding while trying to
> handle them individually.
>
> We only open individual ports, right? If so, I suggest skipping
> ranges, since they must necessarily have been opened by hand.
>
On Fri, Jul 29, 2011 at 3:09 PM, Gustavo Niemeyer <email address hidden>wrote:
> Review: Needs Fixing services = {} machines = set()
> Here are a few additional points Jim. Nothing huge, I think:
>
>
> [1]
>
> self._watched_
> + self._watched_
>
>
> Bogus line. It's duplicated.
>
Fixed
> get_machine( instance_ id) get_opened_ ports(machine)
>
> [2]
>
> machine = yield self.provider.
> machine.machine_id = machine_id
> (...)
> current_ports = yield self.provider.
>
> We really need to fix this. You ask the provider for a machine, it
> gives you the machine with a field it has no idea about how to
> retrieve, you stuff an id into that object, and then give it _back_
> to the provider because it will need it.
>
> Result:
>
> - The provider is the one responsible for its machines, but can't
> create one with the needed data.
> - The provider expects a machine_id in arbitrary moments, and doesn't
> tell when.
> - Arbitrary locations set the id, because they magically know the dance.
>
> No one but you understand that. We need a more consistent API to
> handle this.
>
> If get_open_ports and friends need unique identifier, let's provide it
> explicitly. This should be the external id, not the internal id used
> within zookeeper.
>
> Unless the providers can set the machine_id attribute, please take it
> off from the machine.
>
Changed accordingly to a model where machine_id is passed explicitly.
Confirmed with William that this is OK.
> dns_name= None, launch_time=None,
>
> [3]
>
> + def __init__(self, instance_id, dns_name=None,
> + private_
> machine_id=None):
>
> Per above, please drop this.
>
Reverted
> machine_ variables( machine_ data) deferred. addCallback( self.start_ machine) machine_ variables( machine_ data)
> [4]
>
> - launch_deferred = self.get_
> - launch_
> + machine_variables = yield self.get_
> + provider_machines = yield self.start_machine(
> + machine_variables, machine_data)
>
> The idea of having a data bag like machine_data was a bad one. Now we
> have logic all around which depends on arbitrary fields inside it, and
> do not say so. Let's not increase the problem.
>
> If start_machine needs the machine_id, let's just provide the machine_id
> rather than a bag.
>
Changed so that start_machine takes machine_id. Note there are two
start_machine functions, which is probably not ideal. I only refactored this
one.
> port(self, machine, port, protocol)
> [5]
>
> + return open_provider_
>
> Much better, thanks!
>
Cool!
> exception. EC2Error: Error Message: tionError(
> [6]
>
> + except EC2Error:
> + # AWS may return an error when the instance is not found
> + # (but presumably eventually will be), eg,
> + # txaws.ec2.
> + # The instance ID 'i-afe113ce' does not exist
> + raise ProviderInterac
> + "Machine (id: %r) was not found" % provider_id)
>
> This looks too generic. What other errors can EC2Error represent?
>
Well, I have certainly seen them, but I don't know of a complete list. I providers. ec2.accessor, more generic.
will look. I have made this and the similar usage in
ensemble.
>
> Also, should we perhaps try again a couple of times before giving
> up if the provider_id was indeed not found, to compensate for the
> boredom of eventual consistency?
>
> For the latter, free to just add a comment in case you don't want
> to do now. The former looks like an issue we'll want to address
> before merging.
>
Hmm, I think the best way to do this is take advantage of the periodic
machine check (and make that more robust per bug 639888). I added some
corresponding comments.
A scenario where a simple retry might have issues is where a service is
mistakenly exposed then quickly unexposed. It's possible that a retry
associated with exposed might stamp on that of unexposed.
> [7]
>
> - # TODO: For now authorize external traffic to the instance,
> - # except for the ZooKeeper port. The expose functionality
>
> Good to see that stuff going away.
>
Definitely
> machine_ group in group_ids: ec2.delete_ security_ group( machine_ group)
> [8]
>
> + if ensemble_
> + yield self._provider.
> + ensemble_
>
> Has this logic been tested against EC2? Won't this command explode if
> the group does not exist?
>
Tested numerous times, but I added tests around scenarios of security groups
existing, including getting deleted in a race (which would trigger the
above).
> tionError(
> [9]
>
> + except EC2Error:
> + raise ProviderInterac
> + "Machine (id: %r) not found when attempting to open %s/%s" % (
>
> Again, is that the only error that can occur? EC2Error feels more generic
> than that.
>
Generalized
> tionError(
> [10]
>
> + except EC2Error:
> + raise ProviderInterac
> + "Machine (id: %r) not found when attempting to close %s/%s" %
> (
>
> Ditto.
>
Likewise
> tionError( instance_ id)
> [11]
>
> + except EC2Error:
> + raise ProviderInterac
> + "Machine (id: %r) not found when attempting "
> + "to get opened ports" % machine.
>
> Ditto.
>
Same
> int(ip_ permission. from_port) , permission. to_port) + 1): ports.add( (port, ip_permission. ip_protocol) )
> [12]
>
> + for port in xrange(
> + int(ip_
> + opened_
>
> Feels like we may have issues around this. I can easily see someone
> adding a huge range of ports, and Ensemble exploding while trying to
> handle them individually.
>
> We only open individual ports, right? If so, I suggest skipping
> ranges, since they must necessarily have been opened by hand.
>
Now skips ranges, and of course tests that case.