Merge into trunk : distributed-queue : Code : txzookeeper

Status:	Merged
Merged at revision:	24
Proposed branch:	lp:~hazmat/txzookeeper/distributed-queue
Merge into:	lp:txzookeeper
Diff against target:	989 lines (+901/-13) 7 files modified txzookeeper/lock.py (+12/-7) txzookeeper/queue.py (+451/-0) txzookeeper/tests/__init__.py (+0/-4) txzookeeper/tests/test_client.py (+27/-0) txzookeeper/tests/test_lock.py (+17/-0) txzookeeper/tests/test_queue.py (+392/-0) txzookeeper/todo.txt (+2/-2)
To merge this branch:	bzr merge lp:~hazmat/txzookeeper/distributed-queue
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Gustavo Niemeyer		2010-05-20	Approve on 2010-06-15
Review via email: mp+25712@code.launchpad.net

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2010-05-28:

#

[1]

+ self.path = path
+ self.client = client
+ self.persistent = persistent

I tend to prefer keeping things private, unless they are actually required by external clients, or would be obviously important to know about. This way it's easy to tell what's the "trusted" API clients are supposed to rely on, and also makes it more comfortable when changing the interface (anything private can be removed/renamed/replaced).

How do you feel about this in general?

[2]

+ self, path, client, acl=[ZOO_OPEN_ACL_UNSAFE], persistent=False):

Also a pretty general point which I'm making mostly to synchronize our thinking, rather than as a *required* change here:

I tend to prefer using the style of acl=None in the initialization of default parameters, and then process it internally in the constructor (if acl is None, acl = ... or similar).

While the result is obviously the same here, the main distinction is that it enables code using the function or class to say "I don't have anything special in this parameter.. just do your default.", which gets pretty tricky when the default value is in the keyword argument constructor itself.

[3]

+ return self._get(wait=True)
+
+ get_wait = get

Do we need all these alternatives? Part of the beauty of Twisted is that we don't really have to care about what "waiting" means.

I suggest we have a single interface for getting, and it will always return a deferred which will fire once an item is available, no matter what. This will also simplify a bit the logic elsewhere (in _refill, _get_item, etc).

[4]

+ d = self.client.create(
+ "/".join((self.path, self.prefix)), item, self._acl, flags)
+ return d

Nice. It feels pretty cool to be able to wait on a put this way.

[5]

+ d = self.client.exists(self.path)
+
+ def on_success(stat):
+ return stat["numChildren"]

Oh, interesting trick! I would imagine that getting the full list would be required, but this is of course a lot better.

[6]

+ Fetch the node data in the queue directory for the given node name. If
+ wait is
(...)
+ # tests. Instead we process our entire our node cache before
+ # proceeding.

Couple of comment details: "is ..." and "our entire our".

[7]

+ Refetch the queue children, setting a watch as needed, and invalidating
+ any previous children entries queue.

It would be nice to have a higher level description of what the algorithm is actually doing. E.g. what is the children entries queue about, what happens when it's empty, or when two different consumers have a partially overlapping queue, how are items consumed, etc.

[8]

In on_queue_items_changed():

+ self._cached_entries = []
+ d = self._refill(wait=wait)

Why is the cache being reset right after we're told changes have happened? Shouldn't this happen once we actually get the new list of children?

[9]

+ d = self._refill(wait=wait)
(...)
+ d = self.client.get_children(
+ self.path, on_queue_items_changed)

It'd be good to have more descriptive names for these variables, since one of th...

On Fri, 28 May 2010 16:30:42 -0400, Gustavo Niemeyer  
<gustavo@niemeyer.net> wrote:

> [1]
>
> +        self.path = path
> +        self.client = client
> +        self.persistent = persistent
>
> I tend to prefer keeping things private, unless they are actually  
> required by external clients, or would be obviously important to know  
> about.  This way it's easy to tell what's the "trusted" API clients are  
> supposed to rely on, and also makes it more comfortable when changing  
> the interface (anything private can be removed/renamed/replaced).
>
> How do you feel about this in general?

In general i'm fine with it. i have mixed feeling about private is spelled  
in some cases, i've dealt with libraries that take private (double under  
style) to levels that made clean reuse or debugging difficult.  In this  
particular case, i'm fine with client being private (single underscore),  
and path and persistent being read only properties and have made those  
changes. In general its okay as long its not a double underscore and there  
isn't a legitimate reason for the its use by consumer, if the latter i'd  
tend to prefer at least read only property access if direct attribute  
access would be dangerous.

>
>
> [2]
>
> +        self, path, client, acl=[ZOO_OPEN_ACL_UNSAFE],  
> persistent=False):
>
> Also a pretty general point which I'm making mostly to synchronize our  
> thinking, rather than as a *required* change here:
>
> I tend to prefer using the style of acl=None in the initialization of  
> default parameters, and then process it internally in the constructor  
> (if acl is None, acl = ... or similar).
>
> While the result is obviously the same here, the main distinction is  
> that it enables code using the function or class to say "I don't have  
> anything special in this parameter.. just do your default.", which gets  
> pretty tricky when the default value is in the keyword argument  
> constructor itself.
>

i'm fine with that, to me its a minor tradeoff to quick reading the  
function signature, but then again there's a whole class of bugs around  
mutable default args. changed.

>
> [3]
>
> +        return self._get(wait=True)
> +
> +    get_wait = get
>
> Do we need all these alternatives?  Part of the beauty of Twisted is  
> that we don't really have to care about what "waiting" means.

right, originally i was trying to match signature provided by the queue  
implementation. i think this alias is an artifact of when i changing the  
defaults behavior of get (used to raise empty error). its removed now.

>
> I suggest we have a single interface for getting, and it will always  
> return a deferred which will fire once an item is available, no matter  
> what.  This will also simplify a bit the logic elsewhere (in _refill,  
> _get_item, etc).
>

hmmm.. i like having the option of both apis and the synchronicity to the  
Queue.Queue api. alternatively perhaps a timeout option on the get api  
would also suffice (also in Queue.Queue api). the logic simplification  
 from the method removal is pretty minor,  maybe four lines of code total  
in the methods mentioned.

>
>
> [6]
>
> +        Fetch the node data in the queue directory for the given node  
> name. If
> +        wait is
> (...)
> +                # tests. Instead we process our entire our node cache  
> before
> +                # proceeding.
>
> Couple of comment details: "is ..." and "our entire our".
>

fixed and reworded.

> [7]
>
> +        Refetch the queue children, setting a watch as needed, and  
> invalidating
> +        any previous children entries queue.
>
> It would be nice to have a higher level description of what the  
> algorithm is actually doing. E.g. what is the children entries queue  
> about, what happens when it's empty, or when two different consumers  
> have a partially overlapping queue, how are items consumed, etc.
>

noted

>
> [8]
>
> In on_queue_items_changed():
>
> +            self._cached_entries = []
> +            d = self._refill(wait=wait)
>
> Why is the cache being reset right after we're told changes have  
> happened?  Shouldn't this happen once we actually get the new list of  
> children?

its being reset to prevent processing of possibly invalid cache entries,  
when we know we have an out of date list. The line can be removed without  
affecting the algorithm, its just a tradeoff to reduce a few redundant  
roundtrip item fetches in high concurrency (test? ;-) scenarios. i've gone  
ahead and removed it.
>
> [9]
>
> +            d = self._refill(wait=wait)
> (...)
> +        d = self.client.get_children(
> +            self.path, on_queue_items_changed)
>
> It'd be good to have more descriptive names for these variables, since  
> one of them is shadowing the other.  Note, for instance:
>
> +            d.addCallback(notify_waiting)
> +
> +        d = self.client.get_children(
>
> The "d" above isn't the "d" below, even though they share a scope and  
> are so close.
>
>
> [10]
>
> +            if isinstance(failure.value, zookeeper.NoNodeException):
>
> Failure has a trap() method which is handy for these cases.
>

nice, thanks.

> [11]
>
> This is a gut feeling, but it feels a bit like there are too many entry  
> points into the same caching logic.  There are four places testing for  
> the emptiness of _cached_entries, and at least four other places  
> modifying it for various reasons, in different functions.  It'd be  
> really good for us if we can streamline and simplify this into a single  
> place popping items, and a single place refilling it.
>
> If you don't feel like working on this, I can have a look at refactoring  
> this somehow to see if that's feasible, once these branches go in and we  
> have a stable base again.
>

i took a stab at reducing cache entries modifications points, by  
restarting the get unconditionally in _get_item instead of processing  
cache entries, but it lead to consumer underflows in the staged producer  
/consumer test. i'm going to leave off on this for now.

lp:~hazmat/txzookeeper/distributed-queue updated on 2010-06-03

29. By Kapil Thangavelu on 2010-06-03: remove get_nowait api
30. By Kapil Thangavelu on 2010-06-03: use trap for no node error handler, remove unused import queue.empty
31. By Kapil Thangavelu on 2010-06-03: remove old get_nowait tests.

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2010-06-03:

#

[3] I went ahead and removed the get_nowait api as per our discussion. in future the use case can be addressed with a timeout.

[9] no more scope name conflict on deferreds.

lp:~hazmat/txzookeeper/distributed-queue updated on 2010-06-07

32. By Kapil Thangavelu on 2010-06-07: refactor distributed queue implementation

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2010-06-07:

#

i refactored the implementation it no longer uses any cached values, and is more efficient about establishing watches on the queue.

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2010-06-08:

#

Cool, nice improvements indeed!

I'm going to restart the review again, since it's a brand new implementation. I'll start the sequence on 12 just to avoid conflicts.

[12]

+ def __init__(self, deferred, watcher):
+ self.deferred = deferred
+ self.watcher = watcher
+ self.processing = False
+ self.refetch = False

I'm having a slightly hard time reading the code because of the generality of the names here. E.g. "refetch" what? "watcher" for what? "processing" what? Why is "processing" only turned on inside _get_item?

[13]

+ def on_queue_items_changed(*args):
+ """Event watcher on queue node child events."""
+ if request.complete or not self._client.connected:
+ return
+
+ if request.processing:
+ request.refetch = True
+ else:
+ self._get(request)

I think I might be missing some detail about the implementation, perhaps due to [12].

Let's say that things happen in the following order:

1) User calls get()
2) get() calls _get()
3) _get() hooks _get_item on the result, and the watcher above on changes
4) Result is returned, and _get_item() is queued for being called
5) watcher from (2) fires, and calls _get() again
6) Repeat (3)
7) Repeat (4)

In other words, it looks like _get_item might end up being queued multiple times even before it started running the first time.

[14]

I find a bit suspect that we're killing the item from the queue even before the user had a chance to know about its existence. This will make the queue somewhat prone to items being eaten by crashes.

[15]

+ # Refetching deferred until we process all the children from
+ # from a get children call.
+ if request.refetch:
+ request.processing = False
+ return self._get(request)

Again, going back to [12], I find the nomenclature a bit non-obvious to follow. Why is it not "processing" anymore on a refetch? Looks like there's a very tight relationship with the get children watcher in this logic, even though that's not being made explicit by names nor comments.

Also, I believe there's an issue here. Let's say things happen in this order:

1) children is [], refetch is False, request.processing is True
2) on_no_node_error is called on a NoNodeException
3) Due to the state, the exception is swallowed and nothing really changes
4) on_queue_items_changed is called
5) Given the state, refetch is made True
6) and... nothing else ever happens?

I think making names more explicit might help avoiding issues like these, since it'll be easier to reason about the logic there and know out of gut feeling when things aren't happening the way they should.

Cool, nice improvements indeed!

I'm going to restart the review again, since it's a brand new implementation.  I'll start the sequence on 12 just to avoid conflicts.

[12]

+    def __init__(self, deferred, watcher):
+        self.deferred = deferred
+        self.watcher = watcher
+        self.processing = False
+        self.refetch = False

I'm having a slightly hard time reading the code because of the generality of the names here. E.g. "refetch" what?  "watcher" for what? "processing" what?  Why is "processing" only turned on inside _get_item?

[13]

+        def on_queue_items_changed(*args):
+            """Event watcher on queue node child events."""
+            if request.complete or not self._client.connected:
+                return
+
+            if request.processing:
+                request.refetch = True
+            else:
+                self._get(request)

I think I might be missing some detail about the implementation, perhaps due to [12].

Let's say that things happen in the following order:

1) User calls get()
2) get() calls _get()
3) _get() hooks _get_item on the result, and the watcher above on changes
4) Result is returned, and _get_item() is queued for being called
5) watcher from (2) fires, and calls _get() again
6) Repeat (3)
7) Repeat (4)

In other words, it looks like _get_item might end up being queued multiple times even before it started running the first time.
 
[14]

I find a bit suspect that we're killing the item from the queue even before the user had a chance to know about its existence.  This will make the queue somewhat prone to items being eaten by crashes.
 
[15]

+            # Refetching deferred until we process all the children from
+            # from a get children call.
+            if request.refetch:
+                request.processing = False
+                return self._get(request)

Again, going back to [12], I find the nomenclature a bit non-obvious to follow. Why is it not "processing" anymore on a refetch?  Looks like there's a very tight relationship with the get children watcher in this logic, even though that's not being made explicit by names nor comments.

Also, I believe there's an issue here.  Let's say things happen in this order:

1) children is [], refetch is False, request.processing is True
2) on_no_node_error is called on a NoNodeException
3) Due to the state, the exception is swallowed and nothing really changes
4) on_queue_items_changed is called
5) Given the state, refetch is made True
6) and... nothing else ever happens?

I think making names more explicit might help avoiding issues like these, since it'll be easier to reason about the logic there and know out of gut feeling when things aren't happening the way they should.

review: Needs Fixing

lp:~hazmat/txzookeeper/distributed-queue updated on 2010-06-08

33. By Kapil Thangavelu on 2010-06-08: request.processing should be false anytime there are no children, not just on refetch.

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2010-06-08:

#

Download full text (4.2 KiB)

On Tue, 08 Jun 2010 10:28:24 -0400, Gustavo Niemeyer
<email address hidden> wrote:

> Review: Needs Fixing
> Cool, nice improvements indeed!
>
> I'm going to restart the review again, since it's a brand new
> implementation. I'll start the sequence on 12 just to avoid conflicts.
>
> [12]
>
> + def __init__(self, deferred, watcher):
> + self.deferred = deferred
> + self.watcher = watcher
> + self.processing = False
> + self.refetch = False
>
> I'm having a slightly hard time reading the code because of the
> generality of the names here. E.g. "refetch" what? "watcher" for what?
> "processing" what? Why is "processing" only turned on inside _get_item?
>

Thanks for the review. Noted on names, i'll update them. fwiw, refetch ->
children, watcher -> queue children, processing -> children

re processing inside of _get_item, i've moved it to _get

> [13]
>
> + def on_queue_items_changed(*args):
> + """Event watcher on queue node child events."""
> + if request.complete or not self._client.connected:
> + return
> +
> + if request.processing:
> + request.refetch = True
> + else:
> + self._get(request)
>
> I think I might be missing some detail about the implementation, perhaps
> due to [12].
>
> Let's say that things happen in the following order:
>
> 1) User calls get()
> 2) get() calls _get()
> 3) _get() hooks _get_item on the result, and the watcher above on changes
> 4) Result is returned, and _get_item() is queued for being called
> 5) watcher from (2) fires, and calls _get() again
> 6) Repeat (3)
> 7) Repeat (4)
>
> In other words, it looks like _get_item might end up being queued
> multiple times even before it started running the first time.

indeed, its possible. i've updated the code to set processing = True in
_get before the client call.

> [14]
>
> I find a bit suspect that we're killing the item from the queue even
> before the user had a chance to know about its existence. This will
> make the queue somewhat prone to items being eaten by crashes.

As it is right now, the queue semantics enforce isolation and minimal
communication with the zookeeper server. I noted the requirement regarding
error handling for queue consumers in the module docstring. If we want
reliable message delivery/processing and want the queue to enforce that
semantic for us, than we'll likely need some aggregate/composite node
structures, cooperative semantics, or data introspection/modification on
queue item nodes, all of which are will incur additional communication
overhead to enforce that semantic. I think those are effectively different
data structures, a queue isn't a nesc. a message queue.

> [15]
>
>
> + # Refetching deferred until we process all the children from
> + # from a get children call.
> + if request.refetch:
> + request.processing = False
> + return self._get(request)
>
> Again, going back to [12], I find the nomenclature a bit non-obvious to
> follow. Why is it not "processing" anymore on a refetch? Looks like...

On Tue, 08 Jun 2010 10:28:24 -0400, Gustavo Niemeyer  
<gustavo@niemeyer.net> wrote:

> Review: Needs Fixing
> Cool, nice improvements indeed!
>
> I'm going to restart the review again, since it's a brand new  
> implementation.  I'll start the sequence on 12 just to avoid conflicts.
>
> [12]
>
> +    def __init__(self, deferred, watcher):
> +        self.deferred = deferred
> +        self.watcher = watcher
> +        self.processing = False
> +        self.refetch = False
>
> I'm having a slightly hard time reading the code because of the  
> generality of the names here. E.g. "refetch" what?  "watcher" for what?  
> "processing" what?  Why is "processing" only turned on inside _get_item?
>

Thanks for the review. Noted on names, i'll update them. fwiw, refetch ->  
children, watcher -> queue children, processing -> children

re processing inside of _get_item, i've moved it to _get

> [13]
>
> +        def on_queue_items_changed(*args):
> +            """Event watcher on queue node child events."""
> +            if request.complete or not self._client.connected:
> +                return
> +
> +            if request.processing:
> +                request.refetch = True
> +            else:
> +                self._get(request)
>
> I think I might be missing some detail about the implementation, perhaps  
> due to [12].
>
> Let's say that things happen in the following order:
>
> 1) User calls get()
> 2) get() calls _get()
> 3) _get() hooks _get_item on the result, and the watcher above on changes
> 4) Result is returned, and _get_item() is queued for being called
> 5) watcher from (2) fires, and calls _get() again
> 6) Repeat (3)
> 7) Repeat (4)
>
> In other words, it looks like _get_item might end up being queued  
> multiple times even before it started running the first time.

indeed, its possible. i've updated the code to set processing = True in  
_get before the client call.

> [14]
>
> I find a bit suspect that we're killing the item from the queue even  
> before the user had a chance to know about its existence.  This will  
> make the queue somewhat prone to items being eaten by crashes.

As it is right now, the queue semantics enforce isolation and minimal  
communication with the zookeeper server. I noted the requirement regarding  
error handling for queue consumers in the module docstring. If we want  
reliable message delivery/processing and want the queue to enforce that  
semantic for us, than we'll likely need some aggregate/composite node  
structures, cooperative semantics, or data introspection/modification on  
queue item nodes, all of which are will incur additional communication  
overhead to enforce that semantic. I think those are effectively different  
data structures, a queue isn't a nesc. a message queue.

> [15]
>
>
> +            # Refetching deferred until we process all the children from
> +            # from a get children call.
> +            if request.refetch:
> +                request.processing = False
> +                return self._get(request)
>
> Again, going back to [12], I find the nomenclature a bit non-obvious to  
> follow. Why is it not "processing" anymore on a refetch?  Looks like  
> there's a very tight relationship with the get children watcher in this  
> logic, even though that's not being made explicit by names nor comments.

that's a bug, since fixed. the request.processing = False should be  
outside and before the conditional. Effectively whenever there are no  
children being processed request.processing should be false.

>
> Also, I believe there's an issue here.  Let's say things happen in this  
> order:
>
> 1) children is [], refetch is False, request.processing is True
> 2) on_no_node_error is called on a NoNodeException
> 3) Due to the state, the exception is swallowed and nothing really  
> changes
> 4) on_queue_items_changed is called
> 5) Given the state, refetch is made True
> 6) and... nothing else ever happens?
>
> I think making names more explicit might help avoiding issues like  
> these, since it'll be easier to reason about the logic there and know  
> out of gut feeling when things aren't happening the way they should.

with the fix above this scenario is no longer valid, when the watch fires,  
it restarts the _get.

Thanks!

lp:~hazmat/txzookeeper/distributed-queue updated on 2010-06-10

34. By Kapil Thangavelu on 2010-06-08: set request processing in _get, such that watches that fire before callbacks are processed correctly.
35. By Kapil Thangavelu on 2010-06-08: reliable queue implementation
36. By Kapil Thangavelu on 2010-06-09: serialized queue impl
37. By Kapil Thangavelu on 2010-06-09: refactor tests to abstract queue class, reset refetch flag when refetching children.
38. By Kapil Thangavelu on 2010-06-10: additional tests for serialized and reliable queues.
39. By Kapil Thangavelu on 2010-06-10: add test to validate that unexpected errors propogate to the get deferred.

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2010-06-10:

#

[12] I added some doc strings to the get request which documents all the variables, if this is insufficient for clarity let me know, and i'll do a round of variable renames.

[14] Per out discussion on reliable processing/consumption of queue items, i've added the reliable queue and serialized queue implementations. A new serialized queue implementation utilizing a lock is something i'll address in a future branch (added bug:592266), the implementation as is suffers from a lot of pointless contention among consumers waking up from watches without any work they can proceed with, nonetheless its functional.

lp:~hazmat/txzookeeper/distributed-queue updated on 2010-06-10

40. By Kapil Thangavelu on 2010-06-10: additional doc string on persistent v. transient reliable queues
41. By Kapil Thangavelu on 2010-06-10: rename GetRequest variables for clarity.
42. By Kapil Thangavelu on 2010-06-10: document each branch of the conditional within the queue child watch.

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2010-06-11:

#

Thanks Kapil. Queue looks great. Some items about the new stuff:

[15]

+ and processed in the order they where placed in the queue. (TODO) An
+ implementation with less contention between consumers might instead utilize
+ a reliable queue with a lock.

Even without the lock, a better implementation would wait for the removal of the specific -processing file which is holding further action back, rather than any change in the queue. E.g. if the item itself is removed, but not the -processing control file, it will fire off, and fail again. Also (and most importantly), any new items added to the queue will also fire the watch and cause the consumer to refetch the full child list only to find out that it's still being processed by someone else, and it can do nothing about the new item appended at the back of the queue.

[16]

+class SerializedQueueTests(QueueTests):
+
+ queue_factory = SerializedQueue

Don't we need something here? :-)

[17]

Queue should probably use QueueItems too (without delete). That said, since it's just a reference implementation and we'll likely need the reliable version, feel free to ignore this.

[18]

It feels like there's potential for more reuse in the _get_item implementation, but I don't think we should wait this for merging. It's just a good refactoring for a boring moment.

review: Needs Fixing

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2010-06-11:

#

Oh, I forgot to mention one thing:

[19]

For rich documentation of parameters, return values, etc, we've been using the epydoc format:

http://epydoc.sourceforge.net/manual-fields.html

lp:~hazmat/txzookeeper/distributed-queue updated on 2010-06-11

43. By Kapil Thangavelu on 2010-06-11: doc string cleanup
44. By Kapil Thangavelu on 2010-06-11: merge trunk
45. By Kapil Thangavelu on 2010-06-11: if lock acquisition fails, then all instance state associated to the attempt should be cleared, so subsequent acquire attempts can proceed.
46. By Kapil Thangavelu on 2010-06-11: serialized lock impl aggregating a reliable queue and distributed lock.
47. By Kapil Thangavelu on 2010-06-11: make queue prefix public, utilize reliable queue tests for serialized queue.
48. By Kapil Thangavelu on 2010-06-11: extra epydoc docstrings

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2010-06-11:

#

On Fri, 11 Jun 2010 10:57:30 -0400, Gustavo Niemeyer
<email address hidden> wrote:

> Review: Needs Fixing
> Thanks Kapil. Queue looks great. Some items about the new stuff:
>
> [15]
>
> + and processed in the order they where placed in the queue. (TODO) An
> + implementation with less contention between consumers might instead
> utilize
> + a reliable queue with a lock.
>
> Even without the lock, a better implementation would wait for the
> removal of the specific -processing file which is holding further action
> back, rather than any change in the queue. E.g. if the item itself is
> removed, but not the -processing control file, it will fire off, and
> fail again. Also (and most importantly), any new items added to the
> queue will also fire the watch and cause the consumer to refetch the
> full child list only to find out that it's still being processed by
> someone else, and it can do nothing about the new item appended at the
> back of the queue.

That would definitely be a better implementation than what was there
previously wrt to contention. However it would effectively be
reimplementing parts of the lock logic (ie. things like check exists
return value on the processing node is equivalent to checking exists on
the previous lock candidate node). I went ahead and reimplemented the
serialized queue using a subclass of reliable queue that utilizes an
exclusive lock to serialize access.

>
> [16]
>
> +class SerializedQueueTests(QueueTests):
> +
> + queue_factory = SerializedQueue
>
> Don't we need something here? :-)

yeah.. i'm not sure what offhand. The default implementation of all the
queues are ordered. Perhaps a multiple clients, one which gets an item and
then sleeps a close.

>
> [17]
>
> Queue should probably use QueueItems too (without delete). That said,
> since it's just a reference implementation and we'll likely need the
> reliable version, feel free to ignore this.
>

cool, i'm just going to leave it as is. I tried to be clear in the doc
string that its mostly meant as example implementation of the common
zookeeper recipe.

> [18]
>
> It feels like there's potential for more reuse in the _get_item
> implementation, but I don't think we should wait this for merging. It's
> just a good refactoring for a boring moment.
>

definitely, the closures are convient, but make things a bit more
difficult for clean reuse. i took a stab at this yesterday, but shelved it
for now. Effectively we stuff the extra args for callbacks and errbacks
instead of relying on the closure.

On Fri, 11 Jun 2010 10:57:30 -0400, Gustavo Niemeyer  
<gustavo@niemeyer.net> wrote:

> Review: Needs Fixing
> Thanks Kapil.  Queue looks great.  Some items about the new stuff:
>
> [15]
>
> +    and processed in the order they where placed in the queue. (TODO) An
> +    implementation with less contention between consumers might instead  
> utilize
> +    a reliable queue with a lock.
>
> Even without the lock, a better implementation would wait for the  
> removal of the specific -processing file which is holding further action  
> back, rather than any change in the queue.  E.g. if the item itself is  
> removed, but not the -processing control file, it will fire off, and  
> fail again.  Also (and most importantly), any new items added to the  
> queue will also fire the watch and cause the consumer to refetch the  
> full child list only to find out that it's still being processed by  
> someone else, and it can do nothing about the new item appended at the  
> back of the queue.

That would definitely be a better implementation than what was there  
previously wrt to contention. However it would effectively be  
reimplementing parts of the lock logic (ie. things like check exists  
return value on the processing node is equivalent to checking exists on  
the previous lock candidate node). I went ahead and reimplemented the  
serialized queue using a subclass of reliable queue that utilizes an  
exclusive lock to serialize access.

>
> [16]
>
> +class SerializedQueueTests(QueueTests):
> +
> +    queue_factory = SerializedQueue
>
> Don't we need something here? :-)

yeah.. i'm not sure what offhand. The default implementation of all the  
queues are ordered. Perhaps a multiple clients, one which gets an item and  
then sleeps a close.

>
> [17]
>
> Queue should probably use QueueItems too (without delete).  That said,  
> since it's just a reference implementation and we'll likely need the  
> reliable version, feel free to ignore this.
>

cool, i'm just going to leave it as is. I tried to be clear in the doc  
string that its mostly meant as example implementation of the common  
zookeeper recipe.

> [18]
>
> It feels like there's potential for more reuse in the _get_item  
> implementation, but I don't think we should wait this for merging.  It's  
> just a good refactoring for a boring moment.
>

definitely, the closures are convient, but make things a bit more  
difficult for clean reuse. i took a stab at this yesterday, but shelved it  
for now. Effectively we stuff the extra args for callbacks and errbacks  
instead of relying on the closure.

lp:~hazmat/txzookeeper/distributed-queue updated on 2010-06-11

49. By Kapil Thangavelu on 2010-06-11: module docstring
50. By Kapil Thangavelu on 2010-06-11: serialized queue behavior test

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2010-06-15:

#

The design of SerializedQueue looks pretty nice. Thank you!

A few additional (last?) comments below:

[20]

In SerializedQueue:

if name.endswith(suffix):
children[:] = []

Putting some debugging information, I see that this logic is actually used, but I'm a bit confused regarding when it's necessary. The tests which reach this logic also make me a bit worried: test_staged_multiproducer_multiconsumer, which theoretically should be a case which should never see items being processed due to the lock.

[21]

Something that occurred to me while I was reading the code is that the line:

request.processing_children = False

May still not get executed on certain exists from the logic in the item processing. Should we guarantee that it will necessarily be reset on any exit, so that we never get an unusable object (with an inconsistent state)?

With these details addressed, +1!

review: Approve

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2010-07-08:

#

> The design of SerializedQueue looks pretty nice. Thank you!
>
> A few additional (last?) comments below:
>
> [20]
>
> In SerializedQueue:
>
> if name.endswith(suffix):
> children[:] = []
>
> Putting some debugging information, I see that this logic is actually used,
> but I'm a bit confused regarding when it's necessary. The tests which reach
> this logic also make me a bit worried:
> test_staged_multiproducer_multiconsumer, which theoretically should be a case
> which should never see items being processed due to the lock.

the lock is only held when fetching an item. ie. the lock guards access to the queue items. However the consumer might still be processing the item, and until it does the processing node isn't released. it might be a better semantic (less contention) for serialization if the lock was held for the duration of the item processing. i'll make that change.

>
> [21]
>
> Something that occurred to me while I was reading the code is that the line:
>
> request.processing_children = False
>
> May still not get executed on certain exists from the logic in the item
> processing. Should we guarantee that it will necessarily be reset on any
> exit, so that we never get an unusable object (with an inconsistent state)?
>

is there a scenario where this would be the case? afaics all the failure mode exists are covered by error handlers that will set this flag to false.

lp:~hazmat/txzookeeper/distributed-queue updated on 2010-07-08

51. By Kapil Thangavelu on 2010-07-08: serializable queue consumer holds lock till item is processed.
52. By Kapil Thangavelu on 2010-07-08: fix a race condition in reliable queue, if two consumers had fetch an item, and one of the consumers finished processing it before the other created a reservation, the slow consumer would return an item that was deleted. Now the the cycle switches from get->reserve-return to exists->reserve->get->return.

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2010-07-08:

#

On Thu, 08 Jul 2010 12:27:42 -0400, Kapil Thangavelu
<email address hidden> wrote:

>> The design of SerializedQueue looks pretty nice. Thank you!
>>
>> A few additional (last?) comments below:
>>
>> [20]
>>
>> In SerializedQueue:
>>
>> if name.endswith(suffix):
>> children[:] = []
>>
>> Putting some debugging information, I see that this logic is actually
>> used,
>> but I'm a bit confused regarding when it's necessary. The tests which
>> reach
>> this logic also make me a bit worried:
>> test_staged_multiproducer_multiconsumer, which theoretically should be
>> a case
>> which should never see items being processed due to the lock.
>
> the lock is only held when fetching an item. ie. the lock guards access
> to the queue items. However the consumer might still be processing the
> item, and until it does the processing node isn't released. it might be
> a better semantic (less contention) for serialization if the lock was
> held for the duration of the item processing. i'll make that change.
>

This is implemented now, the serialized queue works much smoother now with
less contention.
I've removed the children[:] filtering logic that was in use previously.

>
>> [21]
>>
>> Something that occurred to me while I was reading the code is that the
>> line:
>>
>> request.processing_children = False
>>
>> May still not get executed on certain exists from the logic in the item
>> processing. Should we guarantee that it will necessarily be reset on
>> any
>> exit, so that we never get an unusable object (with an inconsistent
>> state)?
>>
>
> is there a scenario where this would be the case? afaics all the failure
> mode exists are covered by error handlers that will set this flag to
> false.
>
after our discussion on irc, i took a look at putting the
request.processing_children into an errback/callback handler, but its not
clear that the same semantic would be achieved, Because one of the
scenarios is that a consumer is waiting on a producer to put items in the
queue, request.processing_children should be false here, but the no
callbacks would have been fired. ie.. a consumer attempts to fetch an item
from a queue, there are some items in the queue, and the consumer is
processing children, but is competing with other consumers, if the queue
empties before an item is fetched, the consumer should wait on the watch
with processing_children = False with a callback/errback on the get the
flag wouldn't be set appropriately.

cheers,

Kapil

On Thu, 08 Jul 2010 12:27:42 -0400, Kapil Thangavelu  
<kapil.thangavelu@canonical.com> wrote:

>> The design of SerializedQueue looks pretty nice.  Thank you!
>>
>> A few additional (last?) comments below:
>>
>> [20]
>>
>> In SerializedQueue:
>>
>>             if name.endswith(suffix):
>>                 children[:] = []
>>
>> Putting some debugging information, I see that this logic is actually  
>> used,
>> but I'm a bit confused regarding when it's necessary.  The tests which  
>> reach
>> this logic also make me a bit worried:
>> test_staged_multiproducer_multiconsumer, which theoretically should be  
>> a case
>> which should never see items being processed due to the lock.
>
> the lock is only held when fetching an item. ie. the lock guards access  
> to the queue items. However the consumer might still be processing the  
> item, and until it does the processing node isn't released. it might be  
> a better semantic (less contention) for serialization if the lock was  
> held for the duration of the item processing. i'll make that change.
>

This is implemented now, the serialized queue works much smoother now with  
less contention.
I've removed the children[:] filtering logic that was in use previously.

>
>> [21]
>>
>> Something that occurred to me while I was reading the code is that the  
>> line:
>>
>>             request.processing_children = False
>>
>> May still not get executed on certain exists from the logic in the item
>> processing.  Should we guarantee that it will necessarily be reset on  
>> any
>> exit, so that we never get an unusable object (with an inconsistent  
>> state)?
>>
>
> is there a scenario where this would be the case? afaics all the failure  
> mode exists are covered by error handlers that will set this flag to  
> false.
>
after our discussion on irc, i took a look at putting the  
request.processing_children into an errback/callback handler, but its not  
clear that the same semantic would be achieved, Because one of the  
scenarios is that a consumer is waiting on a producer to put items in the  
queue, request.processing_children should be false here, but the no  
callbacks would have been fired. ie.. a consumer attempts to fetch an item  
 from a queue, there are some items in the queue, and the consumer is  
processing children, but is competing with other consumers, if the queue  
empties before an item is fetched, the consumer should wait on the watch  
with processing_children = False with a callback/errback on the get the  
flag wouldn't be set appropriately.

cheers,

Kapil

lp:~hazmat/txzookeeper/distributed-queue updated on 2010-07-09

53. By Kapil Thangavelu on 2010-07-09: propgate unexpected errors when retrieving a node
54. By Kapil Thangavelu on 2010-07-09: refactor serialized queue to avoid extraneous existance checks and processing nodes.

txzookeeper

Merge lp:~hazmat/txzookeeper/distributed-queue into lp:txzookeeper

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'txzookeeper/lock.py'
 --- txzookeeper/lock.py	2010-06-10 15:44:40 +0000
 +++ txzookeeper/lock.py	2010-07-09 19:52:57 +0000
@@ -51,17 +51,22 @@
              "/".join((self.path, self.prefix)),
              flags=zookeeper.EPHEMERAL|zookeeper.SEQUENCE)
--        def on_candidate_create(path):
--            self._candidate_path = path
--            return self._acquire()
--
--        d.addCallback(on_candidate_create)
--
++        d.addCallback(self._on_candidate_create)
++        d.addErrback(self._on_no_queue_error)
          return d
--    def _acquire(self):
++    def _on_candidate_create(self, path):
++        self._candidate_path = path
++        return self._acquire()
++
++    def _on_no_queue_error(self, failure):
++        self._candidate_path = None
++        return failure
++
++    def _acquire(self, *args):
          d = self._client.get_children(self.path)
          d.addCallback(self._check_candidate_nodes)
++        d.addErrback(self._on_no_queue_error)
          return d
      def _check_candidate_nodes(self, children):
 === added file 'txzookeeper/queue.py'
 --- txzookeeper/queue.py	1970-01-01 00:00:00 +0000
 +++ txzookeeper/queue.py	2010-07-09 19:52:57 +0000
@@ -0,0 +1,451 @@
++"""
++Several distributed multiprocess queue implementations.
++
++The C{Queue} implementation follows closely the apache zookeeper recipe, it
++provides no guarantees beyond isolation and concurrency of retrieval of items.
++
++The C{ReliableQueue} implementation, provides isolation, and concurrency, as
++well guarantees that if a consumer dies before processing an item, that item is
++made available to another consumer.
++
++The C{SerializedQueue} implementation provides for strict in order processing
++of items within a queue.
++"""
++
++import zookeeper
++
++from twisted.internet.defer import Deferred, fail
++from twisted.python.failure import Failure
++from txzookeeper.lock import Lock
++from txzookeeper.client import ZOO_OPEN_ACL_UNSAFE
++
++
++class Queue(object):
++    """
++    Implementation is based off the apache zookeeper Queue recipe.
++
++    There are some things to keep in mind when using this queue implementation.
++    Its primarily to enforce isolation and concurrent access, however it does
++    not provide for reliable consumption.  An error condition in a queue
++    consumer must requeue the item, else its lost, as its removed from
++    zookeeper on retrieval in this implementation. This implementation more
++    closely mirrors the behavior and api of the pythonstandard library Queue,
++    or multiprocessing.Queue ableit with the caveat of only strings for queue
++    items.
++    """
++
++    prefix = "entry-"
++
++    def __init__(self, path, client, acl=None, persistent=False):
++        """
++        @param client: A connected C{ZookeeperClient} instance.
++        @param path: The path to the queue inthe zookeeper hierarchy.
++        @param acl: An acl to be used for queue items.
++        @param persistent: Boolean flag which denotes if items in the queue are
++        persistent.
++        """
++        self._path = path
++        self._client = client
++        self._persistent = persistent
++        if acl is None:
++            acl = [ZOO_OPEN_ACL_UNSAFE]
++        self._acl = acl
++
++    @property
++    def path(self):
++        """Path to the queue."""
++        return self._path
++
++    @property
++    def persistent(self):
++        """If the queue is persistent returns True."""
++        return self._persistent
++
++    def get(self):
++        """
++        Get and remove an item from the queue. If no item is available
++        at the moment, a deferred is return that will fire when an item
++        is available.
++        """
++
++        def on_queue_items_changed(*args):
++            """Event watcher on queue node child events."""
++            if request.complete or not self._client.connected:
++                return # pragma: no cover
++
++            if request.processing_children:
++                # If deferred stack is currently processing a set of children
++                # defer refetching the children till its done.
++                request.refetch_children = True
++            else:
++                # Else the item get request is just waiting for a watch,
++                # restart the get.
++                self._get(request)
++
++        request = GetRequest(Deferred(), on_queue_items_changed)
++        self._get(request)
++        return request.deferred
++
++    def put(self, item):
++        """
++        Put an item into the queue.
++
++        @param item: String data to be put on the queue.
++        """
++        if not isinstance(item, str):
++            return fail(ValueError("queue items must be strings"))
++
++        flags = zookeeper.SEQUENCE
++        if not self._persistent:
++            flags = flags|zookeeper.EPHEMERAL
++
++        d = self._client.create(
++            "/".join((self._path, self.prefix)), item, self._acl, flags)
++        return d
++
++    def qsize(self):
++        """
++        Return the approximate size of the queue. This value is always
++        effectively a snapshot. Returns a deferred returning an integer.
++        """
++        d = self._client.exists(self._path)
++
++        def on_success(stat):
++            return stat["numChildren"]
++
++        d.addCallback(on_success)
++        return d
++
++    def _get(self, request):
++        request.processing_children = True
++        d = self._client.get_children(self._path, request.child_watcher)
++        d.addCallback(self._get_item, request)
++        return d
++
++    def _get_item(self, children, request):
++
++        def fetch_node(name):
++            path = "/".join((self._path, name))
++            d = self._client.get(path)
++            d.addCallback(on_get_node_success)
++            d.addErrback(on_no_node)
++            return d
++
++        def on_get_node_success((data, stat)):
++            d = self._client.delete("/".join((self._path, name)))
++            d.addCallback(on_delete_node_success, data)
++            d.addErrback(on_no_node)
++            return d
++
++        def on_delete_node_success(result_code, data):
++            request.processing_children = False
++            request.callback(data)
++
++        def on_no_node(failure=None):
++            if failure and not failure.check(zookeeper.NoNodeException):
++                request.errback(failure)
++                return
++            if children:
++                name = children.pop(0)
++                return fetch_node(name)
++
++            # Refetching deferred until we process all the children from
++            # from a get children call.
++            request.processing_children = False
++            if request.refetch_children:
++                request.refetch_children = False
++                return self._get(request)
++
++        if not children:
++            return on_no_node()
++
++        children.sort()
++        name = children.pop(0)
++        return fetch_node(name)
++
++
++class GetRequest(object):
++    """
++    An encapsulation of a consumer request to fetch an item from the queue.
++
++    @refetch_children - boolean field, when true signals that children should
++    be refetched after processing the current set of children.
++
++    @child_watcher -The queue child/item watcher.
++
++    @processing_children - Boolean flag, set to true when the last known
++    children of the queue are being processed. If a watch fires while the
++    children are being processed it sets the refetch_children flag to true
++    instead of getting the children immediately.
++
++    @deferred - The deferred representing retrieving an item from the queue.
++    """
++
++    def __init__(self, deferred, watcher):
++        self.deferred = deferred
++        self.child_watcher = watcher
++        self.processing_children = False
++        self.refetch_children = False
++
++    @property
++    def complete(self):
++        return self.deferred.called
++
++    def callback(self, data):
++        self.deferred.callback(data)
++
++    def errback(self, error):
++        self.deferred.errback(error)
++
++
++class QueueItem(object):
++    """
++    An encapsulation of a work item put into a queue. The work item data is
++    accessible via the data attribute. When the item has been processed by
++    the consumer, the delete method can be invoked to remove the item
++    permanently from the queue.
++
++    An optional processed callback maybe passed to the constructor that will
++    be invoked after the node has been processed.
++    """
++
++    def __init__(self, path, data, client, processed_callback=None):
++        self._path = path
++        self._data = data
++        self._client = client
++        self._processed_callback = processed_callback
++
++    @property
++    def data(self):
++        return self._data
++
++    @property
++    def path(self):
++        return self._path
++
++    def delete(self):
++        """
++        Delete the item node and the item processing node in the queue.
++        Typically invoked by a queue consumer, to signal succesful processing
++        of the queue item.
++        """
++        d = self._client.delete(self.path)
++
++        if self._processed_callback:
++            d.addCallback(self._processed_callback, self.path)
++        return d
++
++
++class ReliableQueue(Queue):
++    """
++    A distributed queue. It varies from a C{Queue} in that it ensures any
++    item consumed from the queue is explicitly ack'd by the consumer.
++    If the consumer dies after retrieving an item before ack'ing the item.
++    The item will be made available to another consumer. To encapsulate the
++    acking behavior the queue item data is returned in a C{QueueItem} instance,
++    with a delete method that will remove it from the queue after processing.
++
++    Reliable queues may be persistent or transient. If the queue is durable,
++    than any item added to the queue must be processed in order to be removed.
++    If the queue is transient, then any jobs placed in the queue by a client
++    are removed when the client is closed, regardless of whether the job
++    has been processed or not.
++    """
++
++    def _item_processed_callback(self, result_code, item_path):
++        return self._client.delete(item_path+"-processing")
++
++    def _filter_children(self, children, suffix="-processing"):
++        """
++        Filter any children currently being processed, modified in place.
++        """
++        children.sort()
++        for name in list(children):
++            # remove any processing nodes and their associated queue item.
++            if name.endswith(suffix):
++                children.remove(name)
++                item_name = name[:-len(suffix)]
++                if item_name in children:
++                    children.remove(item_name)
++
++    def _get_item(self, children, request):
++
++        def check_node(name):
++            """Check the node still exists."""
++            path = "/".join((self._path, name))
++            d = self._client.exists(path)
++            d.addCallback(on_node_exists, path)
++            d.addErrback(on_reservation_failed)
++            return d
++
++        def on_node_exists(stat, path):
++            """Reserve the node for consumer processing."""
++            d = self._client.create(path+"-processing",
++                                    flags=zookeeper.EPHEMERAL)
++            d.addCallback(on_reservation_success, path)
++            d.addErrback(on_reservation_failed)
++            return d
++
++        def on_reservation_success(processing_path, path):
++            """Fetch the node data to return"""
++            d = self._client.get(path)
++            d.addCallback(on_get_node_success, path)
++            d.addErrback(on_get_node_failed, path)
++            return d
++
++        def on_get_node_failed(failure, path):
++            """If we can't fetch the node, delete the processing node."""
++            d = self._client.delete(path+"-processing")
++
++            # propogate unexpected errors appropriately
++            if not failure.check(zookeeper.NoNodeException):
++                d.addCallback(lambda x: request.errback(failure))
++            else:
++                d.addCallback(on_reservation_failed)
++            return d
++
++        def on_get_node_success((data, stat), path):
++            """If we got the node, we're done."""
++            request.processing_children = False
++            request.callback(
++                QueueItem(
++                    path, data, self._client, self._item_processed_callback))
++
++        def on_reservation_failed(failure=None):
++            """If we can't get the node or reserve, continue processing
++            the children."""
++            if failure and not failure.check(
++                zookeeper.NodeExistsException, zookeeper.NoNodeException):
++                request.processing_children = True
++                request.errback(failure)
++                return
++
++            if children:
++                name = children.pop(0)
++                return check_node(name)
++
++            # If a watch fired while processing children, process it
++            # after the children list is exhausted.
++            request.processing_children = False
++            if request.refetch_children:
++                request.refetch_children = False
++                return self._get(request)
++
++        self._filter_children(children)
++
++        if not children:
++            return on_reservation_failed()
++
++        name = children.pop(0)
++        return check_node(name)
++
++
++class SerializedQueue(Queue):
++    """
++    A serialized queue ensures even with multiple consumers items are retrieved
++    and processed in the order they where placed in the queue.
++
++    This implementation aggregates a reliable queue, with a lock to provide
++    for serialized consumer access. The lock is released only when a queue item
++    has been processed.
++    """
++
++    def __init__(self, path, client, acl=None, persistent=False):
++        super(SerializedQueue, self).__init__(path, client, acl, persistent)
++        self._lock = Lock("%s/%s"%(self.path, "_lock"), client)
++
++    def _item_processed_callback(self, result_code, item_path):
++        return self._lock.release()
++
++    def _filter_children(self, children, suffix="-processing"):
++        """
++        Filter the lock from consideration as an item to be processed.
++        """
++        children.sort()
++        for name in list(children):
++            if name.startswith('_'):
++                children.remove(name)
++
++    def _on_lock_directory_does_not_exist(self, failure):
++        """
++        If the lock directory does not exist, go ahead and create it and
++        attempt to acquire the lock.
++        """
++        failure.trap(zookeeper.NoNodeException)
++        d = self._client.create(self._lock.path)
++        d.addBoth(self._on_lock_created_or_exists)
++        return d
++
++    def _on_lock_created_or_exists(self, failure):
++        """
++        The lock node creation will either result in success or node exists
++        error, if a concurrent client created the node first. In either case
++        we proceed with attempting to acquire the lock.
++        """
++        if isinstance(failure, Failure):
++            failure.trap(zookeeper.NodeExistsException)
++        d = self._lock.acquire()
++        return d
++
++    def _on_lock_acquired(self, lock):
++        """
++        After the exclusive queue lock is acquired, we proceed with an attempt
++        to fetch an item from the queue.
++        """
++        d = super(SerializedQueue, self).get()
++        return d
++
++    def get(self):
++        """
++        Get and remove an item from the queue. If no item is available
++        at the moment, a deferred is return that will fire when an item
++        is available.
++        """
++        d = self._lock.acquire()
++
++        d.addErrback(self._on_lock_directory_does_not_exist)
++        d.addCallback(self._on_lock_acquired)
++        return d
++
++    def _get_item(self, children, request):
++
++        def fetch_node(name):
++            path = "/".join((self._path, name))
++            d = self._client.get(path)
++            d.addCallback(on_node_retrieved, path)
++            d.addErrback(on_reservation_failed)
++            return d
++
++        def on_node_retrieved((data, stat), path):
++            request.processing_children = False
++            request.callback(
++                QueueItem(
++                    path, data, self._client, self._item_processed_callback))
++
++        def on_reservation_failed(failure=None):
++            """If we can't get the node or reserve, continue processing
++            the children."""
++            if failure and not failure.check(
++                zookeeper.NodeExistsException, zookeeper.NoNodeException):
++                request.processing_children = True
++                request.errback(failure)
++                return
++
++            if children:
++                name = children.pop(0)
++                return fetch_node(name)
++
++            # If a watch fired while processing children, process it
++            # after the children list is exhausted.
++            request.processing_children = False
++            if request.refetch_children:
++                request.refetch_children = False
++                return self._get(request)
++
++        self._filter_children(children)
++
++        if not children:
++            return on_reservation_failed()
++
++        name = children.pop(0)
++        return fetch_node(name)
 === modified file 'txzookeeper/tests/__init__.py'
 --- txzookeeper/tests/__init__.py	2010-06-10 15:46:02 +0000
 +++ txzookeeper/tests/__init__.py	2010-07-09 19:52:57 +0000
@@ -4,10 +4,6 @@
  from twisted.trial.unittest import TestCase
  from mocker import MockerTestCase
--#from txzookeeper.client import Wrapper
--
--#zookeeper = Wrapper(zookeeper)
--
  class ZookeeperTestCase(TestCase, MockerTestCase):
      def setUp(self):
 === modified file 'txzookeeper/tests/test_client.py'
 --- txzookeeper/tests/test_client.py	2010-06-01 14:26:45 +0000
 +++ txzookeeper/tests/test_client.py	2010-07-09 19:52:57 +0000
@@ -342,6 +342,33 @@
          d.addCallback(verify_exists)
          return d
++    def test_exists_with_watcher_and_close(self):
++        """
++        Closing a connection with an watch outstanding behaves correctly.
++        """
++        d = self.client.connect()
++        zookeeper.set_debug_level(zookeeper.LOG_LEVEL_DEBUG)
++
++        def node_watcher(event_type, state, path):
++            client = getattr(self, "client", None)
++            if client is not None and client.connected:
++                self.fail("Client should be disconnected")
++
++        def create_node(client):
++            return client.create("/syracuse")
++
++        def check_exists(path):
++            # shouldn't fire till unit test cleanup
++            return self.client.exists(path, node_watcher)
++
++        def verify_exists(result):
++            self.assertTrue(result)
++
++        d.addCallback(create_node)
++        d.addCallback(check_exists)
++        d.addCallback(verify_exists)
++        return d
++
      def test_exists_with_nonexistant_watcher(self):
          """
          The exists method can also be used to set an optional watcher on a
 === modified file 'txzookeeper/tests/test_lock.py'
 --- txzookeeper/tests/test_lock.py	2010-06-10 15:44:40 +0000
 +++ txzookeeper/tests/test_lock.py	2010-07-09 19:52:57 +0000
@@ -1,4 +1,5 @@
++from zookeeper import NoNodeException
  from mocker import ANY
  from twisted.internet.defer import (
      inlineCallbacks, returnValue, Deferred, succeed)
@@ -81,6 +82,22 @@
          yield self.failUnlessFailure(lock.acquire(), LockError)
      @inlineCallbacks
++    def test_acquire_after_error(self):
++        """
++        Any instance state associated with a failed acquired should be cleared
++        on error, allowing subsequent to succeed.
++        """
++        client = yield self.open_client()
++        path = "/lock-test-acquire-after-error"
++        lock = Lock(path, client)
++        d = lock.acquire()
++        self.failUnlessFailure(d, NoNodeException)
++        yield d
++        yield client.create(path)
++        yield lock.acquire()
++        self.assertEqual(lock.acquired, True)
++
++    @inlineCallbacks
      def test_error_on_acquire_acquiring(self):
          """
          Attempting to acquire the lock while an attempt is already in progress,
 === added file 'txzookeeper/tests/test_queue.py'
 --- txzookeeper/tests/test_queue.py	1970-01-01 00:00:00 +0000
 +++ txzookeeper/tests/test_queue.py	2010-07-09 19:52:57 +0000
@@ -0,0 +1,392 @@
++
++from zookeeper import NoNodeException
++from twisted.internet.defer import (
++    inlineCallbacks, returnValue, DeferredList, Deferred, succeed, fail)
++
++from txzookeeper import ZookeeperClient
++from txzookeeper.client import NotConnectedException
++from txzookeeper.queue import Queue, ReliableQueue, SerializedQueue, QueueItem
++from txzookeeper.tests import ZookeeperTestCase, utils
++
++from mocker import ANY
++
++
++class QueueTests(ZookeeperTestCase):
++
++    queue_factory = Queue
++
++    def setUp(self):
++        super(QueueTests, self).setUp()
++        self.clients = []
++
++    def tearDown(self):
++        cleanup = False
++
++        for client in self.clients:
++            if not cleanup and client.connected:
++                utils.deleteTree(handle=client.handle)
++                cleanup = True
++            if client.connected:
++                client.close()
++        super(QueueTests, self).tearDown()
++
++    def compare_data(self, data, item):
++        if isinstance(item, QueueItem):
++            self.assertEqual(data, item.data)
++        else:
++            self.assertEqual(data, item)
++
++    def consume_item(self, item):
++        if isinstance(item, QueueItem):
++            return item.delete(), item.data
++        return None, item
++
++    @inlineCallbacks
++    def open_client(self, credentials=None):
++        """
++        Open a zookeeper client, optionally authenticating with the
++        credentials if given.
++        """
++        client = ZookeeperClient("127.0.0.1:2181")
++        self.clients.append(client)
++        yield client.connect()
++        if credentials:
++            d = client.add_auth("digest", credentials)
++            # hack to keep auth fast
++            yield client.exists("/")
++            yield d
++        returnValue(client)
++
++    def test_path_property(self):
++        """
++        The queue has a property that can be used to introspect its
++        path in read only manner.
++        """
++        q = self.queue_factory("/moon", None)
++        self.assertEqual(q.path, "/moon")
++
++    def test_persistent_property(self):
++        """
++        The queue has a property that can be used to introspect
++        whether or not the queue entries are persistent.
++        """
++        q = self.queue_factory("/moon", None, persistent=True)
++        self.assertEqual(q.persistent, True)
++
++    @inlineCallbacks
++    def test_put_item(self):
++        """
++        An item can be put on the queue, and is stored in a node in
++        queue's directory.
++        """
++        client = yield self.open_client()
++        path = yield client.create("/queue-test")
++        queue = self.queue_factory(path, client)
++        item = "transform image bluemarble.jpg"
++        yield queue.put(item)
++        children = yield client.get_children(path)
++        self.assertEqual(len(children), 1)
++        data, stat = yield client.get("/".join((path, children[0])))
++        self.compare_data(data, item)
++
++    @inlineCallbacks
++    def test_qsize(self):
++        """
++        The client implements a method which returns an unreliable
++        approximation of the number of items in the queue (mirrors api
++        of Queue.Queue), its unreliable only in that the value represents
++        a temporal snapshot of the value at the time it was requested,
++        not its current value.
++        """
++        client = yield self.open_client()
++        path = yield client.create("/test-qsize")
++        queue = self.queue_factory(path, client)
++
++        yield queue.put("abc")
++        size = yield queue.qsize()
++        self.assertTrue(size, 1)
++
++        yield queue.put("bcd")
++        size = yield queue.qsize()
++        self.assertTrue(size, 2)
++
++        yield queue.get()
++        size = yield queue.qsize()
++        self.assertTrue(size, 1)
++
++    @inlineCallbacks
++    def test_invalid_put_item(self):
++        """
++        The queue only accepts string items.
++        """
++        client = yield self.open_client()
++        queue = self.queue_factory("/unused", client)
++        self.failUnlessFailure(queue.put(123), ValueError)
++
++    @inlineCallbacks
++    def test_get_with_invalid_queue(self):
++        """
++        If the queue hasn't been created an unknown node exception is raised
++        on get.
++        """
++        client = yield self.open_client()
++        queue = self.queue_factory("/unused", client)
++        yield self.failUnlessFailure(queue.put("abc"), NoNodeException)
++
++    @inlineCallbacks
++    def test_put_with_invalid_queue(self):
++        """
++        If the queue hasn't been created an unknown node exception is raised
++        on put.
++        """
++        client = yield self.open_client()
++        queue = self.queue_factory("/unused", client)
++        yield self.failUnlessFailure(queue.put("abc"), NoNodeException)
++
++    @inlineCallbacks
++    def test_unexpected_error_during_item_retrieval(self):
++        """
++        If an unexpected error occurs when reserving an item, the error is
++        passed up to the get deferred's errback method.
++        """
++        test_client = yield self.open_client()
++        path = yield test_client.create("/reliable-queue-test")
++
++        # setup the test scenario
++        mock_client = self.mocker.patch(test_client)
++        mock_client.get_children(path, ANY)
++        self.mocker.result(succeed(["entry-000000"]))
++
++        item_path = "%s/%s"%(path, "entry-000000")
++        mock_client.get(item_path)
++        self.mocker.result(fail(SyntaxError("x")))
++        self.mocker.replay()
++
++        # odd behavior, this should return a failure, as above, but it returns
++        # None
++        d = self.queue_factory(path, mock_client).get()
++        assert d
++        self.failUnlessFailure(d, SyntaxError)
++        yield d
++
++    @inlineCallbacks
++    def test_get_and_put(self):
++        """
++        Get can also be used on empty queues and returns a deferred that fires
++        whenever an item is has been retrieved from the queue.
++        """
++        client = yield self.open_client()
++        path = yield client.create("/queue-wait-test")
++        data = "zebra moon"
++        queue = self.queue_factory(path, client)
++        d = queue.get()
++
++        @inlineCallbacks
++        def push_item():
++            queue = self.queue_factory(path, client)
++            yield queue.put(data)
++
++        from twisted.internet import reactor
++        reactor.callLater(0.1, push_item)
++
++        item = yield d
++        self.compare_data(data, item)
++
++    @inlineCallbacks
++    def test_interleaved_multiple_consumers_wait(self):
++        """
++        Multiple consumers and a producer adding and removing items on the
++        the queue concurrently.
++        """
++        test_client = yield self.open_client()
++        path = yield test_client.create("/multi-consumer-wait-test")
++        results = []
++
++        @inlineCallbacks
++        def producer(item_count):
++            from twisted.internet import reactor
++            client = yield self.open_client()
++            queue = self.queue_factory(path, client)
++
++            items = []
++            producer_done = Deferred()
++
++            def iteration(i):
++                if len(items) == (item_count-1):
++                    return producer_done.callback(None)
++                items.append(i)
++                queue.put(str(i))
++
++            for i in range(item_count):
++                reactor.callLater(i*0.05, iteration, i)
++            yield producer_done
++            returnValue(items)
++
++        @inlineCallbacks
++        def consumer(item_count):
++            client = yield self.open_client()
++            queue = self.queue_factory(path, client)
++            for i in range(item_count):
++                try:
++                    data = yield queue.get()
++                    d, data = self.consume_item(data)
++                    if d:
++                        yield d
++                except NotConnectedException:
++                    # when the test closes, we need to catch this
++                    # as one of the producers will likely hang.
++                    returnValue(len(results))
++                results.append((client.handle, data))
++
++            returnValue(len(results))
++
++        yield DeferredList(
++            [DeferredList([consumer(3), consumer(2)], fireOnOneCallback=1),
++             producer(6)])
++        # as soon as the producer and either consumer is complete than the test
++        # is done. Thus the only assertion we can make is the result is the
++        # size of at least the smallest consumer.
++        self.assertTrue(len(results) >= 2)
++
++    @inlineCallbacks
++    def test_staged_multiproducer_multiconsumer(self):
++        """
++        A real world scenario test, A set of producers filling a queue with
++        items, and then a set of concurrent consumers pulling from the queue
++        till its empty. The consumers use a non blocking get (defer raises
++        exception on empty).
++        """
++        test_client = yield self.open_client()
++        path = yield test_client.create("/multi-prod-cons")
++
++        consume_results = []
++        produce_results = []
++
++        @inlineCallbacks
++        def producer(start, offset):
++            client = yield self.open_client()
++            q = self.queue_factory(path, client)
++            for i in range(start, start+offset):
++                yield q.put(str(i))
++                produce_results.append(str(i))
++
++        @inlineCallbacks
++        def consumer(max):
++            client = yield self.open_client()
++            q = self.queue_factory(path, client)
++            attempts = range(max)
++            for el in attempts:
++                value = yield q.get()
++                d, value = self.consume_item(value)
++                if d:
++                    yield d
++                consume_results.append(value)
++            returnValue(True)
++
++        # two producers 20 items total
++        yield DeferredList(
++            [producer(0, 10), producer(10, 10)])
++
++        children = yield test_client.get_children(path)
++        self.assertEqual(len(children), 20)
++
++        yield DeferredList(
++            [consumer(8), consumer(8), consumer(4)])
++
++        err = set(produce_results)-set(consume_results)
++        self.assertFalse(err)
++
++        self.assertEqual(len(consume_results), len(produce_results))
++
++
++class ReliableQueueTests(QueueTests):
++
++    queue_factory = ReliableQueue
++
++    @inlineCallbacks
++    def test_unprocessed_item_reappears(self):
++        """
++        If a queue consumer exits before processing an item, then
++        the item will become visible to other queue consumers.
++        """
++        test_client = yield self.open_client()
++        path = yield test_client.create("/reliable-queue-test")
++
++        data = "rabbit stew"
++        queue = self.queue_factory(path, test_client)
++        yield queue.put(data)
++
++        test_client2 = yield self.open_client()
++        queue2 = self.queue_factory(path, test_client2)
++        item = yield queue2.get()
++        self.compare_data(data, item)
++
++        d = queue.get()
++        yield test_client2.close()
++
++        item = yield d
++        self.compare_data(data, item)
++
++    @inlineCallbacks
++    def test_processed_item_removed(self):
++        """
++        If a client processes an item, than that item is removed from the queue
++        permanently.
++        """
++        test_client = yield self.open_client()
++        path = yield test_client.create("/reliable-queue-test")
++
++        data = "rabbit stew"
++        queue = self.queue_factory(path, test_client)
++        yield queue.put(data)
++        item = yield queue.get()
++        self.compare_data(data, item)
++        yield item.delete()
++        yield test_client.close()
++
++        test_client2 = yield self.open_client()
++        children = yield test_client2.get_children(path)
++        children = [c for c in children if c.startswith(queue.prefix)]
++        self.assertFalse(bool(children))
++
++
++class SerializedQueueTests(ReliableQueueTests):
++
++    queue_factory = SerializedQueue
++
++    @inlineCallbacks
++    def test_serialized_behavior(self):
++        """
++        The serialized queue behavior is such that even with multiple
++        consumers, items are processed in order.
++        """
++        test_client = yield self.open_client()
++        path = yield test_client.create("/serialized-queue-test")
++
++        queue = self.queue_factory(path, test_client, persistent=True)
++
++        yield queue.put("a")
++        yield queue.put("b")
++
++        test_client2 = yield self.open_client()
++        queue2 = self.queue_factory(path, test_client2, persistent=True)
++
++        d = queue2.get()
++
++        def on_get_item_sleep_and_close(item):
++            """Close the connection after we have the item."""
++            from twisted.internet import reactor
++            reactor.callLater(0.1, test_client2.close)
++            return item
++
++        d.addCallback(on_get_item_sleep_and_close)
++
++        # fetch the item from queue2
++        item1 = yield d
++        # fetch the item from queue1, this will not get "b", because client2 is
++        # still processing "a". When client2 closes its connection, client1
++        # will get item "a"
++        item2 = yield queue.get()
++
++        self.compare_data("a", item2)
++        self.assertEqual(item1.data, item2.data)
 === modified file 'txzookeeper/todo.txt'
 --- txzookeeper/todo.txt	2010-05-08 12:14:56 +0000
 +++ txzookeeper/todo.txt	2010-07-09 19:52:57 +0000
@@ -2,10 +2,10 @@
  bugs to file upstream
   - you can set acl on a non existant node.
-- - memory leak every api invocation.
++ - memory leak every api invocation. [really? need some measurements here]
   observed while trying xtest_get_children_with_watcher
-- - async get children with watcher seems broken.
++ - async get children with watcher seems broken. [772 - fixed upstream]
   - segfault if close during completion.
   - getting a watch notification when closing a connection, segfaults.