Code review comment for lp:~hazmat/txzookeeper/distributed-queue

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Cool, nice improvements indeed!

I'm going to restart the review again, since it's a brand new implementation. I'll start the sequence on 12 just to avoid conflicts.

[12]

+ def __init__(self, deferred, watcher):
+ self.deferred = deferred
+ self.watcher = watcher
+ self.processing = False
+ self.refetch = False

I'm having a slightly hard time reading the code because of the generality of the names here. E.g. "refetch" what? "watcher" for what? "processing" what? Why is "processing" only turned on inside _get_item?

[13]

+ def on_queue_items_changed(*args):
+ """Event watcher on queue node child events."""
+ if request.complete or not self._client.connected:
+ return
+
+ if request.processing:
+ request.refetch = True
+ else:
+ self._get(request)

I think I might be missing some detail about the implementation, perhaps due to [12].

Let's say that things happen in the following order:

1) User calls get()
2) get() calls _get()
3) _get() hooks _get_item on the result, and the watcher above on changes
4) Result is returned, and _get_item() is queued for being called
5) watcher from (2) fires, and calls _get() again
6) Repeat (3)
7) Repeat (4)

In other words, it looks like _get_item might end up being queued multiple times even before it started running the first time.

[14]

I find a bit suspect that we're killing the item from the queue even before the user had a chance to know about its existence. This will make the queue somewhat prone to items being eaten by crashes.

[15]

+ # Refetching deferred until we process all the children from
+ # from a get children call.
+ if request.refetch:
+ request.processing = False
+ return self._get(request)

Again, going back to [12], I find the nomenclature a bit non-obvious to follow. Why is it not "processing" anymore on a refetch? Looks like there's a very tight relationship with the get children watcher in this logic, even though that's not being made explicit by names nor comments.

Also, I believe there's an issue here. Let's say things happen in this order:

1) children is [], refetch is False, request.processing is True
2) on_no_node_error is called on a NoNodeException
3) Due to the state, the exception is swallowed and nothing really changes
4) on_queue_items_changed is called
5) Given the state, refetch is made True
6) and... nothing else ever happens?

I think making names more explicit might help avoiding issues like these, since it'll be easier to reason about the logic there and know out of gut feeling when things aren't happening the way they should.

review: Needs Fixing

« Back to merge proposal