Hi Gavin,

This took me a while to review.  By and large it looks nice, but I have a few cosmetic issues to rant on.  Not all of them are vital, and some will seem more trouble than they're worth — unless they can become habits.  It may seem petty of me; but I could give a more effective review without these things in the way.


=== modified file 'lib/lp/bugs/doc/bugsubscription.txt'
--- lib/lp/bugs/doc/bugsubscription.txt	2011-12-24 17:49:30 +0000
+++ lib/lp/bugs/doc/bugsubscription.txt	2012-01-03 17:52:00 +0000
@@ -99,7 +99,7 @@
>  getAlsoNotifiedSubscribers() and getSubscribersFromDuplicates().
>  
>      >>> linux_source_bug.getAlsoNotifiedSubscribers()
> -    [<Person at ...>]
> +    (<Person at ...>,)
>      >>> linux_source_bug.getSubscribersFromDuplicates()
>      ()
>  
@@ -109,7 +109,7 @@
>      >>> from lp.bugs.model.bug import get_also_notified_subscribers
>      >>> res = get_also_notified_subscribers(linux_source_bug.bugtasks[0])
>      >>> res
> -    [<Person at ...>]
> +    (<Person at ...>,)
 
These two original tests were brittle and a bit obscure.  They seemed to establish that (1) a list is returned and (2) the list contains one or more Persons.  I suspect (1) is of no significance.

Your new tests establish that (1) a tuple is returned and (2) the tuple contains exactly one Person, as implied by the trailing comma.  So you test more, but it's still brittle and still a bit obscure.

It's not worth spending too much time on any given instance of these problems, but it's good to have fixes ready when you come across them: listify so that it won't matter what kind of iterable you get, or print just the length, or use a “[foo] = give_me_a_list_or_tuple()” assignment.  The big question of course is what the test means to prove and what is incidental.  Stay close to the former and away from the latter.


=== modified file 'lib/lp/bugs/model/bug.py'
--- lib/lp/bugs/model/bug.py	2011-12-30 06:14:56 +0000
+++ lib/lp/bugs/model/bug.py	2012-01-03 17:52:00 +0000

@@ -948,9 +949,10 @@
>              BugSubscription.bug_id == self.id).order_by(BugSubscription.id)
>          return DecoratedResultSet(results, operator.itemgetter(1))
>  
> -    def getSubscriptionInfo(self, level=BugNotificationLevel.LIFECYCLE):
> +    def getSubscriptionInfo(self, level=None):
>          """See `IBug`."""
> -        return BugSubscriptionInfo(self, level)
> +        return BugSubscriptionInfo(
> +            self, BugNotificationLevel.LIFECYCLE if level is None else level)

Found the "if/else" slightly hard to read here.  Maybe I'm just not used to the syntax yet, but it made me double-check that nothing else goes on in this line.  Please consider putting the if/else on a separate line.


@@ -1002,6 +1004,8 @@
>          # the regular proxied object.
 >         return sorted(
 >             indirect_subscribers,
> +            # XXX: GavinPanella 2011-12-12 bug=???: Should probably use
> +            # person_sort_key.
>              key=lambda x: removeSecurityProxy(x).displayname)

Don't forget to file that bug!


@@ -2389,48 +2392,44 @@

> -    # Direct subscriptions always take precedence over indirect
> -    # subscriptions.
> -    direct_subscribers = set(bug.getDirectSubscribers())
> -
> -    also_notified_subscribers = set()
> -
> -    for bugtask in bugtasks:
> -        if (bugtask.assignee and
> -            bugtask.assignee not in direct_subscribers):
> -            # We have an assignee that is not a direct subscriber.
> -            also_notified_subscribers.add(bugtask.assignee)
> -            if recipients is not None:
> +    # Subscribers to exclude.
> +    exclude_subscribers = frozenset().union(
> +        info.all_direct_subscribers, info.muted_subscribers)

That's much better.  Makes me wonder though why direct subscribers are excluded.


> +    # Get also notified subscribers at the given level for the given tasks.
> +    also_notified_subscribers = info.also_notified_subscribers

The comment confused me for a moment: the “also” looks like a badly placed part of the sentence.  Maybe hyphenate as “also-notified subscribers”?


>      # This structural subscribers code omits direct subscribers itself.
> -    also_notified_subscribers.update(
> -        get_structural_subscribers(
> -            bug_or_bugtask, recipients, level, direct_subscribers))
> +    # TODO: Pass the info object into get_structural_subscribers for
> +    # efficiency... or do the recipients stuff here.
> +    structural_subscribers = get_structural_subscribers(
> +        bug_or_bugtask, recipients, level, exclude_subscribers)
> +    assert also_notified_subscribers.issuperset(structural_subscribers)

Unfinished business?
 
Any idea how much efficiency the change would gain you?  If you know it's significant, I'd file a bug and make this a proper XXX.  If not, human intuition about performance being what it is, I'd leave out the comment but make a note to have a look during Q/A.

 
@@ -2573,13 +2573,49 @@
 
     def __init__(self, bug, level):
         self.bug = bug
+        self.bugtask = None  # Implies all.
         assert level is not None
         self.level = level
+        self.cache = {self.cache_key: self}

Wait — what is this cache?  What will it store?  How does it work?


+        # This is often used in event handlers, many of which block implicit
+        # flushes. However, the data needs to be in the database for the
+        # queries herein to give correct answers.
+        Store.of(bug).flush()

Ouch, ouch, ouch.  Well done figuring that out and even more so documenting it.


+    @property
+    def cache_key(self):
+        bugtask_id = None if self.bugtask is None else self.bugtask.id
+        return self.bug.id, bugtask_id, self.level

This could do with a docstring, especially since its usage is also undocumented.

Again I find the if/else hard to read.  Does “getattr(self.bugtask, 'id', None)” cause problems with Storm?  Since this is a generic problem, personally I'd move this out into a helper so I could write "get_id(self.bugtask)."  Keeps the distractions out of the logic.

> +    def forTask(self, bugtask):
> +        """Create a new `BugSubscriptionInfo` limited to `bugtask`.

Our usual naming scheme for methods is to start with a verb, although in this case (and with forLevel) I think it's clear enough that there's no problem in deviating from it.


> +    @property
> +    def all_direct_subscriptions(self):
> +        """The bug's direct subscriptions at all levels.
> +
> +        Excludes muted subscriptions.
> +        """
> +        return self.forLevel(
> +            BugNotificationLevel.LIFECYCLE).direct_subscriptions

This is a problem I often have with the word “all” in identifiers: it often comes with invalidating fineprint.

Which is only to be expected, when you think about how the word got into the identifier in the first place.  We'd be in right trouble if any collection needed the word “all” in the name to stop arbitrary items from falling out!  The “all” is usually there to say “there's some other item with a shorter name which actually had an additional restriction that we neglected to mention.”

Usually the proper solution is to go back to the other item and make the restriction explicit.  Sometimes “all” is an attractive quick fix, but it doesn't apply recursively.

So: how do direct_subscriptions and all_direct_subscriptions differ in purpose?  It's usually just a matter of finding the right term and sticking to it.  We often get to regret cases where we don't make the effort.


>      @cachedproperty
>      def duplicate_subscriptions_and_subscribers(self):
> -        """Subscriptions to duplicates of the bug."""
>          if self.bug.private:
>              return ((), ())
>          else:

That docstring was useful to me, assuming that it's correct: it tells me it's the bugs, not the subscriptions, that are duplicates.


>      @cachedproperty
>      def also_notified_subscribers(self):
> -        """All subscribers except direct and dupe subscribers."""
> +        """All subscribers except direct and dupe subscribers.
> +
> +        Excludes muted subscribers.
> +        """

So it's "all" subscribers, except direct ones, ones that are only for duplicates of this bug, and muted subscribers?  I guess that leaves the non-direct, non-muted, structured subscribers to this precise bug, or is there something else?
 

=== modified file 'lib/lp/bugs/model/structuralsubscription.py'
--- lib/lp/bugs/model/structuralsubscription.py	2011-12-30 06:14:56 +0000
+++ lib/lp/bugs/model/structuralsubscription.py	2012-01-03 17:52:00 +0000

@@ -588,54 +589,93 @@

> +def query_structural_subscriptions(
> +    what, bug, bugtasks, level, exclude=None):

We have a different formatting rule for function definitions than we do for invocations:

 def foo(param1, param2,
               param3, param4):

Note: no line break after the opening parenthesis, and continuation lines aligned with the starting column of param1.  Can't say I'm delighted with the choice, but at least it gives us something consistent.


> +    """Query into structural subscriptions for a given bug.
> +
> +    :param what: The fields to fetch. Choose from `Person`,
> +        `StructuralSubscription`, `BugSubscriptionFilter`, or a combo.
> +    :param bug: An `IBug`
> +    :param bugtasks: An iterable of `IBugTask`.
> +    :param level: A level from `BugNotificationLevel`.

Will this level be matched exactly, or is it a boundary on a set of levels?


> +    :param exclude: `Person`s to exclude (e.g. direct subscribers).
> +    """
>      filter_id_query = (
>          _get_structural_subscription_filter_id_query(
> -            bug, bugtasks, level, direct_subscribers))
> +            bug, bugtasks, level, exclude))
>      if not filter_id_query:
>          return EmptyResultSet()

Just to check that this is deliberate: you're forcing an execution of filter_id_query here to check for matching rows, and if it finds any, you'll execute it again later.

Not saying it has to, but it _could_ be more efficient to listify the results here and use that in the bigger query.  It's not just that it saves a query; it also may give the planner slightly more reliable costing information for the big query.


> -    # This is here because of a circular import.
> -    from lp.registry.model.person import Person
> +    from lp.registry.model.person import Person  # Circular.

We normally put these at the top of the function, right below the docstring.


+def resolve_bug_and_bugtasks(bug_or_bugtask):
+    """Return a bug and a list of bugtasks given a bug or a bugtask.

Isn't "resolve" a bit of a reserved word in the domain of bug trackers?


+    :param bug_or_bugtask: An `IBug` or `IBugTask`.
+    :raises ValueError: If `bug_or_bugtask` does not provide `IBug` or
+        `IBugTask`.
+    """

Will that error be handled specially somewhere?  If so, it may be safer to define a custom exception class.  Makes the "except:" clause that much more specific.  Or if not, maybe this should be an assertion?


+    if IBug.providedBy(bug_or_bugtask):
+        return bug_or_bugtask, bug_or_bugtask.bugtasks
+    elif IBugTask.providedBy(bug_or_bugtask):
+        return bug_or_bugtask.bug, [bug_or_bugtask]
+    else:
+        raise ValueError(
+            "Expected bug or bugtask, got %r" % (bug_or_bugtask,))

What does tuple-izing bug_or_bugtask do here?


=== modified file 'lib/lp/bugs/model/tests/test_bug.py'
--- lib/lp/bugs/model/tests/test_bug.py	2012-01-01 02:58:52 +0000
+++ lib/lp/bugs/model/tests/test_bug.py	2012-01-03 17:52:00 +0000

@@ -476,6 +478,75 @@
>          self.assertContentEqual(public_branches, linked_branches)
>          self.assertNotIn(private_branch, linked_branches)
>  
> +    def test_get_direct_subscribers_with_recipients_query_count(self):
> +        # getDirectSubscribers() uses a constant number of queries when given
> +        # a recipients argument regardless of the number of subscribers.

Bit confusing to spell the method's name differently in the name of the test method.  I believe our usual way of doing it is:

    def test_getDirectSubscribers_with_recipients_query_count(self):


> +        bug = self.factory.makeBug()
> +
> +        def create_subscriber():
> +            subscriber = self.factory.makePerson()
> +            with person_logged_in(subscriber):
> +                bug.subscribe(subscriber, subscriber)

This looks very generic, and you're repeating it later.  Maybe make it a method?


> +        def get_subscribers():
> +            recipients = BugNotificationRecipients()
> +            bug.getDirectSubscribers(recipients=recipients)

This looks unsafe.  I see that record_two_runs flushes the database.  But imagine getDirectSubscribers returning a SELECT result set… is that guaranteed to be executed?  It might be safter to listify the result.


> +        recorder1, recorder2 = record_two_runs(
> +            get_subscribers, create_subscriber, 3)
> +        self.assertThat(
> +            recorder2, HasQueryCount(Equals(recorder1.count)))

By the way, I wasn't aware of record_two_runs.  Gives me a lot more confidence than our old attempts at testing exact query counts!


=== modified file 'lib/lp/bugs/model/tests/test_bugsubscriptioninfo.py'
--- lib/lp/bugs/model/tests/test_bugsubscriptioninfo.py	2012-01-01 02:58:52 +0000
+++ lib/lp/bugs/model/tests/test_bugsubscriptioninfo.py	2012-01-03 17:52:00 +0000

@@ -142,16 +141,75 @@
 
> +    def test_forTask(self):
> +        # `forTask` returns a new `BugSubscriptionInfo` narrowed to the given
> +        # bugtask.
> +        info = self.getInfo()
> +        self.assertIs(None, info.bugtask)
> +        # If called with the current bugtask the same `BugSubscriptionInfo`
> +        # instance is returned.
> +        self.assertIs(info, info.forTask(info.bugtask))
> +        # If called with a different bugtask a new `BugSubscriptionInfo` is
> +        # created.
> +        bugtask = self.bug.default_bugtask
> +        info_for_task = info.forTask(bugtask)
> +        self.assertIs(bugtask, info_for_task.bugtask)
> +        self.assertIsNot(info, info_for_task)
> +        # The instances share a cache of `BugSubscriptionInfo` instances.
> +        expected_cache = {
> +            info.cache_key: info,
> +            info_for_task.cache_key: info_for_task,
> +            }
> +        self.assertEqual(expected_cache, info.cache)
> +        self.assertIs(info.cache, info_for_task.cache)
> +        # Calling `forTask` again looks in the cache first.
> +        self.assertIs(info, info_for_task.forTask(info.bugtask))
> +        self.assertIs(info_for_task, info.forTask(info_for_task.bugtask))
> +        # The level is the same.
> +        self.assertEqual(info.level, info_for_task.level)

This looks like multiple tests rolled into one, proving different facts and exercising different scenarios: “self.getInfo() returns an info without a bugtask,” “forTask for the current bugtask returns self,” “forTask for a bugtask other than the current one creates a new BugSubscriptionInfo,” “A new BugSubscriptionInfo created by forTask shares its cache with the original,” “A newly constructed BugSubscriptionInfo is in its own cache,” “A BugSubscriptionInfo created by forTask is in its own cache,” “forTask uses its cache.”

Unless getInfo is particularly expensive, I'd split this up more so that you exercise forTask only in one place per test.  It's also nice to see which properties break, or get fixed, or get implemented, during development.

And finally there's “A BugSubscriptionInfo returned from cache has the same level."  Imagine this property wasn't implemented, so that you just always got the default level.  Wouldn't this test still pass regardless?  This is one reason why in TDD you should watch the test fail before you make it pass.


> +    def test_forLevel(self):
> +        # `forLevel` returns a new `BugSubscriptionInfo` narrowed to the given
> +        # subscription level.
> +        info = self.getInfo()
> +        # If called with the current level the same `BugSubscriptionInfo`
> +        # instance is returned.
> +        self.assertIs(info, info.forLevel(info.level))
> +        # If called with a different level a new `BugSubscriptionInfo` is
> +        # created.
> +        level = BugNotificationLevel.METADATA
> +        info_for_level = info.forLevel(level)
> +        self.assertEqual(level, info_for_level.level)
> +        self.assertIsNot(info, info_for_level)

This does test the right thing.  A very small superficial note though: when you're contrasting a default value (in this case LIFECYCLE) to an explicit different value (in this case METADATA), better make both explicit in the test.  If you pass LIFECYCLE to self.getInfo(), it'll be clearer that info.level != METADATA in that first test.


> +    def test_muted(self):
> +        # The set of muted subscribers for the bug.
> +        subscribers, subscriptions = self._create_direct_subscriptions()
> +        sub1, sub2 = subscribers
> +        with person_logged_in(sub1):
> +            self.bug.mute(sub1, sub1)
> +        self.assertEqual(set([sub1]), self.getInfo().muted_subscribers)

“There's an app for that.” :-)

    self.assertContentEqual([sub1], self.getInfo().muted_subscribers)


@@ -159,6 +217,17 @@
         found_subscriptions = self.getInfo().direct_subscriptions
         self.assertEqual(set([subscriptions[1]]), found_subscriptions)
 
+    def test_all_direct(self):
+        # The set of all direct subscribers, regardless of level.
+        subscribers, subscriptions = self._create_direct_subscriptions()
+        # Change the first subscription to be for comments only.
+        sub1, sub2 = subscriptions
+        with person_logged_in(sub1.person):
+            sub1.bug_notification_level = BugNotificationLevel.LIFECYCLE
+        info = self.getInfo(BugNotificationLevel.COMMENTS)
+        self.assertEqual(set((sub2,)), info.direct_subscriptions)
+        self.assertEqual(set((sub1, sub2)), info.all_direct_subscriptions)

Here too, I think self.assertContentEqual could come in handy.  "set((sub2,))" is not very easy on the eyes!

One reason for that on top of the obvious ones is: too many (parentheses, with multiple values), and (trailing commas after one value,) and set() for zero items, when what you mean is basically [a list].  A list is more consistent and more recognizable.

Lists are also more idiomatic if you just want an ordered set.  Think of tuples not as containers, but as juxtapositions of items with different roles.  In python it's okay to use them as immutable lists sometimes, e.g. if you really need to save memory; but when in doubt, optimize for clarity.


>      def test_all_pillar_owners_without_bug_supervisors(self):
>          # The set of owners of pillars for which no bug supervisor is
> -        # configured.
> +        # configured and which use Launchpad for bug tracking.
>          [bugtask] = self.bug.bugtasks
>          found_owners = (
>              self.getInfo().all_pillar_owners_without_bug_supervisors)
>          self.assertEqual(set(), found_owners)
> -        self.assertEqual((), found_owners.sorted)
>          # Clear the supervisor for the first bugtask's target.
>          with person_logged_in(bugtask.target.owner):
>              bugtask.target.setBugSupervisor(None, bugtask.owner)
> +            bugtask.pillar.official_malone = False
> +        # The pillar does not use Launchpad, so the collection is still empty.
> +        found_owners = (
> +            self.getInfo().all_pillar_owners_without_bug_supervisors)
> +        self.assertEqual(set(), found_owners)
> +        # When a pillar does use Launchpad the collection includes the
> +        # pillar's owner.
> +        with person_logged_in(bugtask.target.owner):
> +            bugtask.pillar.official_malone = True
>          found_owners = (
>              self.getInfo().all_pillar_owners_without_bug_supervisors)
>          self.assertEqual(set([bugtask.pillar.owner]), found_owners)
> -        self.assertEqual((bugtask.pillar.owner,), found_owners.sorted)
> -        # Add another bugtask with a bug supervisor.
> -        target2 = self.factory.makeProduct(bug_supervisor=None)
> +        # Add another bugtask for a pillar that uses Launchpad but without a
> +        # bug supervisor.
> +        target2 = self.factory.makeProduct(
> +            bug_supervisor=None, official_malone=True)
>          bugtask2 = self.factory.makeBugTask(bug=self.bug, target=target2)
>          found_owners = (
>              self.getInfo().all_pillar_owners_without_bug_supervisors)
>          self.assertEqual(
>              set([bugtask.pillar.owner, bugtask2.pillar.owner]),
>              found_owners)
> +        # Getting subscription info for just a specific bugtask will yield
> +        # owners for only the pillar associated with that bugtask.
> +        info_for_bugtask2 = self.getInfo().forTask(bugtask2)
>          self.assertEqual(
> -            (bugtask.pillar.owner, bugtask2.pillar.owner),
> -            found_owners.sorted)
> +            set([bugtask2.pillar.owner]),
> +            info_for_bugtask2.all_pillar_owners_without_bug_supervisors)
 
Don't do this!

You're inserting new scenarios into an ongoing test story.  That's the stuff maintenance nightmares are made of.  It makes the test hard to read, and data flows hard to track.

I'm perfectly willing to take it on faith that you're conscientiously not introducing any changes that reverberate through the rest of the test, and that no part of the test relies on anything that happened more than 2 lines or so before.  But the next person to work on this code won't know who wrote this or when.

Since they will have more urgent things on their mind than rewriting the test, they'll either read the whole thing for very little gain, or break it and then fix it up as an afterthought.  Neither is good for the project in the long run.


@@ -444,27 +538,28 @@
 
> -    def exercise_subscription_set(self, set_name):
> +    def exercise_subscription_set(self, set_name, counts=(1, 1, 0)):
>          # Looking up subscriptions takes a single query.
> -        with self.exactly_x_queries(1):
> +        with self.exactly_x_queries(counts[0]):

Could you make the meaning of "counts" explicit in some way?


=== modified file 'lib/lp/bugs/tests/test_structuralsubscription.py'
--- lib/lp/bugs/tests/test_structuralsubscription.py	2012-01-01 02:58:52 +0000
+++ lib/lp/bugs/tests/test_structuralsubscription.py	2012-01-03 17:52:00 +0000

@@ -634,6 +639,99 @@
 
> +class TestGetStructuralSubscriptions(TestCaseWithFactory):
> +
> +    layer = DatabaseFunctionalLayer
>
> +    def make_product_with_bug(self):
> +        product = self.factory.makeProduct()
> +        bug = self.factory.makeBug(product=product)
> +        return product, bug
> +
> +    def test_get_structural_subscriptions_no_subscriptions(self):
> +        # If there are no subscriptions for any of the bug's targets then no
> +        # subscriptions will be returned by get_structural_subscriptions().
> +        product, bug = self.make_product_with_bug()
> +        subscriptions = get_structural_subscriptions(bug, None)
> +        self.assertIsInstance(subscriptions, RESULT_SETS)
> +        self.assertEqual([], list(subscriptions))

Much nicer tests here!  Why is that?


Jeroen