Launchpad itself

Merge lp:~flacoste/launchpad/ppr-merge into lp:launchpad

ppr-merge
Merge into devel

Proposed by Francis J. Lacoste on 2010-11-03

Status:	Merged
Approved by:	Gavin Panella on 2010-11-04
Approved revision:	no longer in the source branch.
Merged at revision:	11872
Proposed branch:	lp:~flacoste/launchpad/ppr-merge
Merge into:	lp:launchpad
Prerequisite:	lp:~flacoste/launchpad/ppr-constant-memory
Diff against target:	421 lines (+208/-20) 2 files modified lib/lp/scripts/utilities/pageperformancereport.py (+124/-16) lib/lp/scripts/utilities/tests/test_pageperformancereport.py (+84/-4)
To merge this branch:	bzr merge lp:~flacoste/launchpad/ppr-merge
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Gavin Panella (community)		2010-11-03	Approve on 2010-11-04
Review via email: mp+40014@code.launchpad.net

Commit message

Add --merge option that can be used to aggregate stats saved from individual run.

Description of the change

Hello,

This is a follow-up on my previous constant-memory branch. It adds the option
of generating a report by aggregating the RequestTimes data structure pickled
in a previous run.

This will allow us to generate weekly, monthly and even yearly page
performance report with minimal fuss.

I did this by adding a __add__ method to all data structure involved. There
are only two tricky algorithms, one how to merge the variance. I used the
algorithm pointed at by the wikipedia page
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

The transcription on that page actually didn't produce the expected results.
So I went to the paper and used the original formula.

The may to merge two approximate medians is home-made. I went with what seemed
to make the most sense. It seem to produce ok results.

Revision history for this message

Gavin Panella (allenap) wrote on 2010-11-04:

[1]

+ (float(self.count)/(other.count*results.count)) *
+ ((float(other.count)/self.count)*self.sum - other.sum)**2)
...
+ results.mean = float(results.sum)/results.count
...
+ for n in range((i+1)-len(self.buckets)):

PEP8 suggests "Use spaces around arithmetic operators", but also says
"A Foolish Consistency is the Hobgoblin of Little Minds" :)

[2]

I think this is what for...else is for:

            for i, (category, stats) in enumerate(self.category_times):
                if category.title == other_category.title:
                    results.category_times[i] = (
                        category, stats + other_stats)
                    break
            else:
                results.category_times.append(
                    (other_category, copy.deepcopy(other_stats)))

[3]

+ sorted(url_times.items(),
+ key=lambda x: x[1].total_time,
+ reverse=True)[:self.top_urls_cache_size])

Indentation, and the key function might be clearer with some
unwrapping:

                sorted(url_times.items(),
                       key=lambda (category, stats): stats.total_time,
                       reverse=True)[:self.top_urls_cache_size])

sorted() is also happy with an iterator - i.e. iteritems() - but I
don't think that url_times will ever be large enough to make much of a
difference.

[4]

+ f = bz2.BZ2File(filename, 'r')
+ times += cPickle.load(f)
+ f.close()

BZ2File doesn't provide the context protocol (or whatever it's called)
but files returned by codecs.open(...) do, so this is possible:

with codecs.open(filename, 'r', 'bz2') as f:
times += cPickle.load(f)

Makes bugger all difference though :)

[1]

+                (float(self.count)/(other.count*results.count)) *
+                ((float(other.count)/self.count)*self.sum - other.sum)**2)
...
+            results.mean = float(results.sum)/results.count
...
+                for n in range((i+1)-len(self.buckets)):

PEP8 suggests "Use spaces around arithmetic operators", but also says
"A Foolish Consistency is the Hobgoblin of Little Minds" :)

[2]

+            found = False
+            for i, (category, stats) in enumerate(self.category_times):
+                if category.title == other_category.title:
+                    results.category_times[i] = (
+                        category, stats + other_stats)
+                    found = True
+                    break
+            if not found:
+                results.category_times.append(
+                    (other_category, copy.deepcopy(other_stats)))

I think this is what for...else is for:

[3]

+                sorted(url_times.items(),
+                key=lambda x: x[1].total_time,
+                reverse=True)[:self.top_urls_cache_size])

Indentation, and the key function might be clearer with some
unwrapping:

sorted(url_times.items(),
                       key=lambda (category, stats): stats.total_time,
                       reverse=True)[:self.top_urls_cache_size])

sorted() is also happy with an iterator - i.e. iteritems() - but I
don't think that url_times will ever be large enough to make much of a
difference.

[4]

+            f = bz2.BZ2File(filename, 'r')
+            times += cPickle.load(f)
+            f.close()

BZ2File doesn't provide the context protocol (or whatever it's called)
but files returned by codecs.open(...) do, so this is possible:

with codecs.open(filename, 'r', 'bz2') as f:
                times += cPickle.load(f)

Makes bugger all difference though :)

review: Approve

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2010-11-04:

On November 4, 2010, Gavin Panella wrote:
> Review: Approve
> [1]
>
> + (float(self.count)/(other.count*results.count)) *
> + ((float(other.count)/self.count)*self.sum - other.sum)**2)
> ...
> + results.mean = float(results.sum)/results.count
> ...
> + for n in range((i+1)-len(self.buckets)):
>
> PEP8 suggests "Use spaces around arithmetic operators", but also says
> "A Foolish Consistency is the Hobgoblin of Little Minds" :)
>
>

I added spaces, except for the **2 which would make the line go over 80!

Indeed, I'm never sure of that one. Thanks for pointing this out. I changed
another instance of that idiom in the same file.

>
> [3]
>
> + sorted(url_times.items(),
> + key=lambda x: x[1].total_time,
> + reverse=True)[:self.top_urls_cache_size])
>
> Indentation, and the key function might be clearer with some
> unwrapping:
>
> sorted(url_times.items(),
> key=lambda (category, stats): stats.total_time,
> reverse=True)[:self.top_urls_cache_size])
>
> sorted() is also happy with an iterator - i.e. iteritems() - but I
> don't think that url_times will ever be large enough to make much of a
> difference.
>

Nice suggestion.

>
> [4]
>
> + f = bz2.BZ2File(filename, 'r')
> + times += cPickle.load(f)
> + f.close()
>
> BZ2File doesn't provide the context protocol (or whatever it's called)
> but files returned by codecs.open(...) do, so this is possible:
>
> with codecs.open(filename, 'r', 'bz2') as f:
> times += cPickle.load(f)
>
> Makes bugger all difference though :)

Hmm, I already import bz2, so kept the less 3733t idiom instead of adding an
import :-p

Thanks for the review!

--
Francis J. Lacoste
<email address hidden>

On November 4, 2010, Gavin Panella wrote:
> Review: Approve
> [1]
> 
> +                (float(self.count)/(other.count*results.count)) *
> +                ((float(other.count)/self.count)*self.sum - other.sum)**2)
> ...
> +            results.mean = float(results.sum)/results.count
> ...
> +                for n in range((i+1)-len(self.buckets)):
> 
> PEP8 suggests "Use spaces around arithmetic operators", but also says
> "A Foolish Consistency is the Hobgoblin of Little Minds" :)
> 
>

I added spaces, except for the **2 which would make the line go over 80!

> [2]
> 
> +            found = False
> +            for i, (category, stats) in enumerate(self.category_times):
> +                if category.title == other_category.title:
> +                    results.category_times[i] = (
> +                        category, stats + other_stats)
> +                    found = True
> +                    break
> +            if not found:
> +                results.category_times.append(
> +                    (other_category, copy.deepcopy(other_stats)))
> 
> I think this is what for...else is for:
> 
>             for i, (category, stats) in enumerate(self.category_times):
>                 if category.title == other_category.title:
>                     results.category_times[i] = (
>                         category, stats + other_stats)
>                     break
>             else:
>                 results.category_times.append(
>                     (other_category, copy.deepcopy(other_stats)))
>

Indeed, I'm never sure of that one. Thanks for pointing this out. I changed 
another instance of that idiom in the same file.

> 
> [3]
> 
> +                sorted(url_times.items(),
> +                key=lambda x: x[1].total_time,
> +                reverse=True)[:self.top_urls_cache_size])
> 
> Indentation, and the key function might be clearer with some
> unwrapping:
> 
>                 sorted(url_times.items(),
>                        key=lambda (category, stats): stats.total_time,
>                        reverse=True)[:self.top_urls_cache_size])
> 
> sorted() is also happy with an iterator - i.e. iteritems() - but I
> don't think that url_times will ever be large enough to make much of a
> difference.
>

Nice suggestion.

> 
> [4]
> 
> +            f = bz2.BZ2File(filename, 'r')
> +            times += cPickle.load(f)
> +            f.close()
> 
> BZ2File doesn't provide the context protocol (or whatever it's called)
> but files returned by codecs.open(...) do, so this is possible:
> 
>             with codecs.open(filename, 'r', 'bz2') as f:
>                 times += cPickle.load(f)
> 
> Makes bugger all difference though :)

Hmm, I already import bz2, so kept the less 3733t idiom instead of adding an 
import :-p

Thanks for the review!

-- 
Francis J. Lacoste
francis.lacoste@canonical.com

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Barki Mustapha

Celso Providelo

Christian Reis

Christy Awad

Colin Watson

Francis J. Lacoste

Harpianto,ANDI

James Troup

John A Meinel

Kevin bush

Launchpad code reviewers

Launchpad code reviewers from Canonical

Matthew Tanner

Maximiliano Bertacchini

Oguz Ersoz

Simon Brakhane

Ubuntu-BR DevOps

William Grant

alhawiti

api.ng

pedro cavazos

todaioan

wenjingwen

to status/vote changes:

Tzaddi

Tzaddi Belding

 === modified file 'lib/lp/scripts/utilities/pageperformancereport.py'
 --- lib/lp/scripts/utilities/pageperformancereport.py	2010-11-05 15:24:12 +0000
 +++ lib/lp/scripts/utilities/pageperformancereport.py	2010-11-05 15:24:31 +0000
@@ -9,6 +9,7 @@
  import bz2
  import cPickle
  from cgi import escape as html_quote
++import copy
  from ConfigParser import RawConfigParser
  import csv
  from datetime import datetime
@@ -69,6 +70,11 @@
      def __cmp__(self, other):
          return cmp(self.title.lower(), other.title.lower())
++    def __deepcopy__(self, memo):
++        # We provide __deepcopy__ because the module doesn't handle
++        # compiled regular expression by default.
++        return Category(self.title, self.regexp)
++
  class OnlineStatsCalculator:
      """Object that can compute count, sum, mean, variance and median.
@@ -113,6 +119,30 @@
          else:
              return math.sqrt(self.variance)
++    def __add__(self, other):
++        """Adds this and another OnlineStatsCalculator.
++
++        The result combines the stats of the two objects.
++        """
++        results = OnlineStatsCalculator()
++        results.count = self.count + other.count
++        results.sum = self.sum + other.sum
++        if self.count > 0 and other.count > 0:
++            # This is 2.1b in Chan, Tony F.; Golub, Gene H.; LeVeque,
++            # Randall J. (1979), "Updating Formulae and a Pairwise Algorithm
++            # for Computing Sample Variances.",
++            # Technical Report STAN-CS-79-773,
++            # Department of Computer Science, Stanford University,
++            # ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf .
++            results.M2 = self.M2 + other.M2 + (
++                (float(self.count) / (other.count * results.count)) *
++                ((float(other.count) / self.count) * self.sum - other.sum)**2)
++        else:
++            results.M2 = self.M2 + other.M2 # One of them is 0.
++        if results.count > 0:
++            results.mean = float(results.sum) / results.count
++        return results
++
  class OnlineApproximateMedian:
      """Approximate the median of a set of elements.
@@ -139,23 +169,22 @@
          The bucket size should be a low odd-integer.
          """
--        self.count = 0
          self.bucket_size = bucket_size
          # Index of the median in a completed bucket.
          self.median_idx = (bucket_size-1)//2
          self.buckets = []
--    def update(self, x):
++    def update(self, x, order=0):
          """Update with x."""
          if x is None:
              return
--        self.count += 1
--        i = 0
++        i = order
          while True:
              # Create bucket on demand.
--            if i == len(self.buckets):
--                self.buckets.append([])
++            if i >= len(self.buckets):
++                for n in range((i+1)-len(self.buckets)):
++                    self.buckets.append([])
              bucket = self.buckets[i]
              bucket.append(x)
              if len(bucket) == self.bucket_size:
@@ -171,9 +200,6 @@
      @property
      def median(self):
          """Return the median."""
--        if self.count == 0:
--            return 0
--
          # Find the 'weighted' median by assigning a weight to each
          # element proportional to how far they have been selected.
          candidates = []
@@ -183,6 +209,9 @@
              for x in bucket:
                  total_weight += weight
                  candidates.append([x, weight])
++        if len(candidates) == 0:
++            return 0
++
          # Each weight is the equivalent of having the candidates appear
          # that number of times in the array.
          # So buckets like [[1, 2], [2, 3], [4, 2]] would be expanded to
@@ -196,6 +225,21 @@
              if weighted_idx > median:
                  return x
++    def __add__(self, other):
++        """Merge two approximators together.
++
++        All candidates from the other are merged through the standard
++        algorithm, starting at the same level. So an item that went through
++        two rounds of selection, will be compared with other items having
++        gone through the same number of rounds.
++        """
++        results = OnlineApproximateMedian(self.bucket_size)
++        results.buckets = copy.deepcopy(self.buckets)
++        for i, bucket in enumerate(other.buckets):
++            for x in bucket:
++                results.update(x, i)
++        return results
++
  class Stats:
      """Bag to hold and compute request statistics.
@@ -341,6 +385,21 @@
          idx = int(min(len(self.histogram)-1, request.app_seconds))
          self.histogram[idx][1] += 1
++    def __add__(self, other):
++        """Merge another OnlineStats with this one."""
++        results = copy.deepcopy(self)
++        results.time_stats += other.time_stats
++        results.time_median_approximate += other.time_median_approximate
++        results.sql_time_stats += other.sql_time_stats
++        results.sql_time_median_approximate += (
++            other.sql_time_median_approximate)
++        results.sql_statements_stats += other.sql_statements_stats
++        results.sql_statements_median_approximate += (
++            other.sql_statements_median_approximate)
++        for i, (n, f) in enumerate(other._histogram):
++            results._histogram[i][1] += f
++        return results
++
  class RequestTimes:
      """Collect statistics from requests.
@@ -402,10 +461,9 @@
                  cutoff = int(self.top_urls_cache_size*0.90)
                  self.url_times = dict(
                      sorted(self.url_times.items(),
--                    key=lambda x: x[1].total_time,
++                    key=lambda (url, stats): stats.total_time,
                      reverse=True)[:cutoff])
--
      def get_category_times(self):
          """Return the times for each category."""
          return self.category_times
@@ -415,13 +473,50 @@
          # Sort the result by total time
          return sorted(
              self.url_times.items(),
--            key=lambda x: x[1].total_time, reverse=True)[:self.top_urls]
++            key=lambda (url, stats): stats.total_time,
++            reverse=True)[:self.top_urls]
      def get_pageid_times(self):
          """Return the times for the pageids."""
          # Sort the result by pageid
          return sorted(self.pageid_times.items())
++    def __add__(self, other):
++        """Merge two RequestTimes together."""
++        results = copy.deepcopy(self)
++        for other_category, other_stats in other.category_times:
++            for i, (category, stats) in enumerate(self.category_times):
++                if category.title == other_category.title:
++                    results.category_times[i] = (
++                        category, stats + other_stats)
++                    break
++            else:
++                results.category_times.append(
++                    (other_category, copy.deepcopy(other_stats)))
++
++        url_times = results.url_times
++        for url, stats in other.url_times.items():
++            if url in url_times:
++                url_times[url] += stats
++            else:
++                url_times[url] = copy.deepcopy(stats)
++        # Only keep top_urls_cache_size entries.
++        if len(self.url_times) > self.top_urls_cache_size:
++            self.url_times = dict(
++                sorted(
++                    url_times.items(),
++                    key=lambda (url, stats): stats.total_time,
++                    reverse=True)[:self.top_urls_cache_size])
++
++        pageid_times = results.pageid_times
++        for pageid, stats in other.pageid_times.items():
++            if pageid in pageid_times:
++                pageid_times[pageid] += stats
++            else:
++                pageid_times[pageid] = copy.deepcopy(stats)
++
++        return results
++
  def main():
      parser = LPOptionParser("%prog [args] tracelog [...]")
@@ -459,6 +554,11 @@
          # Default to 12: the staging timeout.
          default=12, type="int",
          help="The configured timeout value : determines high risk page ids.")
++    parser.add_option(
++        "--merge", dest="merge",
++        default=False, action='store_true',
++        help="Files are interpreted as pickled stats and are aggregated for" +
++        "the report.")
      options, args = parser.parse_args()
@@ -473,6 +573,9 @@
              parser.error(
                  "--from timestamp %s is before --until timestamp %s"
                  % (options.from_ts, options.until_ts))
++    if options.from_ts is not None or options.until_ts is not None:
++        if options.merge:
++            parser.error('--from and --until cannot be used with --merge')
      for filename in args:
          if not os.path.exists(filename):
@@ -501,7 +604,14 @@
      times = RequestTimes(categories, options)
--    parse(args, times, options)
++    if options.merge:
++        for filename in args:
++            log.info('Merging %s...' % filename)
++            f = bz2.BZ2File(filename, 'r')
++            times += cPickle.load(f)
++            f.close()
++    else:
++        parse(args, times, options)
      category_times = times.get_category_times()
@@ -564,16 +674,14 @@
      for option in script_config.options('metrics'):
          name = script_config.get('metrics', option)
--        found = False
          for category, stats in category_times:
              if category.title == name:
                  writer.writerows([
                      ("%s_99" % option, "%f@%d" % (
                          stats.ninetyninth_percentile_time, date)),
                      ("%s_mean" % option, "%f@%d" % (stats.mean, date))])
--                found = True
                  break
--        if not found:
++        else:
              log.warning("Can't find category %s for metric %s" % (
                  option, name))
      metrics_file.close()
 === modified file 'lib/lp/scripts/utilities/tests/test_pageperformancereport.py'
 --- lib/lp/scripts/utilities/tests/test_pageperformancereport.py	2010-11-05 15:24:12 +0000
 +++ lib/lp/scripts/utilities/tests/test_pageperformancereport.py	2010-11-05 15:24:31 +0000
@@ -12,6 +12,7 @@
  from lp.scripts.utilities.pageperformancereport import (
      Category,
      OnlineApproximateMedian,
++    OnlineStats,
      OnlineStatsCalculator,
      RequestTimes,
      Stats,
@@ -30,6 +31,7 @@
  class FakeRequest:
++
      def __init__(self, url, app_seconds, sql_statements=None,
                   sql_seconds=None, pageid=None):
          self.url = url
@@ -40,6 +42,7 @@
  class FakeStats(Stats):
++
      def __init__(self, **kwargs):
          # Override the constructor to just store the values.
          self.__dict__.update(kwargs)
@@ -196,9 +199,28 @@
          pageid_times = self.db.get_pageid_times()
          self.assertStatsAreEquals(PAGEID_STATS, pageid_times)
++    def test___add__(self):
++        # Ensure that adding two RequestTimes together result in
++        # a merge of their constituencies.
++        db1 = self.db
++        db2 = RequestTimes(self.categories, FakeOptions())
++        db1.add_request(FakeRequest('/', 1.5, 5, 1.0, '+root'))
++        db1.add_request(FakeRequest('/bugs', 3.5, 15, 1.0, '+bugs'))
++        db2.add_request(FakeRequest('/bugs/1', 5.0, 30, 4.0, '+bug'))
++        results = db1 + db2
++        self.assertEquals(3, results.category_times[0][1].total_hits)
++        self.assertEquals(0, results.category_times[1][1].total_hits)
++        self.assertEquals(2, results.category_times[2][1].total_hits)
++        self.assertEquals(1, results.pageid_times['+root'].total_hits)
++        self.assertEquals(1, results.pageid_times['+bugs'].total_hits)
++        self.assertEquals(1, results.pageid_times['+bug'].total_hits)
++        self.assertEquals(1, results.url_times['/'].total_hits)
++        self.assertEquals(1, results.url_times['/bugs'].total_hits)
++        self.assertEquals(1, results.url_times['/bugs/1'].total_hits)
++
  class TestStats(TestCase):
--    """Tests for the stats class."""
++    """Tests for the Stats class."""
      def test_relative_histogram(self):
          # Test that relative histogram gives an histogram using
@@ -211,6 +233,26 @@
              stats.relative_histogram)
++class TestOnlineStats(TestCase):
++    """Tests for the OnlineStats class."""
++
++    def test___add__(self):
++        # Ensure that adding two OnlineStats merge all their constituencies.
++        stats1 = OnlineStats(4)
++        stats1.update(FakeRequest('/', 2.0, 5, 1.5))
++        stats2 = OnlineStats(4)
++        stats2.update(FakeRequest('/', 1.5, 2, 3.0))
++        stats2.update(FakeRequest('/', 5.0, 2, 2.0))
++        results = stats1 + stats2
++        self.assertEquals(3, results.total_hits)
++        self.assertEquals(2, results.median)
++        self.assertEquals(9, results.total_sqlstatements)
++        self.assertEquals(2, results.median_sqlstatements)
++        self.assertEquals(6.5, results.total_sqltime)
++        self.assertEquals(2.0, results.median_sqltime)
++        self.assertEquals([[0, 0], [1, 1], [2, 1], [3, 1]], results.histogram)
++
++
  class TestOnlineStatsCalculator(TestCase):
      """Tests for the online stats calculator."""
@@ -248,14 +290,44 @@
          self.assertEquals(6, self.stats.variance)
          self.assertEquals("2.45", "%.2f" % self.stats.std)
++    def test___add___two_empty_together(self):
++        stats2 = OnlineStatsCalculator()
++        results = self.stats + stats2
++        self.assertEquals(0, results.count)
++        self.assertEquals(0, results.sum)
++        self.assertEquals(0, results.mean)
++        self.assertEquals(0, results.variance)
++
++    def test___add___one_empty(self):
++        stats2 = OnlineStatsCalculator()
++        for x in [1, 2, 3]:
++            self.stats.update(x)
++        results = self.stats + stats2
++        self.assertEquals(3, results.count)
++        self.assertEquals(6, results.sum)
++        self.assertEquals(2, results.mean)
++        self.assertEquals(2, results.M2)
++
++    def test___add__(self):
++        stats2 = OnlineStatsCalculator()
++        for x in [3, 6, 9]:
++            self.stats.update(x)
++        for x in [1, 2, 3]:
++            stats2.update(x)
++        results = self.stats + stats2
++        self.assertEquals(6, results.count)
++        self.assertEquals(24, results.sum)
++        self.assertEquals(4, results.mean)
++        self.assertEquals(44, results.M2)
++
  SHUFFLE_RANGE_100 = [
 , 79, 99, 76, 60, 63, 87, 77, 51, 82, 42, 96, 93, 58, 32, 66, 75,
--     2, 26, 22, 11, 73, 61, 83, 65, 68, 44, 81, 64,  3, 33, 34, 15,  1,
++     2, 26, 22, 11, 73, 61, 83, 65, 68, 44, 81, 64, 3, 33, 34, 15, 1,
 , 27, 90, 74, 46, 57, 59, 31, 13, 19, 89, 29, 56, 94, 50, 49, 62,
 , 21, 35, 5, 84, 88, 16, 8, 23, 40, 6, 48, 10, 97, 0, 53, 17, 30,
 , 43, 86, 12, 71, 38, 78, 36, 7, 45, 47, 80, 54, 39, 91, 98, 24,
--    55, 14, 52, 20, 69, 85, 95, 28, 4, 9, 67, 70, 41, 72
++    55, 14, 52, 20, 69, 85, 95, 28, 4, 9, 67, 70, 41, 72,
+     ]
@@ -279,12 +351,20 @@
          self.estimator.update(None)
          self.assertEquals(1, self.estimator.median)
--    def test_approximage_median_is_good_enough(self):
++    def test_approximate_median_is_good_enough(self):
          for x in SHUFFLE_RANGE_100:
              self.estimator.update(x)
          # True median is 50, 49 is good enough :-)
          self.assertIn(self.estimator.median, range(49,52))
++    def test___add__(self):
++        median1 = OnlineApproximateMedian(3)
++        median1.buckets = [[1, 3], [4, 5], [6, 3]]
++        median2 = OnlineApproximateMedian(3)
++        median2.buckets = [[], [3, 6], [3, 7]]
++        results = median1 + median2
++        self.assertEquals([[1, 3], [6], [3, 7], [4]], results.buckets)
++
  def test_suite():
      return unittest.TestLoader().loadTestsFromName(__name__)