txStatsD

Merge lp:~lucio.torre/txstatsd/add-distinct-count into lp:txstatsd

add-distinct-count
Merge into trunk

Proposed by Lucio Torre on 2011-10-25

Status:	Merged
Approved by:	Sidnei da Silva on 2011-10-26
Approved revision:	40
Merged at revision:	48
Proposed branch:	lp:~lucio.torre/txstatsd/add-distinct-count
Merge into:	lp:txstatsd
Diff against target:	400 lines (+303/-0) 6 files modified txstatsd/metrics/distinctmetric.py (+153/-0) txstatsd/metrics/metrics.py (+8/-0) txstatsd/server/configurableprocessor.py (+7/-0) txstatsd/server/processor.py (+29/-0) txstatsd/tests/metrics/test_distinct.py (+80/-0) txstatsd/tests/test_processor.py (+26/-0)
To merge this branch:	bzr merge lp:~lucio.torre/txstatsd/add-distinct-count
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Sidnei da Silva		2011-10-25	Approve on 2011-10-26
Review via email: mp+80391@code.launchpad.net

Commit message

add probabilistic distinct counter

Description of the change

Implements a probabilistic distinct counter with sliding windows.

See: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.7100
for an idea on the algorithm.

It uses custom hash functions that are tested for uniform distribution (now skipped, they take too long).

Revision history for this message

Sidnei da Silva (sidnei) wrote on 2011-10-25:

1. Why not use something like 'd' for the postfix instead of 'distinct'?

2. There's some copy-n-paste issues (gauge -> distinct in docstrings)

3. TestDistinc -> TestDistinct

4. Maybe 'add' can be simplified a bit? Instead of looping 2x256 (list comp + enumerate) do a generator expression inside enumerate?

5. Missing blank line between docstring and __init__

6. (unrelated to the branch) it seems pretty obvious that adding a new type of metric needs to touch a lot of code. I particularly dislike constructs like this:

175 + if not name in self._metrics:
176 + metric = DistinctMetric(self.connection, name)
177 + self._metrics[name] = metric

Maybe something can be done to simplify it in the future.

review: Needs Fixing

Revision history for this message

Lucio Torre (lucio.torre) wrote on 2011-10-26:

1- because it wont matter much in the packet size and its more clear. I can change it if you want, it just seamed better. I dont like when we start growing code letters that we repeat in various parts of the code and they carry no meaning.

2- fixed

3- fixed

4- Turned the hash loop into a generator so its just one pass of size n_hashes

5- fixed.

6- There is a lot to fix there. I dont think that is that bad, as the metric will be created in the entry point for those measurements. Still, all the changes that need to be added for process, flush and configurable processor seem a bit repetitive. But yes, it needs some work there, we should discuss it. I always saw this work a part of leaving behind statsd compatibility.

Revision history for this message

Sidnei da Silva (sidnei) wrote on 2011-10-26:

Re: 1, please change it for the sake of consistency. While I agree its not a lot relative to the whole packet, I think it's worth saving every byte we can.

The remaining changes look good. +1

Note you can set a commit message and flip the MP to approved, and tarmac will pick it up.

review: Approve

Revision history for this message

Ubuntu One Server Tarmac Bot (ubuntuone-server-tarmac) wrote on 2011-10-26:

There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.

Revision history for this message

Sidnei da Silva (sidnei) on 2011-10-26:

review: Approve

Revision history for this message

Ubuntu One Server Tarmac Bot (ubuntuone-server-tarmac) wrote on 2011-10-26:

Download full text (15.5 KiB)

The attempt to merge lp:~lucio.torre/txstatsd/add-distinct-count into lp:txstatsd failed. Below is the output from the failed tests.

[ERROR]
txstatsd.tests.metrics.test_histogrammetric
  TestHistogramReporterMetric
    test_histogram_of_numbers_1_through_10000 ...                          [OK]
    test_histogram_with_zero_recorded_values ...                           [OK]
txstatsd.tests.metrics.test_timermetric
  TestBlankTimerMetric
    test_count ...                                                         [OK]
    test_fifteen_minute_rate ...                                           [OK]
    test_five_minute_rate ...                                              [OK]
    test_max ...                                                           [OK]
    test_mean ...                                                          [OK]
    test_mean_rate ...                                                     [OK]
    test_min ...                                                           [OK]
    test_no_values ...                                                     [OK]
    test_one_minute_rate ...                                               [OK]
    test_percentiles ...                                                   [OK]
    test_std_dev ...                                                       [OK]
  TestTimingSeriesEvents
    test_count ...                                                         [OK]
    test_max ...                                                           [OK]
    test_mean ...                                                          [OK]
    test_min ...                                                           [OK]
    test_percentiles ...                                                   [OK]
    test_std_dev ...                                                       [OK]
    test_values ...                                                        [OK]
txstatsd.tests.stats.test_ewma
  TestEwmaFifteenMinute
    test_eight_minutes ...                                                 [OK]
    test_eleven_minutes ...                                                [OK]
    test_fifteen_minutes ...                                               [OK]
    test_first_tick ...                                                    [OK]
    test_five_minutes ...                                                  [OK]
    test_four_minutes ...                                                  [OK]
    test_fourteen_minutes ...                                              [OK]
    test_nine_minutes ...                                                  [OK]
    test_one_minute ...                                                    [OK]
    test_seven_minutes ...                                                 [OK]
    test_six_minutes ...                                                   [OK]
    test_ten_minutes ...                                                   [OK]
    test_thirteen_minutes ...                                              [OK]
    test_three_minutes ...                                                 [OK]
    test_twelve_minutes ...                                                [OK]
    test_two_minutes ...                                                   [OK]
  TestEwmaFiveMinute
    test_eight_minutes ...                                                 [OK]
    test_eleven_minutes ...                                                [OK]
    test_fifteen_minutes ...                                               [OK]
    test_first_tick ...                                                    [OK]
    test_five_minutes ...                                                  [OK]
    test_four_minutes ...                                                  [OK]
    test_fourteen_minutes ...                                              [OK]
    test_nine_minutes ...                                                  [OK]
    test_one_minute ...                                                    [OK]
    test_seven_minutes ...                                                 [OK]
    test_six_minutes ...                                                   [OK]
    test_ten_minutes ...                                                   [OK]
    test_thirteen_minutes ...                                              [OK]
    test_three_minutes ...                                                 [OK]
    test_twelve_minutes ...                                                [OK]
    test_two_minutes ...                                                   [OK]
  TestEwmaOneMinute
    test_eight_minutes ...                                                 [OK]
    test_eleven_minutes ...                                                [OK]
    test_fifteen_minutes ...                                               [OK]
    test_first_tick ...                                                    [OK]
    test_five_minutes ...                                                  [OK]
    test_four_minutes ...                                                  [OK]
    test_fourteen_minutes ...                                              [OK]
    test_nine_minutes ...                                                  [OK]
    test_one_minute ...                                                    [OK]
    test_seven_minutes ...                                                 [OK]
    test_six_minutes ...                                                   [OK]
    test_ten_minutes ...                                                   [OK]
    test_thirteen_minutes ...                                              [OK]
    test_three_minutes ...                                                 [OK]
    test_twelve_minutes ...                                                [OK]
    test_two_minutes ...                                                   [OK]
txstatsd.tests.stats.test_exponentiallydecayingsample
  TestExponentiallyDecayingSample
    test_100_out_of_1000_elements ...                                      [OK]
    test_100_out_of_10_elements ...                                        [OK]
    test_heavily_biased_100_out_of_1000_elements ...                       [OK]
txstatsd.tests.stats.test_uniformsample
  TestUniformSample
    test_100_out_of_1000_elements ...                                      [OK]
    test_100_out_of_10_elements ...                                        [OK]
txstatsd.tests.test_client
  TestClient
    test_twistedstatsd_write_with_malformed_address ...                    [OK]
    test_twistedstatsd_write_with_wellformed_address ...                   [OK]
    test_udpstatsd_malformed_address ...                                   [OK]
    test_udpstatsd_socket_nonblocking ...                                  [OK]
    test_udpstatsd_wellformed_address ...                                  [OK]
txstatsd.tests.test_configurableprocessor
  FlushMessagesTest
    test_flush_counter_with_empty_prefix ...                               [OK]
    test_flush_counter_with_prefix ...                                     [OK]
    test_flush_single_timer_multiple_times ...                             [OK]
    test_flush_single_timer_single_time ...                                [OK]
  FlushMeterMetricMessagesTest
    test_flush_meter_metric_with_prefix ...                                [OK]
txstatsd.tests.test_loggingprocessor
  TestLoggingMessageProcessor
    test_logger ...                                                        [OK]
    test_logger_with_no_info ...                                           [OK]
    test_logger_with_non_callable_info ...                                 [OK]
txstatsd.tests.test_metrics
  TestMetrics
    test_counter ...                                                       [OK]
    test_empty_namespace ...                                               [OK]
    test_gauge ...                                                         [OK]
    test_meter ...                                                         [OK]
    test_timing ...                                                        [OK]
txstatsd.tests.test_process
  TestSystemPerformance
    test_ioinfo ...                                                        [OK]
    test_ioinfo_with_get_io_counters ...                                   [OK]
    test_loadinfo ...                                                      [OK]
    test_meminfo ...                                                       [OK]
    test_netdev ...                                                        [OK]
    test_netinfo_no_get_connections ...                                    [OK]
    test_netinfo_with_get_connections ...                                  [OK]
    test_reactor_stats ...                                                 [OK]
    test_read ...                                                          [OK]
    test_self_statinfo ...                                                 [OK]
    test_self_statinfo_with_num_threads ...                                [OK]
    test_statinfo ...                                                      [OK]
    test_threadpool_stats ...                                              [OK]
txstatsd.tests.test_processor
  FlushMessagesTest
    test_flush_counter ...                                                 [OK]
    test_flush_counter_one_second_interval ...                             [OK]
    test_flush_distinct_metric ...                                         [OK]
    test_flush_gauge_metric ...                                            [OK]
    test_flush_no_stats ...                                                [OK]
    test_flush_single_timer_50th_percentile ...                            [OK]
    test_flush_single_timer_multiple_times ...                             [OK]
    test_flush_single_timer_single_time ...                                [OK]
  FlushMeterMetricMessagesTest
    test_flush_meter_metric ...                                            [OK]
  ProcessMessagesTest
    test_receive_counter ...                                               [OK]
    test_receive_counter_no_value ...                                      [OK]
    test_receive_counter_rate ...                                          [OK]
    test_receive_distinct_metric ...                                       [OK]
    test_receive_gauge_metric ...                                          [OK]
    test_receive_message_no_fields ...                                     [OK]
    test_receive_not_enough_fields ...                                     [OK]
    test_receive_timer ...                                                 [OK]
    test_receive_timer_no_value ...                                        [OK]
    test_receive_too_many_fields ...                                       [OK]
txstatsd.tests.test_protocol
  TestGraphiteProtocol
    test_paused_producer_discards_everything_until_resumed ...             [OK]
    test_stopped_producer_discards_everything ...                          [OK]
    test_write_unless_paused ...                                           [OK]
  TestPausedMessagingLogging
    test_paused_producer_logging ...                                       [OK]
    test_resumed_producer_logging ...                                      [OK]
  TestProducerRegistration
    test_register_producer ...                                             [OK]
txstatsd.tests.test_service
  GlueOptionsTestCase
    test_cmdline_overrides_config ...                                      [OK]
    test_defaults ...                                                      [OK]
    test_ensure_config_values_coerced ...                                  [OK]
    test_no_config_option ...                                              [OK]
    test_reads_from_config ...                                             [OK]
    test_set_parameter ...                                                 [OK]
    test_support_default_not_in_config ...                                 [OK]
  ServiceTestsBuilder_EPollReactor
    test_monitor_response ...                                              [OK]
    test_service ...                                                       [OK]
  ServiceTestsBuilder_Glib2Reactor
    test_monitor_response ...                                              [OK]
    test_service ...                                                       [OK]
  ServiceTestsBuilder_Gtk2Reactor
    test_monitor_response ...                                              [OK]
/usr/lib/pymodules/python2.6/gtk-2.0/gtk/__init__.py:57: GtkWarning: could not open display
    test_service ...                                                       [OK]
  ServiceTestsBuilder_IOCPReactor
    test_monitor_response ...                                         [SKIPPED]
    test_service ...                                                  [SKIPPED]
  ServiceTestsBuilder_KQueueReactor
    test_monitor_response ...                                         [SKIPPED]
    test_service ...                                                  [SKIPPED]
  ServiceTestsBuilder_PollReactor
    test_monitor_response ...                                              [OK]
    test_service ...                                                       [OK]
  ServiceTestsBuilder_SelectReactor
    test_monitor_response ...                                              [OK]
    test_service ...                                                       [OK]
  ServiceTestsBuilder_Win32Reactor
    test_monitor_response ...                                         [SKIPPED]
    test_service ...                                                  [SKIPPED]

===============================================================================
[SKIPPED]: txstatsd.tests.test_service.ServiceTestsBuilder_IOCPReactor.test_monitor_response

No module named win32api
===============================================================================
[SKIPPED]: txstatsd.tests.test_service.ServiceTestsBuilder_IOCPReactor.test_service

No module named win32api
===============================================================================
[SKIPPED]: txstatsd.tests.test_service.ServiceTestsBuilder_KQueueReactor.test_monitor_response

'module' object has no attribute 'kqueuereactor'
===============================================================================
[SKIPPED]: txstatsd.tests.test_service.ServiceTestsBuilder_KQueueReactor.test_service

No module named win32file
===============================================================================
[SKIPPED]: txstatsd.tests.test_service.ServiceTestsBuilder_Win32Reactor.test_service

No module named win32file
===============================================================================
[ERROR]: txstatsd.tests.metrics.test_distinct

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/twisted/trial/runner.py", line 563, in loadPackage
    module = modinfo.load()
  File "/usr/lib/python2.6/dist-packages/twisted/python/modules.py", line 381, in load
    return self.pathEntry.pythonPath.moduleLoader(self.name)
  File "/usr/lib/python2.6/dist-packages/twisted/python/reflect.py", line 464, in namedAny
    topLevelPackage = _importAndCheckStack(trialname)
  File "/home/tarmac/cache/txstatsd/trunk/txstatsd/tests/metrics/test_distinct.py", line 5, in <module>
    from scipy.stats import chi2
exceptions.ImportError: No module named scipy.stats
-------------------------------------------------------------------------------
Ran 152 tests in 1.429s

FAILED (skips=6, errors=1, successes=146)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Lucio Torre

txStatsD Developers

 === added file 'txstatsd/metrics/distinctmetric.py'
 --- txstatsd/metrics/distinctmetric.py	1970-01-01 00:00:00 +0000
 +++ txstatsd/metrics/distinctmetric.py	2011-10-26 16:55:26 +0000
@@ -0,0 +1,153 @@
++# Copyright (C) 2011 Canonical
++# All Rights Reserved
++"""
++Implements a probabilistic distinct counter with sliding windows.
++
++Based on:
++http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.7100
++
++And extended for sliding windows.
++"""
++import random
++import time
++
++from string import Template
++
++from txstatsd.metrics.metric import Metric
++
++
++class SBoxHash(object):
++    """A very fast hash.
++
++    This class create a random hash function that is very fast.
++    Based on SBOXes. Not Crypto Strong.
++
++    Two instances of this class will hash differently.
++    """
++
++    def __init__(self):
++        self.table = [random.randint(0, 0xFFFFFFFF - 1) for i in range(256)]
++
++    def hash(self, data):
++        value = 0
++        for c in data:
++            value = value ^ self.table[ord(c)]
++            value = value * 3
++            value = value & 0xFFFFFFFF
++        return value
++
++
++def hash(data):
++    """Hash data using a random hasher."""
++    p = SBoxHash()
++    return p.hash(data)
++
++
++def zeros(n):
++    """Count the zeros to the right of the binary representation of n."""
++    count = 0
++    i = 0
++    while True:
++        v = (n >> i)
++        if v <= 0:
++            return count
++        if v & 1:
++            return count
++        count += 1
++        i += 1
++    return count
++
++
++class SlidingDistinctCounter(object):
++    """A probabilistic distinct counter with sliding windows."""
++
++    def __init__(self, n_hashes, n_buckets):
++        self.n_hashes = n_hashes
++        self.n_buckets = n_buckets
++
++        self.hashes = [SBoxHash() for i in range(n_hashes)]
++        self.buckets = [[0] * n_buckets for i in range(n_hashes)]
++
++    def add(self, when, item):
++        hashes = (h.hash(item) for h in self.hashes)
++        for i, value in enumerate(hashes):
++            self.buckets[i][min(self.n_buckets - 1, zeros(value))] = when
++
++    def distinct(self, since=0):
++        total = 0.0
++        for i in range(self.n_hashes):
++            least0 = 0
++            for b in range(self.n_buckets):
++                if self.buckets[i][b] <= since:
++                    break
++                least0 += 1
++            total += least0
++        v = total / self.n_hashes
++        return int((2 ** v) / 0.77351)
++
++
++class DistinctMetric(Metric):
++    """
++    Keeps an estimate of the distinct numbers of items seen on various
++    sliding windows of time.
++    """
++
++    def mark(self, item):
++        """Report this item was seen."""
++        self.send("%s|d" % item)
++
++
++class DistinctMetricReporter(object):
++    """
++    Keeps an estimate of the distinct numbers of items seen on various
++    sliding windows of time.
++    """
++
++    MESSAGE = (
++        "$prefix%(key)s.count_1min %(count_1min)s %(timestamp)s\n"
++        "$prefix%(key)s.count_1hour %(count_1hour)s %(timestamp)s\n"
++        "$prefix%(key)s.count_1day %(count_1day)s %(timestamp)s\n"
++        "$prefix%(key)s.count %(count)s %(timestamp)s\n"
++    )
++
++    def __init__(self, name, wall_time_func=time.time, prefix=""):
++        """Construct a metric we expect to be periodically updated.
++
++        @param name: Indicates what is being instrumented.
++        @param wall_time_func: Function for obtaining wall time.
++        @param prefix: If present, a string to prepend to the message
++            composed when C{report} is called.
++        """
++        self.name = name
++        self.wall_time_func = wall_time_func
++        self.counter = SlidingDistinctCounter(32, 32)
++        if prefix:
++            prefix += '.'
++        self.message = Template(DistinctMetricReporter.MESSAGE).substitute(
++            prefix=prefix)
++
++    def count(self):
++        return self.counter.distinct()
++
++    def count_1min(self, now):
++        return self.counter.distinct(now - 60)
++
++    def count_1hour(self, now):
++        return self.counter.distinct(now - 60 * 60)
++
++    def count_1day(self, now):
++        return self.counter.distinct(now - 60 * 60 * 24)
++
++    def update(self, item):
++        """Adds a seen item."""
++        self.counter.add(self.wall_time_func(), item)
++
++    def report(self, timestamp):
++        now = self.wall_time_func()
++        return self.message % {
++            "key": self.name,
++            "count": self.count(),
++            "count_1min": self.count_1min(now),
++            "count_1hour": self.count_1hour(now),
++            "count_1day": self.count_1day(now),
++            "timestamp": timestamp}
 === modified file 'txstatsd/metrics/metrics.py'
 --- txstatsd/metrics/metrics.py	2011-08-25 17:41:37 +0000
 +++ txstatsd/metrics/metrics.py	2011-10-26 16:55:26 +0000
@@ -1,6 +1,7 @@
  from txstatsd.metrics.gaugemetric import GaugeMetric
  from txstatsd.metrics.metermetric import MeterMetric
++from txstatsd.metrics.distinctmetric import DistinctMetric
  from txstatsd.metrics.metric import Metric
@@ -69,6 +70,13 @@
              self._metrics[name] = metric
          self._metrics[name].send("%s|ms" % duration)
++    def distinct(self, name, item):
++        name = self.fully_qualify_name(name)
++        if not name in self._metrics:
++            metric = DistinctMetric(self.connection, name)
++            self._metrics[name] = metric
++        self._metrics[name].mark(item)
++
      def clear(self, name):
          """Allow the metric to re-initialize its internal state."""
          name = self.fully_qualify_name(name)
 === modified file 'txstatsd/server/configurableprocessor.py'
 --- txstatsd/server/configurableprocessor.py	2011-10-10 16:08:12 +0000
 +++ txstatsd/server/configurableprocessor.py	2011-10-26 16:55:26 +0000
@@ -71,6 +71,13 @@
              self.meter_metrics[key] = metric
          self.meter_metrics[key].mark(value)
++    def compose_distinct_metric(self, key, item):
++        if not key in self.distinct_metrics:
++            metric = DistinctMetricReporter(key, self.time_function,
++                                         prefix=self.message_prefix)
++            self.distinct_metrics[key] = metric
++        self.distinct_metrics[key].update(item)
++
      def flush_counter_metrics(self, interval, timestamp):
          metrics = []
          events = 0
 === modified file 'txstatsd/server/processor.py'
 --- txstatsd/server/processor.py	2011-10-10 16:08:12 +0000
 +++ txstatsd/server/processor.py	2011-10-26 16:55:26 +0000
@@ -6,6 +6,7 @@
  from twisted.python import log
  from txstatsd.metrics.metermetric import MeterMetricReporter
++from txstatsd.metrics.distinctmetric import DistinctMetricReporter
  SPACES = re.compile("\s+")
@@ -60,6 +61,7 @@
          self.counter_metrics = {}
          self.gauge_metrics = deque()
          self.meter_metrics = {}
++        self.distinct_metrics = {}
      def fail(self, message):
          """Log and discard malformed message."""
@@ -91,9 +93,21 @@
              self.process_gauge_metric(key, fields[0], message)
          elif fields[1] == "m":
              self.process_meter_metric(key, fields[0], message)
++        elif fields[1] == "d":
++            self.process_distinct_metric(key, fields[0], message)
          else:
              return self.fail(message)
++    def process_distinct_metric(self, key, item, message):
++        self.compose_distinct_metric(key, str(item))
++
++    def compose_distinct_metric(self, key, item):
++        if not key in self.distinct_metrics:
++            metric = DistinctMetricReporter(key, self.time_function,
++                                         prefix="stats.distinct")
++            self.distinct_metrics[key] = metric
++        self.distinct_metrics[key].update(item)
++
      def process_timer_metric(self, key, duration, message):
          try:
              duration = float(duration)
@@ -192,6 +206,11 @@
              messages.extend(meter_metrics)
              num_stats += events
++        distinct_metrics, events = self.flush_distinct_metrics(timestamp)
++        if events > 0:
++            messages.extend(distinct_metrics)
++            num_stats += events
++
          self.flush_metrics_summary(messages, num_stats, timestamp)
          return messages
@@ -278,6 +297,16 @@
          return (metrics, events)
++    def flush_distinct_metrics(self, timestamp):
++        metrics = []
++        events = 0
++        for metric in self.distinct_metrics.itervalues():
++            message = metric.report(timestamp)
++            metrics.append(message)
++            events += 1
++
++        return (metrics, events)
++
      def flush_metrics_summary(self, messages, num_stats, timestamp):
          messages.append("statsd.numStats %s %s\n" % (num_stats, timestamp))
 === added file 'txstatsd/tests/metrics/test_distinct.py'
 --- txstatsd/tests/metrics/test_distinct.py	1970-01-01 00:00:00 +0000
 +++ txstatsd/tests/metrics/test_distinct.py	2011-10-26 16:55:26 +0000
@@ -0,0 +1,80 @@
++# Copyright (C) 2011 Canonical
++# All Rights Reserved
++import random
++
++from scipy.stats import chi2
++
++from twisted.trial.unittest import TestCase
++
++import txstatsd.metrics.distinctmetric as distinct
++
++
++class TestHash(TestCase):
++    def test_hash_chars(self):
++        "For one table, all chars map to different chars"
++        results = set()
++        for c in range(256):
++            random.seed(1)
++            h = distinct.hash(chr(c))
++            results.add(h)
++        self.assertEquals(len(results), 256)
++
++    def test_chi_square(self):
++        N = 10000
++
++        for (bits, buckets) in [(-1, 1024), (24, 256),
++                                (16, 256), (8, 256), (0, 256)]:
++            bins = [0] * buckets
++            for i in range(N):
++                v = distinct.hash(str(i))
++                if bits < 0:
++                    bin = v / (0xFFFFFFFF / buckets)
++                else:
++                    bin = (v >> bits) & 0xFF
++                bins[bin] += 1
++            value = sum(((x - N / buckets) ** 2) / (N / buckets) for x in bins)
++            pval = chi2.cdf(value, N)
++            if pval > 0.5:
++                print bins, pval
++            self.assertTrue(pval < 0.5, "bits %s, pval == %s" % (bits, pval))
++    test_chi_square.skip = "Takes too long to run every time."
++
++
++class TestZeros(TestCase):
++    def test_zeros(self):
++        self.assertEquals(distinct.zeros(1), 0)
++        self.assertEquals(distinct.zeros(2), 1)
++        self.assertEquals(distinct.zeros(4), 2)
++        self.assertEquals(distinct.zeros(5), 0)
++        self.assertEquals(distinct.zeros(8), 3)
++        self.assertEquals(distinct.zeros(9), 0)
++
++class TestDistinct(TestCase):
++    def test_all(self):
++        random.seed(1)
++
++        for r in [1000, 10000]:
++            cd = distinct.SlidingDistinctCounter(32, 32)
++            for i in range(r):
++                cd.add(1, str(i))
++            error = abs(cd.distinct() - r)
++            self.assertTrue(error < 0.15 * r)
++
++
++class TestDistinctMetricReporter(TestCase):
++    def test_reports(self):
++        random.seed(1)
++        _wall_time = [0]
++        def _time():
++            return _wall_time[0]
++
++        dmr = distinct.DistinctMetricReporter("test", wall_time_func=_time)
++        for i in range(3000):
++            _wall_time[0] = i * 50
++            dmr.update(str(i))
++        now = _time()
++        self.assertTrue(abs(dmr.count() - 3000) < 600)
++        self.assertTrue(abs(dmr.count_1min(now) - 1) < 2)
++        self.assertTrue(abs(dmr.count_1hour(now) - 72) < 15)
++        self.assertTrue(abs(dmr.count_1day(now) - 1728) < 500)
++        self.assertTrue("count_1hour" in dmr.report(now))
 === modified file 'txstatsd/tests/test_processor.py'
 --- txstatsd/tests/test_processor.py	2011-10-07 19:49:25 +0000
 +++ txstatsd/tests/test_processor.py	2011-10-26 16:55:26 +0000
@@ -63,6 +63,16 @@
              [9.6, 'gorets'],
              self.processor.gauge_metrics.pop())
++    def test_receive_distinct_metric(self):
++        """
++        A distinct metric message takes the form:
++        '<name>:<item>|d'.
++        'distinct' indicates this is a distinct metric message.
++        """
++        self.processor.process("gorets:one|d")
++        self.assertEqual(1, len(self.processor.distinct_metrics))
++        self.assertTrue(self.processor.distinct_metrics["gorets"].count() > 0)
++
      def test_receive_message_no_fields(self):
          """
          If a timer message has no fields, it is logged and discarded.
@@ -227,6 +237,22 @@
              "statsd.numStats 1 42", messages[1].splitlines()[0])
          self.assertEqual(0, len(self.processor.gauge_metrics))
++    def test_flush_distinct_metric(self):
++        """
++        Test the correct rendering of the Graphite report for
++        a distinct metric.
++        """
++
++        self.processor.process("gorets:item|d")
++
++        messages = self.processor.flush()
++        self.assertEqual(2, len(messages))
++        metrics = messages[0]
++        self.assertTrue("stats.distinct.gorets.count " in metrics)
++        self.assertTrue("stats.distinct.gorets.count_1hour" in metrics)
++        self.assertTrue("stats.distinct.gorets.count_1min" in metrics)
++        self.assertTrue("stats.distinct.gorets.count_1day" in metrics)
++
  class FlushMeterMetricMessagesTest(TestCase):