Merge lp:~theiw/txstatsd/txstatsd-coda-hale-timer into lp:txstatsd
- txstatsd-coda-hale-timer
- Merge into trunk
Status: | Merged |
---|---|
Approved by: | Ian Wilkinson |
Approved revision: | 35 |
Merged at revision: | 35 |
Proposed branch: | lp:~theiw/txstatsd/txstatsd-coda-hale-timer |
Merge into: | lp:txstatsd |
Diff against target: |
1286 lines (+1015/-94) 16 files modified
txstatsd/metrics/countermetric.py (+1/-1) txstatsd/metrics/extendedmetrics.py (+11/-0) txstatsd/metrics/histogrammetric.py (+195/-0) txstatsd/metrics/timermetric.py (+153/-0) txstatsd/server/configurableprocessor.py (+28/-17) txstatsd/server/processor.py (+7/-3) txstatsd/stats/exponentiallydecayingsample.py (+124/-0) txstatsd/stats/uniformsample.py (+45/-0) txstatsd/tests/metrics/__init__.py (+2/-0) txstatsd/tests/metrics/test_histogrammetric.py (+89/-0) txstatsd/tests/metrics/test_timermetric.py (+139/-0) txstatsd/tests/stats/test_exponentiallydecayingsample.py (+47/-0) txstatsd/tests/stats/test_uniformsample.py (+32/-0) txstatsd/tests/test_configurableprocessor.py (+141/-0) txstatsd/tests/test_processor.py (+0/-72) txstatsd/version.py (+1/-1) |
To merge this branch: | bzr merge lp:~theiw/txstatsd/txstatsd-coda-hale-timer |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Lucio Torre (community) | Approve | ||
Sidnei da Silva | Approve | ||
Review via email: mp+75333@code.launchpad.net |
Commit message
Completes the support for the Coda Hale metrics. The inclusion of the timer metric brings a number of dependencies: the histogram metric, and some stats support.
This also fixes an issue in updating the counter metric reporter.
Description of the change
Completes the support for the Coda Hale metrics. The inclusion of the timer metric brings a number of dependencies: the histogram metric, and some stats support.
This also fixes an issue in updating the counter metric reporter.
Lucio Torre (lucio.torre) wrote : | # |
Just for the record:
1)
674 + def update(self, value):
675 + self._count += 1
676 + if self._count <= len(self._values):
677 + self._values[
678 + else:
679 + r = random.randint(1, sys.maxint) % self._count
680 + if r < len(self._values):
681 + self._values[r] = value
If we received 1 million values and reservoir_size == 100, then we will only update the values once every 10k values. And when we get to 10M values, we will only update the values once every 100k values.
This seems to be an uniform sampling of the whole history of the values. The problem here is that we dont keep the whole history, just the data since the last time we reset txstatsd. And given a systemic change in the observed system, which could affect the values significantly, we would only change the derived stat in a way inversely proportional to the size of the history. So this stat would en up showing more information about when we last restarted txstatsd than the real value we are trying to track.
So, the question is, whats the use case for uniformly sampled histograms?
Same goes for mean and all the other metrics implemented in the HistogramMetric
2) maybe minor. with percentiles we approximate the CDF of the random variable, not the PDF. And from the term histogram i imagine we get to see the PDF.
Examples:
http://
http://
Which one of the two reminds you more of a histogram for a normal distribution?
Ian Wilkinson (theiw) wrote : | # |
This is the response I received from a friend more knowledgable on stats than myself:
"1) Using historical data can be fine but it depends on the process generating the data. The buzzword here is stationarity. If the data generating process doesn't change over time then using historical data is OK. If the process has changed then using that data is a bad idea. Basically, if there is some sort of trend then you have problems.
2) Looking at the code the fact that it's giving you the percentiles would imply it's a cumulative histogram...which is still a type of histogram! As for it being an estimate of the CDF, it will be under some conditions on the data generating process but this is non-parametrics and I don't know an awful lot about them."
I have also posted Coda Hale on the historical data concern.
On Monday, 26 September 2011 at 15:41, Lucio Torre wrote:
> Just for the record:
> 1)
>
> 674 + def update(self, value):
> 675 + self._count += 1
> 676 + if self._count <= len(self._values):
> 677 + self._values[
> 678 + else:
> 679 + r = random.randint(1, sys.maxint) % self._count
> 680 + if r < len(self._values):
> 681 + self._values[r] = value
>
> If we received 1 million values and reservoir_size == 100, then we will only update the values once every 10k values. And when we get to 10M values, we will only update the values once every 100k values.
>
> This seems to be an uniform sampling of the whole history of the values. The problem here is that we dont keep the whole history, just the data since the last time we reset txstatsd. And given a systemic change in the observed system, which could affect the values significantly, we would only change the derived stat in a way inversely proportional to the size of the history. So this stat would en up showing more information about when we last restarted txstatsd than the real value we are trying to track.
>
> So, the question is, whats the use case for uniformly sampled histograms?
>
> Same goes for mean and all the other metrics implemented in the HistogramMetric
>
> 2) maybe minor. with percentiles we approximate the CDF of the random variable, not the PDF. And from the term histogram i imagine we get to see the PDF.
>
> Examples:
> http://
> http://
> Which one of the two reminds you more of a histogram for a normal distribution?
> --
> https:/
> You are the owner of lp:~theiw/txstatsd/txstatsd-coda-hale-timer.
Lucio Torre (lucio.torre) wrote : | # |
i think we can land this so we can unblock clients using this functionality.
Still, we should continue investigating this and maybe make any changes necessary to support most valuable stats.
I think that for each stat that we present we *must* be able to understand what it means and what data is taken into account to produce it (and how its merged too).
We we should have a page explaining in easy to understand terms what each metric is.
Preview Diff
1 | === modified file 'txstatsd/metrics/countermetric.py' |
2 | --- txstatsd/metrics/countermetric.py 2011-09-09 15:49:05 +0000 |
3 | +++ txstatsd/metrics/countermetric.py 2011-09-14 12:06:24 +0000 |
4 | @@ -67,7 +67,7 @@ |
5 | self.count = 0 |
6 | |
7 | def mark(self, value): |
8 | - self.count += value |
9 | + self.count = value |
10 | |
11 | def report(self, timestamp): |
12 | return self.message % { |
13 | |
14 | === modified file 'txstatsd/metrics/extendedmetrics.py' |
15 | --- txstatsd/metrics/extendedmetrics.py 2011-09-09 15:17:09 +0000 |
16 | +++ txstatsd/metrics/extendedmetrics.py 2011-09-14 12:06:24 +0000 |
17 | @@ -1,5 +1,6 @@ |
18 | |
19 | from txstatsd.metrics.countermetric import CounterMetric |
20 | +from txstatsd.metrics.timermetric import TimerMetric |
21 | from txstatsd.metrics.metrics import Metrics |
22 | |
23 | |
24 | @@ -37,3 +38,13 @@ |
25 | sample_rate) |
26 | self._metrics[name] = metric |
27 | self._metrics[name].decrement(value) |
28 | + |
29 | + def timing(self, name, duration, sample_rate=1): |
30 | + """Report this sample performed in duration ms.""" |
31 | + name = self.fully_qualify_name(name) |
32 | + if not name in self._metrics: |
33 | + metric = TimerMetric(self.connection, |
34 | + name, |
35 | + sample_rate) |
36 | + self._metrics[name] = metric |
37 | + self._metrics[name].mark(duration) |
38 | |
39 | === added file 'txstatsd/metrics/histogrammetric.py' |
40 | --- txstatsd/metrics/histogrammetric.py 1970-01-01 00:00:00 +0000 |
41 | +++ txstatsd/metrics/histogrammetric.py 2011-09-14 12:06:24 +0000 |
42 | @@ -0,0 +1,195 @@ |
43 | + |
44 | +import math |
45 | +import sys |
46 | + |
47 | +from string import Template |
48 | + |
49 | +from txstatsd.stats.exponentiallydecayingsample \ |
50 | + import ExponentiallyDecayingSample |
51 | +from txstatsd.stats.uniformsample import UniformSample |
52 | + |
53 | + |
54 | +class HistogramMetricReporter(object): |
55 | + """ |
56 | + A metric which calculates the distribution of a value. |
57 | + |
58 | + See: |
59 | + - U{Accurately computing running variance |
60 | + <http://www.johndcook.com/standard_deviation.html>} |
61 | + """ |
62 | + |
63 | + MESSAGE = ( |
64 | + "$prefix%(key)s.min %(min)s %(timestamp)s\n" |
65 | + "$prefix%(key)s.max %(max)s %(timestamp)s\n" |
66 | + "$prefix%(key)s.mean %(mean)s %(timestamp)s\n" |
67 | + "$prefix%(key)s.stddev %(lower)s %(timestamp)s\n" |
68 | + "$prefix%(key)s.median %(count)s %(timestamp)s\n" |
69 | + "$prefix%(key)s.75percentile %(75percentile)s %(timestamp)s\n" |
70 | + "$prefix%(key)s.95percentile %(95percentile)s %(timestamp)s\n" |
71 | + "$prefix%(key)s.98percentile %(98percentile)s %(timestamp)s\n" |
72 | + "$prefix%(key)s.99percentile %(99percentile)s %(timestamp)s\n" |
73 | + "$prefix%(key)s.999percentile %(999percentile)s %(timestamp)s\n") |
74 | + |
75 | + @classmethod |
76 | + def using_uniform_sample(cls, prefix=""): |
77 | + """ |
78 | + Uses a uniform sample of 1028 elements, which offers a 99.9% |
79 | + confidence level with a 5% margin of error assuming a normal |
80 | + distribution. |
81 | + """ |
82 | + sample = UniformSample(1028) |
83 | + return HistogramMetricReporter(sample, prefix=prefix) |
84 | + |
85 | + @classmethod |
86 | + def using_exponentially_decaying_sample(cls, prefix=""): |
87 | + """ |
88 | + Uses an exponentially decaying sample of 1028 elements, which offers |
89 | + a 99.9% confidence level with a 5% margin of error assuming a normal |
90 | + distribution, and an alpha factor of 0.015, which heavily biases |
91 | + the sample to the past 5 minutes of measurements. |
92 | + """ |
93 | + sample = ExponentiallyDecayingSample(1028, 0.015) |
94 | + return HistogramMetricReporter(sample, prefix=prefix) |
95 | + |
96 | + def __init__(self, sample, prefix=""): |
97 | + """Creates a new HistogramMetric with the given sample. |
98 | + |
99 | + @param sample: The sample to create a histogram from. |
100 | + """ |
101 | + self.sample = sample |
102 | + |
103 | + if prefix: |
104 | + prefix += '.' |
105 | + self.message = Template(HistogramMetricReporter.MESSAGE).substitute( |
106 | + prefix=prefix) |
107 | + |
108 | + self._min = 0 |
109 | + self._max = 0 |
110 | + self._sum = 0 |
111 | + |
112 | + # These are for the Welford algorithm for calculating running |
113 | + # variance without floating-point doom. |
114 | + self.variance = [-1.0, 0.0] # M, S |
115 | + self.count = 0 |
116 | + |
117 | + self.clear() |
118 | + |
119 | + def clear(self): |
120 | + """Clears all recorded values.""" |
121 | + self.sample.clear() |
122 | + self.count = 0 |
123 | + self._max = None |
124 | + self._min = None |
125 | + self._sum = 0 |
126 | + self.variance = [-1.0, 0.0] |
127 | + |
128 | + def update(self, value, name=""): |
129 | + """Adds a recorded value. |
130 | + |
131 | + @param value: The length of the value. |
132 | + """ |
133 | + self.count += 1 |
134 | + self.sample.update(value) |
135 | + self.set_max(value) |
136 | + self.set_min(value) |
137 | + self._sum += value |
138 | + self.update_variance(value) |
139 | + |
140 | + def report(self, timestamp): |
141 | + # median, 75, 95, 98, 99, 99.9 percentile |
142 | + percentiles = self.percentiles(0.5, 0.75, 0.95, 0.98, 0.99, 0.999) |
143 | + |
144 | + return self.message % { |
145 | + "key": self.name, |
146 | + "min": self.min(), |
147 | + "max": self.max(), |
148 | + "mean": self.mean(), |
149 | + "stddev": self.std_dev(), |
150 | + "median": percentiles[0], |
151 | + "75percentile": percentiles[1], |
152 | + "95percentile": percentiles[2], |
153 | + "98percentile": percentiles[3], |
154 | + "99percentile": percentiles[4], |
155 | + "999percentile": percentiles[5], |
156 | + "timestamp": timestamp} |
157 | + |
158 | + def min(self): |
159 | + """Returns the smallest recorded value.""" |
160 | + return self._min if self.count > 0 else 0.0 |
161 | + |
162 | + def max(self): |
163 | + """Returns the largest recorded value.""" |
164 | + return self._max if self.count > 0 else 0.0 |
165 | + |
166 | + def mean(self): |
167 | + """Returns the arithmetic mean of all recorded values.""" |
168 | + return float(self._sum) / self.count if self.count > 0 else 0.0 |
169 | + |
170 | + def std_dev(self): |
171 | + """Returns the standard deviation of all recorded values.""" |
172 | + return math.sqrt(self.get_variance()) if self.count > 0 else 0.0 |
173 | + |
174 | + def percentiles(self, *percentiles): |
175 | + """Returns a list of values at the given percentiles. |
176 | + |
177 | + @param percentiles one or more percentiles |
178 | + """ |
179 | + |
180 | + scores = [0.0 for i in range(len(percentiles))] |
181 | + if self.count > 0: |
182 | + values = self.sample.get_values() |
183 | + values.sort() |
184 | + |
185 | + for i in range(len(percentiles)): |
186 | + p = percentiles[i] |
187 | + pos = p * (len(values) + 1) |
188 | + if pos < 1: |
189 | + scores[i] = values[0] |
190 | + elif pos >= len(values): |
191 | + scores[i] = values[-1] |
192 | + else: |
193 | + lower = values[int(pos) - 1] |
194 | + upper = values[int(pos)] |
195 | + scores[i] = lower + (pos - math.floor(pos)) * ( |
196 | + upper - lower) |
197 | + |
198 | + return scores |
199 | + |
200 | + def get_values(self): |
201 | + """Returns a list of all values in the histogram's sample.""" |
202 | + return self.sample.get_values() |
203 | + |
204 | + def get_variance(self): |
205 | + if self.count <= 1: |
206 | + return 0.0 |
207 | + return self.variance[1] / (self.count - 1) |
208 | + |
209 | + def set_max(self, potential_max): |
210 | + if self._max is None: |
211 | + self._max = potential_max |
212 | + else: |
213 | + self._max = max(self.max(), potential_max) |
214 | + |
215 | + def set_min(self, potential_min): |
216 | + if self._min is None: |
217 | + self._min = potential_min |
218 | + else: |
219 | + self._min = min(self.min(), potential_min) |
220 | + |
221 | + def update_variance(self, value): |
222 | + old_values = self.variance |
223 | + new_values = [0.0, 0.0] |
224 | + if old_values[0] == -1: |
225 | + new_values[0] = value |
226 | + new_values[1] = 0.0 |
227 | + else: |
228 | + old_m = old_values[0] |
229 | + old_s = old_values[1] |
230 | + |
231 | + new_m = old_m + (float(value - old_m) / self.count) |
232 | + new_s = old_s + (float(value - old_m) * (value - new_m)) |
233 | + |
234 | + new_values[0] = new_m |
235 | + new_values[1] = new_s |
236 | + |
237 | + self.variance = new_values |
238 | \ No newline at end of file |
239 | |
240 | === added file 'txstatsd/metrics/timermetric.py' |
241 | --- txstatsd/metrics/timermetric.py 1970-01-01 00:00:00 +0000 |
242 | +++ txstatsd/metrics/timermetric.py 2011-09-14 12:06:24 +0000 |
243 | @@ -0,0 +1,153 @@ |
244 | + |
245 | +from string import Template |
246 | + |
247 | +import time |
248 | + |
249 | +from txstatsd.metrics.histogrammetric import HistogramMetricReporter |
250 | +from txstatsd.metrics.metermetric import MeterMetricReporter |
251 | +from txstatsd.metrics.metric import Metric |
252 | +from txstatsd.stats.exponentiallydecayingsample \ |
253 | + import ExponentiallyDecayingSample |
254 | + |
255 | + |
256 | +class TimerMetric(Metric): |
257 | + """ |
258 | + A timer metric which aggregates timing durations and provides duration |
259 | + statistics, plus throughput statistics via L{MeterMetric}. |
260 | + """ |
261 | + |
262 | + def __init__(self, connection, name, sample_rate=1): |
263 | + """Construct a metric that reports samples to the supplied |
264 | + C{connection}. |
265 | + |
266 | + @param connection: The connection endpoint representing |
267 | + the StatsD server. |
268 | + @param name: Indicates what is being instrumented. |
269 | + @param sample_rate: Restrict the number of samples sent |
270 | + to the StatsD server based on the supplied C{sample_rate}. |
271 | + """ |
272 | + Metric.__init__(self, connection, name, sample_rate=sample_rate) |
273 | + |
274 | + def mark(self, duration): |
275 | + """Report this sample performed in duration (measured in seconds).""" |
276 | + self.send("%s|ms" % duration) |
277 | + |
278 | + |
279 | +class TimerMetricReporter(object): |
280 | + """ |
281 | + A timer metric which aggregates timing durations and provides duration |
282 | + statistics, plus throughput statistics via L{MeterMetricReporter}. |
283 | + """ |
284 | + |
285 | + MESSAGE = ( |
286 | + "$prefix%(key)s.min %(min)s %(timestamp)s\n" |
287 | + "$prefix%(key)s.max %(max)s %(timestamp)s\n" |
288 | + "$prefix%(key)s.mean %(mean)s %(timestamp)s\n" |
289 | + "$prefix%(key)s.stddev %(stddev)s %(timestamp)s\n" |
290 | + "$prefix%(key)s.median %(median)s %(timestamp)s\n" |
291 | + "$prefix%(key)s.75percentile %(75percentile)s %(timestamp)s\n" |
292 | + "$prefix%(key)s.95percentile %(95percentile)s %(timestamp)s\n" |
293 | + "$prefix%(key)s.98percentile %(98percentile)s %(timestamp)s\n" |
294 | + "$prefix%(key)s.99percentile %(99percentile)s %(timestamp)s\n" |
295 | + "$prefix%(key)s.999percentile %(999percentile)s %(timestamp)s\n") |
296 | + |
297 | + def __init__(self, name, wall_time_func=time.time, prefix=""): |
298 | + """Construct a metric we expect to be periodically updated. |
299 | + |
300 | + @param name: Indicates what is being instrumented. |
301 | + @param wall_time_func: Function for obtaining wall time. |
302 | + @param prefix: If present, a string to prepend to the message |
303 | + composed when C{report} is called. |
304 | + """ |
305 | + self.name = name |
306 | + self.wall_time_func = wall_time_func |
307 | + |
308 | + if prefix: |
309 | + prefix += '.' |
310 | + self.message = Template(TimerMetricReporter.MESSAGE).substitute( |
311 | + prefix=prefix) |
312 | + |
313 | + sample = ExponentiallyDecayingSample(1028, 0.015) |
314 | + self.histogram = HistogramMetricReporter(sample) |
315 | + self.meter = MeterMetricReporter( |
316 | + 'calls', wall_time_func=self.wall_time_func) |
317 | + self.clear() |
318 | + |
319 | + def clear(self): |
320 | + """Clears all recorded durations.""" |
321 | + self.histogram.clear() |
322 | + |
323 | + def count(self): |
324 | + return self.histogram.count |
325 | + |
326 | + def fifteen_minute_rate(self): |
327 | + return self.meter.fifteen_minute_rate() |
328 | + |
329 | + def five_minute_rate(self): |
330 | + return self.meter.five_minute_rate() |
331 | + |
332 | + def mean_rate(self): |
333 | + return self.meter.mean_rate() |
334 | + |
335 | + def one_minute_rate(self): |
336 | + return self.meter.one_minute_rate() |
337 | + |
338 | + def max(self): |
339 | + """Returns the longest recorded duration.""" |
340 | + return self.histogram.max() |
341 | + |
342 | + def min(self): |
343 | + """Returns the shortest recorded duration.""" |
344 | + return self.histogram.min() |
345 | + |
346 | + def mean(self): |
347 | + """Returns the arithmetic mean of all recorded durations.""" |
348 | + return self.histogram.mean() |
349 | + |
350 | + def std_dev(self): |
351 | + """Returns the standard deviation of all recorded durations.""" |
352 | + return self.histogram.std_dev() |
353 | + |
354 | + def percentiles(self, *percentiles): |
355 | + """ |
356 | + Returns an array of durations at the given percentiles. |
357 | + |
358 | + @param percentiles: One or more percentiles. |
359 | + """ |
360 | + return [percentile for percentile in |
361 | + self.histogram.percentiles(*percentiles)] |
362 | + |
363 | + def get_values(self): |
364 | + """Returns a list of all recorded durations in the timer's sample.""" |
365 | + return [value for value in self.histogram.get_values()] |
366 | + |
367 | + def update(self, duration): |
368 | + """Adds a recorded duration. |
369 | + |
370 | + @param duration: The length of the duration in seconds. |
371 | + """ |
372 | + if duration >= 0: |
373 | + self.histogram.update(duration) |
374 | + self.meter.mark() |
375 | + |
376 | + def tick(self): |
377 | + """Updates the moving averages.""" |
378 | + self.meter.tick() |
379 | + |
380 | + def report(self, timestamp): |
381 | + # median, 75, 95, 98, 99, 99.9 percentile |
382 | + percentiles = self.percentiles(0.5, 0.75, 0.95, 0.98, 0.99, 0.999) |
383 | + |
384 | + return self.message % { |
385 | + "key": self.name, |
386 | + "min": self.min(), |
387 | + "max": self.max(), |
388 | + "mean": self.mean(), |
389 | + "stddev": self.std_dev(), |
390 | + "median": percentiles[0], |
391 | + "75percentile": percentiles[1], |
392 | + "95percentile": percentiles[2], |
393 | + "98percentile": percentiles[3], |
394 | + "99percentile": percentiles[4], |
395 | + "999percentile": percentiles[5], |
396 | + "timestamp": timestamp} |
397 | |
398 | === modified file 'txstatsd/server/configurableprocessor.py' |
399 | --- txstatsd/server/configurableprocessor.py 2011-09-09 15:17:09 +0000 |
400 | +++ txstatsd/server/configurableprocessor.py 2011-09-14 12:06:24 +0000 |
401 | @@ -5,6 +5,7 @@ |
402 | |
403 | from txstatsd.metrics.countermetric import CounterMetricReporter |
404 | from txstatsd.metrics.metermetric import MeterMetricReporter |
405 | +from txstatsd.metrics.timermetric import TimerMetricReporter |
406 | from txstatsd.server.processor import MessageProcessor |
407 | |
408 | |
409 | @@ -12,25 +13,17 @@ |
410 | """ |
411 | This specialised C{MessageProcessor} supports behaviour |
412 | that is not StatsD-compliant. |
413 | + |
414 | Currently, this extends to: |
415 | - Allow a prefix to be added to the composed messages sent |
416 | to the Graphite server. |
417 | - Report an incrementing and decrementing counter metric. |
418 | + - Report a timer metric which aggregates timing durations and provides |
419 | + duration statistics, plus throughput statistics. |
420 | """ |
421 | |
422 | - # Notice: These messages replicate those seen in the |
423 | - # MessageProcessor (excepting the prefix identifier). |
424 | - # In a future release they will be placed in their |
425 | - # respective metric reporter class. |
426 | - # See MeterMetricReporter. |
427 | - TIMERS_MESSAGE = ( |
428 | - "$prefix%(key)s.mean %(mean)s %(timestamp)s\n" |
429 | - "$prefix%(key)s.upper %(upper)s %(timestamp)s\n" |
430 | - "$prefix%(key)s.upper_%(percent)s %(threshold_upper)s" |
431 | - " %(timestamp)s\n" |
432 | - "$prefix%(key)s.lower %(lower)s %(timestamp)s\n" |
433 | - "$prefix%(key)s.count %(count)s %(timestamp)s\n") |
434 | - |
435 | + #TODO Provide a GaugeMetricReporter (in the manner of the other |
436 | + # Coda Hale metrics). |
437 | GAUGE_METRIC_MESSAGE = ( |
438 | "$prefix%(key)s.value %(value)s %(timestamp)s\n") |
439 | |
440 | @@ -41,14 +34,16 @@ |
441 | if message_prefix: |
442 | message_prefix += '.' |
443 | |
444 | - message = Template(ConfigurableMessageProcessor.TIMERS_MESSAGE) |
445 | - self.timers_message = message.substitute( |
446 | - prefix=message_prefix) |
447 | - |
448 | message = Template(ConfigurableMessageProcessor.GAUGE_METRIC_MESSAGE) |
449 | self.gauge_metric_message = message.substitute( |
450 | prefix=message_prefix) |
451 | |
452 | + def compose_timer_metric(self, key, duration): |
453 | + if not key in self.timer_metrics: |
454 | + metric = TimerMetricReporter(key, prefix=self.message_prefix) |
455 | + self.timer_metrics[key] = metric |
456 | + self.timer_metrics[key].update(duration) |
457 | + |
458 | def process_counter_metric(self, key, composite, message): |
459 | try: |
460 | value = float(composite[0]) |
461 | @@ -79,3 +74,19 @@ |
462 | events += 1 |
463 | |
464 | return (metrics, events) |
465 | + |
466 | + def flush_timer_metrics(self, percent, timestamp): |
467 | + metrics = [] |
468 | + events = 0 |
469 | + for metric in self.timer_metrics.itervalues(): |
470 | + message = metric.report(timestamp) |
471 | + metrics.append(message) |
472 | + events += 1 |
473 | + |
474 | + return (metrics, events) |
475 | + |
476 | + def update_metrics(self): |
477 | + super(ConfigurableMessageProcessor, self).update_metrics() |
478 | + |
479 | + for metric in self.timer_metrics.itervalues(): |
480 | + metric.tick() |
481 | |
482 | === modified file 'txstatsd/server/processor.py' |
483 | --- txstatsd/server/processor.py 2011-09-09 15:17:09 +0000 |
484 | +++ txstatsd/server/processor.py 2011-09-14 12:06:24 +0000 |
485 | @@ -94,14 +94,18 @@ |
486 | else: |
487 | return self.fail(message) |
488 | |
489 | - def process_timer_metric(self, key, value, message): |
490 | + def process_timer_metric(self, key, duration, message): |
491 | try: |
492 | - value = float(value) |
493 | + duration = float(duration) |
494 | except (TypeError, ValueError): |
495 | return self.fail(message) |
496 | + |
497 | + self.compose_timer_metric(key, duration) |
498 | + |
499 | + def compose_timer_metric(self, key, duration): |
500 | if key not in self.timer_metrics: |
501 | self.timer_metrics[key] = [] |
502 | - self.timer_metrics[key].append(value) |
503 | + self.timer_metrics[key].append(duration) |
504 | |
505 | def process_counter_metric(self, key, composite, message): |
506 | try: |
507 | |
508 | === added file 'txstatsd/stats/exponentiallydecayingsample.py' |
509 | --- txstatsd/stats/exponentiallydecayingsample.py 1970-01-01 00:00:00 +0000 |
510 | +++ txstatsd/stats/exponentiallydecayingsample.py 2011-09-14 12:06:24 +0000 |
511 | @@ -0,0 +1,124 @@ |
512 | + |
513 | +import math |
514 | +import random |
515 | +import time |
516 | + |
517 | + |
518 | +class ExponentiallyDecayingSample(object): |
519 | + """ |
520 | + An exponentially-decaying random sample of values. Uses Cormode et |
521 | + al's forward-decaying priority reservoir sampling method to produce a |
522 | + statistically representative sample, exponentially biased towards newer |
523 | + entries. |
524 | + |
525 | + See: |
526 | + - U{Cormode et al. Forward Decay: A Practical Time Decay Model for |
527 | + Streaming Systems. ICDE '09: Proceedings of the 2009 IEEE International |
528 | + Conference on Data Engineering (2009) |
529 | + <http://www.research.att.com/people/Cormode_Graham/ |
530 | + library/publications/CormodeShkapenyukSrivastavaXu09.pdf>} |
531 | + """ |
532 | + |
533 | + # 1 hour (in seconds) |
534 | + RESCALE_THRESHOLD = 60 * 60 |
535 | + |
536 | + def __init__(self, reservoir_size, alpha): |
537 | + """Creates a new C{ExponentiallyDecayingSample}. |
538 | + |
539 | + @param reservoir_size: The number of samples to keep in the sampling |
540 | + reservoir. |
541 | + @parama alpha: The exponential decay factor; the higher this is, |
542 | + the more biased the sample will be towards newer values. |
543 | + """ |
544 | + self._values = dict() |
545 | + self.alpha = alpha |
546 | + self.reservoir_size = reservoir_size |
547 | + |
548 | + self.count = 0 |
549 | + self.start_time = 0 |
550 | + self.next_scale_time = 0 |
551 | + |
552 | + self.clear() |
553 | + |
554 | + def clear(self): |
555 | + self._values.clear() |
556 | + self.count = 0 |
557 | + self.start_time = self.tick() |
558 | + self.next_scale_time = ( |
559 | + time.time() + ExponentiallyDecayingSample.RESCALE_THRESHOLD) |
560 | + |
561 | + def size(self): |
562 | + return min(self.reservoir_size, self.count) |
563 | + |
564 | + def update(self, value, timestamp=None): |
565 | + """Adds an old value with a fixed timestamp to the sample. |
566 | + |
567 | + @param value: The value to be added. |
568 | + @param timestamp: The epoch timestamp of *value* in seconds. |
569 | + """ |
570 | + |
571 | + if timestamp is None: |
572 | + timestamp = self.tick() |
573 | + |
574 | + priority = self.weight(timestamp - self.start_time) / random.random() |
575 | + self.count += 1 |
576 | + new_count = self.count |
577 | + if new_count <= self.reservoir_size: |
578 | + self._values[priority] = value |
579 | + else: |
580 | + keys = sorted(self._values.keys()) |
581 | + first = keys[0] |
582 | + |
583 | + if first < priority: |
584 | + if priority not in self._values: |
585 | + self._values[priority] = value |
586 | + del self._values[first] |
587 | + |
588 | + now = time.time() |
589 | + next = self.next_scale_time |
590 | + if now >= next: |
591 | + self.rescale(now, next) |
592 | + |
593 | + def get_values(self): |
594 | + keys = sorted(self._values.keys()) |
595 | + return [self._values[k] for k in keys] |
596 | + |
597 | + def tick(self): |
598 | + return time.time() |
599 | + |
600 | + def weight(self, t): |
601 | + return math.exp(self.alpha * t) |
602 | + |
603 | + def rescale(self, now, next): |
604 | + """ |
605 | + A common feature of the above techniques - indeed, the key technique |
606 | + that allows us to track the decayed weights efficiently - is that they |
607 | + maintain counts and other quantities based on g(ti - L), and only |
608 | + scale by g(t - L) at query time. But while g(ti - L)/g(t-L) is |
609 | + guaranteed to lie between zero and one, the intermediate values of |
610 | + g(ti - L) could become very large. For polynomial functions, these |
611 | + values should not grow too large, and should be effectively |
612 | + represented in practice by floating point values without loss of |
613 | + precision. For exponential functions, these values could grow quite |
614 | + large as new values of (ti - L) become large, and potentially exceed |
615 | + the capacity of common floating point types. However, since the values |
616 | + stored by the algorithms are linear combinations of g values (scaled |
617 | + sums), they can be rescaled relative to a new landmark. That is, by |
618 | + the analysis of exponential decay in Section III-A, the choice of L |
619 | + does not affect the final result. We can therefore multiply each value |
620 | + based on L by a factor of exp(-alpha(L' - L)), and obtain the correct |
621 | + value as if we had instead computed relative to a new landmark L' (and |
622 | + then use this new L' at query time). This can be done with a linear |
623 | + pass over whatever data structure is being used. |
624 | + """ |
625 | + |
626 | + self.next_scale_time = ( |
627 | + now + ExponentiallyDecayingSample.RESCALE_THRESHOLD) |
628 | + old_start_time = self.start_time |
629 | + self.start_time = self.tick() |
630 | + keys = sorted(self._values.keys()) |
631 | + for k in keys: |
632 | + v = self._values[k] |
633 | + del self._values[k] |
634 | + self._values[k * math.exp(-self.alpha * |
635 | + (self.start_time - old_start_time))] = v |
636 | |
637 | === added file 'txstatsd/stats/uniformsample.py' |
638 | --- txstatsd/stats/uniformsample.py 1970-01-01 00:00:00 +0000 |
639 | +++ txstatsd/stats/uniformsample.py 2011-09-14 12:06:24 +0000 |
640 | @@ -0,0 +1,45 @@ |
641 | + |
642 | +import random |
643 | +import sys |
644 | + |
645 | + |
646 | +class UniformSample(object): |
647 | + """ |
648 | + A random sample of a stream of values. Uses Vitter's Algorithm R to |
649 | + produce a statistically representative sample. |
650 | + |
651 | + See: |
652 | + - U{Random Sampling with a Reservoir |
653 | + <http://www.cs.umd.edu/~samir/498/vitter.pdf>} |
654 | + """ |
655 | + |
656 | + def __init__(self, reservoir_size): |
657 | + """Creates a new C{UniformSample}. |
658 | + |
659 | + @param reservoir_size: The number of samples to keep in the sampling |
660 | + reservoir. |
661 | + """ |
662 | + self._values = [0 for i in range(reservoir_size)] |
663 | + self._count = 0 |
664 | + self.clear() |
665 | + |
666 | + def clear(self): |
667 | + self._values = [0 for i in range(len(self._values))] |
668 | + self._count = 0 |
669 | + |
670 | + def size(self): |
671 | + c = self._count |
672 | + return len(self._values) if c > len(self._values) else c |
673 | + |
674 | + def update(self, value): |
675 | + self._count += 1 |
676 | + if self._count <= len(self._values): |
677 | + self._values[self._count - 1] = value |
678 | + else: |
679 | + r = random.randint(1, sys.maxint) % self._count |
680 | + if r < len(self._values): |
681 | + self._values[r] = value |
682 | + |
683 | + def get_values(self): |
684 | + s = self.size() |
685 | + return [self._values[i] for i in range(0, s)] |
686 | |
687 | === added directory 'txstatsd/tests/metrics' |
688 | === added file 'txstatsd/tests/metrics/__init__.py' |
689 | --- txstatsd/tests/metrics/__init__.py 1970-01-01 00:00:00 +0000 |
690 | +++ txstatsd/tests/metrics/__init__.py 2011-09-14 12:06:24 +0000 |
691 | @@ -0,0 +1,2 @@ |
692 | + |
693 | + |
694 | |
695 | === added file 'txstatsd/tests/metrics/test_histogrammetric.py' |
696 | --- txstatsd/tests/metrics/test_histogrammetric.py 1970-01-01 00:00:00 +0000 |
697 | +++ txstatsd/tests/metrics/test_histogrammetric.py 2011-09-14 12:06:24 +0000 |
698 | @@ -0,0 +1,89 @@ |
699 | + |
700 | +import math |
701 | +from unittest import TestCase |
702 | + |
703 | +from txstatsd.metrics.histogrammetric import HistogramMetricReporter |
704 | +from txstatsd.stats.uniformsample import UniformSample |
705 | + |
706 | + |
707 | +class TestHistogramReporterMetric(TestCase): |
708 | + def test_histogram_with_zero_recorded_values(self): |
709 | + sample = UniformSample(100) |
710 | + histogram = HistogramMetricReporter(sample) |
711 | + |
712 | + self.assertEqual(histogram.count, 0, 'Should have a count of 0') |
713 | + self.assertEqual(histogram.max(), 0, |
714 | + 'Should have a max of 0') |
715 | + self.assertEqual(histogram.min(), 0, |
716 | + 'Should have a min of 0') |
717 | + |
718 | + self.assertEqual(histogram.mean(), 0, |
719 | + 'Should have a mean of 0') |
720 | + |
721 | + self.assertEqual(histogram.std_dev(), 0, |
722 | + 'Should have a standard deviation of 0') |
723 | + |
724 | + percentiles = histogram.percentiles(0.5, 0.75, 0.99) |
725 | + self.assertTrue( |
726 | + (math.fabs(percentiles[0] - 0) < 0.01), |
727 | + 'Should calculate percentiles') |
728 | + self.assertTrue( |
729 | + (math.fabs(percentiles[1] - 0) < 0.01), |
730 | + 'Should calculate percentiles') |
731 | + self.assertTrue( |
732 | + (math.fabs(percentiles[2] - 0) < 0.01), |
733 | + 'Should calculate percentiles') |
734 | + |
735 | + self.assertEqual(len(histogram.get_values()), 0, |
736 | + 'Should have no values') |
737 | + |
738 | + def test_histogram_of_numbers_1_through_10000(self): |
739 | + sample = UniformSample(100000) |
740 | + histogram = HistogramMetricReporter(sample) |
741 | + for i in range(1, 10001): |
742 | + histogram.update(i) |
743 | + |
744 | + self.assertEqual(histogram.count, 10000, |
745 | + 'Should have a count of 10000') |
746 | + |
747 | + self.assertEqual(histogram.max(), 10000, |
748 | + 'Should have a max of 10000') |
749 | + self.assertEqual(histogram.min(), 1, |
750 | + 'Should have a min of 1') |
751 | + |
752 | + def test_histogram_of_numbers_1_through_10000(self): |
753 | + sample = UniformSample(100000) |
754 | + histogram = HistogramMetricReporter(sample) |
755 | + for i in range(1, 10001): |
756 | + histogram.update(i) |
757 | + |
758 | + self.assertEqual(histogram.count, 10000, |
759 | + 'Should have a count of 10000') |
760 | + |
761 | + self.assertEqual(histogram.max(), 10000, |
762 | + 'Should have a max of 10000') |
763 | + self.assertEqual(histogram.min(), 1, |
764 | + 'Should have a min of 1') |
765 | + |
766 | + self.assertTrue( |
767 | + (math.fabs(histogram.mean() - 5000.5) < 0.01), |
768 | + 'Should have a mean value of 5000.5') |
769 | + |
770 | + self.assertTrue( |
771 | + (math.fabs(histogram.std_dev() - 2886.89) < 0.01), |
772 | + 'Should have a standard deviation of X') |
773 | + |
774 | + percentiles = histogram.percentiles(0.5, 0.75, 0.99) |
775 | + self.assertTrue( |
776 | + (math.fabs(percentiles[0] - 5000.5) < 0.01), |
777 | + 'Should calculate percentiles') |
778 | + self.assertTrue( |
779 | + (math.fabs(percentiles[1] - 7500.75) < 0.01), |
780 | + 'Should calculate percentiles') |
781 | + self.assertTrue( |
782 | + (math.fabs(percentiles[2] - 9900.99) < 0.01), |
783 | + 'Should calculate percentiles') |
784 | + |
785 | + values = [i for i in range(1, 10001)] |
786 | + self.assertEqual(histogram.get_values(), values, |
787 | + 'Should have 10000 values') |
788 | \ No newline at end of file |
789 | |
790 | === added file 'txstatsd/tests/metrics/test_timermetric.py' |
791 | --- txstatsd/tests/metrics/test_timermetric.py 1970-01-01 00:00:00 +0000 |
792 | +++ txstatsd/tests/metrics/test_timermetric.py 2011-09-14 12:06:24 +0000 |
793 | @@ -0,0 +1,139 @@ |
794 | + |
795 | +import math |
796 | + |
797 | +from twisted.trial.unittest import TestCase |
798 | + |
799 | +from txstatsd.metrics.timermetric import TimerMetricReporter |
800 | + |
801 | + |
802 | +class TestBlankTimerMetric(TestCase): |
803 | + def setUp(self): |
804 | + self.timer = TimerMetricReporter('test') |
805 | + self.timer.tick() |
806 | + |
807 | + def test_max(self): |
808 | + self.assertEqual( |
809 | + self.timer.max(), 0, |
810 | + 'Should have a max of zero') |
811 | + |
812 | + def test_min(self): |
813 | + self.assertEqual( |
814 | + self.timer.min(), 0, |
815 | + 'Should have a min of zero') |
816 | + |
817 | + def test_mean(self): |
818 | + self.assertEqual( |
819 | + self.timer.max(), 0, |
820 | + 'Should have a mean of zero') |
821 | + |
822 | + def test_count(self): |
823 | + self.assertEqual( |
824 | + self.timer.count(), 0, |
825 | + 'Should have a count of zero') |
826 | + |
827 | + def test_std_dev(self): |
828 | + self.assertEqual( |
829 | + self.timer.std_dev(), 0, |
830 | + 'Should have a standard deviation of zero') |
831 | + |
832 | + def test_percentiles(self): |
833 | + percentiles = self.timer.percentiles(0.5, 0.95, 0.98, 0.99, 0.999) |
834 | + self.assertEqual( |
835 | + percentiles[0], 0, |
836 | + 'Should have median of zero') |
837 | + self.assertEqual( |
838 | + percentiles[1], 0, |
839 | + 'Should have p95 of zero') |
840 | + self.assertEqual( |
841 | + percentiles[2], 0, |
842 | + 'Should have p98 of zero') |
843 | + self.assertEqual( |
844 | + percentiles[3], 0, |
845 | + 'Should have p99 of zero') |
846 | + self.assertEqual( |
847 | + percentiles[4], 0, |
848 | + 'Should have p99.9 of zero') |
849 | + |
850 | + def test_mean_rate(self): |
851 | + self.assertEqual( |
852 | + self.timer.mean_rate(), 0, |
853 | + 'Should have a mean rate of zero') |
854 | + |
855 | + def test_one_minute_rate(self): |
856 | + self.assertEqual( |
857 | + self.timer.one_minute_rate(), 0, |
858 | + 'Should have a one-minute rate of zero`') |
859 | + |
860 | + def test_five_minute_rate(self): |
861 | + self.assertEqual( |
862 | + self.timer.five_minute_rate(), 0, |
863 | + 'Should have a five-minute rate of zero') |
864 | + |
865 | + def test_fifteen_minute_rate(self): |
866 | + self.assertEqual( |
867 | + self.timer.fifteen_minute_rate(), 0, |
868 | + 'Should have a fifteen-minute rate of zero') |
869 | + |
870 | + def test_no_values(self): |
871 | + self.assertEqual( |
872 | + len(self.timer.get_values()), 0, |
873 | + 'Should have no values') |
874 | + |
875 | + |
876 | +class TestTimingSeriesEvents(TestCase): |
877 | + def setUp(self): |
878 | + self.timer = TimerMetricReporter('test') |
879 | + self.timer.tick() |
880 | + self.timer.update(10) |
881 | + self.timer.update(20) |
882 | + self.timer.update(20) |
883 | + self.timer.update(30) |
884 | + self.timer.update(40) |
885 | + |
886 | + def test_count(self): |
887 | + self.assertEqual( |
888 | + self.timer.count(), 5, |
889 | + 'Should record the count') |
890 | + |
891 | + def test_min(self): |
892 | + self.assertTrue( |
893 | + (math.fabs(self.timer.min() - 10.0) < 0.001), |
894 | + 'Should calculate the minimum duration') |
895 | + |
896 | + def test_max(self): |
897 | + self.assertTrue( |
898 | + (math.fabs(self.timer.max() - 40.0) < 0.001), |
899 | + 'Should calculate the maximum duration') |
900 | + |
901 | + def test_mean(self): |
902 | + self.assertTrue( |
903 | + (math.fabs(self.timer.mean() - 24.0) < 0.001), |
904 | + 'Should calculate the mean duration') |
905 | + |
906 | + def test_std_dev(self): |
907 | + self.assertTrue( |
908 | + (math.fabs(self.timer.std_dev() - 11.401) < 0.001), |
909 | + 'Should calculate the standard deviation') |
910 | + |
911 | + def test_percentiles(self): |
912 | + percentiles = self.timer.percentiles(0.5, 0.95, 0.98, 0.99, 0.999) |
913 | + self.assertTrue( |
914 | + (math.fabs(percentiles[0] - 20.0) < 0.001), |
915 | + 'Should calculate the median') |
916 | + self.assertTrue( |
917 | + (math.fabs(percentiles[1] - 40.0) < 0.001), |
918 | + 'Should calculate the p95') |
919 | + self.assertTrue( |
920 | + (math.fabs(percentiles[2] - 40.0) < 0.001), |
921 | + 'Should calculate the p98') |
922 | + self.assertTrue( |
923 | + (math.fabs(percentiles[3] - 40.0) < 0.001), |
924 | + 'Should calculate the p99') |
925 | + self.assertTrue( |
926 | + (math.fabs(percentiles[4] - 40.0) < 0.001), |
927 | + 'Should calculate the p999') |
928 | + |
929 | + def test_values(self): |
930 | + self.assertEqual( |
931 | + set(self.timer.get_values()), set([10, 20, 20, 30, 40]), |
932 | + 'Should have a series of values') |
933 | |
934 | === added file 'txstatsd/tests/stats/test_exponentiallydecayingsample.py' |
935 | --- txstatsd/tests/stats/test_exponentiallydecayingsample.py 1970-01-01 00:00:00 +0000 |
936 | +++ txstatsd/tests/stats/test_exponentiallydecayingsample.py 2011-09-14 12:06:24 +0000 |
937 | @@ -0,0 +1,47 @@ |
938 | + |
939 | +from unittest import TestCase |
940 | + |
941 | +from txstatsd.stats.exponentiallydecayingsample import ( |
942 | + ExponentiallyDecayingSample) |
943 | + |
944 | + |
945 | +class TestExponentiallyDecayingSample(TestCase): |
946 | + |
947 | + def test_100_out_of_1000_elements(self): |
948 | + population = [i for i in range(0, 100)] |
949 | + sample = ExponentiallyDecayingSample(1000, 0.99) |
950 | + for i in population: |
951 | + sample.update(i) |
952 | + |
953 | + self.assertEqual(sample.size(), 100, 'Should have 100 elements') |
954 | + self.assertEqual(len(sample.get_values()), 100, |
955 | + 'Should have 100 elements') |
956 | + self.assertEqual( |
957 | + len(set(sample.get_values()).difference(set(population))), 0, |
958 | + 'Should only have elements from the population') |
959 | + |
960 | + def test_100_out_of_10_elements(self): |
961 | + population = [i for i in range(0, 10)] |
962 | + sample = ExponentiallyDecayingSample(100, 0.99) |
963 | + for i in population: |
964 | + sample.update(i) |
965 | + |
966 | + self.assertEqual(sample.size(), 10, 'Should have 10 elements') |
967 | + self.assertEqual(len(sample.get_values()), 10, |
968 | + 'Should have 10 elements') |
969 | + self.assertEqual( |
970 | + len(set(sample.get_values()).difference(set(population))), 0, |
971 | + 'Should only have elements from the population') |
972 | + |
973 | + def test_heavily_biased_100_out_of_1000_elements(self): |
974 | + population = [i for i in range(0, 100)] |
975 | + sample = ExponentiallyDecayingSample(1000, 0.01) |
976 | + for i in population: |
977 | + sample.update(i) |
978 | + |
979 | + self.assertEqual(sample.size(), 100, 'Should have 100 elements') |
980 | + self.assertEqual(len(sample.get_values()), 100, |
981 | + 'Should have 100 elements') |
982 | + self.assertEqual( |
983 | + len(set(sample.get_values()).difference(set(population))), 0, |
984 | + 'Should only have elements from the population') |
985 | \ No newline at end of file |
986 | |
987 | === added file 'txstatsd/tests/stats/test_uniformsample.py' |
988 | --- txstatsd/tests/stats/test_uniformsample.py 1970-01-01 00:00:00 +0000 |
989 | +++ txstatsd/tests/stats/test_uniformsample.py 2011-09-14 12:06:24 +0000 |
990 | @@ -0,0 +1,32 @@ |
991 | + |
992 | +from unittest import TestCase |
993 | + |
994 | +from txstatsd.stats.uniformsample import UniformSample |
995 | + |
996 | + |
997 | +class TestUniformSample(TestCase): |
998 | + def test_100_out_of_1000_elements(self): |
999 | + population = [i for i in range(0, 1000)] |
1000 | + sample = UniformSample(100) |
1001 | + for i in population: |
1002 | + sample.update(i) |
1003 | + |
1004 | + self.assertEqual(sample.size(), 100, 'Should have 100 elements') |
1005 | + self.assertEqual(len(sample.get_values()), 100, |
1006 | + 'Should have 100 elements') |
1007 | + self.assertEqual( |
1008 | + len(set(sample.get_values()).difference(set(population))), 0, |
1009 | + 'Should only have elements from the population') |
1010 | + |
1011 | + def test_100_out_of_10_elements(self): |
1012 | + population = [i for i in range(0, 10)] |
1013 | + sample = UniformSample(100) |
1014 | + for i in population: |
1015 | + sample.update(i) |
1016 | + |
1017 | + self.assertEqual(sample.size(), 10, 'Should have 10 elements') |
1018 | + self.assertEqual(len(sample.get_values()), 10, |
1019 | + 'Should have 10 elements') |
1020 | + self.assertEqual( |
1021 | + len(set(sample.get_values()).difference(set(population))), 0, |
1022 | + 'Should only have elements from the population') |
1023 | |
1024 | === added file 'txstatsd/tests/test_configurableprocessor.py' |
1025 | --- txstatsd/tests/test_configurableprocessor.py 1970-01-01 00:00:00 +0000 |
1026 | +++ txstatsd/tests/test_configurableprocessor.py 2011-09-14 12:06:24 +0000 |
1027 | @@ -0,0 +1,141 @@ |
1028 | +import time |
1029 | + |
1030 | +from unittest import TestCase |
1031 | + |
1032 | +from txstatsd.server.configurableprocessor import ConfigurableMessageProcessor |
1033 | + |
1034 | + |
1035 | +class FlushMessagesTest(TestCase): |
1036 | + |
1037 | + def test_flush_counter_with_empty_prefix(self): |
1038 | + """ |
1039 | + Ensure no prefix features if none is supplied. |
1040 | + B{Note}: The C{ConfigurableMessageProcessor} reports |
1041 | + the counter value, and not the normalized version as |
1042 | + seen in the StatsD-compliant C{Processor}. |
1043 | + """ |
1044 | + configurable_processor = ConfigurableMessageProcessor( |
1045 | + time_function=lambda: 42) |
1046 | + configurable_processor.process("gorets:17|c") |
1047 | + messages = configurable_processor.flush() |
1048 | + self.assertEqual(2, len(messages)) |
1049 | + counters = messages[0].splitlines() |
1050 | + self.assertEqual("gorets.count 17 42", counters[0]) |
1051 | + self.assertEqual("statsd.numStats 1 42", messages[1]) |
1052 | + |
1053 | + def test_flush_counter_with_prefix(self): |
1054 | + """ |
1055 | + Ensure the prefix features if one is supplied. |
1056 | + """ |
1057 | + configurable_processor = ConfigurableMessageProcessor( |
1058 | + time_function=lambda: 42, message_prefix="test.metric") |
1059 | + configurable_processor.process("gorets:17|c") |
1060 | + messages = configurable_processor.flush() |
1061 | + self.assertEqual(2, len(messages)) |
1062 | + counters = messages[0].splitlines() |
1063 | + self.assertEqual("test.metric.gorets.count 17 42", counters[0]) |
1064 | + self.assertEqual("statsd.numStats 1 42", messages[1]) |
1065 | + |
1066 | + def test_flush_single_timer_single_time(self): |
1067 | + """ |
1068 | + If a single timer with a single data point is present, all |
1069 | + percentiles will be set to the same value. |
1070 | + """ |
1071 | + configurable_processor = ConfigurableMessageProcessor( |
1072 | + time_function=lambda: 42) |
1073 | + |
1074 | + configurable_processor.process("glork:24|ms") |
1075 | + messages = configurable_processor.flush() |
1076 | + |
1077 | + self.assertEqual(2, len(messages)) |
1078 | + timers = messages[0].splitlines() |
1079 | + self.assertEqual("glork.min 24.0 42", timers[0]) |
1080 | + self.assertEqual("glork.max 24.0 42", timers[1]) |
1081 | + self.assertEqual("glork.mean 24.0 42", timers[2]) |
1082 | + self.assertEqual("glork.stddev 0.0 42", timers[3]) |
1083 | + self.assertEqual("glork.median 24.0 42", timers[4]) |
1084 | + self.assertEqual("glork.75percentile 24.0 42", timers[5]) |
1085 | + self.assertEqual("glork.95percentile 24.0 42", timers[6]) |
1086 | + self.assertEqual("glork.98percentile 24.0 42", timers[7]) |
1087 | + self.assertEqual("glork.99percentile 24.0 42", timers[8]) |
1088 | + self.assertEqual("glork.999percentile 24.0 42", timers[9]) |
1089 | + self.assertEqual("statsd.numStats 1 42", messages[1]) |
1090 | + |
1091 | + def test_flush_single_timer_multiple_times(self): |
1092 | + """ |
1093 | + Test reporting of multiple timer metric samples. |
1094 | + """ |
1095 | + configurable_processor = ConfigurableMessageProcessor( |
1096 | + time_function=lambda: 42) |
1097 | + |
1098 | + configurable_processor.process("glork:4|ms") |
1099 | + configurable_processor.update_metrics() |
1100 | + configurable_processor.process("glork:8|ms") |
1101 | + configurable_processor.update_metrics() |
1102 | + configurable_processor.process("glork:15|ms") |
1103 | + configurable_processor.update_metrics() |
1104 | + configurable_processor.process("glork:16|ms") |
1105 | + configurable_processor.update_metrics() |
1106 | + configurable_processor.process("glork:23|ms") |
1107 | + configurable_processor.update_metrics() |
1108 | + configurable_processor.process("glork:42|ms") |
1109 | + configurable_processor.update_metrics() |
1110 | + |
1111 | + messages = configurable_processor.flush() |
1112 | + self.assertEqual(2, len(messages)) |
1113 | + timers = messages[0].splitlines() |
1114 | + self.assertEqual("glork.min 4.0 42", timers[0]) |
1115 | + self.assertEqual("glork.max 42.0 42", timers[1]) |
1116 | + self.assertEqual("glork.mean 18.0 42", timers[2]) |
1117 | + self.assertTrue(timers[3].startswith("glork.stddev 13.4907")) |
1118 | + self.assertEqual("glork.median 15.5 42", timers[4]) |
1119 | + self.assertEqual("glork.75percentile 27.75 42", timers[5]) |
1120 | + self.assertEqual("glork.95percentile 42.0 42", timers[6]) |
1121 | + self.assertEqual("glork.98percentile 42.0 42", timers[7]) |
1122 | + self.assertEqual("glork.99percentile 42.0 42", timers[8]) |
1123 | + self.assertEqual("glork.999percentile 42.0 42", timers[9]) |
1124 | + self.assertEqual("statsd.numStats 1 42", messages[1]) |
1125 | + |
1126 | + |
1127 | +class FlushMeterMetricMessagesTest(TestCase): |
1128 | + |
1129 | + def setUp(self): |
1130 | + self.configurable_processor = ConfigurableMessageProcessor( |
1131 | + time_function=self.wall_clock_time, message_prefix="test.metric") |
1132 | + self.time_now = int(time.time()) |
1133 | + |
1134 | + def wall_clock_time(self): |
1135 | + return self.time_now |
1136 | + |
1137 | + def mark_minutes(self, minutes): |
1138 | + for i in range(1, minutes * 60, 5): |
1139 | + self.processor.update_metrics() |
1140 | + |
1141 | + def test_flush_meter_metric_with_prefix(self): |
1142 | + """ |
1143 | + Test the correct rendering of the Graphite report for |
1144 | + a meter metric when a prefix is supplied. |
1145 | + """ |
1146 | + self.configurable_processor.process("gorets:3.0|m") |
1147 | + |
1148 | + self.time_now += 1 |
1149 | + messages = self.configurable_processor.flush() |
1150 | + self.assertEqual(2, len(messages)) |
1151 | + meter_metric = messages[0].splitlines() |
1152 | + self.assertEqual( |
1153 | + "test.metric.gorets.count 3.0 %s" % self.time_now, |
1154 | + meter_metric[0]) |
1155 | + self.assertEqual( |
1156 | + "test.metric.gorets.mean_rate 3.0 %s" % self.time_now, |
1157 | + meter_metric[1]) |
1158 | + self.assertEqual( |
1159 | + "test.metric.gorets.1min_rate 0.0 %s" % self.time_now, |
1160 | + meter_metric[2]) |
1161 | + self.assertEqual( |
1162 | + "test.metric.gorets.5min_rate 0.0 %s" % self.time_now, |
1163 | + meter_metric[3]) |
1164 | + self.assertEqual( |
1165 | + "test.metric.gorets.15min_rate 0.0 %s" % self.time_now, |
1166 | + meter_metric[4]) |
1167 | + self.assertEqual( |
1168 | + "statsd.numStats 1 %s" % self.time_now, messages[1]) |
1169 | |
1170 | === modified file 'txstatsd/tests/test_processor.py' |
1171 | --- txstatsd/tests/test_processor.py 2011-09-09 15:17:09 +0000 |
1172 | +++ txstatsd/tests/test_processor.py 2011-09-14 12:06:24 +0000 |
1173 | @@ -3,7 +3,6 @@ |
1174 | from unittest import TestCase |
1175 | |
1176 | from txstatsd.server.processor import MessageProcessor |
1177 | -from txstatsd.server.configurableprocessor import ConfigurableMessageProcessor |
1178 | |
1179 | |
1180 | class TestMessageProcessor(MessageProcessor): |
1181 | @@ -16,17 +15,6 @@ |
1182 | self.failures.append(message) |
1183 | |
1184 | |
1185 | -class TestConfigurableMessageProcessor(ConfigurableMessageProcessor): |
1186 | - |
1187 | - def __init__(self, time_function=time.time, message_prefix=""): |
1188 | - super(TestConfigurableMessageProcessor, self).__init__( |
1189 | - time_function, |
1190 | - message_prefix) |
1191 | - |
1192 | - def fail(self, message): |
1193 | - self.failures.append(message) |
1194 | - |
1195 | - |
1196 | class ProcessMessagesTest(TestCase): |
1197 | |
1198 | def setUp(self): |
1199 | @@ -145,35 +133,6 @@ |
1200 | self.assertEqual("statsd.numStats 1 42", messages[1]) |
1201 | self.assertEqual(0, self.processor.counter_metrics["gorets"]) |
1202 | |
1203 | - def test_flush_counter_with_empty_prefix(self): |
1204 | - """ |
1205 | - Ensure no prefix features if none is supplied. |
1206 | - B{Note}: The C{ConfigurableMessageProcessor} reports |
1207 | - the counter value, and not the normalized version as |
1208 | - seen in the StatsD-compliant C{Processor}. |
1209 | - """ |
1210 | - configurable_processor = ConfigurableMessageProcessor( |
1211 | - time_function=lambda: 42) |
1212 | - configurable_processor.process("gorets:17|c") |
1213 | - messages = configurable_processor.flush() |
1214 | - self.assertEqual(2, len(messages)) |
1215 | - counters = messages[0].splitlines() |
1216 | - self.assertEqual("gorets.count 17 42", counters[0]) |
1217 | - self.assertEqual("statsd.numStats 1 42", messages[1]) |
1218 | - |
1219 | - def test_flush_counter_with_prefix(self): |
1220 | - """ |
1221 | - Ensure the prefix features if one is supplied. |
1222 | - """ |
1223 | - configurable_processor = ConfigurableMessageProcessor( |
1224 | - time_function=lambda: 42, message_prefix="test.metric") |
1225 | - configurable_processor.process("gorets:17|c") |
1226 | - messages = configurable_processor.flush() |
1227 | - self.assertEqual(2, len(messages)) |
1228 | - counters = messages[0].splitlines() |
1229 | - self.assertEqual("test.metric.gorets.count 17 42", counters[0]) |
1230 | - self.assertEqual("statsd.numStats 1 42", messages[1]) |
1231 | - |
1232 | def test_flush_counter_one_second_interval(self): |
1233 | """ |
1234 | It is possible to flush counters with a one-second interval, in which |
1235 | @@ -273,8 +232,6 @@ |
1236 | |
1237 | def setUp(self): |
1238 | self.processor = MessageProcessor(time_function=self.wall_clock_time) |
1239 | - self.configurable_processor = ConfigurableMessageProcessor( |
1240 | - time_function=self.wall_clock_time, message_prefix="test.metric") |
1241 | self.time_now = int(time.time()) |
1242 | |
1243 | def wall_clock_time(self): |
1244 | @@ -284,35 +241,6 @@ |
1245 | for i in range(1, minutes * 60, 5): |
1246 | self.processor.update_metrics() |
1247 | |
1248 | - def test_flush_meter_metric_with_prefix(self): |
1249 | - """ |
1250 | - Test the correct rendering of the Graphite report for |
1251 | - a meter metric when a prefix is supplied. |
1252 | - """ |
1253 | - self.configurable_processor.process("gorets:3.0|m") |
1254 | - |
1255 | - self.time_now += 1 |
1256 | - messages = self.configurable_processor.flush() |
1257 | - self.assertEqual(2, len(messages)) |
1258 | - meter_metric = messages[0].splitlines() |
1259 | - self.assertEqual( |
1260 | - "test.metric.gorets.count 3.0 %s" % self.time_now, |
1261 | - meter_metric[0]) |
1262 | - self.assertEqual( |
1263 | - "test.metric.gorets.mean_rate 3.0 %s" % self.time_now, |
1264 | - meter_metric[1]) |
1265 | - self.assertEqual( |
1266 | - "test.metric.gorets.1min_rate 0.0 %s" % self.time_now, |
1267 | - meter_metric[2]) |
1268 | - self.assertEqual( |
1269 | - "test.metric.gorets.5min_rate 0.0 %s" % self.time_now, |
1270 | - meter_metric[3]) |
1271 | - self.assertEqual( |
1272 | - "test.metric.gorets.15min_rate 0.0 %s" % self.time_now, |
1273 | - meter_metric[4]) |
1274 | - self.assertEqual( |
1275 | - "statsd.numStats 1 %s" % self.time_now, messages[1]) |
1276 | - |
1277 | def test_flush_meter_metric(self): |
1278 | """ |
1279 | Test the correct rendering of the Graphite report for |
1280 | |
1281 | === modified file 'txstatsd/version.py' |
1282 | --- txstatsd/version.py 2011-09-09 15:17:09 +0000 |
1283 | +++ txstatsd/version.py 2011-09-14 12:06:24 +0000 |
1284 | @@ -1,1 +1,1 @@ |
1285 | -txstatsd = "0.4.0" |
1286 | +txstatsd = "0.5.0" |
My understanding is that Lucio had an issue with histogram, so I'm deferring to him on the stats side which I don't have much background into. Code-wise, everything looks fine, tests pass. +1!