Merge lp:~flacoste/launchpad/ppr-constant-memory into lp:launchpad
- ppr-constant-memory
- Merge into devel
| Status: | Rejected |
|---|---|
| Rejected by: | Francis J. Lacoste on 2010-10-29 |
| Proposed branch: | lp:~flacoste/launchpad/ppr-constant-memory |
| Merge into: | lp:launchpad |
| Diff against target: |
992 lines (+605/-225) 5 files modified
lib/lp/scripts/utilities/pageperformancereport.py (+308/-223) lib/lp/scripts/utilities/tests/test_pageperformancereport.py (+290/-0) setup.py (+0/-1) utilities/page-performance-report.ini (+7/-0) versions.cfg (+0/-1) |
| To merge this branch: | bzr merge lp:~flacoste/launchpad/ppr-constant-memory |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Robert Collins (community) | 2010-10-28 | Approve on 2010-10-28 | |
|
Review via email:
|
|||
Commit Message
Description of the Change
This is my second iteration on making the page-performanc
memory.
This time around, I managed to get it to run in constant memory.
I dropped numpy and compute all statistics using SQL.
I also added some automated tests. They are very coarse, making sure that the
computed stats for a set of requests match. The expected results were
hand-calculated in a spreadsheet.
Another drive-by is that I increased the cache_size on Robert's suggestions.
Local testing seem to show that it makes a very small difference (main effect
seems to reduce the sys time, but that's not the bulk of the time.)
The whole report seems at least now twice slower than the previous algorithm.
The previous took ~1m52 to generate a report for 300k records. The new one
takes ~4m. But the biggest advantage is that it runs in a constant amount of
memory. This means that we'll be able to generate weekly and monthly reports
(once we have more space on sodium, space is currently too tight to generate
them).
| Robert Collins (lifeless) wrote : | # |
I suggest this because the current implementation running on devpad is -thrashing- machine performance.
| Francis J. Lacoste (flacoste) wrote : | # |
This was replaced by an online in-memory version.
Preview Diff
| 1 | === modified file 'lib/lp/scripts/utilities/pageperformancereport.py' |
| 2 | --- lib/lp/scripts/utilities/pageperformancereport.py 2010-10-25 21:47:16 +0000 |
| 3 | +++ lib/lp/scripts/utilities/pageperformancereport.py 2010-10-29 22:02:04 +0000 |
| 4 | @@ -6,19 +6,19 @@ |
| 5 | __metaclass__ = type |
| 6 | __all__ = ['main'] |
| 7 | |
| 8 | +import bz2 |
| 9 | from cgi import escape as html_quote |
| 10 | from ConfigParser import RawConfigParser |
| 11 | +import csv |
| 12 | from datetime import datetime |
| 13 | +import gzip |
| 14 | +import math |
| 15 | import os.path |
| 16 | import re |
| 17 | -import subprocess |
| 18 | from textwrap import dedent |
| 19 | -import sqlite3 |
| 20 | -import tempfile |
| 21 | +import textwrap |
| 22 | import time |
| 23 | -import warnings |
| 24 | |
| 25 | -import numpy |
| 26 | import simplejson as json |
| 27 | import sre_constants |
| 28 | import zc.zservertracelog.tracereport |
| 29 | @@ -27,9 +27,6 @@ |
| 30 | from canonical.launchpad.scripts.logger import log |
| 31 | from lp.scripts.helpers import LPOptionParser |
| 32 | |
| 33 | -# We don't care about conversion to nan, they are expected. |
| 34 | -warnings.filterwarnings( |
| 35 | - 'ignore', '.*converting a masked element to nan.', UserWarning) |
| 36 | |
| 37 | class Request(zc.zservertracelog.tracereport.Request): |
| 38 | url = None |
| 39 | @@ -58,6 +55,7 @@ |
| 40 | |
| 41 | Requests belong to a Category if the URL matches a regular expression. |
| 42 | """ |
| 43 | + |
| 44 | def __init__(self, title, regexp): |
| 45 | self.title = title |
| 46 | self.regexp = regexp |
| 47 | @@ -71,8 +69,128 @@ |
| 48 | return cmp(self.title.lower(), other.title.lower()) |
| 49 | |
| 50 | |
| 51 | +class OnlineStatsCalculator: |
| 52 | + """Object that can compute count, sum, mean, variance and median. |
| 53 | + |
| 54 | + It computes these value incrementally and using minimal storage |
| 55 | + using the Welford / Knuth algorithm described at |
| 56 | + http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm |
| 57 | + """ |
| 58 | + |
| 59 | + def __init__(self): |
| 60 | + self.count = 0 |
| 61 | + self.sum = 0 |
| 62 | + self.M2 = 0.0 # Sum of square difference |
| 63 | + self.mean = 0.0 |
| 64 | + |
| 65 | + def update(self, x): |
| 66 | + """Incrementally update the stats when adding x to the set. |
| 67 | + |
| 68 | + None values are ignored. |
| 69 | + """ |
| 70 | + if x is None: |
| 71 | + return |
| 72 | + self.count += 1 |
| 73 | + self.sum += x |
| 74 | + delta = x - self.mean |
| 75 | + self.mean = float(self.sum)/self.count |
| 76 | + self.M2 += delta*(x - self.mean) |
| 77 | + |
| 78 | + @property |
| 79 | + def variance(self): |
| 80 | + """Return the population variance.""" |
| 81 | + if self.count == 0: |
| 82 | + return 0 |
| 83 | + else: |
| 84 | + return self.M2/self.count |
| 85 | + |
| 86 | + @property |
| 87 | + def std(self): |
| 88 | + """Return the standard deviation.""" |
| 89 | + if self.count == 0: |
| 90 | + return 0 |
| 91 | + else: |
| 92 | + return math.sqrt(self.variance) |
| 93 | + |
| 94 | + |
| 95 | +class OnlineApproximateMedian: |
| 96 | + """Approximate the median of a set of elements. |
| 97 | + |
| 98 | + This implements a space-efficient algorithm which only sees each value |
| 99 | + once. (It will hold in memory log B of n elements.) |
| 100 | + |
| 101 | + It was described and analysed in |
| 102 | + D. Cantone and M.Hofri, |
| 103 | + "Analysis of An Approximate Median Selection Algorithm" |
| 104 | + ftp://ftp.cs.wpi.edu/pub/techreports/pdf/06-17.pdf |
| 105 | + |
| 106 | + This algorithm is similar to Tukey's median of medians technique. |
| 107 | + It will compute the median among bucket_size values. And the median among |
| 108 | + those. |
| 109 | + """ |
| 110 | + |
| 111 | + def __init__(self, bucket_size=9): |
| 112 | + """Creates a new estimator. |
| 113 | + |
| 114 | + It approximates the median by finding the median among each |
| 115 | + successive bucket_size element. And then using these medians for other |
| 116 | + round of selection. |
| 117 | + |
| 118 | + The bucket size should be a low odd-integer. |
| 119 | + """ |
| 120 | + self.count = 0 |
| 121 | + self.bucket_size = bucket_size |
| 122 | + # Index of the median in a completed bucket. |
| 123 | + self.median_idx = bucket_size/2 |
| 124 | + self.buckets = [] |
| 125 | + |
| 126 | + def update(self, x): |
| 127 | + """Update with x.""" |
| 128 | + if x is None: |
| 129 | + return |
| 130 | + |
| 131 | + self.count += 1 |
| 132 | + i = 0 |
| 133 | + while True: |
| 134 | + # Create bucket on demand. |
| 135 | + if i == len(self.buckets): |
| 136 | + self.buckets.append([]) |
| 137 | + bucket = self.buckets[i] |
| 138 | + bucket.append(x) |
| 139 | + if len(bucket) == self.bucket_size: |
| 140 | + # Select the median in this bucket, and promote it. |
| 141 | + x = sorted(bucket)[self.median_idx] |
| 142 | + # Free the bucket for the next round. |
| 143 | + del bucket[:] |
| 144 | + i += 1 |
| 145 | + continue |
| 146 | + else: |
| 147 | + break |
| 148 | + |
| 149 | + @property |
| 150 | + def median(self): |
| 151 | + """Return the median.""" |
| 152 | + if self.count == 0: |
| 153 | + return 0 |
| 154 | + |
| 155 | + # Find the 'weighted' median by assigning a weight to each |
| 156 | + # element proportional to how far they have been selected. |
| 157 | + candidates = [] |
| 158 | + total_weight = 0 |
| 159 | + for i, bucket in enumerate(self.buckets): |
| 160 | + weight = self.bucket_size ** i |
| 161 | + for x in bucket: |
| 162 | + total_weight += weight |
| 163 | + candidates.append([weight, x]) |
| 164 | + # Make weight relative. |
| 165 | + candidates = [ |
| 166 | + ((float(weight)/total_weight)*x, x) |
| 167 | + for weight, x in candidates] |
| 168 | + return sorted(candidates)[(len(candidates)-1)/2][1] |
| 169 | + |
| 170 | + |
| 171 | class Stats: |
| 172 | - """Bag to hold request statistics. |
| 173 | + """Bag to hold and compute request statistics. |
| 174 | |
| 175 | All times are in seconds. |
| 176 | """ |
| 177 | @@ -82,7 +200,6 @@ |
| 178 | mean = 0 # Mean time per hit. |
| 179 | median = 0 # Median time per hit. |
| 180 | std = 0 # Standard deviation per hit. |
| 181 | - ninetyninth_percentile_time = 0 |
| 182 | histogram = None # # Request times histogram. |
| 183 | |
| 184 | total_sqltime = 0 # Total time spent waiting for SQL to process. |
| 185 | @@ -95,212 +212,175 @@ |
| 186 | median_sqlstatements = 0 |
| 187 | std_sqlstatements = 0 |
| 188 | |
| 189 | - def __init__(self, times, timeout): |
| 190 | - """Compute the stats based on times. |
| 191 | - |
| 192 | - Times is a list of (app_time, sql_statements, sql_times). |
| 193 | - |
| 194 | - The histogram is a list of request counts per 1 second bucket. |
| 195 | - ie. histogram[0] contains the number of requests taking between 0 and |
| 196 | - 1 second, histogram[1] contains the number of requests taking between |
| 197 | - 1 and 2 seconds etc. histogram is None if there are no requests in |
| 198 | - this Category. |
| 199 | + @property |
| 200 | + def ninetyninth_percentile_time(self): |
| 201 | + """Time under which 99% of requests are rendered. |
| 202 | + |
| 203 | + This is estimated as 3 std deviations from the mean. Given that |
| 204 | + in a daily report, many URLs or PageIds won't have 100 requests, it's |
| 205 | + more useful to use this estimator. |
| 206 | """ |
| 207 | - if not times: |
| 208 | - return |
| 209 | - |
| 210 | - self.total_hits = len(times) |
| 211 | - |
| 212 | - # Ignore missing values (-1) in computation. |
| 213 | - times_array = numpy.ma.masked_values( |
| 214 | - numpy.asarray(times, dtype=numpy.float32), -1.) |
| 215 | - |
| 216 | - self.total_time, self.total_sqlstatements, self.total_sqltime = ( |
| 217 | - times_array.sum(axis=0)) |
| 218 | - |
| 219 | - self.mean, self.mean_sqlstatements, self.mean_sqltime = ( |
| 220 | - times_array.mean(axis=0)) |
| 221 | - |
| 222 | - self.median, self.median_sqlstatements, self.median_sqltime = ( |
| 223 | - numpy.median(times_array, axis=0)) |
| 224 | - |
| 225 | - self.std, self.std_sqlstatements, self.std_sqltime = ( |
| 226 | - numpy.std(times_array, axis=0)) |
| 227 | - |
| 228 | - # This is an approximation which may not be true: we don't know if we |
| 229 | - # have a std distribution or not. We could just find the 99th |
| 230 | - # percentile by counting. Shock. Horror; however this appears pretty |
| 231 | - # good based on eyeballing things so far - once we're down in the 2-3 |
| 232 | - # second range for everything we may want to revisit. |
| 233 | - self.ninetyninth_percentile_time = self.mean + self.std*3 |
| 234 | - |
| 235 | - histogram_width = int(timeout*1.5) |
| 236 | - histogram_times = numpy.clip(times_array[:,0], 0, histogram_width) |
| 237 | - histogram = numpy.histogram( |
| 238 | - histogram_times, normed=True, range=(0, histogram_width), |
| 239 | - bins=histogram_width) |
| 240 | - self.histogram = zip(histogram[1], histogram[0]) |
| 241 | - |
| 242 | - |
| 243 | -class SQLiteRequestTimes: |
| 244 | - """SQLite-based request times computation.""" |
| 245 | + return self.mean + 3*self.std |
| 246 | + |
| 247 | + @property |
| 248 | + def relative_histogram(self): |
| 249 | + """Return an histogram where the frequency is relative.""" |
| 250 | + if self.histogram: |
| 251 | + return [[x, float(f)/self.total_hits] for x, f in self.histogram] |
| 252 | + else: |
| 253 | + return None |
| 254 | + |
| 255 | + def text(self): |
| 256 | + """Return a textual version of the stats.""" |
| 257 | + return textwrap.dedent(""" |
| 258 | + <Stats for %d requests: |
| 259 | + Time: total=%.2f; mean=%.2f; median=%.2f; std=%.2f |
| 260 | + SQL time: total=%.2f; mean=%.2f; median=%.2f; std=%.2f |
| 261 | + SQL stmt: total=%.f; mean=%.2f; median=%.f; std=%.2f |
| 262 | + >""" % ( |
| 263 | + self.total_hits, self.total_time, self.mean, self.median, |
| 264 | + self.std, self.total_sqltime, self.mean_sqltime, |
| 265 | + self.median_sqltime, self.std_sqltime, |
| 266 | + self.total_sqlstatements, self.mean_sqlstatements, |
| 267 | + self.median_sqlstatements, self.std_sqlstatements)) |
| 268 | + |
| 269 | + |
| 270 | +class OnlineStats(Stats): |
| 271 | + """Implementation of stats that can be computed online. |
| 272 | + |
| 273 | + You call update() for each request and the stats are updated incrementally |
| 274 | + with minimum storage space. |
| 275 | + """ |
| 276 | + |
| 277 | + def __init__(self, histogram_width): |
| 278 | + self.time_stats = OnlineStatsCalculator() |
| 279 | + self.time_median_approximate = OnlineApproximateMedian() |
| 280 | + self.sql_time_stats = OnlineStatsCalculator() |
| 281 | + self.sql_time_median_approximate = OnlineApproximateMedian() |
| 282 | + self.sql_statements_stats = OnlineStatsCalculator() |
| 283 | + self.sql_statements_median_approximate = OnlineApproximateMedian() |
| 284 | + self._histogram = [ |
| 285 | + [x, 0] for x in range(histogram_width)] |
| 286 | + |
| 287 | + @property |
| 288 | + def total_hits(self): |
| 289 | + return self.time_stats.count |
| 290 | + |
| 291 | + @property |
| 292 | + def total_time(self): |
| 293 | + return self.time_stats.sum |
| 294 | + |
| 295 | + @property |
| 296 | + def mean(self): |
| 297 | + return self.time_stats.mean |
| 298 | + |
| 299 | + @property |
| 300 | + def median(self): |
| 301 | + return self.time_median_approximate.median |
| 302 | + |
| 303 | + @property |
| 304 | + def std(self): |
| 305 | + return self.time_stats.std |
| 306 | + |
| 307 | + @property |
| 308 | + def total_sqltime(self): |
| 309 | + return self.sql_time_stats.sum |
| 310 | + |
| 311 | + @property |
| 312 | + def mean_sqltime(self): |
| 313 | + return self.sql_time_stats.mean |
| 314 | + |
| 315 | + @property |
| 316 | + def median_sqltime(self): |
| 317 | + return self.sql_time_median_approximate.median |
| 318 | + |
| 319 | + @property |
| 320 | + def std_sqltime(self): |
| 321 | + return self.sql_time_stats.std |
| 322 | + |
| 323 | + @property |
| 324 | + def total_sqlstatements(self): |
| 325 | + return self.sql_statements_stats.sum |
| 326 | + |
| 327 | + @property |
| 328 | + def mean_sqlstatements(self): |
| 329 | + return self.sql_statements_stats.mean |
| 330 | + |
| 331 | + @property |
| 332 | + def median_sqlstatements(self): |
| 333 | + return self.sql_statements_median_approximate.median |
| 334 | + |
| 335 | + @property |
| 336 | + def std_sqlstatements(self): |
| 337 | + return self.sql_statements_stats.std |
| 338 | + |
| 339 | + @property |
| 340 | + def histogram(self): |
| 341 | + if self.time_stats.count: |
| 342 | + return self._histogram |
| 343 | + else: |
| 344 | + return None |
| 345 | + |
| 346 | + def update(self, request): |
| 347 | + """Update the stats based on request.""" |
| 348 | + self.time_stats.update(request.app_seconds) |
| 349 | + self.time_median_approximate.update(request.app_seconds) |
| 350 | + self.sql_time_stats.update(request.sql_seconds) |
| 351 | + self.sql_time_median_approximate.update(request.sql_seconds) |
| 352 | + self.sql_statements_stats.update(request.sql_statements) |
| 353 | + self.sql_statements_median_approximate.update(request.sql_statements) |
| 354 | + |
| 355 | + idx = int(min(len(self.histogram)-1, request.app_seconds)) |
| 356 | + self.histogram[idx][1] += 1 |
| 357 | + |
| 358 | + |
| 359 | +class RequestTimes: |
| 360 | + """Collect the """ |
| 361 | |
| 362 | def __init__(self, categories, options): |
| 363 | - if options.db_file is None: |
| 364 | - fd, self.filename = tempfile.mkstemp(suffix='.db', prefix='ppr') |
| 365 | - os.close(fd) |
| 366 | - else: |
| 367 | - self.filename = options.db_file |
| 368 | - self.con = sqlite3.connect(self.filename, isolation_level='EXCLUSIVE') |
| 369 | - log.debug('Using request database %s' % self.filename) |
| 370 | - # Some speed optimization. |
| 371 | - self.con.execute('PRAGMA synchronous = off') |
| 372 | - self.con.execute('PRAGMA journal_mode = off') |
| 373 | - |
| 374 | - self.categories = categories |
| 375 | - self.store_all_request = options.pageids or options.top_urls |
| 376 | - self.timeout = options.timeout |
| 377 | - self.cur = self.con.cursor() |
| 378 | - |
| 379 | - # Create the tables, ignore errors about them being already present. |
| 380 | - try: |
| 381 | - self.cur.execute(''' |
| 382 | - CREATE TABLE category_request ( |
| 383 | - category INTEGER, |
| 384 | - time REAL, |
| 385 | - sql_statements INTEGER, |
| 386 | - sql_time REAL) |
| 387 | - '''); |
| 388 | - except sqlite3.OperationalError, e: |
| 389 | - if 'already exists' in str(e): |
| 390 | - pass |
| 391 | - else: |
| 392 | - raise |
| 393 | - |
| 394 | - if self.store_all_request: |
| 395 | - try: |
| 396 | - self.cur.execute(''' |
| 397 | - CREATE TABLE request ( |
| 398 | - pageid TEXT, |
| 399 | - url TEXT, |
| 400 | - time REAL, |
| 401 | - sql_statements INTEGER, |
| 402 | - sql_time REAL) |
| 403 | - '''); |
| 404 | - except sqlite3.OperationalError, e: |
| 405 | - if 'already exists' in str(e): |
| 406 | - pass |
| 407 | - else: |
| 408 | - raise |
| 409 | + self.by_pageids = options.pageids |
| 410 | + self.by_urls = options.top_urls |
| 411 | + |
| 412 | + # Histogram has a bin per second up to 1.5 our timeout. |
| 413 | + self.histogram_width = int(options.timeout*1.5) |
| 414 | + self.category_times = [ |
| 415 | + (category, OnlineStats(self.histogram_width)) |
| 416 | + for category in categories] |
| 417 | + self.url_times = {} |
| 418 | + self.pageid_times = {} |
| 419 | |
| 420 | def add_request(self, request): |
| 421 | - """Add a request to the cache.""" |
| 422 | - sql_statements = request.sql_statements |
| 423 | - sql_seconds = request.sql_seconds |
| 424 | - |
| 425 | - # Store missing value as -1, as it makes dealing with those |
| 426 | - # easier with numpy. |
| 427 | - if sql_statements is None: |
| 428 | - sql_statements = -1 |
| 429 | - if sql_seconds is None: |
| 430 | - sql_seconds = -1 |
| 431 | - for idx, category in enumerate(self.categories): |
| 432 | + """Add a request to the .""" |
| 433 | + for category, stats in self.category_times: |
| 434 | if category.match(request): |
| 435 | - self.con.execute( |
| 436 | - "INSERT INTO category_request VALUES (?,?,?,?)", |
| 437 | - (idx, request.app_seconds, sql_statements, sql_seconds)) |
| 438 | + stats.update(request) |
| 439 | |
| 440 | - if self.store_all_request: |
| 441 | + if self.by_pageids: |
| 442 | pageid = request.pageid or 'Unknown' |
| 443 | - self.con.execute( |
| 444 | - "INSERT INTO request VALUES (?,?,?,?,?)", |
| 445 | - (pageid, request.url, request.app_seconds, sql_statements, |
| 446 | - sql_seconds)) |
| 447 | + stats = self.pageid_times.setdefault( |
| 448 | + pageid, OnlineStats(self.histogram_width)) |
| 449 | + stats.update(request) |
| 450 | |
| 451 | - def commit(self): |
| 452 | - """Call commit on the underlying connection.""" |
| 453 | - self.con.commit() |
| 454 | + if self.by_urls: |
| 455 | + stats = self.url_times.setdefault( |
| 456 | + request.url, OnlineStats(self.histogram_width)) |
| 457 | + stats.update(request) |
| 458 | |
| 459 | def get_category_times(self): |
| 460 | """Return the times for each category.""" |
| 461 | - category_query = 'SELECT * FROM category_request ORDER BY category' |
| 462 | - |
| 463 | - empty_stats = Stats([], 0) |
| 464 | - categories = dict(self.get_times(category_query)) |
| 465 | - return [ |
| 466 | - (category, categories.get(idx, empty_stats)) |
| 467 | - for idx, category in enumerate(self.categories)] |
| 468 | + return self.category_times |
| 469 | |
| 470 | def get_top_urls_times(self, top_n): |
| 471 | """Return the times for the Top URL by total time""" |
| 472 | - top_url_query = ''' |
| 473 | - SELECT url, time, sql_statements, sql_time |
| 474 | - FROM request WHERE url IN ( |
| 475 | - SELECT url FROM (SELECT url, sum(time) FROM request |
| 476 | - GROUP BY url |
| 477 | - ORDER BY sum(time) DESC |
| 478 | - LIMIT %d)) |
| 479 | - ORDER BY url |
| 480 | - ''' % top_n |
| 481 | # Sort the result by total time |
| 482 | return sorted( |
| 483 | - self.get_times(top_url_query), key=lambda x: x[1].total_time, |
| 484 | - reverse=True) |
| 485 | + self.url_times.items(), |
| 486 | + key=lambda x: x[1].total_time, reverse=True)[:top_n] |
| 487 | |
| 488 | def get_pageid_times(self): |
| 489 | """Return the times for the pageids.""" |
| 490 | - pageid_query = ''' |
| 491 | - SELECT pageid, time, sql_statements, sql_time |
| 492 | - FROM request |
| 493 | - ORDER BY pageid |
| 494 | - ''' |
| 495 | - return self.get_times(pageid_query) |
| 496 | - |
| 497 | - def get_times(self, query): |
| 498 | - """Return a list of key, stats based on the query. |
| 499 | - |
| 500 | - The query should return rows of the form: |
| 501 | - [key, app_time, sql_statements, sql_times] |
| 502 | - |
| 503 | - And should be sorted on key. |
| 504 | - """ |
| 505 | - times = [] |
| 506 | - current_key = None |
| 507 | - results = [] |
| 508 | - self.cur.execute(query) |
| 509 | - while True: |
| 510 | - rows = self.cur.fetchmany() |
| 511 | - if len(rows) == 0: |
| 512 | - break |
| 513 | - for row in rows: |
| 514 | - # We are encountering a new group... |
| 515 | - if row[0] != current_key: |
| 516 | - # Compute the stats of the previous group |
| 517 | - if current_key != None: |
| 518 | - results.append( |
| 519 | - (current_key, Stats(times, self.timeout))) |
| 520 | - # Initialize the new group. |
| 521 | - current_key = row[0] |
| 522 | - times = [] |
| 523 | - |
| 524 | - times.append(row[1:]) |
| 525 | - # Compute the stats of the last group |
| 526 | - if current_key != None: |
| 527 | - results.append((current_key, Stats(times, self.timeout))) |
| 528 | - |
| 529 | - return results |
| 530 | - |
| 531 | - def close(self, remove=False): |
| 532 | - """Close the SQLite connection. |
| 533 | - |
| 534 | - :param remove: If true, the DB file will be removed. |
| 535 | - """ |
| 536 | - self.con.close() |
| 537 | - if remove: |
| 538 | - log.debug('Deleting request database.') |
| 539 | - os.unlink(self.filename) |
| 540 | - else: |
| 541 | - log.debug('Keeping request database %s.' % self.filename) |
| 542 | + # Sort the result by pageid |
| 543 | + return sorted(self.pageid_times.items()) |
| 544 | |
| 545 | |
| 546 | def main(): |
| 547 | @@ -339,17 +419,13 @@ |
| 548 | # Default to 12: the staging timeout. |
| 549 | default=12, type="int", |
| 550 | help="The configured timeout value : determines high risk page ids.") |
| 551 | - parser.add_option( |
| 552 | - "--db-file", dest="db_file", |
| 553 | - default=None, metavar="FILE", |
| 554 | - help="Do not parse the records, generate reports from the DB file.") |
| 555 | |
| 556 | options, args = parser.parse_args() |
| 557 | |
| 558 | if not os.path.isdir(options.directory): |
| 559 | parser.error("Directory %s does not exist" % options.directory) |
| 560 | |
| 561 | - if len(args) == 0 and options.db_file is None: |
| 562 | + if len(args) == 0: |
| 563 | parser.error("At least one zserver tracelog file must be provided") |
| 564 | |
| 565 | if options.from_ts is not None and options.until_ts is not None: |
| 566 | @@ -383,22 +459,17 @@ |
| 567 | if len(categories) == 0: |
| 568 | parser.error("No data in [categories] section of configuration.") |
| 569 | |
| 570 | - times = SQLiteRequestTimes(categories, options) |
| 571 | - |
| 572 | - if len(args) > 0: |
| 573 | - parse(args, times, options) |
| 574 | - times.commit() |
| 575 | - |
| 576 | - log.debug('Generating category statistics...') |
| 577 | + times = RequestTimes(categories, options) |
| 578 | + |
| 579 | + parse(args, times, options) |
| 580 | + |
| 581 | category_times = times.get_category_times() |
| 582 | |
| 583 | pageid_times = [] |
| 584 | url_times= [] |
| 585 | if options.top_urls: |
| 586 | - log.debug('Generating top %d urls statistics...' % options.top_urls) |
| 587 | url_times = times.get_top_urls_times(options.top_urls) |
| 588 | if options.pageids: |
| 589 | - log.debug('Generating pageid statistics...') |
| 590 | pageid_times = times.get_pageid_times() |
| 591 | |
| 592 | def _report_filename(filename): |
| 593 | @@ -436,7 +507,30 @@ |
| 594 | open(report_filename, 'w'), None, pageid_times, None, |
| 595 | options.timeout - 2) |
| 596 | |
| 597 | - times.close(options.db_file is None) |
| 598 | + # Output metrics for selected categories. |
| 599 | + report_filename = _report_filename('metrics.dat') |
| 600 | + log.info('Saving category_metrics %s', report_filename) |
| 601 | + metrics_file = open(report_filename, 'w') |
| 602 | + writer = csv.writer(metrics_file, delimiter=':') |
| 603 | + date = options.until_ts or options.from_ts or datetime.utcnow() |
| 604 | + date = time.mktime(date.timetuple()) |
| 605 | + |
| 606 | + for option in script_config.options('metrics'): |
| 607 | + name = script_config.get('metrics', option) |
| 608 | + found = False |
| 609 | + for category, stats in category_times: |
| 610 | + if category.title == name: |
| 611 | + writer.writerows([ |
| 612 | + ("%s_99" % option, "%f@%d" % ( |
| 613 | + stats.ninetyninth_percentile_time, date)), |
| 614 | + ("%s_mean" % option, "%f@%d" % (stats.mean, date))]) |
| 615 | + found = True |
| 616 | + break |
| 617 | + if not found: |
| 618 | + log.warning("Can't find category %s for metric %s" % ( |
| 619 | + option, name)) |
| 620 | + metrics_file.close() |
| 621 | + |
| 622 | return 0 |
| 623 | |
| 624 | |
| 625 | @@ -447,17 +541,9 @@ |
| 626 | """ |
| 627 | ext = os.path.splitext(filename)[1] |
| 628 | if ext == '.bz2': |
| 629 | - p = subprocess.Popen( |
| 630 | - ['bunzip2', '-c', filename], |
| 631 | - stdout=subprocess.PIPE, stdin=subprocess.PIPE) |
| 632 | - p.stdin.close() |
| 633 | - return p.stdout |
| 634 | + return bz2.BZ2File(filename, 'r') |
| 635 | elif ext == '.gz': |
| 636 | - p = subprocess.Popen( |
| 637 | - ['gunzip', '-c', filename], |
| 638 | - stdout=subprocess.PIPE, stdin=subprocess.PIPE) |
| 639 | - p.stdin.close() |
| 640 | - return p.stdout |
| 641 | + return gzip.GzipFile(filename, 'r') |
| 642 | else: |
| 643 | return open(filename, mode) |
| 644 | |
| 645 | @@ -684,7 +770,7 @@ |
| 646 | histograms = [] |
| 647 | |
| 648 | def handle_times(html_title, stats): |
| 649 | - histograms.append(stats.histogram) |
| 650 | + histograms.append(stats.relative_histogram) |
| 651 | print >> outf, dedent("""\ |
| 652 | <tr> |
| 653 | <th class="category-title">%s</th> |
| 654 | @@ -810,4 +896,3 @@ |
| 655 | </body> |
| 656 | </html> |
| 657 | """) |
| 658 | - |
| 659 | |
| 660 | === added file 'lib/lp/scripts/utilities/tests/test_pageperformancereport.py' |
| 661 | --- lib/lp/scripts/utilities/tests/test_pageperformancereport.py 1970-01-01 00:00:00 +0000 |
| 662 | +++ lib/lp/scripts/utilities/tests/test_pageperformancereport.py 2010-10-29 22:02:04 +0000 |
| 663 | @@ -0,0 +1,290 @@ |
| 664 | +# Copyright 2010 Canonical Ltd. This software is licensed under the |
| 665 | +# GNU Affero General Public License version 3 (see the file LICENSE). |
| 666 | + |
| 667 | +"""Test the pageperformancereport script.""" |
| 668 | + |
| 669 | +__metaclass__ = type |
| 670 | + |
| 671 | +import unittest |
| 672 | + |
| 673 | +from lp.testing import TestCase |
| 674 | + |
| 675 | +from lp.scripts.utilities.pageperformancereport import ( |
| 676 | + Category, |
| 677 | + OnlineApproximateMedian, |
| 678 | + OnlineStatsCalculator, |
| 679 | + RequestTimes, |
| 680 | + Stats, |
| 681 | + ) |
| 682 | + |
| 683 | + |
| 684 | +class FakeOptions: |
| 685 | + timeout = 4 |
| 686 | + db_file = None |
| 687 | + pageids = True |
| 688 | + top_urls = True |
| 689 | + |
| 690 | + def __init__(self, **kwargs): |
| 691 | + """Assign all arguments as attributes.""" |
| 692 | + self.__dict__.update(kwargs) |
| 693 | + |
| 694 | + |
| 695 | +class FakeRequest: |
| 696 | + def __init__(self, url, app_seconds, sql_statements=None, |
| 697 | + sql_seconds=None, pageid=None): |
| 698 | + self.url = url |
| 699 | + self.pageid = pageid |
| 700 | + self.app_seconds = app_seconds |
| 701 | + self.sql_statements = sql_statements |
| 702 | + self.sql_seconds = sql_seconds |
| 703 | + |
| 704 | + |
| 705 | +class FakeStats(Stats): |
| 706 | + def __init__(self, **kwargs): |
| 707 | + # Override the constructor to just store the values. |
| 708 | + self.__dict__.update(kwargs) |
| 709 | + |
| 710 | + |
| 711 | +FAKE_REQUESTS = [ |
| 712 | + FakeRequest('/', 0.5, pageid='+root'), |
| 713 | + FakeRequest('/bugs', 4.5, 56, 3.0, pageid='+bugs'), |
| 714 | + FakeRequest('/bugs', 4.2, 56, 2.2, pageid='+bugs'), |
| 715 | + FakeRequest('/bugs', 5.5, 76, 4.0, pageid='+bugs'), |
| 716 | + FakeRequest('/ubuntu', 2.5, 6, 2.0, pageid='+distribution'), |
| 717 | + FakeRequest('/launchpad', 3.5, 3, 3.0, pageid='+project'), |
| 718 | + FakeRequest('/bzr', 2.5, 4, 2.0, pageid='+project'), |
| 719 | + FakeRequest('/bugs/1', 20.5, 567, 14.0, pageid='+bug'), |
| 720 | + FakeRequest('/bugs/1', 15.5, 567, 9.0, pageid='+bug'), |
| 721 | + FakeRequest('/bugs/5', 1.5, 30, 1.2, pageid='+bug'), |
| 722 | + FakeRequest('/lazr', 1.0, 16, 0.3, pageid='+project'), |
| 723 | + FakeRequest('/drizzle', 0.9, 11, 1.3, pageid='+project'), |
| 724 | + ] |
| 725 | + |
| 726 | + |
| 727 | +# The category stats computed for the above 12 requests. |
| 728 | +CATEGORY_STATS = [ |
| 729 | + # Median is an approximation. |
| 730 | + # Real values are: 2.50, 2.20, 30 |
| 731 | + (Category('All', ''), FakeStats( |
| 732 | + total_hits=12, total_time=62.60, mean=5.22, median=1.0, std=5.99, |
| 733 | + total_sqltime=42, mean_sqltime=3.82, median_sqltime=1.3, |
| 734 | + std_sqltime=3.89, |
| 735 | + total_sqlstatements=1392, mean_sqlstatements=126.55, |
| 736 | + median_sqlstatements=16, std_sqlstatements=208.94, |
| 737 | + histogram=[[0, 2], [1, 2], [2, 2], [3, 1], [4, 2], [5, 3]], |
| 738 | + )), |
| 739 | + (Category('Test', ''), FakeStats()), |
| 740 | + (Category('Bugs', ''), FakeStats( |
| 741 | + total_hits=6, total_time=51.70, mean=8.62, median=4.5, std=6.90, |
| 742 | + total_sqltime=33.40, mean_sqltime=5.57, median_sqltime=3, |
| 743 | + std_sqltime=4.52, |
| 744 | + total_sqlstatements=1352, mean_sqlstatements=225.33, |
| 745 | + median_sqlstatements=56, std_sqlstatements=241.96, |
| 746 | + histogram=[[0, 0], [1, 1], [2, 0], [3, 0], [4, 2], [5, 3]], |
| 747 | + )), |
| 748 | + ] |
| 749 | + |
| 750 | + |
| 751 | +# The top 3 URL stats computed for the above 12 requests. |
| 752 | +TOP_3_URL_STATS = [ |
| 753 | + ('/bugs/1', FakeStats( |
| 754 | + total_hits=2, total_time=36.0, mean=18.0, median=15.5, std=2.50, |
| 755 | + total_sqltime=23.0, mean_sqltime=11.5, median_sqltime=9.0, |
| 756 | + std_sqltime=2.50, |
| 757 | + total_sqlstatements=1134, mean_sqlstatements=567.0, |
| 758 | + median_sqlstatements=567, std_statements=0, |
| 759 | + histogram=[[0, 0], [1, 0], [2, 0], [3, 0], [4, 0], [5, 2]], |
| 760 | + )), |
| 761 | + ('/bugs', FakeStats( |
| 762 | + total_hits=3, total_time=14.2, mean=4.73, median=4.5, std=0.56, |
| 763 | + total_sqltime=9.2, mean_sqltime=3.07, median_sqltime=3, |
| 764 | + std_sqltime=0.74, |
| 765 | + total_sqlstatements=188, mean_sqlstatements=62.67, |
| 766 | + median_sqlstatements=56, std_sqlstatements=9.43, |
| 767 | + histogram=[[0, 0], [1, 0], [2, 0], [3, 0], [4, 2], [5, 1]], |
| 768 | + )), |
| 769 | + ('/launchpad', FakeStats( |
| 770 | + total_hits=1, total_time=3.5, mean=3.5, median=3.5, std=0, |
| 771 | + total_sqltime=3.0, mean_sqltime=3, median_sqltime=3, std_sqltime=0, |
| 772 | + total_sqlstatements=3, mean_sqlstatements=3, |
| 773 | + median_sqlstatements=3, std_sqlstatements=0, |
| 774 | + histogram=[[0, 0], [1, 0], [2, 0], [3, 1], [4, 0], [5, 0]], |
| 775 | + )), |
| 776 | + ] |
| 777 | + |
| 778 | + |
| 779 | +# The pageid stats computed for the above 12 requests. |
| 780 | +PAGEID_STATS = [ |
| 781 | + ('+bug', FakeStats( |
| 782 | + total_hits=3, total_time=37.5, mean=12.5, median=15.5, std=8.04, |
| 783 | + total_sqltime=24.2, mean_sqltime=8.07, median_sqltime=9, |
| 784 | + std_sqltime=5.27, |
| 785 | + total_sqlstatements=1164, mean_sqlstatements=388, |
| 786 | + median_sqlstatements=567, std_sqlstatements=253.14, |
| 787 | + histogram=[[0, 0], [1, 1], [2, 0], [3, 0], [4, 0], [5, 2]], |
| 788 | + )), |
| 789 | + ('+bugs', FakeStats( |
| 790 | + total_hits=3, total_time=14.2, mean=4.73, median=4.5, std=0.56, |
| 791 | + total_sqltime=9.2, mean_sqltime=3.07, median_sqltime=3, |
| 792 | + std_sqltime=0.74, |
| 793 | + total_sqlstatements=188, mean_sqlstatements=62.67, |
| 794 | + median_sqlstatements=56, std_sqlstatements=9.43, |
| 795 | + histogram=[[0, 0], [1, 0], [2, 0], [3, 0], [4, 2], [5, 1]], |
| 796 | + )), |
| 797 | + ('+distribution', FakeStats( |
| 798 | + total_hits=1, total_time=2.5, mean=2.5, median=2.5, std=0, |
| 799 | + total_sqltime=2.0, mean_sqltime=2, median_sqltime=2, std_sqltime=0, |
| 800 | + total_sqlstatements=6, mean_sqlstatements=6, |
| 801 | + median_sqlstatements=6, std_sqlstatements=0, |
| 802 | + histogram=[[0, 0], [1, 0], [2, 1], [3, 0], [4, 0], [5, 0]], |
| 803 | + )), |
| 804 | + ('+project', FakeStats( |
| 805 | + total_hits=4, total_time=7.9, mean=1.98, median=1, std=1.08, |
| 806 | + total_sqltime=6.6, mean_sqltime=1.65, median_sqltime=1.3, |
| 807 | + std_sqltime=0.99, |
| 808 | + total_sqlstatements=34, mean_sqlstatements=8.5, |
| 809 | + median_sqlstatements=4, std_sqlstatements=5.32, |
| 810 | + histogram=[[0, 1], [1, 1], [2, 1], [3, 1], [4, 0], [5, 0]], |
| 811 | + )), |
| 812 | + ('+root', FakeStats( |
| 813 | + total_hits=1, total_time=0.5, mean=0.5, median=0.5, std=0, |
| 814 | + histogram=[[0, 1], [1, 0], [2, 0], [3, 0], [4, 0], [5, 0]], |
| 815 | + )), |
| 816 | + ] |
| 817 | + |
| 818 | + |
| 819 | +class TestRequestTimes(TestCase): |
| 820 | + """Tests the RequestTimes backend.""" |
| 821 | + |
| 822 | + def setUp(self): |
| 823 | + TestCase.setUp(self) |
| 824 | + self.categories = [ |
| 825 | + Category('All', '.*'), Category('Test', '.*test.*'), |
| 826 | + Category('Bugs', '.*bugs.*')] |
| 827 | + self.db = RequestTimes(self.categories, FakeOptions()) |
| 828 | + |
| 829 | + def setUpRequests(self): |
| 830 | + """Insert some requests into the db.""" |
| 831 | + for r in FAKE_REQUESTS: |
| 832 | + self.db.add_request(r) |
| 833 | + |
| 834 | + def assertStatsAreEquals(self, expected, results): |
| 835 | + self.assertEquals( |
| 836 | + len(expected), len(results), 'Wrong number of results') |
| 837 | + for idx in range(len(results)): |
| 838 | + self.assertEquals(expected[idx][0], results[idx][0], |
| 839 | + "Wrong key for results %d" % idx) |
| 840 | + key = results[idx][0] |
| 841 | + self.assertEquals(expected[idx][1].text(), results[idx][1].text(), |
| 842 | + "Wrong stats for results %d (%s)" % (idx, key)) |
| 843 | + self.assertEquals( |
| 844 | + expected[idx][1].histogram, results[idx][1].histogram, |
| 845 | + "Wrong histogram for results %d (%s)" % (idx, key)) |
| 846 | + |
| 847 | + def test_get_category_times(self): |
| 848 | + self.setUpRequests() |
| 849 | + category_times = self.db.get_category_times() |
| 850 | + self.assertStatsAreEquals(CATEGORY_STATS, category_times) |
| 851 | + |
| 852 | + def test_get_url_times(self): |
| 853 | + self.setUpRequests() |
| 854 | + url_times = self.db.get_top_urls_times(3) |
| 855 | + self.assertStatsAreEquals(TOP_3_URL_STATS, url_times) |
| 856 | + |
| 857 | + def test_get_pageid_times(self): |
| 858 | + self.setUpRequests() |
| 859 | + pageid_times = self.db.get_pageid_times() |
| 860 | + self.assertStatsAreEquals(PAGEID_STATS, pageid_times) |
| 861 | + |
| 862 | + |
| 863 | +class TestStats(TestCase): |
| 864 | + """Tests for the stats class.""" |
| 865 | + |
| 866 | + def test_relative_histogram(self): |
| 867 | + # Test that relative histogram gives an histogram using |
| 868 | + # relative frequency. |
| 869 | + stats = Stats() |
| 870 | + stats.total_hits = 100 |
| 871 | + stats.histogram = [[0, 50], [1, 10], [2, 33], [3, 0], [4, 0], [5, 7]] |
| 872 | + self.assertEquals( |
| 873 | + [[0, 0.5], [1, .1], [2, .33], [3, 0], [4, 0], [5, .07]], |
| 874 | + stats.relative_histogram) |
| 875 | + |
| 876 | + |
| 877 | +class TestOnlineStatsCalculator(TestCase): |
| 878 | + """Tests for the online stats calculator.""" |
| 879 | + |
| 880 | + def setUp(self): |
| 881 | + TestCase.setUp(self) |
| 882 | + self.stats = OnlineStatsCalculator() |
| 883 | + |
| 884 | + def test_stats_for_empty_set(self): |
| 885 | + # Test the stats when there is no input. |
| 886 | + self.assertEquals(0, self.stats.count) |
| 887 | + self.assertEquals(0, self.stats.sum) |
| 888 | + self.assertEquals(0, self.stats.mean) |
| 889 | + self.assertEquals(0, self.stats.variance) |
| 890 | + self.assertEquals(0, self.stats.std) |
| 891 | + |
| 892 | + def test_stats_for_one_value(self): |
| 893 | + # Test the stats when adding one element. |
| 894 | + self.stats.update(5) |
| 895 | + self.assertEquals(1, self.stats.count) |
| 896 | + self.assertEquals(5, self.stats.sum) |
| 897 | + self.assertEquals(5, self.stats.mean) |
| 898 | + self.assertEquals(0, self.stats.variance) |
| 899 | + self.assertEquals(0, self.stats.std) |
| 900 | + |
| 901 | + def test_None_are_ignored(self): |
| 902 | + self.stats.update(None) |
| 903 | + self.assertEquals(0, self.stats.count) |
| 904 | + |
| 905 | + def test_stats_for_3_values(self): |
| 906 | + for x in [3, 6, 9]: |
| 907 | + self.stats.update(x) |
| 908 | + self.assertEquals(3, self.stats.count) |
| 909 | + self.assertEquals(18, self.stats.sum) |
| 910 | + self.assertEquals(6, self.stats.mean) |
| 911 | + self.assertEquals(6, self.stats.variance) |
| 912 | + self.assertEquals("2.45", "%.2f" % self.stats.std) |
| 913 | + |
| 914 | + |
| 915 | +SHUFFLE_RANGE_100 = [ |
| 916 | + 25, 79, 99, 76, 60, 63, 87, 77, 51, 82, 42, 96, 93, 58, 32, 66, 75, |
| 917 | + 2, 26, 22, 11, 73, 61, 83, 65, 68, 44, 81, 64, 3, 33, 34, 15, 1, |
| 918 | + 92, 27, 90, 74, 46, 57, 59, 31, 13, 19, 89, 29, 56, 94, 50, 49, 62, |
| 919 | + 37, 21, 35, 5, 84, 88, 16, 8, 23, 40, 6, 48, 10, 97, 0, 53, 17, 30, |
| 920 | + 18, 43, 86, 12, 71, 38, 78, 36, 7, 45, 47, 80, 54, 39, 91, 98, 24, |
| 921 | + 55, 14, 52, 20, 69, 85, 95, 28, 4, 9, 67, 70, 41, 72 |
| 922 | + ] |
| 923 | + |
| 924 | + |
| 925 | +class TestOnlineApproximateMedian(TestCase): |
| 926 | + """Tests for the approximate median computation.""" |
| 927 | + |
| 928 | + def setUp(self): |
| 929 | + TestCase.setUp(self) |
| 930 | + self.estimator = OnlineApproximateMedian() |
| 931 | + |
| 932 | + def test_median_is_0_when_no_input(self): |
| 933 | + self.assertEquals(0, self.estimator.median) |
| 934 | + |
| 935 | + def test_median_is_true_median_for_n_lower_than_bucket_size(self): |
| 936 | + for x in range(9): |
| 937 | + self.estimator.update(x) |
| 938 | + self.assertEquals(4, self.estimator.median) |
| 939 | + |
| 940 | + def test_None_input_is_ignored(self): |
| 941 | + self.estimator.update(1) |
| 942 | + self.estimator.update(None) |
| 943 | + self.assertEquals(1, self.estimator.median) |
| 944 | + |
| 945 | + def test_approximage_median_is_good_enough(self): |
| 946 | + for x in SHUFFLE_RANGE_100: |
| 947 | + self.estimator.update(x) |
| 948 | + # True median is 50, 52 is good enough :-) |
| 949 | + self.assertEquals(52, self.estimator.median) |
| 950 | + |
| 951 | + |
| 952 | +def test_suite(): |
| 953 | + return unittest.TestLoader().loadTestsFromName(__name__) |
| 954 | |
| 955 | === modified file 'setup.py' |
| 956 | --- setup.py 2010-09-18 08:00:27 +0000 |
| 957 | +++ setup.py 2010-10-29 22:02:04 +0000 |
| 958 | @@ -51,7 +51,6 @@ |
| 959 | 'meliae', |
| 960 | 'mercurial', |
| 961 | 'mocker', |
| 962 | - 'numpy', |
| 963 | 'oauth', |
| 964 | 'paramiko', |
| 965 | 'python-memcached', |
| 966 | |
| 967 | === modified file 'utilities/page-performance-report.ini' |
| 968 | --- utilities/page-performance-report.ini 2010-10-24 21:00:11 +0000 |
| 969 | +++ utilities/page-performance-report.ini 2010-10-29 22:02:04 +0000 |
| 970 | @@ -45,3 +45,10 @@ |
| 971 | Private XML-RPC=^https?://xmlrpc-private\. |
| 972 | Shipit=^https?://shipit\. |
| 973 | |
| 974 | +[metrics] |
| 975 | +ppr_all=All launchpad except opstats |
| 976 | +ppr_bugs=Bugs |
| 977 | +ppr_api=API |
| 978 | +ppr_code=Code |
| 979 | +ppr_translations=Translations |
| 980 | +ppr_registry=Registry |
| 981 | |
| 982 | === modified file 'versions.cfg' |
| 983 | --- versions.cfg 2010-10-27 04:23:52 +0000 |
| 984 | +++ versions.cfg 2010-10-29 22:02:04 +0000 |
| 985 | @@ -45,7 +45,6 @@ |
| 986 | mercurial = 1.6.2 |
| 987 | mocker = 0.10.1 |
| 988 | mozrunner = 1.3.4 |
| 989 | -numpy = 1.3.0 |
| 990 | oauth = 1.0 |
| 991 | paramiko = 1.7.4 |
| 992 | Paste = 1.7.2 |

175 """Return the times for each category.""" get_times( category_ query)) times(' category_ request' , 'category')
176 - category_query = 'SELECT * FROM category_request ORDER BY category'
177 -
178 - empty_stats = Stats([], 0)
179 - categories = dict(self.
180 +
181 + times = self.get_
182 +
183 return [
The VWS here weirds me out - it breaks up the function without being a useful separation; its up to you but I'd rather read
"""return ..."
times = self.get_times...
return [
...
Similarly on some other functions.
This looks like it will work but I think we can make this massively more efficient by just doing it incrementally. E.g. see http:// mathcentral. uregina. ca/QQ/database/ QQ.09.04/ carlos1. html (though I think they have a sign reversal, you'll want to derive it yourself).
We can do that as a third iteration though.