Launchpad Developer Utilities

Merge lp:~lifeless/lp-dev-utils/ppr into lp:lp-dev-utils

ppr
Merge into trunk

Proposed by Robert Collins on 2012-08-09

Status:	Merged
Approved by:	Robert Collins on 2012-08-09
Approved revision:	124
Merged at revision:	124
Proposed branch:	lp:~lifeless/lp-dev-utils/ppr
Merge into:	lp:lp-dev-utils
Diff against target:	2543 lines (+2463/-2) 13 files modified .bzrignore (+7/-0) .testr.conf (+3/-2) Makefile (+13/-0) README (+7/-0) bootstrap.py (+259/-0) buildout.cfg (+38/-0) page-performance-report-daily.sh (+115/-0) page-performance-report.ini (+79/-0) page-performance-report.py (+18/-0) pageperformancereport.py (+1277/-0) setup.py (+50/-0) test_pageperformancereport.py (+486/-0) versions.cfg (+111/-0)
To merge this branch:	bzr merge lp:~lifeless/lp-dev-utils/ppr
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
William Grant	code	2012-08-09	Approve on 2012-08-09
Review via email: mp+118870@code.launchpad.net

Description of the change

This branch:
- updates the .testr.conf to support parallel tests.
- adds buildout to let us use zc stuff (but makes it optional)
- and migrates the pageperformancereport out of LP into lp-dev-utils.

Revision history for this message

William Grant (wgrant) wrote on 2012-08-09:

I don't condone the buildout approach, but OK.

review: Approve (code)

Revision history for this message

Robert Collins (lifeless) wrote on 2012-08-09:

(For clarity, william doesn't condone it anywhere :P)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

John O'Brien

Martin Albisetti

Robert Collins

Stuart Metcalfe

Łukasz Czyżykowski

 === modified file '.bzrignore'
 --- .bzrignore	2012-04-13 15:33:03 +0000
 +++ .bzrignore	2012-08-09 04:56:19 +0000
@@ -1,3 +1,10 @@
  .launchpadlib
  _trial_temp
  .testrepository
++.installed.cfg
++eggs
++download-cache
++lp_dev_utils.egg-info
++parts
++bin
++develop-eggs
 === modified file '.testr.conf'
 --- .testr.conf	2012-04-13 15:08:57 +0000
 +++ .testr.conf	2012-08-09 04:56:19 +0000
@@ -1,3 +1,4 @@
  [DEFAULT]
--test_command=PYTHONPATH=.:$PYTHONPATH python -m subunit.run discover $IDLIST
--test_id_list_default=ec2test
++test_command=${PYTHON:-python} -m subunit.run discover $LISTOPT $IDOPTION .
++test_id_option=--load-list $IDFILE
++test_list_option=--list
 === added file 'Makefile'
 --- Makefile	1970-01-01 00:00:00 +0000
 +++ Makefile	2012-08-09 04:56:19 +0000
@@ -0,0 +1,13 @@
++all:
++
++bin/buildout: buildout.cfg versions.cfg setup.py download-cache eggs
++	./bootstrap.py \
++		--setup-source=download-cache/ez_setup.py \
++		--download-base=download-cache/dist --eggs=eggs
++
++
++download-cache:
++	bzr checkout --lightweight lp:lp-source-dependencies download-cache
++
++eggs:
++	mkdir eggs
 === modified file 'README'
 --- README	2012-04-13 15:08:57 +0000
 +++ README	2012-08-09 04:56:19 +0000
@@ -1,6 +1,7 @@
  ==============
   lp-dev-utils
  ==============
++
  Tools for hacking on Launchpad
  ==============================
@@ -40,3 +41,9 @@
    Ran 84 (+84) tests in 51.723s (+51.651s)
    FAILED (id=1)
++To run the pageperformancereport tests, zc.zservertracelog is needed, this is
++best obtained via buildout::
++
++  $ make bin/buildout
++  $ bin/buildout
++  $ PYTHON=bin/py testr run
 === added file 'bootstrap.py'
 --- bootstrap.py	1970-01-01 00:00:00 +0000
 +++ bootstrap.py	2012-08-09 04:56:19 +0000
@@ -0,0 +1,259 @@
++#!/usr/bin/env python
++##############################################################################
++#
++# Copyright (c) 2006 Zope Foundation and Contributors.
++# All Rights Reserved.
++#
++# This software is subject to the provisions of the Zope Public License,
++# Version 2.1 (ZPL).  A copy of the ZPL should accompany this distribution.
++# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
++# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
++# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
++# FOR A PARTICULAR PURPOSE.
++#
++##############################################################################
++"""Bootstrap a buildout-based project
++
++Simply run this script in a directory containing a buildout.cfg.
++The script accepts buildout command-line options, so you can
++use the -c option to specify an alternate configuration file.
++"""
++
++import os, shutil, sys, tempfile, textwrap, urllib, urllib2, subprocess
++from optparse import OptionParser
++
++if sys.platform == 'win32':
++    def quote(c):
++        if ' ' in c:
++            return '"%s"' % c # work around spawn lamosity on windows
++        else:
++            return c
++else:
++    quote = str
++
++# See zc.buildout.easy_install._has_broken_dash_S for motivation and comments.
++stdout, stderr = subprocess.Popen(
++    [sys.executable, '-Sc',
++     'try:\n'
++     '    import ConfigParser\n'
++     'except ImportError:\n'
++     '    print 1\n'
++     'else:\n'
++     '    print 0\n'],
++    stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()
++has_broken_dash_S = bool(int(stdout.strip()))
++
++# In order to be more robust in the face of system Pythons, we want to
++# run without site-packages loaded.  This is somewhat tricky, in
++# particular because Python 2.6's distutils imports site, so starting
++# with the -S flag is not sufficient.  However, we'll start with that:
++if not has_broken_dash_S and 'site' in sys.modules:
++    # We will restart with python -S.
++    args = sys.argv[:]
++    args[0:0] = [sys.executable, '-S']
++    args = map(quote, args)
++    os.execv(sys.executable, args)
++# Now we are running with -S.  We'll get the clean sys.path, import site
++# because distutils will do it later, and then reset the path and clean
++# out any namespace packages from site-packages that might have been
++# loaded by .pth files.
++clean_path = sys.path[:]
++import site
++sys.path[:] = clean_path
++for k, v in sys.modules.items():
++    if (hasattr(v, '__path__') and
++        len(v.__path__)==1 and
++        not os.path.exists(os.path.join(v.__path__[0],'__init__.py'))):
++        # This is a namespace package.  Remove it.
++        sys.modules.pop(k)
++
++is_jython = sys.platform.startswith('java')
++
++setuptools_source = 'http://peak.telecommunity.com/dist/ez_setup.py'
++distribute_source = 'http://python-distribute.org/distribute_setup.py'
++
++# parsing arguments
++def normalize_to_url(option, opt_str, value, parser):
++    if value:
++        if '://' not in value: # It doesn't smell like a URL.
++            value = 'file://%s' % (
++                urllib.pathname2url(
++                    os.path.abspath(os.path.expanduser(value))),)
++        if opt_str == '--download-base' and not value.endswith('/'):
++            # Download base needs a trailing slash to make the world happy.
++            value += '/'
++    else:
++        value = None
++    name = opt_str[2:].replace('-', '_')
++    setattr(parser.values, name, value)
++
++usage = '''\
++[DESIRED PYTHON FOR BUILDOUT] bootstrap.py [options]
++
++Bootstraps a buildout-based project.
++
++Simply run this script in a directory containing a buildout.cfg, using the
++Python that you want bin/buildout to use.
++
++Note that by using --setup-source and --download-base to point to
++local resources, you can keep this script from going over the network.
++'''
++
++parser = OptionParser(usage=usage)
++parser.add_option("-v", "--version", dest="version",
++                          help="use a specific zc.buildout version")
++parser.add_option("-d", "--distribute",
++                   action="store_true", dest="use_distribute", default=False,
++                   help="Use Distribute rather than Setuptools.")
++parser.add_option("--setup-source", action="callback", dest="setup_source",
++                  callback=normalize_to_url, nargs=1, type="string",
++                  help=("Specify a URL or file location for the setup file. "
++                        "If you use Setuptools, this will default to " +
++                        setuptools_source + "; if you use Distribute, this "
++                        "will default to " + distribute_source +"."))
++parser.add_option("--download-base", action="callback", dest="download_base",
++                  callback=normalize_to_url, nargs=1, type="string",
++                  help=("Specify a URL or directory for downloading "
++                        "zc.buildout and either Setuptools or Distribute. "
++                        "Defaults to PyPI."))
++parser.add_option("--eggs",
++                  help=("Specify a directory for storing eggs.  Defaults to "
++                        "a temporary directory that is deleted when the "
++                        "bootstrap script completes."))
++parser.add_option("-t", "--accept-buildout-test-releases",
++                  dest='accept_buildout_test_releases',
++                  action="store_true", default=False,
++                  help=("Normally, if you do not specify a --version, the "
++                        "bootstrap script and buildout gets the newest "
++                        "*final* versions of zc.buildout and its recipes and "
++                        "extensions for you.  If you use this flag, "
++                        "bootstrap and buildout will get the newest releases "
++                        "even if they are alphas or betas."))
++parser.add_option("-c", None, action="store", dest="config_file",
++                   help=("Specify the path to the buildout configuration "
++                         "file to be used."))
++
++options, args = parser.parse_args()
++
++# if -c was provided, we push it back into args for buildout's main function
++if options.config_file is not None:
++    args += ['-c', options.config_file]
++
++if options.eggs:
++    eggs_dir = os.path.abspath(os.path.expanduser(options.eggs))
++else:
++    eggs_dir = tempfile.mkdtemp()
++
++if options.setup_source is None:
++    if options.use_distribute:
++        options.setup_source = distribute_source
++    else:
++        options.setup_source = setuptools_source
++
++if options.accept_buildout_test_releases:
++    args.append('buildout:accept-buildout-test-releases=true')
++args.append('bootstrap')
++
++try:
++    import pkg_resources
++    import setuptools # A flag.  Sometimes pkg_resources is installed alone.
++    if not hasattr(pkg_resources, '_distribute'):
++        raise ImportError
++except ImportError:
++    ez_code = urllib2.urlopen(
++        options.setup_source).read().replace('\r\n', '\n')
++    ez = {}
++    exec ez_code in ez
++    setup_args = dict(to_dir=eggs_dir, download_delay=0)
++    if options.download_base:
++        setup_args['download_base'] = options.download_base
++    if options.use_distribute:
++        setup_args['no_fake'] = True
++    ez['use_setuptools'](**setup_args)
++    reload(sys.modules['pkg_resources'])
++    import pkg_resources
++    # This does not (always?) update the default working set.  We will
++    # do it.
++    for path in sys.path:
++        if path not in pkg_resources.working_set.entries:
++            pkg_resources.working_set.add_entry(path)
++
++cmd = [quote(sys.executable),
++       '-c',
++       quote('from setuptools.command.easy_install import main; main()'),
++       '-mqNxd',
++       quote(eggs_dir)]
++
++if not has_broken_dash_S:
++    cmd.insert(1, '-S')
++
++find_links = options.download_base
++if not find_links:
++    find_links = os.environ.get('bootstrap-testing-find-links')
++if find_links:
++    cmd.extend(['-f', quote(find_links)])
++
++if options.use_distribute:
++    setup_requirement = 'distribute'
++else:
++    setup_requirement = 'setuptools'
++ws = pkg_resources.working_set
++setup_requirement_path = ws.find(
++    pkg_resources.Requirement.parse(setup_requirement)).location
++env = dict(
++    os.environ,
++    PYTHONPATH=setup_requirement_path)
++
++requirement = 'zc.buildout'
++version = options.version
++if version is None and not options.accept_buildout_test_releases:
++    # Figure out the most recent final version of zc.buildout.
++    import setuptools.package_index
++    _final_parts = '*final-', '*final'
++    def _final_version(parsed_version):
++        for part in parsed_version:
++            if (part[:1] == '*') and (part not in _final_parts):
++                return False
++        return True
++    index = setuptools.package_index.PackageIndex(
++        search_path=[setup_requirement_path])
++    if find_links:
++        index.add_find_links((find_links,))
++    req = pkg_resources.Requirement.parse(requirement)
++    if index.obtain(req) is not None:
++        best = []
++        bestv = None
++        for dist in index[req.project_name]:
++            distv = dist.parsed_version
++            if _final_version(distv):
++                if bestv is None or distv > bestv:
++                    best = [dist]
++                    bestv = distv
++                elif distv == bestv:
++                    best.append(dist)
++        if best:
++            best.sort()
++            version = best[-1].version
++if version:
++    requirement = '=='.join((requirement, version))
++cmd.append(requirement)
++
++if is_jython:
++    import subprocess
++    exitcode = subprocess.Popen(cmd, env=env).wait()
++else: # Windows prefers this, apparently; otherwise we would prefer subprocess
++    exitcode = os.spawnle(*([os.P_WAIT, sys.executable] + cmd + [env]))
++if exitcode != 0:
++    sys.stdout.flush()
++    sys.stderr.flush()
++    print ("An error occurred when trying to install zc.buildout. "
++           "Look above this message for any errors that "
++           "were output by easy_install.")
++    sys.exit(exitcode)
++
++ws.add_entry(eggs_dir)
++ws.require(requirement)
++import zc.buildout.buildout
++zc.buildout.buildout.main(args)
++if not options.eggs: # clean up temporary egg directory
++    shutil.rmtree(eggs_dir)
 === added file 'buildout.cfg'
 --- buildout.cfg	1970-01-01 00:00:00 +0000
 +++ buildout.cfg	2012-08-09 04:56:19 +0000
@@ -0,0 +1,38 @@
++# Copyright 2011 Canonical Ltd.  This software is licensed under the
++# GNU Lesser General Public License version 3 (see the file LICENSE).
++
++[buildout]
++parts =
++    scripts
++unzip = true
++eggs-directory = eggs
++download-cache = download-cache
++relative-paths = true
++
++# Disable this option temporarily if you want buildout to find software
++# dependencies *other* than those in our download-cache.  Once you have the
++# desired software, reenable this option (and check in the new software to
++# lp:lp-source-dependencies if this is going to be reviewed/merged/deployed.)
++install-from-cache = true
++
++# This also will need to be temporarily disabled or changed for package
++# upgrades.  Newly-added packages should also add their desired version number
++# to versions.cfg.
++extends = versions.cfg
++
++allow-picked-versions = false
++
++prefer-final = true
++
++develop = .
++
++# [configuration]
++# instance_name = development
++
++[scripts]
++recipe = z3c.recipe.scripts
++eggs = lp-dev-utils [test]
++include-site-packages = true
++allowed-eggs-from-site-packages =
++    subunit
++interpreter = py
 === added file 'page-performance-report-daily.sh'
 --- page-performance-report-daily.sh	1970-01-01 00:00:00 +0000
 +++ page-performance-report-daily.sh	2012-08-09 04:56:19 +0000
@@ -0,0 +1,115 @@
++#!/bin/sh
++
++#TZ=UTC # trace logs are still BST - blech
++
++CATEGORY=lpnet
++LOGS_ROOTS="/srv/launchpad.net-logs/production /srv/launchpad.net-logs/edge"
++OUTPUT_ROOT=${HOME}/public_html/ppr/lpnet
++DAY_FMT="+%Y-%m-%d"
++
++find_logs() {
++    from=$1
++    until=$2
++
++    end_mtime_switch=
++    days_to_end="$(expr `date +%j` - `date -d $until +%j` - 1)"
++    if [ $days_to_end -gt 0 ]; then
++        end_mtime_switch="-daystart -mtime +$days_to_end"
++    fi
++
++    find ${LOGS_ROOTS} \
++        -maxdepth 2 -type f -newermt "$from - 1 day" $end_mtime_switch \
++        -name launchpad-trace\* \
++        | sort | xargs -x
++}
++
++# Find all the daily stats.pck.bz2 $from $until
++find_stats() {
++    from=$1
++    until=$2
++
++    # Build a string of all the days within range.
++    local dates
++    local day
++    day=$from
++    while [ $day != $until ]; do
++        dates="$dates $day"
++        day=`date $DAY_FMT -d "$day + 1 day"`
++    done
++
++    # Use that to build a regex that will be used to select
++    # the files to use.
++    local regex
++    regex="daily_(`echo $dates |sed -e 's/ /|/g'`)"
++
++    find ${OUTPUT_ROOT} -name 'stats.pck.bz2' | egrep $regex
++}
++
++report() {
++    type=$1
++    from=$2
++    until=$3
++    link=$4
++
++    local files
++    local options
++    if [ "$type" = "daily" ]; then
++        files=`find_logs $from $until`
++        options="--from=$from --until=$until"
++    else
++        files=`find_stats $from $until`
++        options="--merge"
++    fi
++
++    local dir
++    dir=${OUTPUT_ROOT}/`date -d $from +%Y-%m`/${type}_${from}_${until}
++    mkdir -p ${dir}
++
++    echo Generating report from $from until $until into $dir `date`
++
++    ./page-performance-report.py -v  --top-urls=200 --directory=${dir} \
++        $options $files
++
++    # Only do the linking if requested.
++    if [ "$link" = "link" ]; then
++        ln -sf ${dir}/partition.html \
++            ${OUTPUT_ROOT}/latest-${type}-partition.html
++        ln -sf ${dir}/categories.html \
++            ${OUTPUT_ROOT}/latest-${type}-categories.html
++        ln -sf ${dir}/pageids.html \
++            ${OUTPUT_ROOT}/latest-${type}-pageids.html
++        ln -sf ${dir}/combined.html \
++            ${OUTPUT_ROOT}/latest-${type}-combined.html
++        ln -sf ${dir}/metrics.dat ${OUTPUT_ROOT}/latest-${type}-metrics.dat
++        ln -sf ${dir}/top200.html ${OUTPUT_ROOT}/latest-${type}-top200.html
++        ln -sf ${dir}/timeout-candidates.html   \
++            ${OUTPUT_ROOT}/latest-${type}-timeout-candidates.html
++    fi
++
++    return 0
++}
++
++local link
++if [ "$3" = "-l" ]; then
++    link="link"
++fi
++
++if [ "$1" = '-d' ]; then
++    report daily `date -d $2 $DAY_FMT` `date -d "$2 + 1 day" $DAY_FMT` $link
++elif [ "$1" = '-w' ]; then
++    report weekly `date -d $2 $DAY_FMT` `date -d "$2 + 1 week" $DAY_FMT` $link
++elif [ "$1" = '-m' ]; then
++    report monthly `date -d $2 $DAY_FMT` `date -d "$2 + 1 month" $DAY_FMT` $link
++else
++    # Default invocation used from cron to generate latest one.
++    now=`date $DAY_FMT`
++    report daily `date -d yesterday $DAY_FMT` $now link
++
++    if [ `date +%a` = 'Sun' ]; then
++        report weekly `date -d 'last week' $DAY_FMT` $now link
++    fi
++
++    if [ `date +%d` = '01' ]; then
++        report monthly `date -d 'last month' $DAY_FMT` $now link
++    fi
++fi
 === added file 'page-performance-report.ini'
 --- page-performance-report.ini	1970-01-01 00:00:00 +0000
 +++ page-performance-report.ini	2012-08-09 04:56:19 +0000
@@ -0,0 +1,79 @@
++[categories]
++# Category -> Python regular expression.
++# Remeber to quote ?, ., + & ? characters to match litterally.
++# 'kodos' is useful for interactively testing regular expressions.
++All Launchpad=.
++All Launchpad except operational pages=(?<!\+opstats|\+haproxy)$
++
++API=(^https?://api\.|/\+access-token$)
++Operational=(\+opstats|\+haproxy)$
++Web (Non API/non operational/non XML-RPC)=^https?://(?!api\.)
++    [^/]+($|/
++     (?!\+haproxy|\+opstats|\+access-token
++      |((authserver|bugs|bazaar|codehosting|
++         codeimportscheduler|mailinglists|softwarecenteragent|
++         featureflags)/\w+$)))
++Other=^/
++
++Launchpad Frontpage=^https?://launchpad\.[^/]+(/index\.html)?$
++
++# Note that the bug text dump is served on the main launchpad domain
++# and we need to exlude it from the registry stats.
++Registry=^https?://launchpad\..*(?<!/\+text)(?<!/\+access-token)$
++Registry - Person Index=^https?://launchpad\.[^/]+/%7E[^/]+(/\+index)?$
++Registry - Pillar Index=^https?://launchpad\.[^/]+/\w[^/]*(/\+index)?$
++
++Answers=^https?://answers\.
++Answers - Front page=^https?://answers\.[^/]+(/questions/\+index)?$
++
++Blueprints=^https?://blueprints\.
++Blueprints - Front page=^https?://blueprints\.[^/]+(/specs/\+index)?$
++
++# Note that the bug text dump is not served on the bugs domain,
++# probably for hysterical reasons. This is why the bugs regexp is
++# confusing.
++Bugs=^https?://(bugs\.|.+/bugs/\d+/\+text$)
++Bugs - Front page=^https?://bugs\.[^/]+(/bugs/\+index)?$
++Bugs - Bug Page=^https?://bugs\.[^/]+/.+/\+bug/\d+(/\+index)?$
++Bugs - Pillar Index=^https?://bugs\.[^/]+/\w[^/]*(/\+bugs-index)?$
++Bugs - Search=^https?://bugs\.[^/]+/.+/\+bugs$
++Bugs - Text Dump=^https?://launchpad\..+/\+text$
++
++Code=^https?://code\.
++Code - Front page=^https?://code\.[^/]+(/\+code/\+index)?$
++Code - Pillar Branches=^https?://code\.[^/]+/\w[^/]*(/\+code-index)?$
++Code - Branch Page=^https?://code\.[^/]+/%7E[^/]+/[^/]+/[^/]+(/\+index)?$
++Code - Merge Proposal=^https?://code\.[^/]+/.+/\+merge/\d+(/\+index)$
++
++Soyuz - PPA Index=^https?://launchpad\.[^/]+/.+/\+archive/[^/]+(/\+index)?$
++
++Translations=^https?://translations\.
++Translations - Front page=^https?://translations\.[^/]+/translations/\+index$
++Translations - Overview=^https?://translations\..*/\+lang/\w+(/\+index)?$
++
++Public XML-RPC=^https://(launchpad|xmlrpc)[^/]+/bazaar/\w+$
++Private XML-RPC=^https://(launchpad|xmlrpc)[^/]+/
++    (authserver|bugs|codehosting|
++     codeimportscheduler|mailinglists|
++     softwarecenteragent|featureflags)/\w+$
++
++[metrics]
++ppr_all=All Launchpad except operational pages
++ppr_web=Web (Non API/non operational/non XML-RPC)
++ppr_operational=Operational
++ppr_bugs=Bugs
++ppr_api=API
++ppr_code=Code
++ppr_public_xmlrpc=Public XML-RPC
++ppr_private_xmlrpc=Private XML-RPC
++ppr_translations=Translations
++ppr_registry=Registry
++ppr_other=Other
++
++[partition]
++API=
++Operational=
++Private XML-RPC=
++Public XML-RPC=
++Web (Non API/non operational/non XML-RPC)=
++Other=
 === added file 'page-performance-report.py'
 --- page-performance-report.py	1970-01-01 00:00:00 +0000
 +++ page-performance-report.py	2012-08-09 04:56:19 +0000
@@ -0,0 +1,18 @@
++#!/usr/bin/python -S
++#
++# Copyright 2010 Canonical Ltd.  This software is licensed under the
++# GNU Affero General Public License version 3 (see the file LICENSE).
++
++"""Page performance report generated from zserver tracelogs."""
++
++__metaclass__ = type
++
++import _pythonpath
++
++import sys
++
++from lp.scripts.utilities.pageperformancereport import main
++
++
++if __name__ == '__main__':
++    sys.exit(main())
 === added file 'pageperformancereport.py'
 --- pageperformancereport.py	1970-01-01 00:00:00 +0000
 +++ pageperformancereport.py	2012-08-09 04:56:19 +0000
@@ -0,0 +1,1277 @@
++# Copyright 2010 Canonical Ltd.  This software is licensed under the
++# GNU Affero General Public License version 3 (see the file LICENSE).
++
++"""Page performance report generated from zserver trace logs."""
++
++__metaclass__ = type
++__all__ = ['main']
++
++import bz2
++from cgi import escape as html_quote
++from ConfigParser import RawConfigParser
++import copy
++import cPickle
++import csv
++from datetime import datetime
++import gzip
++import logging
++import math
++import optparse
++import os.path
++import re
++import textwrap
++from textwrap import dedent
++import time
++
++import simplejson as json
++import sre_constants
++import zc.zservertracelog.tracereport
++
++logging.basicConfig()
++log = logging
++
++
++def _check_datetime(option, opt, value):
++    "Type checker for optparse datetime option type."
++    # We support 5 valid ISO8601 formats.
++    formats = [
++        '%Y-%m-%dT%H:%M:%S',
++        '%Y-%m-%dT%H:%M',
++        '%Y-%m-%d %H:%M:%S',
++        '%Y-%m-%d %H:%M',
++        '%Y-%m-%d',
++        ]
++    for format in formats:
++        try:
++            return datetime.strptime(value, format)
++        except ValueError:
++            pass
++    raise optparse.OptionValueError(
++        "option %s: invalid datetime value: %r" % (opt, value))
++
++
++class Option(optparse.Option):
++    """Extended optparse Option class.
++
++    Adds a 'datetime' option type.
++    """
++    TYPES = optparse.Option.TYPES + ("datetime", datetime)
++    TYPE_CHECKER = copy.copy(optparse.Option.TYPE_CHECKER)
++    TYPE_CHECKER["datetime"] = _check_datetime
++    TYPE_CHECKER[datetime] = _check_datetime
++
++
++class OptionParser(optparse.OptionParser):
++    """Extended optparse OptionParser.
++
++    Adds a 'datetime' option type.
++    """
++
++    def __init__(self, *args, **kw):
++        kw.setdefault('option_class', Option)
++        optparse.OptionParser.__init__(self, *args, **kw)
++
++
++class Request(zc.zservertracelog.tracereport.Request):
++    url = None
++    pageid = None
++    ticks = None
++    sql_statements = None
++    sql_seconds = None
++
++    # Override the broken version in our superclass that always
++    # returns an integer.
++    @property
++    def app_seconds(self):
++        interval = self.app_time - self.start_app_time
++        return interval.seconds + interval.microseconds / 1000000.0
++
++    # Override the broken version in our superclass that always
++    # returns an integer.
++    @property
++    def total_seconds(self):
++        interval = self.end - self.start
++        return interval.seconds + interval.microseconds / 1000000.0
++
++
++class Category:
++    """A Category in our report.
++
++    Requests belong to a Category if the URL matches a regular expression.
++    """
++
++    def __init__(self, title, regexp):
++        self.title = title
++        self.regexp = regexp
++        self._compiled_regexp = re.compile(regexp, re.I | re.X)
++        self.partition = False
++
++    def match(self, request):
++        """Return true when the request match this category."""
++        return self._compiled_regexp.search(request.url) is not None
++
++    def __cmp__(self, other):
++        return cmp(self.title.lower(), other.title.lower())
++
++    def __deepcopy__(self, memo):
++        # We provide __deepcopy__ because the module doesn't handle
++        # compiled regular expression by default.
++        return Category(self.title, self.regexp)
++
++
++class OnlineStatsCalculator:
++    """Object that can compute count, sum, mean, variance and median.
++
++    It computes these value incrementally and using minimal storage
++    using the Welford / Knuth algorithm described at
++    http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm
++    """
++
++    def __init__(self):
++        self.count = 0
++        self.sum = 0
++        self.M2 = 0.0 # Sum of square difference
++        self.mean = 0.0
++
++    def update(self, x):
++        """Incrementally update the stats when adding x to the set.
++
++        None values are ignored.
++        """
++        if x is None:
++            return
++        self.count += 1
++        self.sum += x
++        delta = x - self.mean
++        self.mean = float(self.sum)/self.count
++        self.M2 += delta*(x - self.mean)
++
++    @property
++    def variance(self):
++        """Return the population variance."""
++        if self.count == 0:
++            return 0
++        else:
++            return self.M2/self.count
++
++    @property
++    def std(self):
++        """Return the standard deviation."""
++        if self.count == 0:
++            return 0
++        else:
++            return math.sqrt(self.variance)
++
++    def __add__(self, other):
++        """Adds this and another OnlineStatsCalculator.
++
++        The result combines the stats of the two objects.
++        """
++        results = OnlineStatsCalculator()
++        results.count = self.count + other.count
++        results.sum = self.sum + other.sum
++        if self.count > 0 and other.count > 0:
++            # This is 2.1b in Chan, Tony F.; Golub, Gene H.; LeVeque,
++            # Randall J. (1979), "Updating Formulae and a Pairwise Algorithm
++            # for Computing Sample Variances.",
++            # Technical Report STAN-CS-79-773,
++            # Department of Computer Science, Stanford University,
++            # ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf .
++            results.M2 = self.M2 + other.M2 + (
++                (float(self.count) / (other.count * results.count)) *
++                ((float(other.count) / self.count) * self.sum - other.sum)**2)
++        else:
++            results.M2 = self.M2 + other.M2 # One of them is 0.
++        if results.count > 0:
++            results.mean = float(results.sum) / results.count
++        return results
++
++
++class OnlineApproximateMedian:
++    """Approximate the median of a set of elements.
++
++    This implements a space-efficient algorithm which only sees each value
++    once. (It will hold in memory log bucket_size of n elements.)
++
++    It was described and analysed in
++    D. Cantone and  M.Hofri,
++    "Analysis of An Approximate Median Selection Algorithm"
++    ftp://ftp.cs.wpi.edu/pub/techreports/pdf/06-17.pdf
++
++    This algorithm is similar to Tukey's median of medians technique.
++    It will compute the median among bucket_size values. And the median among
++    those.
++    """
++
++    def __init__(self, bucket_size=9):
++        """Creates a new estimator.
++
++        It approximates the median by finding the median among each
++        successive bucket_size element. And then using these medians for other
++        rounds of selection.
++
++        The bucket size should be a low odd-integer.
++        """
++        self.bucket_size = bucket_size
++        # Index of the median in a completed bucket.
++        self.median_idx = (bucket_size-1)//2
++        self.buckets = []
++
++    def update(self, x, order=0):
++        """Update with x."""
++        if x is None:
++            return
++
++        i = order
++        while True:
++            # Create bucket on demand.
++            if i >= len(self.buckets):
++                for n in range((i+1)-len(self.buckets)):
++                    self.buckets.append([])
++            bucket = self.buckets[i]
++            bucket.append(x)
++            if len(bucket) == self.bucket_size:
++                # Select the median in this bucket, and promote it.
++                x = sorted(bucket)[self.median_idx]
++                # Free the bucket for the next round.
++                del bucket[:]
++                i += 1
++                continue
++            else:
++                break
++
++    @property
++    def median(self):
++        """Return the median."""
++        # Find the 'weighted' median by assigning a weight to each
++        # element proportional to how far they have been selected.
++        candidates = []
++        total_weight = 0
++        for i, bucket in enumerate(self.buckets):
++            weight = self.bucket_size ** i
++            for x in bucket:
++                total_weight += weight
++                candidates.append([x, weight])
++        if len(candidates) == 0:
++            return 0
++
++        # Each weight is the equivalent of having the candidates appear
++        # that number of times in the array.
++        # So buckets like [[1, 2], [2, 3], [4, 2]] would be expanded to
++        # [1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4,
++        # 4, 4, 4, 4, 4] and we find the median of that list (2).
++        # We don't expand the items to conserve memory.
++        median = (total_weight-1) / 2
++        weighted_idx = 0
++        for x, weight in sorted(candidates):
++            weighted_idx += weight
++            if weighted_idx > median:
++                return x
++
++    def __add__(self, other):
++        """Merge two approximators together.
++
++        All candidates from the other are merged through the standard
++        algorithm, starting at the same level. So an item that went through
++        two rounds of selection, will be compared with other items having
++        gone through the same number of rounds.
++        """
++        results = OnlineApproximateMedian(self.bucket_size)
++        results.buckets = copy.deepcopy(self.buckets)
++        for i, bucket in enumerate(other.buckets):
++            for x in bucket:
++                results.update(x, i)
++        return results
++
++
++class Stats:
++    """Bag to hold and compute request statistics.
++
++    All times are in seconds.
++    """
++    total_hits = 0 # Total hits.
++
++    total_time = 0 # Total time spent rendering.
++    mean = 0 # Mean time per hit.
++    median = 0 # Median time per hit.
++    std = 0 # Standard deviation per hit.
++    histogram = None # # Request times histogram.
++
++    total_sqltime = 0 # Total time spent waiting for SQL to process.
++    mean_sqltime = 0 # Mean time spend waiting for SQL to process.
++    median_sqltime = 0 # Median time spend waiting for SQL to process.
++    std_sqltime = 0 # Standard deviation of SQL time.
++
++    total_sqlstatements = 0 # Total number of SQL statements issued.
++    mean_sqlstatements = 0
++    median_sqlstatements = 0
++    std_sqlstatements = 0
++
++    @property
++    def ninetyninth_percentile_time(self):
++        """Time under which 99% of requests are rendered.
++
++        This is estimated as 3 std deviations from the mean. Given that
++        in a daily report, many URLs or PageIds won't have 100 requests, it's
++        more useful to use this estimator.
++        """
++        return self.mean + 3*self.std
++
++    @property
++    def ninetyninth_percentile_sqltime(self):
++        """SQL time under which 99% of requests are rendered.
++
++        This is estimated as 3 std deviations from the mean.
++        """
++        return self.mean_sqltime + 3*self.std_sqltime
++
++    @property
++    def ninetyninth_percentile_sqlstatements(self):
++        """Number of SQL statements under which 99% of requests are rendered.
++
++        This is estimated as 3 std deviations from the mean.
++        """
++        return self.mean_sqlstatements + 3*self.std_sqlstatements
++
++    def text(self):
++        """Return a textual version of the stats."""
++        return textwrap.dedent("""
++        <Stats for %d requests:
++            Time:     total=%.2f; mean=%.2f; median=%.2f; std=%.2f
++            SQL time: total=%.2f; mean=%.2f; median=%.2f; std=%.2f
++            SQL stmt: total=%.f;  mean=%.2f; median=%.f; std=%.2f
++            >""" % (
++                self.total_hits, self.total_time, self.mean, self.median,
++                self.std, self.total_sqltime, self.mean_sqltime,
++                self.median_sqltime, self.std_sqltime,
++                self.total_sqlstatements, self.mean_sqlstatements,
++                self.median_sqlstatements, self.std_sqlstatements))
++
++
++class OnlineStats(Stats):
++    """Implementation of stats that can be computed online.
++
++    You call update() for each request and the stats are updated incrementally
++    with minimum storage space.
++    """
++
++    def __init__(self, histogram_width, histogram_resolution):
++        self.time_stats = OnlineStatsCalculator()
++        self.time_median_approximate = OnlineApproximateMedian()
++        self.sql_time_stats = OnlineStatsCalculator()
++        self.sql_time_median_approximate = OnlineApproximateMedian()
++        self.sql_statements_stats = OnlineStatsCalculator()
++        self.sql_statements_median_approximate = OnlineApproximateMedian()
++        self.histogram = Histogram(histogram_width, histogram_resolution)
++
++    @property
++    def total_hits(self):
++        return self.time_stats.count
++
++    @property
++    def total_time(self):
++        return self.time_stats.sum
++
++    @property
++    def mean(self):
++        return self.time_stats.mean
++
++    @property
++    def median(self):
++        return self.time_median_approximate.median
++
++    @property
++    def std(self):
++        return self.time_stats.std
++
++    @property
++    def total_sqltime(self):
++        return self.sql_time_stats.sum
++
++    @property
++    def mean_sqltime(self):
++        return self.sql_time_stats.mean
++
++    @property
++    def median_sqltime(self):
++        return self.sql_time_median_approximate.median
++
++    @property
++    def std_sqltime(self):
++        return self.sql_time_stats.std
++
++    @property
++    def total_sqlstatements(self):
++        return self.sql_statements_stats.sum
++
++    @property
++    def mean_sqlstatements(self):
++        return self.sql_statements_stats.mean
++
++    @property
++    def median_sqlstatements(self):
++        return self.sql_statements_median_approximate.median
++
++    @property
++    def std_sqlstatements(self):
++        return self.sql_statements_stats.std
++
++    def update(self, request):
++        """Update the stats based on request."""
++        self.time_stats.update(request.app_seconds)
++        self.time_median_approximate.update(request.app_seconds)
++        self.sql_time_stats.update(request.sql_seconds)
++        self.sql_time_median_approximate.update(request.sql_seconds)
++        self.sql_statements_stats.update(request.sql_statements)
++        self.sql_statements_median_approximate.update(request.sql_statements)
++        self.histogram.update(request.app_seconds)
++
++    def __add__(self, other):
++        """Merge another OnlineStats with this one."""
++        results = copy.deepcopy(self)
++        results.time_stats += other.time_stats
++        results.time_median_approximate += other.time_median_approximate
++        results.sql_time_stats += other.sql_time_stats
++        results.sql_time_median_approximate += (
++            other.sql_time_median_approximate)
++        results.sql_statements_stats += other.sql_statements_stats
++        results.sql_statements_median_approximate += (
++            other.sql_statements_median_approximate)
++        results.histogram = self.histogram + other.histogram
++        return results
++
++
++class Histogram:
++    """A simple object to compute histogram of a value."""
++
++    @staticmethod
++    def from_bins_data(data):
++        """Create an histogram from existing bins data."""
++        assert data[0][0] == 0, "First bin should start at zero."
++
++        hist = Histogram(len(data), data[1][0])
++        for idx, bin in enumerate(data):
++            hist.count += bin[1]
++            hist.bins[idx][1] = bin[1]
++
++        return hist
++
++    def __init__(self, bins_count, bins_size):
++        """Create a new histogram.
++
++        The histogram will count the frequency of values in bins_count bins
++        of bins_size each.
++        """
++        self.count = 0
++        self.bins_count = bins_count
++        self.bins_size = bins_size
++        self.bins = []
++        for x in range(bins_count):
++            self.bins.append([x*bins_size, 0])
++
++    @property
++    def bins_relative(self):
++        """Return the bins with the frequency expressed as a ratio."""
++        return [[x, float(f)/self.count] for x, f in self.bins]
++
++    def update(self, value):
++        """Update the histogram for this value.
++
++        All values higher than the last bin minimum are counted in that last
++        bin.
++        """
++        self.count += 1
++        idx = int(min(self.bins_count-1, value / self.bins_size))
++        self.bins[idx][1] += 1
++
++    def __repr__(self):
++        """A string representation of this histogram."""
++        return "<Histogram %s>" % self.bins
++
++    def __eq__(self, other):
++        """Two histogram are equals if they have the same bins content."""
++        if not isinstance(other, Histogram):
++            return False
++
++        if self.bins_count != other.bins_count:
++            return False
++
++        if self.bins_size != other.bins_size:
++            return False
++
++        for idx, other_bin in enumerate(other.bins):
++            if self.bins[idx][1] != other_bin[1]:
++                return False
++
++        return True
++
++    def __add__(self, other):
++        """Add the frequency of the other histogram to this one.
++
++        The resulting histogram has the same bins_size than this one.
++        If the other one has a bigger bins_size, we'll assume an even
++        distribution and distribute the frequency across the smaller bins. If
++        it has a lower bin_size, we'll aggregate its bins into the larger
++        ones. We only support different bins_size if the ratio can be
++        expressed as the ratio between 1 and an integer.
++
++        The resulting histogram is as wide as the widest one.
++        """
++        ratio = float(other.bins_size) / self.bins_size
++        bins_count = max(self.bins_count, math.ceil(other.bins_count * ratio))
++        total = Histogram(int(bins_count), self.bins_size)
++        total.count = self.count + other.count
++
++        # Copy our bins into the total
++        for idx, bin in enumerate(self.bins):
++            total.bins[idx][1] = bin[1]
++
++        assert int(ratio) == ratio or int(1/ratio) == 1/ratio, (
++            "We only support different bins size when the ratio is an "
++            "integer to 1: "
++            % ratio)
++
++        if ratio >= 1:
++            # We distribute the frequency across the bins.
++            # For example. if the ratio is 3:1, we'll add a third
++            # of the lower resolution bin to 3 of the higher one.
++            for other_idx, bin in enumerate(other.bins):
++                f = bin[1] / ratio
++                start = int(math.floor(other_idx * ratio))
++                end = int(start + ratio)
++                for idx in range(start, end):
++                    total.bins[idx][1] += f
++        else:
++            # We need to collect the higher resolution bins into the
++            # corresponding lower one.
++            for other_idx, bin in enumerate(other.bins):
++                idx = int(other_idx * ratio)
++                total.bins[idx][1] += bin[1]
++
++        return total
++
++
++class RequestTimes:
++    """Collect statistics from requests.
++
++    Statistics are updated by calling the add_request() method.
++
++    Statistics for mean/stddev/total/median for request times, SQL times and
++    number of SQL statements are collected.
++
++    They are grouped by Category, URL or PageID.
++    """
++
++    def __init__(self, categories, options):
++        self.by_pageids = options.pageids
++        self.top_urls = options.top_urls
++        # We only keep in memory 50 times the number of URLs we want to
++        # return. The number of URLs can go pretty high (because of the
++        # distinct query parameters).
++        #
++        # Keeping all in memory at once is prohibitive. On a small but
++        # representative sample, keeping 50 times the possible number of
++        # candidates and culling to 90% on overflow, generated an identical
++        # report than keeping all the candidates in-memory.
++        #
++        # Keeping 10 times or culling at 90% generated a near-identical report
++        # (it differed a little in the tail.)
++        #
++        # The size/cull parameters might need to change if the requests
++        # distribution become very different than what it currently is.
++        self.top_urls_cache_size = self.top_urls * 50
++
++        # Histogram has a bin per resolution up to our timeout
++        #(and an extra bin).
++        self.histogram_resolution = float(options.resolution)
++        self.histogram_width = int(
++            options.timeout / self.histogram_resolution) + 1
++        self.category_times = [
++            (category, OnlineStats(
++                self.histogram_width, self.histogram_resolution))
++            for category in categories]
++        self.url_times = {}
++        self.pageid_times = {}
++
++    def add_request(self, request):
++        """Add request to the set of requests we collect stats for."""
++        matched = []
++        for category, stats in self.category_times:
++            if category.match(request):
++                stats.update(request)
++                if category.partition:
++                    matched.append(category.title)
++
++        if len(matched) > 1:
++            log.warning(
++                "Multiple partition categories matched by %s (%s)",
++                request.url, ", ".join(matched))
++        elif not matched:
++            log.warning("%s isn't part of the partition", request.url)
++
++        if self.by_pageids:
++            pageid = request.pageid or 'Unknown'
++            stats = self.pageid_times.setdefault(
++                pageid, OnlineStats(
++                    self.histogram_width, self.histogram_resolution))
++            stats.update(request)
++
++        if self.top_urls:
++            stats = self.url_times.setdefault(
++                request.url, OnlineStats(
++                    self.histogram_width, self.histogram_resolution))
++            stats.update(request)
++            #  Whenever we have more URLs than we need to, discard 10%
++            # that is less likely to end up in the top.
++            if len(self.url_times) > self.top_urls_cache_size:
++                cutoff = int(self.top_urls_cache_size*0.90)
++                self.url_times = dict(
++                    sorted(self.url_times.items(),
++                    key=lambda (url, stats): stats.total_time,
++                    reverse=True)[:cutoff])
++
++    def get_category_times(self):
++        """Return the times for each category."""
++        return self.category_times
++
++    def get_top_urls_times(self):
++        """Return the times for the Top URL by total time"""
++        # Sort the result by total time
++        return sorted(
++            self.url_times.items(),
++            key=lambda (url, stats): stats.total_time,
++            reverse=True)[:self.top_urls]
++
++    def get_pageid_times(self):
++        """Return the times for the pageids."""
++        # Sort the result by pageid
++        return sorted(self.pageid_times.items())
++
++    def __add__(self, other):
++        """Merge two RequestTimes together."""
++        results = copy.deepcopy(self)
++        for other_category, other_stats in other.category_times:
++            for i, (category, stats) in enumerate(self.category_times):
++                if category.title == other_category.title:
++                    results.category_times[i] = (
++                        category, stats + other_stats)
++                    break
++            else:
++                results.category_times.append(
++                    (other_category, copy.deepcopy(other_stats)))
++
++        url_times = results.url_times
++        for url, stats in other.url_times.items():
++            if url in url_times:
++                url_times[url] += stats
++            else:
++                url_times[url] = copy.deepcopy(stats)
++        # Only keep top_urls_cache_size entries.
++        if len(self.url_times) > self.top_urls_cache_size:
++            self.url_times = dict(
++                sorted(
++                    url_times.items(),
++                    key=lambda (url, stats): stats.total_time,
++                    reverse=True)[:self.top_urls_cache_size])
++
++        pageid_times = results.pageid_times
++        for pageid, stats in other.pageid_times.items():
++            if pageid in pageid_times:
++                pageid_times[pageid] += stats
++            else:
++                pageid_times[pageid] = copy.deepcopy(stats)
++
++        return results
++
++
++def main():
++    parser = ExtendedOptionParser("%prog [args] tracelog [...]")
++
++    parser.add_option(
++        "-c", "--config", dest="config",
++        default="page-performance-report.ini",
++        metavar="FILE", help="Load configuration from FILE")
++    parser.add_option(
++        "--from", dest="from_ts", type="datetime",
++        default=None, metavar="TIMESTAMP",
++        help="Ignore log entries before TIMESTAMP")
++    parser.add_option(
++        "--until", dest="until_ts", type="datetime",
++        default=None, metavar="TIMESTAMP",
++        help="Ignore log entries after TIMESTAMP")
++    parser.add_option(
++        "--no-partition", dest="partition",
++        action="store_false", default=True,
++        help="Do not produce partition report")
++    parser.add_option(
++        "--no-categories", dest="categories",
++        action="store_false", default=True,
++        help="Do not produce categories report")
++    parser.add_option(
++        "--no-pageids", dest="pageids",
++        action="store_false", default=True,
++        help="Do not produce pageids report")
++    parser.add_option(
++        "--top-urls", dest="top_urls", type=int, metavar="N",
++        default=50, help="Generate report for top N urls by hitcount.")
++    parser.add_option(
++        "--directory", dest="directory",
++        default=os.getcwd(), metavar="DIR",
++        help="Output reports in DIR directory")
++    parser.add_option(
++        "--timeout", dest="timeout",
++        # Default to 9: our production timeout.
++        default=9, type="int", metavar="SECONDS",
++        help="The configured timeout value: used to determine high risk " +
++        "page ids. That would be pages which 99% under render time is "
++        "greater than timeoout - 2s. Default is %defaults.")
++    parser.add_option(
++        "--histogram-resolution", dest="resolution",
++        # Default to 0.5s
++        default=0.5, type="float", metavar="SECONDS",
++        help="The resolution of the histogram bin width. Detault to "
++        "%defaults.")
++    parser.add_option(
++        "--merge", dest="merge",
++        default=False, action='store_true',
++        help="Files are interpreted as pickled stats and are aggregated " +
++        "for the report.")
++
++    options, args = parser.parse_args()
++
++    if not os.path.isdir(options.directory):
++        parser.error("Directory %s does not exist" % options.directory)
++
++    if len(args) == 0:
++        parser.error("At least one zserver tracelog file must be provided")
++
++    if options.from_ts is not None and options.until_ts is not None:
++        if options.from_ts > options.until_ts:
++            parser.error(
++                "--from timestamp %s is before --until timestamp %s"
++                % (options.from_ts, options.until_ts))
++    if options.from_ts is not None or options.until_ts is not None:
++        if options.merge:
++            parser.error('--from and --until cannot be used with --merge')
++
++    for filename in args:
++        if not os.path.exists(filename):
++            parser.error("Tracelog file %s not found." % filename)
++
++    if not os.path.exists(options.config):
++        parser.error("Config file %s not found." % options.config)
++
++    # Need a better config mechanism as ConfigParser doesn't preserve order.
++    script_config = RawConfigParser()
++    script_config.optionxform = str # Make keys case sensitive.
++    script_config.readfp(open(options.config))
++
++    categories = [] # A list of Category, in report order.
++    for option in script_config.options('categories'):
++        regexp = script_config.get('categories', option)
++        try:
++            categories.append(Category(option, regexp))
++        except sre_constants.error as x:
++            log.fatal("Unable to compile regexp %r (%s)" % (regexp, x))
++            return 1
++    categories.sort()
++
++    if len(categories) == 0:
++        parser.error("No data in [categories] section of configuration.")
++
++    # Determine the categories making a partition of the requests
++    for option in script_config.options('partition'):
++        for category in categories:
++            if category.title == option:
++                category.partition = True
++                break
++        else:
++            log.warning(
++                "In partition definition: %s isn't a defined category",
++                option)
++
++    times = RequestTimes(categories, options)
++
++    if options.merge:
++        for filename in args:
++            log.info('Merging %s...' % filename)
++            f = bz2.BZ2File(filename, 'r')
++            times += cPickle.load(f)
++            f.close()
++    else:
++        parse(args, times, options)
++
++    category_times = times.get_category_times()
++
++    pageid_times = []
++    url_times= []
++    if options.top_urls:
++        url_times = times.get_top_urls_times()
++    if options.pageids:
++        pageid_times = times.get_pageid_times()
++
++    def _report_filename(filename):
++        return os.path.join(options.directory, filename)
++
++    # Partition report
++    if options.partition:
++        report_filename = _report_filename('partition.html')
++        log.info("Generating %s", report_filename)
++        partition_times = [
++            category_time
++            for category_time in category_times
++            if category_time[0].partition]
++        html_report(
++            open(report_filename, 'w'), partition_times, None, None,
++            histogram_resolution=options.resolution,
++            category_name='Partition')
++
++    # Category only report.
++    if options.categories:
++        report_filename = _report_filename('categories.html')
++        log.info("Generating %s", report_filename)
++        html_report(
++            open(report_filename, 'w'), category_times, None, None,
++            histogram_resolution=options.resolution)
++
++    # Pageid only report.
++    if options.pageids:
++        report_filename = _report_filename('pageids.html')
++        log.info("Generating %s", report_filename)
++        html_report(
++            open(report_filename, 'w'), None, pageid_times, None,
++            histogram_resolution=options.resolution)
++
++    # Top URL only report.
++    if options.top_urls:
++        report_filename = _report_filename('top%d.html' % options.top_urls)
++        log.info("Generating %s", report_filename)
++        html_report(
++            open(report_filename, 'w'), None, None, url_times,
++            histogram_resolution=options.resolution)
++
++    # Combined report.
++    if options.categories and options.pageids:
++        report_filename = _report_filename('combined.html')
++        html_report(
++            open(report_filename, 'w'),
++            category_times, pageid_times, url_times,
++            histogram_resolution=options.resolution)
++
++    # Report of likely timeout candidates
++    report_filename = _report_filename('timeout-candidates.html')
++    log.info("Generating %s", report_filename)
++    html_report(
++        open(report_filename, 'w'), None, pageid_times, None,
++        options.timeout - 2,
++        histogram_resolution=options.resolution)
++
++    # Save the times cache for later merging.
++    report_filename = _report_filename('stats.pck.bz2')
++    log.info("Saving times database in %s", report_filename)
++    stats_file = bz2.BZ2File(report_filename, 'w')
++    cPickle.dump(times, stats_file, protocol=cPickle.HIGHEST_PROTOCOL)
++    stats_file.close()
++
++    # Output metrics for selected categories.
++    report_filename = _report_filename('metrics.dat')
++    log.info('Saving category_metrics %s', report_filename)
++    metrics_file = open(report_filename, 'w')
++    writer = csv.writer(metrics_file, delimiter=':')
++    date = options.until_ts or options.from_ts or datetime.utcnow()
++    date = time.mktime(date.timetuple())
++
++    for option in script_config.options('metrics'):
++        name = script_config.get('metrics', option)
++        for category, stats in category_times:
++            if category.title == name:
++                writer.writerows([
++                    ("%s_99" % option, "%f@%d" % (
++                        stats.ninetyninth_percentile_time, date)),
++                    ("%s_hits" % option, "%d@%d" % (stats.total_hits, date))])
++                break
++        else:
++            log.warning("Can't find category %s for metric %s" % (
++                option, name))
++    metrics_file.close()
++
++    return 0
++
++
++def smart_open(filename, mode='r'):
++    """Open a file, transparently handling compressed files.
++
++    Compressed files are detected by file extension.
++    """
++    ext = os.path.splitext(filename)[1]
++    if ext == '.bz2':
++        return bz2.BZ2File(filename, 'r')
++    elif ext == '.gz':
++        return gzip.GzipFile(filename, 'r')
++    else:
++        return open(filename, mode)
++
++
++class MalformedLine(Exception):
++    """A malformed line was found in the trace log."""
++
++
++_ts_re = re.compile(
++    '^(\d{4})-(\d\d)-(\d\d)\s(\d\d):(\d\d):(\d\d)(?:.(\d{6}))?$')
++
++
++def parse_timestamp(ts_string):
++    match = _ts_re.search(ts_string)
++    if match is None:
++        raise ValueError("Invalid timestamp")
++    return datetime(
++        *(int(elem) for elem in match.groups() if elem is not None))
++
++
++def parse(tracefiles, times, options):
++    requests = {}
++    total_requests = 0
++    for tracefile in tracefiles:
++        log.info('Processing %s', tracefile)
++        for line in smart_open(tracefile):
++            line = line.rstrip()
++            try:
++                record = line.split(' ', 7)
++                try:
++                    record_type, request_id, date, time_ = record[:4]
++                except ValueError:
++                    raise MalformedLine()
++
++                if record_type == 'S':
++                    # Short circuit - we don't care about these entries.
++                    continue
++
++                # Parse the timestamp.
++                ts_string = '%s %s' % (date, time_)
++                try:
++                    dt = parse_timestamp(ts_string)
++                except ValueError:
++                    raise MalformedLine(
++                        'Invalid timestamp %s' % repr(ts_string))
++
++                # Filter entries by command line date range.
++                if options.from_ts is not None and dt < options.from_ts:
++                    continue # Skip to next line.
++                if options.until_ts is not None and dt > options.until_ts:
++                    break # Skip to next log file.
++
++                args = record[4:]
++
++                def require_args(count):
++                    if len(args) < count:
++                        raise MalformedLine()
++
++                if record_type == 'B': # Request begins.
++                    require_args(2)
++                    requests[request_id] = Request(dt, args[0], args[1])
++                    continue
++
++                request = requests.get(request_id, None)
++                if request is None: # Just ignore partial records.
++                    continue
++
++                # Old stype extension record from Launchpad. Just
++                # contains the URL.
++                if (record_type == '-' and len(args) == 1
++                    and args[0].startswith('http')):
++                    request.url = args[0]
++
++                # New style extension record with a prefix.
++                elif record_type == '-':
++                    # Launchpad outputs several things as tracelog
++                    # extension records. We include a prefix to tell
++                    # them apart.
++                    require_args(1)
++
++                    parse_extension_record(request, args)
++
++                elif record_type == 'I': # Got request input.
++                    require_args(1)
++                    request.I(dt, args[0])
++
++                elif record_type == 'C': # Entered application thread.
++                    request.C(dt)
++
++                elif record_type == 'A': # Application done.
++                    require_args(2)
++                    request.A(dt, args[0], args[1])
++
++                elif record_type == 'E': # Request done.
++                    del requests[request_id]
++                    request.E(dt)
++                    total_requests += 1
++                    if total_requests % 10000 == 0:
++                        log.debug("Parsed %d requests", total_requests)
++
++                    # Add the request to any matching categories.
++                    times.add_request(request)
++                else:
++                    raise MalformedLine('Unknown record type %s', record_type)
++            except MalformedLine as x:
++                log.error(
++                    "Malformed line %s (%s)" % (repr(line), x))
++
++
++def parse_extension_record(request, args):
++    """Decode a ZServer extension records and annotate request."""
++    prefix = args[0]
++
++    if prefix == 'u':
++        request.url = ' '.join(args[1:]) or None
++    elif prefix == 'p':
++        request.pageid = ' '.join(args[1:]) or None
++    elif prefix == 't':
++        if len(args) != 4:
++            raise MalformedLine("Wrong number of arguments %s" % (args,))
++        request.sql_statements = int(args[2])
++        request.sql_seconds = float(args[3]) / 1000
++    else:
++        raise MalformedLine(
++            "Unknown extension prefix %s" % prefix)
++
++
++def html_report(
++    outf, category_times, pageid_times, url_times,
++    ninetyninth_percentile_threshold=None, histogram_resolution=0.5,
++    category_name='Category'):
++    """Write an html report to outf.
++
++    :param outf: A file object to write the report to.
++    :param category_times: The time statistics for categories.
++    :param pageid_times: The time statistics for pageids.
++    :param url_times: The time statistics for the top XXX urls.
++    :param ninetyninth_percentile_threshold: Lower threshold for inclusion of
++        pages in the pageid section; pages where 99 percent of the requests are
++        served under this threshold will not be included.
++    :param histogram_resolution: used as the histogram bar width
++    :param category_name: The name to use for category report. Defaults to
++        'Category'.
++    """
++
++    print >> outf, dedent('''\
++        <!DOCTYPE html>
++        <html>
++        <head>
++        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
++        <title>Launchpad Page Performance Report %(date)s</title>
++        <script language="javascript" type="text/javascript"
++            src="https://devpad.canonical.com/~lpqateam/ppr/js/flot/jquery.min.js"
++            ></script>
++        <script language="javascript" type="text/javascript"
++            src="https://devpad.canonical.com/~lpqateam/ppr/js/jquery.appear-1.1.1.min.js"
++            ></script>
++        <script language="javascript" type="text/javascript"
++            src="https://devpad.canonical.com/~lpqateam/ppr/js/flot/jquery.flot.min.js"
++            ></script>
++        <script language="javascript" type="text/javascript"
++            src="https://devpad.canonical.com/~lpqateam/ppr/js/sorttable.js"></script>
++        <style type="text/css">
++            h3 { font-weight: normal; font-size: 1em; }
++            thead th { padding-left: 1em; padding-right: 1em; }
++            .category-title { text-align: right; padding-right: 2em;
++                              max-width: 25em; }
++            .regexp { font-size: x-small; font-weight: normal; }
++            .mean { text-align: right; padding-right: 1em; }
++            .median { text-align: right; padding-right: 1em; }
++            .standard-deviation { text-align: right; padding-right: 1em; }
++            .histogram { padding: 0.5em 1em; width:400px; height:250px; }
++            .odd-row { background-color: #eeeeff; }
++            .even-row { background-color: #ffffee; }
++            table.sortable thead {
++                background-color:#eee;
++                color:#666666;
++                font-weight: bold;
++                cursor: default;
++                }
++            td.numeric {
++                font-family: monospace;
++                text-align: right;
++                padding: 1em;
++                }
++            .clickable { cursor: hand; }
++            .total-hits, .histogram, .median-sqltime,
++            .median-sqlstatements { border-right: 1px dashed #000000; }
++        </style>
++        </head>
++        <body>
++        <h1>Launchpad Page Performance Report</h1>
++        <h3>%(date)s</h3>
++        ''' % {'date': time.ctime()})
++
++    table_header = dedent('''\
++        <table class="sortable page-performance-report">
++        <caption align="top">Click on column headings to sort.</caption>
++        <thead>
++            <tr>
++            <th class="clickable">Name</th>
++
++            <th class="clickable">Total Hits</th>
++
++            <th class="clickable">99% Under Time (secs)</th>
++
++            <th class="clickable">Mean Time (secs)</th>
++            <th class="clickable">Time Standard Deviation</th>
++            <th class="clickable">Median Time (secs)</th>
++            <th class="sorttable_nosort">Time Distribution</th>
++
++            <th class="clickable">99% Under SQL Time (secs)</th>
++            <th class="clickable">Mean SQL Time (secs)</th>
++            <th class="clickable">SQL Time Standard Deviation</th>
++            <th class="clickable">Median SQL Time (secs)</th>
++
++            <th class="clickable">99% Under SQL Statements</th>
++            <th class="clickable">Mean SQL Statements</th>
++            <th class="clickable">SQL Statement Standard Deviation</th>
++            <th class="clickable">Median SQL Statements</th>
++
++            <th class="clickable">Hits * 99% Under SQL Statement</th>
++            </tr>
++        </thead>
++        <tbody>
++        ''')
++    table_footer = "</tbody></table>"
++
++    # Store our generated histograms to output Javascript later.
++    histograms = []
++
++    def handle_times(html_title, stats):
++        histograms.append(stats.histogram)
++        print >> outf, dedent("""\
++            <tr>
++            <th class="category-title">%s</th>
++            <td class="numeric total-hits">%d</td>
++            <td class="numeric 99pc-under-time">%.2f</td>
++            <td class="numeric mean-time">%.2f</td>
++            <td class="numeric std-time">%.2f</td>
++            <td class="numeric median-time">%.2f</td>
++            <td>
++                <div class="histogram" id="histogram%d"></div>
++            </td>
++            <td class="numeric 99pc-under-sqltime">%.2f</td>
++            <td class="numeric mean-sqltime">%.2f</td>
++            <td class="numeric std-sqltime">%.2f</td>
++            <td class="numeric median-sqltime">%.2f</td>
++
++            <td class="numeric 99pc-under-sqlstatement">%.f</td>
++            <td class="numeric mean-sqlstatements">%.2f</td>
++            <td class="numeric std-sqlstatements">%.2f</td>
++            <td class="numeric median-sqlstatements">%.2f</td>
++
++            <td class="numeric high-db-usage">%.f</td>
++            </tr>
++            """ % (
++                html_title,
++                stats.total_hits, stats.ninetyninth_percentile_time,
++                stats.mean, stats.std, stats.median,
++                len(histograms) - 1,
++                stats.ninetyninth_percentile_sqltime, stats.mean_sqltime,
++                stats.std_sqltime, stats.median_sqltime,
++                stats.ninetyninth_percentile_sqlstatements,
++                stats.mean_sqlstatements,
++                stats.std_sqlstatements, stats.median_sqlstatements,
++                stats.ninetyninth_percentile_sqlstatements* stats.total_hits,
++                ))
++
++    # Table of contents
++    print >> outf, '<ol>'
++    if category_times:
++        print >> outf, '<li><a href="#catrep">%s Report</a></li>' % (
++            category_name)
++    if pageid_times:
++        print >> outf, '<li><a href="#pageidrep">Pageid Report</a></li>'
++    if url_times:
++        print >> outf, '<li><a href="#topurlrep">Top URL Report</a></li>'
++    print >> outf, '</ol>'
++
++    if category_times:
++        print >> outf, '<h2 id="catrep">%s Report</h2>' % (
++            category_name)
++        print >> outf, table_header
++        for category, times in category_times:
++            html_title = '%s<br/><span class="regexp">%s</span>' % (
++                html_quote(category.title), html_quote(category.regexp))
++            handle_times(html_title, times)
++        print >> outf, table_footer
++
++    if pageid_times:
++        print >> outf, '<h2 id="pageidrep">Pageid Report</h2>'
++        print >> outf, table_header
++        for pageid, times in pageid_times:
++            if (ninetyninth_percentile_threshold is not None and
++                (times.ninetyninth_percentile_time <
++                ninetyninth_percentile_threshold)):
++                continue
++            handle_times(html_quote(pageid), times)
++        print >> outf, table_footer
++
++    if url_times:
++        print >> outf, '<h2 id="topurlrep">Top URL Report</h2>'
++        print >> outf, table_header
++        for url, times in url_times:
++            handle_times(html_quote(url), times)
++        print >> outf, table_footer
++
++    # Ourput the javascript to render our histograms nicely, replacing
++    # the placeholder <div> tags output earlier.
++    print >> outf, dedent("""\
++        <script language="javascript" type="text/javascript">
++        $(function () {
++            var options = {
++                series: {
++                    bars: {show: true, barWidth: %s}
++                    },
++                xaxis: {
++                    tickFormatter: function (val, axis) {
++                        return val.toFixed(axis.tickDecimals) + "s";
++                        }
++                    },
++                yaxis: {
++                    min: 0,
++                    max: 1,
++                    transform: function (v) {
++                        return Math.pow(Math.log(v*100+1)/Math.LN2, 0.5);
++                        },
++                    inverseTransform: function (v) {
++                        return Math.pow(Math.exp(v*100+1)/Math.LN2, 2);
++                        },
++                    tickDecimals: 1,
++                    tickFormatter: function (val, axis) {
++                        return (val * 100).toFixed(axis.tickDecimals) + "%%";
++                        },
++                    ticks: [0.001,0.01,0.10,0.50,1.0]
++                    },
++                grid: {
++                    aboveData: true,
++                    labelMargin: 15
++                    }
++                };
++        """ % histogram_resolution)
++
++    for i, histogram in enumerate(histograms):
++        if histogram.count == 0:
++            continue
++        print >> outf, dedent("""\
++            function plot_histogram_%(id)d() {
++                var d = %(data)s;
++
++                $.plot(
++                    $("#histogram%(id)d"),
++                    [{data: d}], options);
++            }
++            $('#histogram%(id)d').appear(function() {
++                plot_histogram_%(id)d();
++            });
++
++            """ % {'id': i, 'data': json.dumps(histogram.bins_relative)})
++
++    print >> outf, dedent("""\
++            });
++        </script>
++        </body>
++        </html>
++        """)
 === added file 'setup.py'
 --- setup.py	1970-01-01 00:00:00 +0000
 +++ setup.py	2012-08-09 04:56:19 +0000
@@ -0,0 +1,50 @@
++#!/usr/bin/env python
++#
++# Copyright (c) 2012, Canonical Ltd
++#
++# This program is free software: you can redistribute it and/or modify
++# it under the terms of the GNU Lesser General Public License as published by
++# the Free Software Foundation, version 3 only.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU Lesser General Public License for more details.
++#
++# You should have received a copy of the GNU Lesser General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++# GNU Lesser General Public License version 3 (see the file LICENSE).
++
++from distutils.core import setup
++import os.path
++
++description = file(
++        os.path.join(os.path.dirname(__file__), 'README'), 'rb').read()
++
++setup(name="lp-dev-utils",
++      version="0.0.0",
++      description=\
++              "Tools for working on or with Launchpad.",
++      long_description=description,
++      maintainer="Launchpad Developers",
++      maintainer_email="launchpad-dev@lists.launchpad.net",
++      url="https://launchpad.net/lp-dev-utils",
++      packages=['ec2test'],
++      package_dir = {'':'.'},
++      classifiers = [
++          'Development Status :: 2 - Pre-Alpha',
++          'Intended Audience :: Developers',
++          'License :: OSI Approved :: GNU General Public License v3 (GPLv3)',
++          'Operating System :: OS Independent',
++          'Programming Language :: Python',
++          ],
++      install_requires = [
++          'zc.zservertracelog',
++          ],
++      extras_require = dict(
++          test=[
++              'fixtures',
++              'testtools',
++              ]
++          ),
++      )
 === added file 'test_pageperformancereport.py'
 --- test_pageperformancereport.py	1970-01-01 00:00:00 +0000
 +++ test_pageperformancereport.py	2012-08-09 04:56:19 +0000
@@ -0,0 +1,486 @@
++# Copyright 2010 Canonical Ltd.  This software is licensed under the
++# GNU Affero General Public License version 3 (see the file LICENSE).
++
++"""Test the pageperformancereport script."""
++
++__metaclass__ = type
++
++import fixtures
++from testtools import TestCase
++
++from pageperformancereport import (
++    Category,
++    Histogram,
++    OnlineApproximateMedian,
++    OnlineStats,
++    OnlineStatsCalculator,
++    RequestTimes,
++    Stats,
++    )
++
++
++class FakeOptions:
++    timeout = 5
++    db_file = None
++    pageids = True
++    top_urls = 3
++    resolution = 1
++
++    def __init__(self, **kwargs):
++        """Assign all arguments as attributes."""
++        self.__dict__.update(kwargs)
++
++
++class FakeRequest:
++
++    def __init__(self, url, app_seconds, sql_statements=None,
++                 sql_seconds=None, pageid=None):
++        self.url = url
++        self.pageid = pageid
++        self.app_seconds = app_seconds
++        self.sql_statements = sql_statements
++        self.sql_seconds = sql_seconds
++
++
++class FakeStats(Stats):
++
++    def __init__(self, **kwargs):
++        # Override the constructor to just store the values.
++        self.__dict__.update(kwargs)
++
++
++FAKE_REQUESTS = [
++    FakeRequest('/', 0.5, pageid='+root'),
++    FakeRequest('/bugs', 4.5, 56, 3.0, pageid='+bugs'),
++    FakeRequest('/bugs', 4.2, 56, 2.2, pageid='+bugs'),
++    FakeRequest('/bugs', 5.5, 76, 4.0, pageid='+bugs'),
++    FakeRequest('/ubuntu', 2.5, 6, 2.0, pageid='+distribution'),
++    FakeRequest('/launchpad', 3.5, 3, 3.0, pageid='+project'),
++    FakeRequest('/bzr', 2.5, 4, 2.0, pageid='+project'),
++    FakeRequest('/bugs/1', 20.5, 567, 14.0, pageid='+bug'),
++    FakeRequest('/bugs/1', 15.5, 567, 9.0, pageid='+bug'),
++    FakeRequest('/bugs/5', 1.5, 30, 1.2, pageid='+bug'),
++    FakeRequest('/lazr', 1.0, 16, 0.3, pageid='+project'),
++    FakeRequest('/drizzle', 0.9, 11, 1.3, pageid='+project'),
++    ]
++
++
++# The category stats computed for the above 12 requests.
++CATEGORY_STATS = [
++    # Median is an approximation.
++    # Real values are: 2.50, 2.20, 30
++    (Category('All', ''), FakeStats(
++        total_hits=12, total_time=62.60, mean=5.22, median=4.20, std=5.99,
++        total_sqltime=42, mean_sqltime=3.82, median_sqltime=3.0,
++        std_sqltime=3.89,
++        total_sqlstatements=1392, mean_sqlstatements=126.55,
++        median_sqlstatements=56, std_sqlstatements=208.94,
++        histogram=[[0, 2], [1, 2], [2, 2], [3, 1], [4, 2], [5, 3]],
++        )),
++    (Category('Test', ''), FakeStats(
++        histogram=[[0, 0], [1, 0], [2, 0], [3, 0], [4, 0], [5, 0]])),
++    (Category('Bugs', ''), FakeStats(
++        total_hits=6, total_time=51.70, mean=8.62, median=4.5, std=6.90,
++        total_sqltime=33.40, mean_sqltime=5.57, median_sqltime=3,
++        std_sqltime=4.52,
++        total_sqlstatements=1352, mean_sqlstatements=225.33,
++        median_sqlstatements=56, std_sqlstatements=241.96,
++        histogram=[[0, 0], [1, 1], [2, 0], [3, 0], [4, 2], [5, 3]],
++        )),
++    ]
++
++
++# The top 3 URL stats computed for the above 12 requests.
++TOP_3_URL_STATS = [
++    ('/bugs/1', FakeStats(
++        total_hits=2, total_time=36.0, mean=18.0, median=15.5, std=2.50,
++        total_sqltime=23.0, mean_sqltime=11.5, median_sqltime=9.0,
++        std_sqltime=2.50,
++        total_sqlstatements=1134, mean_sqlstatements=567.0,
++        median_sqlstatements=567, std_statements=0,
++        histogram=[[0, 0], [1, 0], [2, 0], [3, 0], [4, 0], [5, 2]],
++        )),
++    ('/bugs', FakeStats(
++        total_hits=3, total_time=14.2, mean=4.73, median=4.5, std=0.56,
++        total_sqltime=9.2, mean_sqltime=3.07, median_sqltime=3,
++        std_sqltime=0.74,
++        total_sqlstatements=188, mean_sqlstatements=62.67,
++        median_sqlstatements=56, std_sqlstatements=9.43,
++        histogram=[[0, 0], [1, 0], [2, 0], [3, 0], [4, 2], [5, 1]],
++        )),
++    ('/launchpad', FakeStats(
++        total_hits=1, total_time=3.5, mean=3.5, median=3.5, std=0,
++        total_sqltime=3.0, mean_sqltime=3, median_sqltime=3, std_sqltime=0,
++        total_sqlstatements=3, mean_sqlstatements=3,
++        median_sqlstatements=3, std_sqlstatements=0,
++        histogram=[[0, 0], [1, 0], [2, 0], [3, 1], [4, 0], [5, 0]],
++        )),
++    ]
++
++
++# The pageid stats computed for the above 12 requests.
++PAGEID_STATS = [
++    ('+bug', FakeStats(
++        total_hits=3, total_time=37.5, mean=12.5, median=15.5, std=8.04,
++        total_sqltime=24.2, mean_sqltime=8.07, median_sqltime=9,
++        std_sqltime=5.27,
++        total_sqlstatements=1164, mean_sqlstatements=388,
++        median_sqlstatements=567, std_sqlstatements=253.14,
++        histogram=[[0, 0], [1, 1], [2, 0], [3, 0], [4, 0], [5, 2]],
++        )),
++    ('+bugs', FakeStats(
++        total_hits=3, total_time=14.2, mean=4.73, median=4.5, std=0.56,
++        total_sqltime=9.2, mean_sqltime=3.07, median_sqltime=3,
++        std_sqltime=0.74,
++        total_sqlstatements=188, mean_sqlstatements=62.67,
++        median_sqlstatements=56, std_sqlstatements=9.43,
++        histogram=[[0, 0], [1, 0], [2, 0], [3, 0], [4, 2], [5, 1]],
++        )),
++    ('+distribution', FakeStats(
++        total_hits=1, total_time=2.5, mean=2.5, median=2.5, std=0,
++        total_sqltime=2.0, mean_sqltime=2, median_sqltime=2, std_sqltime=0,
++        total_sqlstatements=6, mean_sqlstatements=6,
++        median_sqlstatements=6, std_sqlstatements=0,
++        histogram=[[0, 0], [1, 0], [2, 1], [3, 0], [4, 0], [5, 0]],
++        )),
++    ('+project', FakeStats(
++        total_hits=4, total_time=7.9, mean=1.98, median=1, std=1.08,
++        total_sqltime=6.6, mean_sqltime=1.65, median_sqltime=1.3,
++        std_sqltime=0.99,
++        total_sqlstatements=34, mean_sqlstatements=8.5,
++        median_sqlstatements=4, std_sqlstatements=5.32,
++        histogram=[[0, 1], [1, 1], [2, 1], [3, 1], [4, 0], [5, 0]],
++        )),
++    ('+root', FakeStats(
++        total_hits=1, total_time=0.5, mean=0.5, median=0.5, std=0,
++        histogram=[[0, 1], [1, 0], [2, 0], [3, 0], [4, 0], [5, 0]],
++        )),
++    ]
++
++
++class TestRequestTimes(TestCase):
++    """Tests the RequestTimes backend."""
++
++    def setUp(self):
++        super(TestRequestTimes, self).setUp()
++        self.categories = [
++            Category('All', '.*'), Category('Test', '.*test.*'),
++            Category('Bugs', '.*bugs.*')]
++        self.db = RequestTimes(self.categories, FakeOptions())
++        self.useFixture(fixtures.LoggerFixture())
++
++    def setUpRequests(self):
++        """Insert some requests into the db."""
++        for r in FAKE_REQUESTS:
++            self.db.add_request(r)
++
++    def assertStatsAreEquals(self, expected, results):
++        self.assertEquals(
++            len(expected), len(results), 'Wrong number of results')
++        for idx in range(len(results)):
++            self.assertEquals(expected[idx][0], results[idx][0],
++                "Wrong key for results %d" % idx)
++            key = results[idx][0]
++            self.assertEquals(expected[idx][1].text(), results[idx][1].text(),
++                "Wrong stats for results %d (%s)" % (idx, key))
++            self.assertEquals(
++                Histogram.from_bins_data(expected[idx][1].histogram),
++                results[idx][1].histogram,
++                "Wrong histogram for results %d (%s)" % (idx, key))
++
++    def test_get_category_times(self):
++        self.setUpRequests()
++        category_times = self.db.get_category_times()
++        self.assertStatsAreEquals(CATEGORY_STATS, category_times)
++
++    def test_get_url_times(self):
++        self.setUpRequests()
++        url_times = self.db.get_top_urls_times()
++        self.assertStatsAreEquals(TOP_3_URL_STATS, url_times)
++
++    def test_get_pageid_times(self):
++        self.setUpRequests()
++        pageid_times = self.db.get_pageid_times()
++        self.assertStatsAreEquals(PAGEID_STATS, pageid_times)
++
++    def test___add__(self):
++        # Ensure that adding two RequestTimes together result in
++        # a merge of their constituencies.
++        db1 = self.db
++        db2 = RequestTimes(self.categories, FakeOptions())
++        db1.add_request(FakeRequest('/', 1.5, 5, 1.0, '+root'))
++        db1.add_request(FakeRequest('/bugs', 3.5, 15, 1.0, '+bugs'))
++        db2.add_request(FakeRequest('/bugs/1', 5.0, 30, 4.0, '+bug'))
++        results = db1 + db2
++        self.assertEquals(3, results.category_times[0][1].total_hits)
++        self.assertEquals(0, results.category_times[1][1].total_hits)
++        self.assertEquals(2, results.category_times[2][1].total_hits)
++        self.assertEquals(1, results.pageid_times['+root'].total_hits)
++        self.assertEquals(1, results.pageid_times['+bugs'].total_hits)
++        self.assertEquals(1, results.pageid_times['+bug'].total_hits)
++        self.assertEquals(1, results.url_times['/'].total_hits)
++        self.assertEquals(1, results.url_times['/bugs'].total_hits)
++        self.assertEquals(1, results.url_times['/bugs/1'].total_hits)
++
++    def test_histogram_init_with_resolution(self):
++        # Test that the resolution parameter increase the number of bins
++        db = RequestTimes(
++            self.categories, FakeOptions(timeout=4, resolution=1))
++        self.assertEquals(5, db.histogram_width)
++        self.assertEquals(1, db.histogram_resolution)
++        db = RequestTimes(
++            self.categories, FakeOptions(timeout=4, resolution=0.5))
++        self.assertEquals(9, db.histogram_width)
++        self.assertEquals(0.5, db.histogram_resolution)
++        db = RequestTimes(
++            self.categories, FakeOptions(timeout=4, resolution=2))
++        self.assertEquals(3, db.histogram_width)
++        self.assertEquals(2, db.histogram_resolution)
++
++
++class TestOnlineStats(TestCase):
++    """Tests for the OnlineStats class."""
++
++    def test___add__(self):
++        # Ensure that adding two OnlineStats merge all their constituencies.
++        stats1 = OnlineStats(4, 1)
++        stats1.update(FakeRequest('/', 2.0, 5, 1.5))
++        stats2 = OnlineStats(4, 1)
++        stats2.update(FakeRequest('/', 1.5, 2, 3.0))
++        stats2.update(FakeRequest('/', 5.0, 2, 2.0))
++        results = stats1 + stats2
++        self.assertEquals(3, results.total_hits)
++        self.assertEquals(2, results.median)
++        self.assertEquals(9, results.total_sqlstatements)
++        self.assertEquals(2, results.median_sqlstatements)
++        self.assertEquals(6.5, results.total_sqltime)
++        self.assertEquals(2.0, results.median_sqltime)
++        self.assertEquals(
++            Histogram.from_bins_data([[0, 0], [1, 1], [2, 1], [3, 1]]),
++            results.histogram)
++
++
++class TestOnlineStatsCalculator(TestCase):
++    """Tests for the online stats calculator."""
++
++    def setUp(self):
++        TestCase.setUp(self)
++        self.stats = OnlineStatsCalculator()
++
++    def test_stats_for_empty_set(self):
++        # Test the stats when there is no input.
++        self.assertEquals(0, self.stats.count)
++        self.assertEquals(0, self.stats.sum)
++        self.assertEquals(0, self.stats.mean)
++        self.assertEquals(0, self.stats.variance)
++        self.assertEquals(0, self.stats.std)
++
++    def test_stats_for_one_value(self):
++        # Test the stats when adding one element.
++        self.stats.update(5)
++        self.assertEquals(1, self.stats.count)
++        self.assertEquals(5, self.stats.sum)
++        self.assertEquals(5, self.stats.mean)
++        self.assertEquals(0, self.stats.variance)
++        self.assertEquals(0, self.stats.std)
++
++    def test_None_are_ignored(self):
++        self.stats.update(None)
++        self.assertEquals(0, self.stats.count)
++
++    def test_stats_for_3_values(self):
++        for x in [3, 6, 9]:
++            self.stats.update(x)
++        self.assertEquals(3, self.stats.count)
++        self.assertEquals(18, self.stats.sum)
++        self.assertEquals(6, self.stats.mean)
++        self.assertEquals(6, self.stats.variance)
++        self.assertEquals("2.45", "%.2f" % self.stats.std)
++
++    def test___add___two_empty_together(self):
++        stats2 = OnlineStatsCalculator()
++        results = self.stats + stats2
++        self.assertEquals(0, results.count)
++        self.assertEquals(0, results.sum)
++        self.assertEquals(0, results.mean)
++        self.assertEquals(0, results.variance)
++
++    def test___add___one_empty(self):
++        stats2 = OnlineStatsCalculator()
++        for x in [1, 2, 3]:
++            self.stats.update(x)
++        results = self.stats + stats2
++        self.assertEquals(3, results.count)
++        self.assertEquals(6, results.sum)
++        self.assertEquals(2, results.mean)
++        self.assertEquals(2, results.M2)
++
++    def test___add__(self):
++        stats2 = OnlineStatsCalculator()
++        for x in [3, 6, 9]:
++            self.stats.update(x)
++        for x in [1, 2, 3]:
++            stats2.update(x)
++        results = self.stats + stats2
++        self.assertEquals(6, results.count)
++        self.assertEquals(24, results.sum)
++        self.assertEquals(4, results.mean)
++        self.assertEquals(44, results.M2)
++
++
++SHUFFLE_RANGE_100 = [
++    25, 79, 99, 76, 60, 63, 87, 77, 51, 82, 42, 96, 93, 58, 32, 66, 75,
++     2, 26, 22, 11, 73, 61, 83, 65, 68, 44, 81, 64, 3, 33, 34, 15, 1,
++    92, 27, 90, 74, 46, 57, 59, 31, 13, 19, 89, 29, 56, 94, 50, 49, 62,
++    37, 21, 35, 5, 84, 88, 16, 8, 23, 40, 6, 48, 10, 97, 0, 53, 17, 30,
++    18, 43, 86, 12, 71, 38, 78, 36, 7, 45, 47, 80, 54, 39, 91, 98, 24,
++    55, 14, 52, 20, 69, 85, 95, 28, 4, 9, 67, 70, 41, 72,
++    ]
++
++
++class TestOnlineApproximateMedian(TestCase):
++    """Tests for the approximate median computation."""
++
++    def setUp(self):
++        TestCase.setUp(self)
++        self.estimator = OnlineApproximateMedian()
++
++    def test_median_is_0_when_no_input(self):
++        self.assertEquals(0, self.estimator.median)
++
++    def test_median_is_true_median_for_n_lower_than_bucket_size(self):
++        for x in range(9):
++            self.estimator.update(x)
++        self.assertEquals(4, self.estimator.median)
++
++    def test_None_input_is_ignored(self):
++        self.estimator.update(1)
++        self.estimator.update(None)
++        self.assertEquals(1, self.estimator.median)
++
++    def test_approximate_median_is_good_enough(self):
++        for x in SHUFFLE_RANGE_100:
++            self.estimator.update(x)
++        # True median is 50, 49 is good enough :-)
++        self.assertIn(self.estimator.median, range(49,52))
++
++    def test___add__(self):
++        median1 = OnlineApproximateMedian(3)
++        median1.buckets = [[1, 3], [4, 5], [6, 3]]
++        median2 = OnlineApproximateMedian(3)
++        median2.buckets = [[], [3, 6], [3, 7]]
++        results = median1 + median2
++        self.assertEquals([[1, 3], [6], [3, 7], [4]], results.buckets)
++
++
++class TestHistogram(TestCase):
++    """Test the histogram computation."""
++
++    def test__init__(self):
++        hist = Histogram(4, 1)
++        self.assertEquals(4, hist.bins_count)
++        self.assertEquals(1, hist.bins_size)
++        self.assertEquals([[0, 0], [1, 0], [2, 0], [3, 0]], hist.bins)
++
++    def test__init__bins_size_float(self):
++        hist = Histogram(9, 0.5)
++        self.assertEquals(9, hist.bins_count)
++        self.assertEquals(0.5, hist.bins_size)
++        self.assertEquals(
++            [[0, 0], [0.5, 0], [1.0, 0], [1.5, 0],
++             [2.0, 0], [2.5, 0], [3.0, 0], [3.5, 0], [4.0, 0]], hist.bins)
++
++    def test_update(self):
++        hist = Histogram(4, 1)
++        hist.update(1)
++        self.assertEquals(1, hist.count)
++        self.assertEquals([[0, 0], [1, 1], [2, 0], [3, 0]], hist.bins)
++
++        hist.update(1.3)
++        self.assertEquals(2, hist.count)
++        self.assertEquals([[0, 0], [1, 2], [2, 0], [3, 0]], hist.bins)
++
++    def test_update_float_bin_size(self):
++        hist = Histogram(4, 0.5)
++        hist.update(1.3)
++        self.assertEquals([[0, 0], [0.5, 0], [1.0, 1], [1.5, 0]], hist.bins)
++        hist.update(0.5)
++        self.assertEquals([[0, 0], [0.5, 1], [1.0, 1], [1.5, 0]], hist.bins)
++        hist.update(0.6)
++        self.assertEquals([[0, 0], [0.5, 2], [1.0, 1], [1.5, 0]], hist.bins)
++
++    def test_update_max_goes_in_last_bin(self):
++        hist = Histogram(4, 1)
++        hist.update(9)
++        self.assertEquals([[0, 0], [1, 0], [2, 0], [3, 1]], hist.bins)
++
++    def test_bins_relative(self):
++        hist = Histogram(4, 1)
++        for x in range(4):
++            hist.update(x)
++        self.assertEquals(
++            [[0, 0.25], [1, 0.25], [2, 0.25], [3, 0.25]], hist.bins_relative)
++
++    def test_from_bins_data(self):
++        hist = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
++        self.assertEquals(4, hist.bins_count)
++        self.assertEquals(1, hist.bins_size)
++        self.assertEquals(6, hist.count)
++        self.assertEquals([[0, 1], [1, 3], [2, 1], [3, 1]], hist.bins)
++
++    def test___repr__(self):
++        hist = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
++        self.assertEquals(
++            "<Histogram [[0, 1], [1, 3], [2, 1], [3, 1]]>", repr(hist))
++
++    def test___eq__(self):
++        hist1 = Histogram(4, 1)
++        hist2 = Histogram(4, 1)
++        self.assertEquals(hist1, hist2)
++
++    def test__eq___with_data(self):
++        hist1 = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
++        hist2 = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
++        self.assertEquals(hist1, hist2)
++
++    def test___add__(self):
++        hist1 = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
++        hist2 = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
++        hist3 = Histogram.from_bins_data([[0, 2], [1, 6], [2, 2], [3, 2]])
++        total = hist1 + hist2
++        self.assertEquals(hist3, total)
++        self.assertEquals(12, total.count)
++
++    def test___add___uses_widest(self):
++        # Make sure that the resulting histogram is as wide as the widest one.
++        hist1 = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
++        hist2 = Histogram.from_bins_data(
++            [[0, 1], [1, 3], [2, 1], [3, 1], [4, 2], [5, 3]])
++        hist3 = Histogram.from_bins_data(
++            [[0, 2], [1, 6], [2, 2], [3, 2], [4, 2], [5, 3]])
++        self.assertEquals(hist3, hist1 + hist2)
++
++    def test___add___interpolate_lower_resolution(self):
++        # Make sure that when the other histogram has a bigger bin_size
++        # the frequency is correctly split across the different bins.
++        hist1 = Histogram.from_bins_data(
++            [[0, 1], [0.5, 3], [1.0, 1], [1.5, 1]])
++        hist2 = Histogram.from_bins_data(
++            [[0, 1], [1, 2], [2, 3], [3, 1], [4, 1]])
++
++        hist3 = Histogram.from_bins_data(
++            [[0, 1.5], [0.5, 3.5], [1.0, 2], [1.5, 2],
++            [2.0, 1.5], [2.5, 1.5], [3.0, 0.5], [3.5, 0.5],
++            [4.0, 0.5], [4.5, 0.5]])
++        self.assertEquals(hist3, hist1 + hist2)
++
++    def test___add___higher_resolution(self):
++        # Make sure that when the other histogram has a smaller bin_size
++        # the frequency is correctly added.
++        hist1 = Histogram.from_bins_data([[0, 1], [1, 2], [2, 3]])
++        hist2 = Histogram.from_bins_data(
++            [[0, 1], [0.5, 3], [1.0, 1], [1.5, 1], [2.0, 3], [2.5, 1],
++             [3, 4], [3.5, 2]])
++
++        hist3 = Histogram.from_bins_data([[0, 5], [1, 4], [2, 7], [3, 6]])
++        self.assertEquals(hist3, hist1 + hist2)
 === added file 'versions.cfg'
 --- versions.cfg	1970-01-01 00:00:00 +0000
 +++ versions.cfg	2012-08-09 04:56:19 +0000
@@ -0,0 +1,111 @@
++[buildout]
++versions = versions
++
++[versions]
++# Alphabetical, case-insensitive, please! :-)
++fixtures = 0.3.9
++pytz = 2012c
++RestrictedPython = 3.5.1
++setuptools = 0.6c11
++testtools = 0.9.14
++transaction = 1.0.0
++# Also upgrade the zc.buildout version in the Makefile's bin/buildout section.
++zc.buildout = 1.5.1
++zc.lockfile = 1.0.0
++zc.recipe.egg = 1.3.2
++z3c.recipe.scripts = 1.0.1
++zc.zservertracelog = 1.1.5
++ZConfig = 2.9.1dev-20110728
++zdaemon = 2.0.4
++ZODB3 = 3.9.2
++zope.annotation = 3.5.0
++zope.app.applicationcontrol = 3.5.1
++zope.app.appsetup = 3.12.0
++zope.app.authentication = 3.6.1
++zope.app.basicskin = 3.4.1
++zope.app.component = 3.8.3
++zope.app.container = 3.8.0
++zope.app.form = 3.8.1
++zope.app.pagetemplate = 3.7.1
++zope.app.publication = 3.9.0
++zope.app.publisher = 3.10.0
++zope.app.server = 3.4.2
++zope.app.wsgi = 3.6.0
++zope.authentication = 3.7.0
++zope.broken = 3.5.0
++zope.browser = 1.2
++zope.browsermenu = 3.9.0
++zope.browserpage = 3.9.0
++zope.browserresource = 3.9.0
++zope.cachedescriptors = 3.5.0
++zope.component = 3.9.3
++zope.componentvocabulary = 1.0
++zope.configuration = 3.6.0
++zope.container = 3.9.0
++zope.contenttype = 3.5.0
++zope.copy = 3.5.0
++zope.copypastemove = 3.5.2
++zope.datetime = 3.4.0
++zope.deferredimport = 3.5.0
++zope.deprecation = 3.4.0
++zope.dottedname = 3.4.6
++zope.dublincore = 3.5.0
++zope.error = 3.7.0
++zope.event = 3.4.1
++zope.exceptions = 3.5.2
++zope.filerepresentation = 3.5.0
++zope.formlib = 3.6.0
++zope.hookable = 3.4.1
++zope.i18n = 3.7.1
++zope.i18nmessageid = 3.5.0
++zope.interface = 3.5.2
++zope.lifecycleevent = 3.5.2
++zope.location = 3.7.0
++zope.minmax = 1.1.1
++# Build of lp:~wallyworld/zope.pagetemplate/fix-isinstance
++# This version adds a small change to the traversal logic so that the
++# optimisation which applies if the object is a dict also works for subclasses
++# of dict. The change has been approved for merge into the official zope code
++# base. This patch is a temporary fix until the next official release.
++zope.pagetemplate = 3.5.0-p1
++zope.password = 3.5.1
++zope.processlifetime = 1.0
++zope.proxy = 3.5.0
++zope.ptresource = 3.9.0
++zope.publisher = 3.12.0
++zope.schema = 3.5.4
++zope.security = 3.7.1
++zope.server = 3.6.1
++zope.session = 3.9.1
++zope.site = 3.7.0
++zope.size = 3.4.1
++zope.tal = 3.5.1
++zope.tales = 3.4.0
++# p1 Build of lp:~mars/zope.testing/3.9.4-p1.  Fixes bugs 570380 and 587886.
++# p2 With patch for thread leaks to make them skips, fixes windmill errors
++#    with 'new threads' in hudson/ec2 builds.
++# p3 And always tear down layers, because thats the Right Thing To Do.
++# p4 fixes --subunit --list to really just list the tests.
++# p5 Build of lp:~launchpad/zope.testing/3.9.4-p5. Fixes bug #609986.
++# p6 reinstates fix from p4.  Build of lp:~launchpad/zope.testing/3.9.4-fork
++#    revision 26.
++# p7 was unused
++# p8 redirects stdout and stderr to a black hole device when --subunit is used
++# p9 adds the redirection of __stderr__ to a black hole device
++# p10 changed the test reporting to use test.id() rather than
++#     str(test) since only the id is unique.
++# p11 reverts p9.
++# p12 reverts p11, restoring p9.
++# p13 Add a new --require-unique flag to the testrunner. When set,
++#     this will cause the testrunner to check all tests IDs to ensure they
++#     haven't been loaded before. If it encounters a duplicate, it will
++#     raise an error and quit.
++# p14 Adds test data written to stderr and stdout into the subunit output.
++# p15 Fixed internal tests.
++# p16 Adds support for skips in Python 2.7.
++# p17 Fixes skip support for Python 2.6.
++# To build (use Python 2.6) run "python bootstrap.py; ./bin/buildout".  Then to
++#    build the distribution run "bin/buildout setup . sdist"
++# Make sure you have subunit installed.
++zope.testing = 3.9.4-p17
++zope.traversing = 3.8.0

Launchpad Developer Utilities

Merge lp:~lifeless/lp-dev-utils/ppr into lp:lp-dev-utils

Commit message

Description of the change

Preview Diff

Subscribers