Merge lp:~ev/oops-repository/bucket-versions into lp:~daisy-pluckers/oops-repository/trunk

Proposed by Evan on 2013-04-15
Status: Merged
Merged at revision: 69
Proposed branch: lp:~ev/oops-repository/bucket-versions
Merge into: lp:~daisy-pluckers/oops-repository/trunk
Diff against target: 115 lines (+78/-4)
3 files modified
oopsrepository/oopses.py (+30/-4)
oopsrepository/schema.py (+22/-0)
oopsrepository/tests/test_oopses.py (+26/-0)
To merge this branch: bzr merge lp:~ev/oops-repository/bucket-versions
Reviewer Review Type Date Requested Status
Daisy Pluckers 2013-04-15 Pending
Review via email: mp+158946@code.launchpad.net

Description of the change

This branch changes the updating of the package versions for each bucket. It now keeps track of which release as well as which version the bucket was updated for (ordering the latter using the dpkg binary version number comparator).

It uses three new column families:

BucketVersionsFull - A wide row of OOPS IDs for each combination of (bucket, DistroRelease, binary package version).

BucketVersionsCount - A counter column family that keeps track of the (DistroRelease, binary package version) count for each bucket. The binary package version is ordered using the dpkg comparator, which produces the following:
{('Ubuntu 12.04', '1.0~ppa1') : 12, ('Ubuntu 12.04', '1.0') : 9, ('Ubuntu 12.10', '1.0') : 36, ...}

BucketVersionsDay - A column family that's just used to keep track of which buckets we've updated the versions for today (today as when they were bucketed, not when the crash occurred - see the comment in the code). We'll use this to repair counters that were over/under-counted.

I'm tempted to only add to this when an exception is encountered when modifying the counter, rather than it grow with the BucketVersionsFull CF. We could also use a TTL to expire the data after a week or two (since it's not a counter CF). Can you think of any reason we might not want to do this? Is there any reason you can think of to keep around the knowledge of when we incremented the counters for BucketVersionsCount?

I haven't been able to think up one myself yet, but I'm not ruling it out :)

There's a script in lp:daisy that will be used to back-populate this data (which is updated to match this merge as https://code.launchpad.net/~ev/daisy/bucket-versions/+merge/158948).

To post a comment you must log in.
Brian Murray (brian-murray) wrote :

In the Description you indicate that BucketVersionsCount is "(DistroRelease, binary package version)", however the code is written so that it is "bv_count.add(bucketid, (version, release))". I'm not sure why but I'd expect the ordering to have release before version instead of the other way around. Is there a specifc reason it is done with version before release?

Brian Murray (brian-murray) wrote :

Additionally, why is pycassa.util.uuid.UUID used? It seems to be just the same as uuid.UUID.

Evan (ev) wrote :

I've made the requested fixes as r68 (uuid) and r69 (release and version swap).

Evan (ev) wrote :

I made the supporting change to lp:daisy (as the lp:~ev/daisy/bucket-versions is now merged) as r316.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'oopsrepository/oopses.py'
2--- oopsrepository/oopses.py 2013-04-12 14:10:15 +0000
3+++ oopsrepository/oopses.py 2013-04-15 14:57:05 +0000
4@@ -186,11 +186,37 @@
5 daybucketcount_cf.add(resolution, bucketid)
6 return day_key
7
8-def update_bucket_versions(config, bucketid, version):
9+def update_bucket_versions(config, bucketid, version,
10+ release=None, oopsid=None):
11+
12 pool = connection_pool(config)
13- bucketversions_cf = pycassa.ColumnFamily(pool, 'BucketVersions',
14- retry_counter_mutations=True)
15- bucketversions_cf.add(bucketid, version)
16+ if release:
17+ bv_full = pycassa.ColumnFamily(pool, 'BucketVersionsFull')
18+ bv_count = pycassa.ColumnFamily(pool, 'BucketVersionsCount')
19+ bv_day = pycassa.ColumnFamily(pool, 'BucketVersionsDay')
20+
21+ # Use the current day, rather than the day of the OOPS because this is
22+ # specifically used for cleaning up counters nightly. If a very old
23+ # OOPS gets processed here, we should still clean it up when we're
24+ # handling the data for today.
25+ day_key = time.strftime('%Y%m%d', time.gmtime())
26+
27+ o = pycassa.util.uuid.UUID(oopsid)
28+ # When correcting the counts in bv_count, we'll iterate
29+ # BucketVersionsDay for day_key. For each of these columns, we'll look
30+ # up the correct value by calling bv_full.get_count(...).
31+ bv_full.insert((bucketid, release, version), {o: ''})
32+ bv_day.insert(day_key, {(bucketid, release, version): ''})
33+
34+ try:
35+ bv_count.add(bucketid, (version, release))
36+ except pycassa.MaximumRetryException:
37+ # We'll clean these up nightly, so ignore timeouts.
38+ pass
39+ else:
40+ bucketversions_cf = pycassa.ColumnFamily(pool, 'BucketVersions',
41+ retry_counter_mutations=True)
42+ bucketversions_cf.add(bucketid, version)
43
44 def query_bucket_versions(config, bucketid):
45 pool = connection_pool(config)
46
47=== modified file 'oopsrepository/schema.py'
48--- oopsrepository/schema.py 2013-04-03 11:56:23 +0000
49+++ oopsrepository/schema.py 2013-04-15 14:57:05 +0000
50@@ -70,6 +70,28 @@
51 workaround_1779(mgr.create_column_family, keyspace, 'BucketVersions',
52 comparator_type=UTF8_TYPE,
53 default_validation_class=CounterColumnType())
54+ if 'BucketVersionsCount' not in cfs:
55+ dpkg_comparator = 'com.canonical.dpkgversiontype.DpkgVersionType'
56+ composite = CompositeType(dpkg_comparator, AsciiType())
57+ workaround_1779(mgr.create_column_family,
58+ keyspace,
59+ 'BucketVersionsCount',
60+ key_validation_class=UTF8_TYPE,
61+ comparator_type=composite,
62+ default_validation_class=CounterColumnType())
63+ if 'BucketVersionsFull' not in cfs:
64+ composite = CompositeType(UTF8Type(), AsciiType(), AsciiType())
65+ workaround_1779(mgr.create_column_family,
66+ keyspace,
67+ 'BucketVersionsFull',
68+ key_validation_class=composite,
69+ comparator_type=TIME_UUID_TYPE)
70+ if 'BucketVersionsDay' not in cfs:
71+ composite = CompositeType(UTF8Type(), AsciiType(), AsciiType())
72+ workaround_1779(mgr.create_column_family,
73+ keyspace,
74+ 'BucketVersionsDay',
75+ comparator_type=composite)
76 if 'BucketSystems' not in cfs:
77 workaround_1779(mgr.create_column_family, keyspace, 'BucketSystems',
78 key_validation_class=UTF8_TYPE,
79
80=== modified file 'oopsrepository/tests/test_oopses.py'
81--- oopsrepository/tests/test_oopses.py 2013-04-03 11:56:23 +0000
82+++ oopsrepository/tests/test_oopses.py 2013-04-15 14:57:05 +0000
83@@ -212,6 +212,32 @@
84 oopses.update_bucket_versions(self.config, 'bucket-id', '1.2.3')
85 self.assertEqual(bucketversions_cf.get('bucket-id')['1.2.3'], 1)
86
87+ bv_full = pycassa.ColumnFamily(self.pool, 'BucketVersionsFull')
88+ bv_count = pycassa.ColumnFamily(self.pool, 'BucketVersionsCount')
89+ u = uuid.uuid1()
90+ args = (self.config, 'bucket-id', '1.2.3', 'Ubuntu 12.04', str(u))
91+ oopses.update_bucket_versions(*args)
92+ d = bv_full.get(('bucket-id', 'Ubuntu 12.04', '1.2.3')).items()[0]
93+ self.assertEqual((u, ''), d)
94+ c = bv_count.get('bucket-id', columns=[('1.2.3', 'Ubuntu 12.04')])
95+ c = c.values()[0]
96+ self.assertEqual(1, c)
97+
98+ def test_dpkg_comparator(self):
99+ keyspace = self.useFixture(TemporaryOOPSDB()).keyspace
100+ bv_count = pycassa.ColumnFamily(self.pool, 'BucketVersionsCount')
101+ bv_count.add('bucket-id', ('1.0', 'release'))
102+ bv_count.get('bucket-id', columns=[('1.0', 'release')])
103+ bv_count.get('bucket-id', column_start=[('1.0', 'release')])
104+ bv_count.get('bucket-id', column_finish=[('1.0', 'release')])
105+ bv_count.add('bucket-id', ('1.0~ev1', 'release'))
106+ self.assertEqual(1,
107+ bv_count.get('bucket-id', column_count=1)[('1.0~ev1', 'release')])
108+ bv_count.add('bucket-id', ('1.0+ev1', 'release'))
109+ c = [('1.0+', 'release')]
110+ self.assertEqual(1,
111+ bv_count.get('bucket-id', column_start=c)[('1.0+ev1', 'release')])
112+
113 def test_update_errors_by_release(self):
114 keyspace = self.useFixture(TemporaryOOPSDB()).keyspace
115 firsterror = pycassa.ColumnFamily(self.pool, 'FirstError')

Subscribers

People subscribed via source and target branches

to all changes: