Merge lp:~ev/oops-repository/bucket-versions into lp:~daisy-pluckers/oops-repository/trunk

Proposed by Evan
Status: Merged
Merged at revision: 69
Proposed branch: lp:~ev/oops-repository/bucket-versions
Merge into: lp:~daisy-pluckers/oops-repository/trunk
Diff against target: 115 lines (+78/-4)
3 files modified
oopsrepository/oopses.py (+30/-4)
oopsrepository/schema.py (+22/-0)
oopsrepository/tests/test_oopses.py (+26/-0)
To merge this branch: bzr merge lp:~ev/oops-repository/bucket-versions
Reviewer Review Type Date Requested Status
Daisy Pluckers Pending
Review via email: mp+158946@code.launchpad.net

Description of the change

This branch changes the updating of the package versions for each bucket. It now keeps track of which release as well as which version the bucket was updated for (ordering the latter using the dpkg binary version number comparator).

It uses three new column families:

BucketVersionsFull - A wide row of OOPS IDs for each combination of (bucket, DistroRelease, binary package version).

BucketVersionsCount - A counter column family that keeps track of the (DistroRelease, binary package version) count for each bucket. The binary package version is ordered using the dpkg comparator, which produces the following:
{('Ubuntu 12.04', '1.0~ppa1') : 12, ('Ubuntu 12.04', '1.0') : 9, ('Ubuntu 12.10', '1.0') : 36, ...}

BucketVersionsDay - A column family that's just used to keep track of which buckets we've updated the versions for today (today as when they were bucketed, not when the crash occurred - see the comment in the code). We'll use this to repair counters that were over/under-counted.

I'm tempted to only add to this when an exception is encountered when modifying the counter, rather than it grow with the BucketVersionsFull CF. We could also use a TTL to expire the data after a week or two (since it's not a counter CF). Can you think of any reason we might not want to do this? Is there any reason you can think of to keep around the knowledge of when we incremented the counters for BucketVersionsCount?

I haven't been able to think up one myself yet, but I'm not ruling it out :)

There's a script in lp:daisy that will be used to back-populate this data (which is updated to match this merge as https://code.launchpad.net/~ev/daisy/bucket-versions/+merge/158948).

To post a comment you must log in.
Revision history for this message
Brian Murray (brian-murray) wrote :

In the Description you indicate that BucketVersionsCount is "(DistroRelease, binary package version)", however the code is written so that it is "bv_count.add(bucketid, (version, release))". I'm not sure why but I'd expect the ordering to have release before version instead of the other way around. Is there a specifc reason it is done with version before release?

Revision history for this message
Brian Murray (brian-murray) wrote :

Additionally, why is pycassa.util.uuid.UUID used? It seems to be just the same as uuid.UUID.

Revision history for this message
Evan (ev) wrote :

I've made the requested fixes as r68 (uuid) and r69 (release and version swap).

Revision history for this message
Evan (ev) wrote :

I made the supporting change to lp:daisy (as the lp:~ev/daisy/bucket-versions is now merged) as r316.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'oopsrepository/oopses.py'
--- oopsrepository/oopses.py 2013-04-12 14:10:15 +0000
+++ oopsrepository/oopses.py 2013-04-15 14:57:05 +0000
@@ -186,11 +186,37 @@
186 daybucketcount_cf.add(resolution, bucketid)186 daybucketcount_cf.add(resolution, bucketid)
187 return day_key187 return day_key
188188
189def update_bucket_versions(config, bucketid, version):189def update_bucket_versions(config, bucketid, version,
190 release=None, oopsid=None):
191
190 pool = connection_pool(config)192 pool = connection_pool(config)
191 bucketversions_cf = pycassa.ColumnFamily(pool, 'BucketVersions',193 if release:
192 retry_counter_mutations=True)194 bv_full = pycassa.ColumnFamily(pool, 'BucketVersionsFull')
193 bucketversions_cf.add(bucketid, version)195 bv_count = pycassa.ColumnFamily(pool, 'BucketVersionsCount')
196 bv_day = pycassa.ColumnFamily(pool, 'BucketVersionsDay')
197
198 # Use the current day, rather than the day of the OOPS because this is
199 # specifically used for cleaning up counters nightly. If a very old
200 # OOPS gets processed here, we should still clean it up when we're
201 # handling the data for today.
202 day_key = time.strftime('%Y%m%d', time.gmtime())
203
204 o = pycassa.util.uuid.UUID(oopsid)
205 # When correcting the counts in bv_count, we'll iterate
206 # BucketVersionsDay for day_key. For each of these columns, we'll look
207 # up the correct value by calling bv_full.get_count(...).
208 bv_full.insert((bucketid, release, version), {o: ''})
209 bv_day.insert(day_key, {(bucketid, release, version): ''})
210
211 try:
212 bv_count.add(bucketid, (version, release))
213 except pycassa.MaximumRetryException:
214 # We'll clean these up nightly, so ignore timeouts.
215 pass
216 else:
217 bucketversions_cf = pycassa.ColumnFamily(pool, 'BucketVersions',
218 retry_counter_mutations=True)
219 bucketversions_cf.add(bucketid, version)
194220
195def query_bucket_versions(config, bucketid):221def query_bucket_versions(config, bucketid):
196 pool = connection_pool(config)222 pool = connection_pool(config)
197223
=== modified file 'oopsrepository/schema.py'
--- oopsrepository/schema.py 2013-04-03 11:56:23 +0000
+++ oopsrepository/schema.py 2013-04-15 14:57:05 +0000
@@ -70,6 +70,28 @@
70 workaround_1779(mgr.create_column_family, keyspace, 'BucketVersions',70 workaround_1779(mgr.create_column_family, keyspace, 'BucketVersions',
71 comparator_type=UTF8_TYPE,71 comparator_type=UTF8_TYPE,
72 default_validation_class=CounterColumnType())72 default_validation_class=CounterColumnType())
73 if 'BucketVersionsCount' not in cfs:
74 dpkg_comparator = 'com.canonical.dpkgversiontype.DpkgVersionType'
75 composite = CompositeType(dpkg_comparator, AsciiType())
76 workaround_1779(mgr.create_column_family,
77 keyspace,
78 'BucketVersionsCount',
79 key_validation_class=UTF8_TYPE,
80 comparator_type=composite,
81 default_validation_class=CounterColumnType())
82 if 'BucketVersionsFull' not in cfs:
83 composite = CompositeType(UTF8Type(), AsciiType(), AsciiType())
84 workaround_1779(mgr.create_column_family,
85 keyspace,
86 'BucketVersionsFull',
87 key_validation_class=composite,
88 comparator_type=TIME_UUID_TYPE)
89 if 'BucketVersionsDay' not in cfs:
90 composite = CompositeType(UTF8Type(), AsciiType(), AsciiType())
91 workaround_1779(mgr.create_column_family,
92 keyspace,
93 'BucketVersionsDay',
94 comparator_type=composite)
73 if 'BucketSystems' not in cfs:95 if 'BucketSystems' not in cfs:
74 workaround_1779(mgr.create_column_family, keyspace, 'BucketSystems',96 workaround_1779(mgr.create_column_family, keyspace, 'BucketSystems',
75 key_validation_class=UTF8_TYPE,97 key_validation_class=UTF8_TYPE,
7698
=== modified file 'oopsrepository/tests/test_oopses.py'
--- oopsrepository/tests/test_oopses.py 2013-04-03 11:56:23 +0000
+++ oopsrepository/tests/test_oopses.py 2013-04-15 14:57:05 +0000
@@ -212,6 +212,32 @@
212 oopses.update_bucket_versions(self.config, 'bucket-id', '1.2.3')212 oopses.update_bucket_versions(self.config, 'bucket-id', '1.2.3')
213 self.assertEqual(bucketversions_cf.get('bucket-id')['1.2.3'], 1)213 self.assertEqual(bucketversions_cf.get('bucket-id')['1.2.3'], 1)
214214
215 bv_full = pycassa.ColumnFamily(self.pool, 'BucketVersionsFull')
216 bv_count = pycassa.ColumnFamily(self.pool, 'BucketVersionsCount')
217 u = uuid.uuid1()
218 args = (self.config, 'bucket-id', '1.2.3', 'Ubuntu 12.04', str(u))
219 oopses.update_bucket_versions(*args)
220 d = bv_full.get(('bucket-id', 'Ubuntu 12.04', '1.2.3')).items()[0]
221 self.assertEqual((u, ''), d)
222 c = bv_count.get('bucket-id', columns=[('1.2.3', 'Ubuntu 12.04')])
223 c = c.values()[0]
224 self.assertEqual(1, c)
225
226 def test_dpkg_comparator(self):
227 keyspace = self.useFixture(TemporaryOOPSDB()).keyspace
228 bv_count = pycassa.ColumnFamily(self.pool, 'BucketVersionsCount')
229 bv_count.add('bucket-id', ('1.0', 'release'))
230 bv_count.get('bucket-id', columns=[('1.0', 'release')])
231 bv_count.get('bucket-id', column_start=[('1.0', 'release')])
232 bv_count.get('bucket-id', column_finish=[('1.0', 'release')])
233 bv_count.add('bucket-id', ('1.0~ev1', 'release'))
234 self.assertEqual(1,
235 bv_count.get('bucket-id', column_count=1)[('1.0~ev1', 'release')])
236 bv_count.add('bucket-id', ('1.0+ev1', 'release'))
237 c = [('1.0+', 'release')]
238 self.assertEqual(1,
239 bv_count.get('bucket-id', column_start=c)[('1.0+ev1', 'release')])
240
215 def test_update_errors_by_release(self):241 def test_update_errors_by_release(self):
216 keyspace = self.useFixture(TemporaryOOPSDB()).keyspace242 keyspace = self.useFixture(TemporaryOOPSDB()).keyspace
217 firsterror = pycassa.ColumnFamily(self.pool, 'FirstError')243 firsterror = pycassa.ColumnFamily(self.pool, 'FirstError')

Subscribers

People subscribed via source and target branches

to all changes: