Merge lp:~brian-murray/daisy/rebuild-bucketversions into lp:daisy

Proposed by Brian Murray
Status: Rejected
Rejected by: Brian Murray
Proposed branch: lp:~brian-murray/daisy/rebuild-bucketversions
Merge into: lp:daisy
Diff against target: 51 lines (+47/-0)
1 file modified
tools/rebuild_bucketversions.py (+47/-0)
To merge this branch: bzr merge lp:~brian-murray/daisy/rebuild-bucketversions
Reviewer Review Type Date Requested Status
Daisy Pluckers Pending
Review via email: mp+149176@code.launchpad.net

Description of the change

This requires a bit more logic than just copying the data from bucketversions to bucketversion as we will be recording the data in both column families for a bit it is possible the same crash and package version will exist in both of them. Subsequently we need to check the count for each package version and if it is greater in the old cf (bucketversions) we should add the difference between the old cf and the new cf to the new cf.

I've tested this by manipulating the data in the old cf and it worked for me, but I'd appreciate a second set of eyes.

To post a comment you must log in.

Unmerged revisions

215. By Brian Murray

add in a tool to copy data from bucketversions to bucketversion which uses the new package version comparator

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'tools/rebuild_bucketversions.py'
2--- tools/rebuild_bucketversions.py 1970-01-01 00:00:00 +0000
3+++ tools/rebuild_bucketversions.py 2013-02-18 22:52:28 +0000
4@@ -0,0 +1,47 @@
5+#!/usr/bin/python
6+
7+import sys
8+import pycassa
9+from pycassa.cassandra.ttypes import NotFoundException, InvalidRequestException
10+
11+configuration = None
12+try:
13+ import local_config as configuration
14+except ImportError:
15+ pass
16+
17+if not configuration:
18+ import configuration
19+
20+creds = {'username': configuration.cassandra_username,
21+ 'password': configuration.cassandra_password}
22+pool = pycassa.ConnectionPool(configuration.cassandra_keyspace,
23+ configuration.cassandra_hosts, timeout=600,
24+ max_retries=100, credentials=creds)
25+
26+bucketversion_cf = pycassa.ColumnFamily(pool, 'BucketVersion')
27+bucketversions_cf = pycassa.ColumnFamily(pool, 'BucketVersions')
28+
29+row_count = 0
30+dry_run = '--dry-run' in sys.argv
31+
32+for k,v in bucketversions_cf.get_range():
33+ row_count += 1
34+ if not dry_run:
35+ try:
36+ newversions = bucketversion_cf.get(k)
37+ except:
38+ newversions = None
39+ if newversions:
40+ for version in v:
41+ if newversions[version] == v[version]:
42+ continue
43+ if newversions[version] > v[version]:
44+ continue
45+ if newversions[version] < v[version]:
46+ diff = v[version] - newversions[version]
47+ bucketversion_cf.add(k, version, value=diff)
48+ else:
49+ bucketversion_cf.insert(k, v)
50+ if row_count % 100000 == 0:
51+ print 'Copied', row_count, 'rows.'

Subscribers

People subscribed via source and target branches

to all changes: