Merge lp:~sinzui/launchpad/revision-karma-1 into lp:launchpad

Proposed by Curtis Hovey on 2012-12-03
Status: Merged
Approved by: Curtis Hovey on 2012-12-03
Approved revision: no longer in the source branch.
Merged at revision: 16337
Proposed branch: lp:~sinzui/launchpad/revision-karma-1
Merge into: lp:launchpad
Diff against target: 357 lines (+50/-167)
3 files modified
lib/lp/code/model/revision.py (+6/-19)
lib/lp/code/model/tests/test_revision.py (+11/-59)
lib/lp/code/scripts/tests/test_revisionkarma.py (+33/-89)
To merge this branch: bzr merge lp:~sinzui/launchpad/revision-karma-1
Reviewer Review Type Date Requested Status
Benji York (community) code 2012-12-03 Approve on 2012-12-03
Review via email: mp+137660@code.launchpad.net

Commit Message

Make allocate-revision-karma fast.

Description of the Change

allocate-revision-karma.py is having its DB connection reaped while it's
running, because the query in RevisionSet
getRevisionsNeedingKarmaAllocated is terribly slow.

Right now, we have 21.1 million revisions. 15.9 million of these are
flagged as not having karma allocated. These 15.9 million revisions are
the combination of revisions that have never been processed, or only
exist in +junk branches, or only exist in branches owned by invalid
people. Each run, the scanner needs to check all 15.9 million revisions.

--------------------------------------------------------------------

RULES

    Pre-implementation: stub
    * There are a few parts to a fix, both in lp/code/model/revision.py
      * in getRevisionsNeedingKarmaAllocated should simply return
        revisions where karma_allocated IS FALSE.
      * in allocateKarma, self.karma_allocated should always be set
        to True.
      * This changes the existing behavior, so if a revision first ends
        up on the system in a +junk branch or in a branch owned by an
        invalid user (such as ~vcs-imports), it will never have karma
        allocated to it. Lp will not longer give users karma for
        another user's commit.

QA

    * As a webops to run
      ./cronscripts/allocate-revision-karma.py
      for qastaging's script host (gandwana).
    * Verify there are no errors after a 10 minutes. The proc
      can be killed since we do not need to Verify all 15 million
      revisions.

LINT

    lib/lp/code/model/revision.py
    lib/lp/code/model/tests/test_revision.py
    lib/lp/code/scripts/tests/test_revisionkarma.py

TEST

    ./bin/test -vc -t TestRevisionKarma lp.code.model.tests.test_revision
    ./bin/test -vc lp.code.scripts.tests.test_revisionkarma

IMPLEMENTATION

I modified allocateKarma() to always set self.karma_allocated = True and
getRevisionsNeedingKarmaAllocated() to only look for elf.karma_allocated
is False. This simplification also allowed me to remove the dedupe code
from getRevisionsNeedingKarmaAllocated(). I removed the tests for
allocating karma to retargeted junk branches and claimed profiles with
imported revisions because we don't want to support these cases anymore.
All the tests in test_revisionkarma was invalidated, but I wrote 3 tests
to verify the 1 positive case and 2 negative cases that the script
iterates over.
    lib/lp/code/model/revision.py
    lib/lp/code/model/tests/test_revision.py
    lib/lp/code/scripts/tests/test_revisionkarma.py

To post a comment you must log in.
Benji York (benji) wrote :

The branch looks good, especially the code consolodation.

Changing the "fail.." calls to "assert..." calls was a nice touch.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/code/model/revision.py'
2--- lib/lp/code/model/revision.py 2012-09-19 01:19:35 +0000
3+++ lib/lp/code/model/revision.py 2012-12-03 19:40:28 +0000
4@@ -32,7 +32,6 @@
5 Asc,
6 Desc,
7 Join,
8- Not,
9 Or,
10 Select,
11 )
12@@ -60,7 +59,6 @@
13 from lp.registry.interfaces.person import validate_public_person
14 from lp.registry.interfaces.product import IProduct
15 from lp.registry.interfaces.projectgroup import IProjectGroup
16-from lp.registry.model.person import ValidPersonCache
17 from lp.services.database.bulk import create
18 from lp.services.database.constants import (
19 DEFAULT,
20@@ -130,6 +128,9 @@
21
22 def allocateKarma(self, branch):
23 """See `IRevision`."""
24+ # Always set karma_allocated to True so that Lp does not reprocess
25+ # junk and invalid user branches because they do not get karma.
26+ self.karma_allocated = True
27 # If we know who the revision author is, give them karma.
28 author = self.revision_author.person
29 if author is not None:
30@@ -143,8 +144,6 @@
31 karma_date = min(self.revision_date, self.date_created)
32 karma = branch.target.assignKarma(
33 author, 'revisionadded', karma_date)
34- if karma is not None:
35- self.karma_allocated = True
36 return karma
37 else:
38 return None
39@@ -457,23 +456,11 @@
40 @staticmethod
41 def getRevisionsNeedingKarmaAllocated(limit=None):
42 """See `IRevisionSet`."""
43- # Here to stop circular imports.
44- from lp.code.model.branch import Branch
45- from lp.code.model.branchrevision import BranchRevision
46-
47 store = IStore(Revision)
48- results_with_dupes = store.find(
49+ results = store.find(
50 Revision,
51- Revision.revision_author == RevisionAuthor.id,
52- RevisionAuthor.person == ValidPersonCache.id,
53- Not(Revision.karma_allocated),
54- BranchRevision.revision == Revision.id,
55- BranchRevision.branch == Branch.id,
56- Or(Branch.product != None, Branch.distroseries != None))[:limit]
57- # Eliminate duplicate rows, returning <= limit rows
58- return store.find(
59- Revision, Revision.id.is_in(
60- results_with_dupes.get_select_expr(Revision.id)))
61+ Revision.karma_allocated is False)[:limit]
62+ return results
63
64 @staticmethod
65 def getPublicRevisionsForPerson(person, day_limit=30):
66
67=== modified file 'lib/lp/code/model/tests/test_revision.py'
68--- lib/lp/code/model/tests/test_revision.py 2012-10-03 04:52:37 +0000
69+++ lib/lp/code/model/tests/test_revision.py 2012-12-03 19:40:28 +0000
70@@ -28,7 +28,6 @@
71 RevisionSet,
72 )
73 from lp.registry.model.karma import Karma
74-from lp.scripts.garbo import RevisionAuthorEmailLinker
75 from lp.services.database.interfaces import (
76 DEFAULT_FLAVOR,
77 IStoreSelector,
78@@ -36,7 +35,6 @@
79 )
80 from lp.services.database.sqlbase import cursor
81 from lp.services.identity.interfaces.account import AccountStatus
82-from lp.services.log.logger import DevNullLogger
83 from lp.testing import (
84 login,
85 logout,
86@@ -84,19 +82,19 @@
87 def test_revisionWithUnknownEmail(self):
88 # A revision when created does not have karma allocated.
89 rev = self.factory.makeRevision()
90- self.failIf(rev.karma_allocated)
91+ self.assertFalse(rev.karma_allocated)
92 # Even if the revision author is someone we know.
93 author = self.factory.makePerson()
94 rev = self.factory.makeRevision(
95 author=author.preferredemail.email)
96- self.failIf(rev.karma_allocated)
97+ self.assertFalse(rev.karma_allocated)
98
99 def test_noKarmaForUnknownAuthor(self):
100 # If the revision author is unknown, karma isn't allocated.
101 rev = self.factory.makeRevision()
102 branch = self.factory.makeProductBranch()
103 branch.createBranchRevision(1, rev)
104- self.failIf(rev.karma_allocated)
105+ self.assertTrue(rev.karma_allocated)
106
107 def test_noRevisionsNeedingAllocation(self):
108 # There are no outstanding revisions needing karma allocated.
109@@ -111,7 +109,7 @@
110 revision_date=datetime.now(pytz.UTC) - timedelta(days=5))
111 branch = self.factory.makeProductBranch()
112 branch.createBranchRevision(1, rev)
113- self.failUnless(rev.karma_allocated)
114+ self.assertTrue(rev.karma_allocated)
115 # Get the karma event.
116 [karma] = list(Store.of(author).find(
117 Karma,
118@@ -125,14 +123,15 @@
119
120 def test_karmaNotAllocatedForKnownAuthorWithInactiveAccount(self):
121 # If the revision author is known, but the account is not active,
122- # don't allocate karma.
123+ # don't allocate karma, and record that karma_allocated was done.
124 author = self.factory.makePerson()
125 rev = self.factory.makeRevision(
126 author=author.preferredemail.email)
127 author.account.status = AccountStatus.SUSPENDED
128 branch = self.factory.makeProductBranch()
129 branch.createBranchRevision(1, rev)
130- self.failIf(rev.karma_allocated)
131+ self.assertTrue(rev.karma_allocated)
132+ self.assertEqual(0, rev.revision_author.person.karma)
133 # Even though the revision author is connected to the person, since
134 # the account status is suspended, the person is not "valid", and so
135 # the revisions are not returned as needing karma allocated.
136@@ -140,66 +139,19 @@
137 [], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
138
139 def test_noKarmaForJunk(self):
140- # Revisions only associated with junk branches don't get karma.
141+ # Revisions only associated with junk branches don't get karma,
142+ # and Lp records that karma_allocated was done.
143 author = self.factory.makePerson()
144 rev = self.factory.makeRevision(
145 author=author.preferredemail.email)
146 branch = self.factory.makePersonalBranch()
147 branch.createBranchRevision(1, rev)
148- self.failIf(rev.karma_allocated)
149+ self.assertTrue(rev.karma_allocated)
150+ self.assertEqual(0, rev.revision_author.person.karma)
151 # Nor is this revision identified as needing karma allocated.
152 self.assertEqual(
153 [], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
154
155- def test_junkBranchMovedToProductNeedsKarma(self):
156- # A junk branch that moves to a product needs karma allocated.
157- author = self.factory.makePerson()
158- rev = self.factory.makeRevision(author=author)
159- branch = self.factory.makePersonalBranch()
160- branch.createBranchRevision(1, rev)
161- # Once the branch is connected to the revision, we now specify
162- # a product for the branch.
163- project = self.factory.makeProduct()
164- branch.setTarget(user=branch.owner, project=project)
165- # The revision is now identified as needing karma allocated.
166- self.assertEqual(
167- [rev], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
168-
169- def test_junkBranchMovedToPackageNeedsKarma(self):
170- # A junk branch that moves to a package needs karma allocated.
171- author = self.factory.makePerson()
172- rev = self.factory.makeRevision(author=author)
173- branch = self.factory.makePersonalBranch()
174- branch.createBranchRevision(1, rev)
175- # Once the branch is connected to the revision, we now specify
176- # a product for the branch.
177- source_package = self.factory.makeSourcePackage()
178- branch.setTarget(user=branch.owner, source_package=source_package)
179- # The revision is now identified as needing karma allocated.
180- self.assertEqual(
181- [rev], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
182-
183- def test_newRevisionAuthorLinkNeedsKarma(self):
184- # If Launchpad knows of revisions by a particular author, and later
185- # that authoer registers with launchpad, the revisions need karma
186- # allocated.
187- email = self.factory.getUniqueEmailAddress()
188- rev = self.factory.makeRevision(author=email)
189- branch = self.factory.makeProductBranch()
190- branch.createBranchRevision(1, rev)
191- self.failIf(rev.karma_allocated)
192- # Since the revision author is not known, the revisions do not at this
193- # stage need karma allocated.
194- self.assertEqual(
195- [], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
196- # The person registers with Launchpad.
197- self.factory.makePerson(email=email)
198- # Garbo runs the RevisionAuthorEmailLinker job.
199- RevisionAuthorEmailLinker(log=DevNullLogger()).run()
200- # Now the kama needs allocating.
201- self.assertEqual(
202- [rev], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
203-
204 def test_karmaDateForFutureRevisions(self):
205 # If the revision date is some time in the future, then the karma date
206 # is set to be the time that the revision was created.
207
208=== modified file 'lib/lp/code/scripts/tests/test_revisionkarma.py'
209--- lib/lp/code/scripts/tests/test_revisionkarma.py 2012-01-20 15:42:44 +0000
210+++ lib/lp/code/scripts/tests/test_revisionkarma.py 2012-12-03 19:40:28 +0000
211@@ -8,13 +8,9 @@
212 from storm.store import Store
213 import transaction
214
215-from lp.code.model.revision import RevisionSet
216 from lp.code.scripts.revisionkarma import RevisionKarmaAllocator
217 from lp.registry.model.karma import Karma
218-from lp.scripts.garbo import RevisionAuthorEmailLinker
219 from lp.services.config import config
220-from lp.services.identity.model.emailaddress import EmailAddressSet
221-from lp.services.log.logger import DevNullLogger
222 from lp.testing import TestCaseWithFactory
223 from lp.testing.dbuser import switch_dbuser
224 from lp.testing.layers import LaunchpadZopelessLayer
225@@ -25,99 +21,47 @@
226
227 layer = LaunchpadZopelessLayer
228
229- def setUp(self):
230- # Use an administrator for the factory
231- TestCaseWithFactory.setUp(self, 'admin@canonical.com')
232+ def runScript(self):
233+ transaction.commit()
234+ switch_dbuser(config.revisionkarma.dbuser)
235+ script = RevisionKarmaAllocator(
236+ 'test', config.revisionkarma.dbuser, ['-q'])
237+ script.main()
238
239- def assertOneKarmaEvent(self, person, product):
240- # Make sure there is one and only one karma event for the person and
241- # product.
242- result = Store.of(person).find(
243+ def assertKarmaEvent(self, person, product, count):
244+ # Make sure the count of karma events matches the script iteration
245+ # over revision.allocateKarma()
246+ instance = person or product
247+ result = Store.of(instance).find(
248 Karma,
249 Karma.person == person,
250 Karma.product == product)
251- self.assertEqual(1, result.count())
252-
253- def test_junkBranchMoved(self):
254- # When a junk branch is moved to a product, the revision author will
255- # get karma on the product.
256- author = self.factory.makePerson()
257- rev = self.factory.makeRevision(
258- author=author.preferredemail.email)
259+ self.assertEqual(count, result.count())
260+
261+ def test_branch_allocated_karma(self):
262+ # Revision authors that are Lp users are awarded karma for non-junk
263+ # branches.
264+ author = self.factory.makePerson()
265+ rev = self.factory.makeRevision(author=author.preferredemail.email)
266+ branch = self.factory.makeBranch()
267+ branch.createBranchRevision(1, rev)
268+ self.assertTrue(rev.karma_allocated)
269+ self.assertKarmaEvent(author, branch.product, 1)
270+
271+ def test_junk_branch_not_allocated_karma(self):
272+ # Revision authors do not get karma for junk branches.
273+ author = self.factory.makePerson()
274+ rev = self.factory.makeRevision(author=author.preferredemail.email)
275 branch = self.factory.makePersonalBranch()
276 branch.createBranchRevision(1, rev)
277- # Once the branch is connected to the revision, we now specify
278- # a product for the branch.
279- project = self.factory.makeProduct()
280- branch.setTarget(user=branch.owner, project=project)
281- # Commit and switch to the script db user.
282- transaction.commit()
283- switch_dbuser(config.revisionkarma.dbuser)
284- script = RevisionKarmaAllocator(
285- 'test', config.revisionkarma.dbuser, ['-q'])
286- script.main()
287- self.assertOneKarmaEvent(author, branch.product)
288+ self.assertTrue(rev.karma_allocated)
289+ self.assertKarmaEvent(author, branch.product, 0)
290
291- def test_newRevisionAuthor(self):
292- # When a user validates an email address that is part of a revision
293- # author, and that author has revisions associated with a product, we
294- # give the karma to the user.
295+ def test_unknown_revision_author_not_allocated_karma(self):
296+ # Karma is not allocated when the revision author is not an Lp user.
297 email = self.factory.getUniqueEmailAddress()
298 rev = self.factory.makeRevision(author=email)
299 branch = self.factory.makeAnyBranch()
300 branch.createBranchRevision(1, rev)
301- transaction.commit()
302- self.failIf(rev.karma_allocated)
303- # Since the revision author is not known, the revisions do not at this
304- # stage need karma allocated.
305- self.assertEqual(
306- [], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
307- # The person registers with Launchpad.
308- author = self.factory.makePerson(email=email)
309- transaction.commit()
310- # Run the RevisionAuthorEmailLinker garbo job.
311- RevisionAuthorEmailLinker(log=DevNullLogger()).run()
312- switch_dbuser(config.revisionkarma.dbuser)
313- script = RevisionKarmaAllocator(
314- 'test', config.revisionkarma.dbuser, ['-q'])
315- script.main()
316- self.assertOneKarmaEvent(author, branch.product)
317-
318- def test_ownerJunkBranchWithAnotherProductBranch(self):
319- # If the revision author has the revision in a junk branch but someone
320- # else has the revision in the ancestry of a branch associated with a
321- # product, then we use the branch with the product rather than the
322- # junk branch owned by the revision author.
323- author = self.factory.makePerson()
324- email = self.factory.getUniqueEmailAddress()
325- rev = self.factory.makeRevision(author=email)
326- branch = self.factory.makePersonalBranch(owner=author)
327- branch.createBranchRevision(1, rev)
328- self.failIf(rev.karma_allocated)
329- # Now we have a junk branch which has a revision with an email address
330- # that is not yet claimed by the author.
331-
332- # Now create a non junk branch owned by someone else that has the
333- # revision.
334- b2 = self.factory.makeProductBranch()
335- # Put the author's revision in the ancestry.
336- b2.createBranchRevision(None, rev)
337-
338- # Now link the revision author to the author.
339- author.validateAndEnsurePreferredEmail(
340- EmailAddressSet().new(email, author))
341- transaction.commit()
342- # Run the RevisionAuthorEmailLinker garbo job.
343- RevisionAuthorEmailLinker(log=DevNullLogger()).run()
344-
345- # Now that the revision author is linked to the person, the revision
346- # needs karma allocated.
347- self.assertEqual(
348- [rev], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
349-
350- # Commit and switch to the script db user.
351- switch_dbuser(config.revisionkarma.dbuser)
352- script = RevisionKarmaAllocator(
353- 'test', config.revisionkarma.dbuser, ['-q'])
354- script.main()
355- self.assertOneKarmaEvent(author, b2.product)
356+ self.assertTrue(rev.karma_allocated)
357+ self.assertKarmaEvent(rev.revision_author.person, branch.product, 0)