Merge lp:~abentley/launchpad/bulk-insert into lp:launchpad

Proposed by Aaron Bentley
Status: Merged
Approved by: j.c.sackett
Approved revision: no longer in the source branch.
Merged at revision: 14865
Proposed branch: lp:~abentley/launchpad/bulk-insert
Merge into: lp:launchpad
Diff against target: 638 lines (+201/-134)
13 files modified
lib/lp/code/configure.zcml (+2/-2)
lib/lp/code/interfaces/revision.py (+13/-2)
lib/lp/code/model/branch.py (+6/-8)
lib/lp/code/model/branchjob.py (+2/-3)
lib/lp/code/model/revision.py (+83/-60)
lib/lp/code/model/tests/test_branchjob.py (+3/-4)
lib/lp/code/model/tests/test_revision.py (+41/-1)
lib/lp/codehosting/scanner/buglinks.py (+4/-5)
lib/lp/codehosting/scanner/bzrsync.py (+15/-21)
lib/lp/codehosting/scanner/events.py (+13/-20)
lib/lp/codehosting/scanner/tests/test_buglinks.py (+3/-5)
lib/lp/codehosting/scanner/tests/test_bzrsync.py (+3/-3)
lib/lp/testing/factory.py (+13/-0)
To merge this branch: bzr merge lp:~abentley/launchpad/bulk-insert
Reviewer Review Type Date Requested Status
j.c.sackett (community) Approve
Richard Harding (community) code* Approve
Review via email: mp+94271@code.launchpad.net

Commit message

Optimize branch scanner.

Description of the change

= Summary =
Partially address bug #808930: Timeout running branch scanner job

== Proposed fix ==
Optimize DB access in the branch scanner by doing bulk row insertions.

This takes a full scan of lp:~irar/gcc-linaro/slp-for-any-nops-4.6/ down to ~2 minutes locally.

== Pre-implementation notes ==
Discussed with gmb

== Implementation details ==
Change INewRevisionEvent to INewMainlineRevisions event, so that multiple revisions can be handled at once. The INewRevisionEvent subscribers only cared about mainline revisions, so providing only mainline revisions simplifies the code path.

Similarly, change got_new_revision to got_new_mainline_revisions.

Change createBranchRevisionFromIDs to perform bulk insertions using new Storm syntax.

Change RevisionSet.acquireRevisionAuthor to acquireRevisionAuthors, so that it does efficient queries for existing RevisionAuthors.

Change RevisionSet.newFromBazaarRevision to newFromBazaarRevisions. It now inserts multiple rows at once, using bulk.create.

Simplify RevisionSet.isPresent (The original did not give a performance advantage.)

syncOneRevision becomes syncRevisions.

Scanner loops use chunks of 10k, as this shows some performance advantage over 1k loops. (100k does not.)

== Tests ==
bin/test -vt test_branchjob -t test_revision -t test_buglinks -t test_bzrsync

== Demo and Q/A ==
None

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/codehosting/scanner/tests/test_bzrsync.py
  lib/lp/code/configure.zcml
  lib/lp/codehosting/scanner/events.py
  lib/lp/code/model/tests/test_branchjob.py
  lib/lp/codehosting/scanner/buglinks.py
  lib/lp/testing/factory.py
  lib/lp/code/interfaces/revision.py
  lib/lp/code/model/branch.py
  lib/lp/codehosting/scanner/tests/test_buglinks.py
  lib/lp/codehosting/scanner/bzrsync.py
  lib/lp/code/model/tests/test_revision.py
  lib/lp/code/model/revision.py
  lib/lp/code/model/branchjob.py

To post a comment you must log in.
Revision history for this message
Aaron Bentley (abentley) wrote :

lifeless has approved LOC increase: "I agree that your branch should land because of decreased costs elsewhere paying for the code..."

Revision history for this message
Richard Harding (rharding) wrote :

Overall this looks good. As someone not familiar with the code in question a few comments would have helped. It took me a lot of reads through newFromBazaarRevisions to trace what I think is going on in there. I look forward to a day when we can use namedtuples!

Is there a general frowning on *args? A lot of places needed to be updated to change their calling arguments to be wrapped in a list:
    RevisionSet().newFromBazaarRevisions([bzr_revision])

I can't help but wonder if allowing these to come in as
    def newFromBazaarRevisions(*revisions):

would end up being a bit cleaner. I've not seen much of that practice so far though.

review: Approve (code*)
Revision history for this message
Aaron Bentley (abentley) wrote :

> Is there a general frowning on *args?

I don't know if there's a general one, but I tend to avoid them for this kind of use, because it makes it harder to change the signature later. *args and optional arguments don't play well together. When "foo(*branches)" becomes "foo(dry_run=False, *branches)", lots of code breaks, and the method becomes more awkward to call. [And foo(*args, **kwargs) isn't explicit that dry_run is an acceptable kwarg]

I feel like the gain from using *args that way is negligible, and there are real problems, so I avoid it.

I'm comfortable using *args and **kwargs when I'm wrapping things.

Aaron

Revision history for this message
j.c.sackett (jcsackett) wrote :

Aaron--

This looks good; I agree with Rick's request for a bit more comments in newFromBazaarRevisions--while it eventually becomes clear what data a given loop or clause is assembling, I wouldn't mind a quick "# Get parent data out of revisions" or similar to help a reader keep track of what's going on where.

Rick--

You're right that the *args format would be cleaner in some cases, but here it looks like the more common case is that the param passed in is a list already, in which case the function would have to do something like `revisions = args[0]`. While I agree it may be a good idea to start raising the use of the *args pattern more, I don't think it's a net gain here.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/code/configure.zcml'
2--- lib/lp/code/configure.zcml 2011-12-24 17:49:30 +0000
3+++ lib/lp/code/configure.zcml 2012-02-24 02:20:24 +0000
4@@ -514,8 +514,8 @@
5 for="lp.codehosting.scanner.events.IRevisionsRemoved"
6 handler="lp.codehosting.scanner.email.send_removed_revision_emails"/>
7 <subscriber
8- for="lp.codehosting.scanner.events.INewRevision"
9- handler="lp.codehosting.scanner.buglinks.got_new_revision"/>
10+ for="lp.codehosting.scanner.events.INewMainlineRevisions"
11+ handler="lp.codehosting.scanner.buglinks.got_new_mainline_revisions"/>
12 <subscriber
13 for="lp.codehosting.scanner.events.IScanCompleted"
14 handler="lp.codehosting.scanner.mergedetection.auto_merge_proposals"/>
15
16=== modified file 'lib/lp/code/interfaces/revision.py'
17--- lib/lp/code/interfaces/revision.py 2011-12-24 16:54:44 +0000
18+++ lib/lp/code/interfaces/revision.py 2012-02-24 02:20:24 +0000
19@@ -134,8 +134,19 @@
20 parent_ids, properties):
21 """Create a new Revision with the given revision ID."""
22
23- def newFromBazaarRevision(bzr_revision):
24- """Create a new Revision from the given Bazaar Revision object."""
25+ def newFromBazaarRevisions(revision_batch):
26+ """Create new Revisions from the given Bazaar Revisions.
27+
28+ This method allows us to insert Revisions in bulk, which is
29+ important for larger branches.
30+ """
31+
32+ def acquireRevisionAuthors(names):
33+ """Return a dict of RevisionAuthors with the specified names.
34+
35+ If a RevisionAuthor is not present in the database, it is created
36+ first.
37+ """
38
39 def getTipRevisionsForBranches(branches):
40 """Get the tip branch revisions for the specified branches.
41
42=== modified file 'lib/lp/code/model/branch.py'
43--- lib/lp/code/model/branch.py 2012-02-20 02:07:55 +0000
44+++ lib/lp/code/model/branch.py 2012-02-24 02:20:24 +0000
45@@ -28,6 +28,7 @@
46 And,
47 Count,
48 Desc,
49+ Insert,
50 NamedFunc,
51 Not,
52 Or,
53@@ -132,7 +133,9 @@
54 validate_public_person,
55 )
56 from lp.services.config import config
57-from lp.services.database.bulk import load_related
58+from lp.services.database.bulk import (
59+ load_related,
60+ )
61 from lp.services.database.constants import (
62 DEFAULT,
63 UTC_NOW,
64@@ -883,13 +886,8 @@
65 CREATE TEMPORARY TABLE RevidSequence
66 (revision_id text, sequence integer)
67 """)
68- data = []
69- for revid, sequence in revision_id_sequence_pairs:
70- data.append('(%s, %s)' % sqlvalues(revid, sequence))
71- data = ', '.join(data)
72- store.execute(
73- "INSERT INTO RevidSequence (revision_id, sequence) VALUES %s"
74- % data)
75+ store.execute(Insert(('revision_id', 'sequence'),
76+ table=['RevidSequence'], expr=revision_id_sequence_pairs))
77 store.execute(
78 """
79 INSERT INTO BranchRevision (branch, revision, sequence)
80
81=== modified file 'lib/lp/code/model/branchjob.py'
82--- lib/lp/code/model/branchjob.py 2012-02-15 22:01:27 +0000
83+++ lib/lp/code/model/branchjob.py 2012-02-24 02:20:24 +0000
84@@ -657,11 +657,10 @@
85 merged_revisions = self.getMergedRevisionIDs(revision_id, graph)
86 authors = self.getAuthors(merged_revisions, graph)
87 revision_set = RevisionSet()
88- rev_authors = set(revision_set.acquireRevisionAuthor(author) for
89- author in authors)
90+ rev_authors = revision_set.acquireRevisionAuthors(authors)
91 outf = StringIO()
92 pretty_authors = []
93- for rev_author in rev_authors:
94+ for rev_author in rev_authors.values():
95 if rev_author.person is None:
96 displayname = rev_author.name
97 else:
98
99=== modified file 'lib/lp/code/model/revision.py'
100--- lib/lp/code/model/revision.py 2011-12-30 06:14:56 +0000
101+++ lib/lp/code/model/revision.py 2012-02-24 02:20:24 +0000
102@@ -25,7 +25,6 @@
103 ForeignKey,
104 IntCol,
105 SQLMultipleJoin,
106- SQLObjectNotFound,
107 StringCol,
108 )
109 from storm.expr import (
110@@ -61,6 +60,7 @@
111 from lp.registry.interfaces.product import IProduct
112 from lp.registry.interfaces.projectgroup import IProjectGroup
113 from lp.registry.model.person import ValidPersonCache
114+from lp.services.database.bulk import create
115 from lp.services.database.constants import (
116 DEFAULT,
117 UTC_NOW,
118@@ -73,18 +73,12 @@
119 from lp.services.database.sqlbase import (
120 quote,
121 SQLBase,
122- sqlvalues,
123 )
124 from lp.services.helpers import shortlist
125 from lp.services.identity.interfaces.emailaddress import (
126 EmailAddressStatus,
127 IEmailAddressSet,
128 )
129-from lp.services.webapp.interfaces import (
130- DEFAULT_FLAVOR,
131- IStoreSelector,
132- MAIN_STORE,
133- )
134
135
136 class Revision(SQLBase):
137@@ -277,13 +271,13 @@
138 properties = {}
139 if _date_created is None:
140 _date_created = UTC_NOW
141- author = self.acquireRevisionAuthor(revision_author)
142+ authors = self.acquireRevisionAuthors([revision_author])
143
144 revision = Revision(
145 revision_id=revision_id,
146 log_body=log_body,
147 revision_date=revision_date,
148- revision_author=author,
149+ revision_author=authors[revision_author],
150 date_created=_date_created)
151 # Don't create future revisions.
152 if revision.revision_date > revision.date_created:
153@@ -303,22 +297,29 @@
154
155 return revision
156
157- def acquireRevisionAuthor(self, name):
158- """Find or create the RevisionAuthor with the specified name.
159+ def acquireRevisionAuthors(self, author_names):
160+ """Find or create the RevisionAuthors with the specified names.
161
162- Name may be any arbitrary string, but if it is an email-id, and
163+ A name may be any arbitrary string, but if it is an email-id, and
164 its email address is a verified email address, it will be
165 automatically linked to the corresponding Person.
166
167 Email-ids come in two major forms:
168 "Foo Bar" <foo@bar.com>
169 foo@bar.com (Foo Bar)
170+ :return: a dict of name -> RevisionAuthor
171 """
172- # create a RevisionAuthor if necessary:
173- try:
174- return RevisionAuthor.byName(name)
175- except SQLObjectNotFound:
176- return self._createRevisionAuthor(name)
177+ store = IMasterStore(Revision)
178+ author_names = set(author_names)
179+ authors = {}
180+ for author in store.find(RevisionAuthor,
181+ RevisionAuthor.name.is_in(author_names)):
182+ authors[author.name] = author
183+ missing = author_names - set(authors.keys())
184+ # create missing RevisionAuthors
185+ for name in missing:
186+ authors[name] = self._createRevisionAuthor(name)
187+ return authors
188
189 def _timestampToDatetime(self, timestamp):
190 """Convert the given timestamp to a datetime object.
191@@ -339,54 +340,76 @@
192 revision_date += timedelta(seconds=timestamp - int_timestamp)
193 return revision_date
194
195- def newFromBazaarRevision(self, bzr_revision):
196+ def newFromBazaarRevisions(self, revisions):
197 """See `IRevisionSet`."""
198- revision_id = bzr_revision.revision_id
199- revision_date = self._timestampToDatetime(bzr_revision.timestamp)
200- authors = bzr_revision.get_apparent_authors()
201- # XXX: JonathanLange 2009-05-01 bug=362686: We can only have one
202- # author per revision, so we use the first on the assumption that
203- # this is the primary author.
204- try:
205- author = authors[0]
206- except IndexError:
207- author = None
208- return self.new(
209- revision_id=revision_id,
210- log_body=bzr_revision.message,
211- revision_date=revision_date,
212- revision_author=author,
213- parent_ids=bzr_revision.parent_ids,
214- properties=bzr_revision.properties)
215+
216+ # Find all author names for these revisions.
217+ author_names = []
218+ for bzr_revision in revisions:
219+ authors = bzr_revision.get_apparent_authors()
220+ try:
221+ author = authors[0]
222+ except IndexError:
223+ author = None
224+ author_names.append(author)
225+ # Get or make every RevisionAuthor for these revisions.
226+ revision_authors = dict(
227+ (name, author.id) for name, author in
228+ self.acquireRevisionAuthors(author_names).items())
229+
230+ # Collect all data for making Revision objects.
231+ data = []
232+ for bzr_revision, author_name in zip(revisions, author_names):
233+ revision_id = bzr_revision.revision_id
234+ revision_date = self._timestampToDatetime(bzr_revision.timestamp)
235+ revision_author = revision_authors[author_name]
236+
237+ data.append(
238+ (revision_id, bzr_revision.message, revision_date,
239+ revision_author))
240+ # Create all Revision objects.
241+ db_revisions = create((
242+ Revision.revision_id, Revision.log_body, Revision.revision_date,
243+ Revision.revision_author_id), data)
244+
245+ # Map revision_id to Revision database ID.
246+ revision_db_id = dict(
247+ (rev.revision_id, rev.id) for rev in db_revisions)
248+
249+ # Collect all data for making RevisionParent and RevisionProperty
250+ # objects.
251+ parent_data = []
252+ property_data = []
253+ for bzr_revision in revisions:
254+ db_id = revision_db_id[bzr_revision.revision_id]
255+ # Property data: revision DB id, name, value.
256+ for name, value in bzr_revision.properties.iteritems():
257+ property_data.append((db_id, name, value))
258+ parent_ids = bzr_revision.parent_ids
259+ # Parent data: revision DB id, sequence, revision_id
260+ seen_parents = set()
261+ for sequence, parent_id in enumerate(parent_ids):
262+ if parent_id in seen_parents:
263+ continue
264+ seen_parents.add(parent_id)
265+ parent_data.append((db_id, sequence, parent_id))
266+ # Create all RevisionParent objects.
267+ create((
268+ RevisionParent.revisionID, RevisionParent.sequence,
269+ RevisionParent.parent_id), parent_data, return_created=False)
270+
271+ # Create all RevisionProperty objects.
272+ create((
273+ RevisionProperty.revisionID, RevisionProperty.name,
274+ RevisionProperty.value), property_data, return_created=False)
275
276 @staticmethod
277 def onlyPresent(revids):
278 """See `IRevisionSet`."""
279- if not revids:
280- return set()
281- store = getUtility(IStoreSelector).get(MAIN_STORE, DEFAULT_FLAVOR)
282- store.execute(
283- """
284- CREATE TEMPORARY TABLE Revids
285- (revision_id text)
286- """)
287- data = []
288- for revid in revids:
289- data.append('(%s)' % sqlvalues(revid))
290- data = ', '.join(data)
291- store.execute(
292- "INSERT INTO Revids (revision_id) VALUES %s" % data)
293- result = store.execute(
294- """
295- SELECT Revids.revision_id
296- FROM Revids, Revision
297- WHERE Revids.revision_id = Revision.revision_id
298- """)
299- present = set()
300- for row in result.get_all():
301- present.add(row[0])
302- store.execute("DROP TABLE Revids")
303- return present
304+ store = IStore(Revision)
305+ clause = Revision.revision_id.is_in(revids)
306+ present = store.find(Revision.revision_id, clause)
307+ return set(present)
308
309 def checkNewVerifiedEmail(self, email):
310 """See `IRevisionSet`."""
311
312=== modified file 'lib/lp/code/model/tests/test_branchjob.py'
313--- lib/lp/code/model/tests/test_branchjob.py 2012-02-15 22:01:27 +0000
314+++ lib/lp/code/model/tests/test_branchjob.py 2012-02-24 02:20:24 +0000
315@@ -392,10 +392,9 @@
316 existing = branch.getBranchRevision(
317 revision_id=bzr_revision.revision_id)
318 if existing is None:
319- revision = RevisionSet().newFromBazaarRevision(bzr_revision)
320- else:
321- revision = RevisionSet().getByRevisionId(
322- bzr_revision.revision_id)
323+ RevisionSet().newFromBazaarRevisions([bzr_revision])
324+ revision = RevisionSet().getByRevisionId(
325+ bzr_revision.revision_id)
326 try:
327 revno = bzr_branch.revision_id_to_revno(revision.revision_id)
328 except bzr_errors.NoSuchRevision:
329
330=== modified file 'lib/lp/code/model/tests/test_revision.py'
331--- lib/lp/code/model/tests/test_revision.py 2011-12-30 06:14:56 +0000
332+++ lib/lp/code/model/tests/test_revision.py 2012-02-24 02:20:24 +0000
333@@ -9,6 +9,7 @@
334 datetime,
335 timedelta,
336 )
337+from testtools.matchers import Equals
338 import time
339 from unittest import TestCase
340
341@@ -27,7 +28,9 @@
342 )
343 from lp.registry.model.karma import Karma
344 from lp.scripts.garbo import RevisionAuthorEmailLinker
345-from lp.services.database.lpstorm import IMasterObject
346+from lp.services.database.lpstorm import (
347+ IMasterObject,
348+ )
349 from lp.services.database.sqlbase import cursor
350 from lp.services.identity.interfaces.account import AccountStatus
351 from lp.services.log.logger import DevNullLogger
352@@ -39,11 +42,13 @@
353 from lp.testing import (
354 login,
355 logout,
356+ StormStatementRecorder,
357 TestCaseWithFactory,
358 time_counter,
359 )
360 from lp.testing.factory import LaunchpadObjectFactory
361 from lp.testing.layers import DatabaseFunctionalLayer
362+from lp.testing.matchers import HasQueryCount
363
364
365 class TestRevisionCreationDate(TestCaseWithFactory):
366@@ -257,6 +262,41 @@
367 found = self.revision_set.getByRevisionId('nonexistent')
368 self.assertIs(None, found)
369
370+ def test_newFromBazaarRevisions(self):
371+ # newFromBazaarRevisions behaves as expected.
372+ # only branchscanner can SELECT revisionproperties.
373+ self.becomeDbUser('branchscanner')
374+ bzr_revisions = [
375+ self.factory.makeBzrRevision('rev-1', prop1="foo"),
376+ self.factory.makeBzrRevision('rev-2', parent_ids=['rev-1'])
377+ ]
378+ with StormStatementRecorder() as recorder:
379+ self.revision_set.newFromBazaarRevisions(bzr_revisions)
380+ rev_1 = self.revision_set.getByRevisionId('rev-1')
381+ self.assertEqual(
382+ bzr_revisions[0].committer, rev_1.revision_author.name)
383+ self.assertEqual(
384+ bzr_revisions[0].message, rev_1.log_body)
385+ self.assertEqual(
386+ datetime(1970, 1, 1, 0, 0, tzinfo=pytz.UTC), rev_1.revision_date)
387+ self.assertEqual([], rev_1.parents)
388+ self.assertEqual({'prop1': 'foo'}, rev_1.getProperties())
389+ rev_2 = self.revision_set.getByRevisionId('rev-2')
390+ self.assertEqual(['rev-1'], rev_2.parent_ids)
391+ # Really, less than 9 is great, but if the count improves, we should
392+ # tighten this restriction.
393+ self.assertThat(recorder, HasQueryCount(Equals(8)))
394+
395+ def test_acquireRevisionAuthors(self):
396+ # AcquireRevisionAuthors creates new authors only if none exists with
397+ # that name.
398+ author1 = self.revision_set.acquireRevisionAuthors(['name1'])['name1']
399+ self.assertEqual(author1.name, 'name1')
400+ Store.of(author1).flush()
401+ author2 = self.revision_set.acquireRevisionAuthors(['name1'])['name1']
402+ self.assertEqual(
403+ removeSecurityProxy(author1).id, removeSecurityProxy(author2).id)
404+
405
406 class TestRevisionGetBranch(TestCaseWithFactory):
407 """Test the `getBranch` method of the revision."""
408
409=== modified file 'lib/lp/codehosting/scanner/buglinks.py'
410--- lib/lp/codehosting/scanner/buglinks.py 2011-05-27 21:12:25 +0000
411+++ lib/lp/codehosting/scanner/buglinks.py 2012-02-24 02:20:24 +0000
412@@ -89,8 +89,7 @@
413 registrant=getUtility(ILaunchpadCelebrities).janitor)
414
415
416-def got_new_revision(new_revision):
417- if new_revision.isMainline():
418- linker = BugBranchLinker(new_revision.db_branch)
419- linker.createBugBranchLinksForRevision(new_revision.bzr_revision)
420-
421+def got_new_mainline_revisions(new_mainline_revisions):
422+ linker = BugBranchLinker(new_mainline_revisions.db_branch)
423+ for bzr_revision in new_mainline_revisions.bzr_revisions:
424+ linker.createBugBranchLinksForRevision(bzr_revision)
425
426=== modified file 'lib/lp/codehosting/scanner/bzrsync.py'
427--- lib/lp/codehosting/scanner/bzrsync.py 2012-02-15 17:29:54 +0000
428+++ lib/lp/codehosting/scanner/bzrsync.py 2012-02-24 02:20:24 +0000
429@@ -48,6 +48,7 @@
430 if logger is None:
431 logger = logging.getLogger(self.__class__.__name__)
432 self.logger = logger
433+ self.revision_set = getUtility(IRevisionSet)
434
435 def syncBranchAndClose(self, bzr_branch=None):
436 """Synchronize the database with a Bazaar branch, handling locking.
437@@ -97,15 +98,9 @@
438 new_db_revs = (
439 new_ancestry - getUtility(IRevisionSet).onlyPresent(new_ancestry))
440 self.logger.info("Adding %s new revisions.", len(new_db_revs))
441- for revids in iter_list_chunks(list(new_db_revs), 1000):
442+ for revids in iter_list_chunks(list(new_db_revs), 10000):
443 revisions = self.getBazaarRevisions(bzr_branch, revids)
444- for revision in revisions:
445- # This would probably go much faster if we found some way to
446- # bulk-load multiple revisions at once, but as this is only
447- # executed for revisions new to Launchpad, it doesn't seem
448- # worth it at this stage.
449- self.syncOneRevision(
450- bzr_branch, revision, revids_to_insert)
451+ self.syncRevisions(bzr_branch, revisions, revids_to_insert)
452 self.deleteBranchRevisions(branchrevisions_to_delete)
453 self.insertBranchRevisions(bzr_branch, revids_to_insert)
454 transaction.commit()
455@@ -239,24 +234,23 @@
456 revisions = bzr_branch.repository.get_parent_map(revisions)
457 return bzr_branch.repository.get_revisions(revisions.keys())
458
459- def syncOneRevision(self, bzr_branch, bzr_revision, revids_to_insert):
460- """Import the revision with the given revision_id.
461+ def syncRevisions(self, bzr_branch, bzr_revisions, revids_to_insert):
462+ """Import the supplied revisions.
463
464 :param bzr_branch: The Bazaar branch that's being scanned.
465- :param bzr_revision: the revision to import
466+ :param bzr_revisions: the revisions to import
467 :type bzr_revision: bzrlib.revision.Revision
468 :param revids_to_insert: a dict of revision ids to integer
469 revno. Non-mainline revisions will be mapped to None.
470 """
471- revision_id = bzr_revision.revision_id
472- revision_set = getUtility(IRevisionSet)
473- # Revision not yet in the database. Load it.
474- self.logger.debug("Inserting revision: %s", revision_id)
475- db_revision = revision_set.newFromBazaarRevision(bzr_revision)
476- notify(
477- events.NewRevision(
478- self.db_branch, bzr_branch, db_revision, bzr_revision,
479- revids_to_insert[revision_id]))
480+ self.revision_set.newFromBazaarRevisions(bzr_revisions)
481+ mainline_revisions = []
482+ for bzr_revision in bzr_revisions:
483+ if revids_to_insert[bzr_revision.revision_id] is None:
484+ continue
485+ mainline_revisions.append(bzr_revision)
486+ notify(events.NewMainlineRevisions(
487+ self.db_branch, bzr_branch, mainline_revisions))
488
489 @staticmethod
490 def revisionsToInsert(added_history, last_revno, added_ancestry):
491@@ -293,7 +287,7 @@
492 self.logger.info("Inserting %d branchrevision records.",
493 len(revids_to_insert))
494 revid_seq_pairs = revids_to_insert.items()
495- for revid_seq_pair_chunk in iter_list_chunks(revid_seq_pairs, 1000):
496+ for revid_seq_pair_chunk in iter_list_chunks(revid_seq_pairs, 10000):
497 self.db_branch.createBranchRevisionFromIDs(revid_seq_pair_chunk)
498
499 def updateBranchStatus(self, bzr_history):
500
501=== modified file 'lib/lp/codehosting/scanner/events.py'
502--- lib/lp/codehosting/scanner/events.py 2010-10-15 18:33:07 +0000
503+++ lib/lp/codehosting/scanner/events.py 2012-02-24 02:20:24 +0000
504@@ -5,7 +5,7 @@
505
506 __metaclass__ = type
507 __all__ = [
508- 'NewRevision',
509+ 'NewMainlineRevisions',
510 'RevisionsRemoved',
511 'TipChanged',
512 ]
513@@ -32,18 +32,17 @@
514 self.bzr_branch = bzr_branch
515
516
517-class INewRevision(IObjectEvent):
518- """A new revision has been found in the branch."""
519-
520-
521-class NewRevision(ScannerEvent):
522- """A new revision has been found in the branch."""
523-
524- implements(INewRevision)
525-
526- def __init__(self, db_branch, bzr_branch, db_revision, bzr_revision,
527- revno):
528- """Construct a `NewRevision` event.
529+class INewMainlineRevisions(IObjectEvent):
530+ """A new revision has been found in the branch."""
531+
532+
533+class NewMainlineRevisions(ScannerEvent):
534+ """A new revision has been found in the branch."""
535+
536+ implements(INewMainlineRevisions)
537+
538+ def __init__(self, db_branch, bzr_branch, bzr_revisions):
539+ """Construct a `NewRevisions` event.
540
541 :param db_branch: The database branch.
542 :param bzr_branch: The Bazaar branch.
543@@ -53,13 +52,7 @@
544 mainline.
545 """
546 ScannerEvent.__init__(self, db_branch, bzr_branch)
547- self.db_revision = db_revision
548- self.bzr_revision = bzr_revision
549- self.revno = revno
550-
551- def isMainline(self):
552- """Is the new revision a mainline one?"""
553- return self.revno is not None
554+ self.bzr_revisions = bzr_revisions
555
556
557 class ITipChanged(IObjectEvent):
558
559=== modified file 'lib/lp/codehosting/scanner/tests/test_buglinks.py'
560--- lib/lp/codehosting/scanner/tests/test_buglinks.py 2012-02-07 06:48:22 +0000
561+++ lib/lp/codehosting/scanner/tests/test_buglinks.py 2012-02-24 02:20:24 +0000
562@@ -8,7 +8,6 @@
563 from bzrlib.revision import Revision
564 from zope.component import getUtility
565 from zope.event import notify
566-from zope.security.proxy import removeSecurityProxy
567
568 from lp.app.errors import NotFoundError
569 from lp.bugs.interfaces.bug import IBugSet
570@@ -270,10 +269,9 @@
571 revprops={
572 'bugs': 'https://launchpad.net/bugs/%d fixed' % bug.id})
573 bzr_revision = tree.branch.repository.get_revision(revision_id)
574- revno = 1
575 revision_set = getUtility(IRevisionSet)
576- db_revision = revision_set.newFromBazaarRevision(bzr_revision)
577- notify(events.NewRevision(
578- db_branch, tree.branch, db_revision, bzr_revision, revno))
579+ revision_set.newFromBazaarRevisions([bzr_revision])
580+ notify(events.NewMainlineRevisions(
581+ db_branch, tree.branch, [bzr_revision]))
582 bug_branch = getUtility(IBugBranchSet).getBugBranch(bug, db_branch)
583 self.assertIsNot(None, bug_branch)
584
585=== modified file 'lib/lp/codehosting/scanner/tests/test_bzrsync.py'
586--- lib/lp/codehosting/scanner/tests/test_bzrsync.py 2012-02-21 22:46:28 +0000
587+++ lib/lp/codehosting/scanner/tests/test_bzrsync.py 2012-02-24 02:20:24 +0000
588@@ -582,8 +582,8 @@
589 self.assertIn(merge_id, branchrevisions_to_delete)
590
591
592-class TestBzrSyncOneRevision(BzrSyncTestCase):
593- """Tests for `BzrSync.syncOneRevision`."""
594+class TestBzrSyncRevisions(BzrSyncTestCase):
595+ """Tests for `BzrSync.syncRevisions`."""
596
597 def setUp(self):
598 BzrSyncTestCase.setUp(self)
599@@ -606,7 +606,7 @@
600
601 # Sync the revision. The second parameter is a dict of revision ids
602 # to revnos, and will error if the revision id is not in the dict.
603- self.bzrsync.syncOneRevision(None, fake_rev, {'rev42': None})
604+ self.bzrsync.syncRevisions(None, [fake_rev], {'rev42': None})
605
606 # Find the revision we just synced and check that it has the correct
607 # date.
608
609=== modified file 'lib/lp/testing/factory.py'
610--- lib/lp/testing/factory.py 2012-02-20 02:07:55 +0000
611+++ lib/lp/testing/factory.py 2012-02-24 02:20:24 +0000
612@@ -48,6 +48,7 @@
613
614 from bzrlib.merge_directive import MergeDirective2
615 from bzrlib.plugins.builder.recipe import BaseRecipeBranch
616+from bzrlib.revision import Revision as BzrRevision
617 import pytz
618 from pytz import UTC
619 import simplejson
620@@ -1545,6 +1546,18 @@
621 return merge_proposal.generateIncrementalDiff(
622 old_revision, new_revision, diff)
623
624+ def makeBzrRevision(self, revision_id=None, parent_ids=None, **kwargs):
625+ if revision_id is None:
626+ revision_id = self.getUniqueString('revision-id')
627+ if parent_ids is None:
628+ parent_ids = []
629+ return BzrRevision(
630+ message=self.getUniqueString('message'),
631+ revision_id=revision_id,
632+ committer=self.getUniqueString('committer'),
633+ parent_ids=parent_ids,
634+ timestamp=0, timezone=0, properties=kwargs)
635+
636 def makeRevision(self, author=None, revision_date=None, parent_ids=None,
637 rev_id=None, log_body=None, date_created=None):
638 """Create a single `Revision`."""