Launchpad itself

Merge lp:~wallyworld/launchpad/another-reporting-cache-garbojob-1071581 into lp:launchpad

another-reporting-cache-garbojob-1071581
Merge into devel

Proposed by Ian Booth on 2012-11-12

Status:

Merged

Approved by:

William Grant on 2012-11-14

Approved revision:

no longer in the source branch.

Merged at revision:

16266

Proposed branch:

lp:~wallyworld/launchpad/another-reporting-cache-garbojob-1071581

Merge into:

lp:launchpad

Diff against target:

848 lines (+263/-189)

10 files modified

lib/lp/registry/browser/tests/test_person.py (+3/-3)
lib/lp/registry/model/person.py (+15/-16)
lib/lp/scripts/garbo.py (+155/-137)
lib/lp/scripts/tests/test_garbo.py (+14/-16)
lib/lp/services/database/bulk.py (+5/-4)
lib/lp/services/database/stormexpr.py (+58/-0)
lib/lp/soyuz/configure.zcml (+2/-2)
lib/lp/soyuz/interfaces/reporting.py (+3/-3)
lib/lp/soyuz/model/reporting.py (+6/-6)
lib/lp/soyuz/stories/soyuz/xx-person-packages.txt (+2/-2)

To merge this branch:

bzr merge lp:~wallyworld/launchpad/another-reporting-cache-garbojob-1071581

Related bugs:

Bug #1071581: +ppa-packages timeout

Critical

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
William Grant (community)	code	2012-11-12	Approve on 2012-11-14
Review via email: mp+133859@code.launchpad.net

Commit message

Improve performance of latest source package release cache garbo job.

Description of the change

== Implementation ==

The previous version of the job selected single, distinct records to insert in the cache table, thus the query used suffered similar performance issues to the query which pulled the data live.

This version of the job iterates over SPPH records, ordered by id, starting at the watermark of the last processed spph id. The query joins the SPR table to pick up published SPRs.

A cache record update is done, using Greatest() for the date_uploaded column. Any records not existing already are inserted.

Hopefully this will run a lot faster than the previous version.

== Tests ==

Existing tests suffice.

== Lint ==

Linting changed files:
lib/lp/scripts/garbo.py
lib/lp/scripts/tests/test_garbo.py

Revision history for this message

Curtis Hovey (sinzui) wrote on 2012-11-12:

I don't think you wan to disable half of the production garbojobs or the threading. I think you need to uncomment them to restore the garbo features.

Revision history for this message

Ian Booth (wallyworld) wrote on 2012-11-12:

Bollocks. That was so I could debug. I forgot to uncomment it.

On Mon 12 Nov 2012 23:47:31 EST, Curtis Hovey wrote:
> I don't think you wan to disable half of the production garbojobs or the threading. I think you need to uncomment them to restore the garbo features.

Revision history for this message

William Grant (wgrant) wrote on 2012-11-13:

73 spph = ClassAlias(SourcePackagePublishingHistory, "spph")

I don't understand why this needs to be a ClassAlias; if you just want a shorter Python name for the class without aliasing in SQL then 'spph = SourcePackagePublishingHistory' is fine and relatively idiomatic.

113 + SourcePackageRelease.dateuploaded, Alias(spph.id, 'spph_id')),

The custom alias seems pointless here.

114 + spph.id > self.next_spph_id

That sounds, then, rather like it's actually last_spph_id.

132 + return self.getPendingUpdates().count() == 0

We don't care about the exact count. is_empty() is cheaper and does what we want.

134 - def update_cache(self, updates):
135 + def update_cache(self, update, inserts):

This should be updateCache.

167 + def perform_update(spr_id, creator_id, maintainer_id, archive_id,
168 + purpose, distroseries_id, spn_id, dateuploaded,
169 + spph_id):

The update here is perhaps slightly excessive. The only fields that can ever change are publication, date_uploaded, and sourcepackagerelease, and it will overwrite all but date_uploaded even if the old record is newer. I suspect we want the update to be conditional on its dateuploaded being newer (for correctness), and to avoid setting the immutable columns (for compactness and efficiency).

Additionally, you probably want to aggregate the inserts; I think the current code will crash if two of the same key show up new in a single batch. There's also likely to be benefit in applying the same aggregation technique to updates, not to avoid crashes but to avoid duplicating work. update_cache will likely end up compact enough to be inlined.

It's probably also cleaner to reword the update as a self.store.find(LPSPRC, LPSPRC.upload_archive_id == archive_id, [...], LPSPRC.dateuploaded < dateuploaded).set(dateuploaded=dateuploaded, [...]).

I'd lastly like to see the Storm class renamed to LatestPersonSourcePackageReleaseCache (note the 'Package' rather than 'package'), as the existing compound concept capitalisation scheme was phased out and exterminated from the codebase in around 2005.

209 + for update in (self.getPendingUpdates()[:chunk_size]):

Extraneous parens are extraneous.

245 - self.runDaily()
246 - self.runHourly()
247 +# self.runDaily()
248 +# self.runHourly()

This seems like an accidental change.

73	spph = ClassAlias(SourcePackagePublishingHistory, "spph")

113	+ SourcePackageRelease.dateuploaded, Alias(spph.id, 'spph_id')),

The custom alias seems pointless here.

114	+ spph.id > self.next_spph_id

That sounds, then, rather like it's actually last_spph_id.

132	+ return self.getPendingUpdates().count() == 0

We don't care about the exact count. is_empty() is cheaper and does what we want.

134	- def update_cache(self, updates):
135	+ def update_cache(self, update, inserts):

This should be updateCache.

167	+ def perform_update(spr_id, creator_id, maintainer_id, archive_id,
168	+ purpose, distroseries_id, spn_id, dateuploaded,
169	+ spph_id):

It's probably also cleaner to reword the update as a self.store.find(LPSPRC, LPSPRC.upload_archive_id == archive_id, [...], LPSPRC.dateuploaded < dateuploaded).set(dateuploaded=dateuploaded, [...]).

209	+ for update in (self.getPendingUpdates()[:chunk_size]):

Extraneous parens are extraneous.

245	- self.runDaily()
246	- self.runHourly()
247	+# self.runDaily()
248	+# self.runHourly()

This seems like an accidental change.

review: Approve (code)

Revision history for this message

William Grant (wgrant) wrote on 2012-11-13:

You'll also greatly benefit from ordering the getPendingUpdates joins the way we want the query to be executed: SPPH, SPR, then Archive.

And it might be worth looking at batching the updates (using a CASE expression is the best way, sadly), but that's a far less critical optimisation.

Revision history for this message

William Grant (wgrant) on 2012-11-14:

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Barki Mustapha

Celso Providelo

Christian Reis

Christy Awad

Colin Watson

Harpianto,ANDI

Ian Booth

James Troup

John A Meinel

Kevin bush

Launchpad code reviewers

Launchpad code reviewers from Canonical

Matthew Tanner

Maximiliano Bertacchini

Oguz Ersoz

Simon Brakhane

Ubuntu-BR DevOps

William Grant

alexsio nau

alhawiti

api.ng

pedro cavazos

todaioan

wenjingwen

to status/vote changes:

Tzaddi

Tzaddi Belding

 === modified file 'lib/lp/registry/browser/tests/test_person.py'
 --- lib/lp/registry/browser/tests/test_person.py	2012-11-08 05:55:16 +0000
 +++ lib/lp/registry/browser/tests/test_person.py	2012-11-14 07:45:27 +0000
@@ -37,7 +37,7 @@
+     )
  from lp.registry.model.karma import KarmaCategory
  from lp.registry.model.milestone import milestone_sort_key
--from lp.scripts.garbo import PopulateLatestPersonSourcepackageReleaseCache
++from lp.scripts.garbo import PopulateLatestPersonSourcePackageReleaseCache
  from lp.services.config import config
  from lp.services.features.testing import FeatureFixture
  from lp.services.identity.interfaces.account import AccountStatus
@@ -849,7 +849,7 @@
              spphs.append(spph)
          # Update the releases cache table.
          switch_dbuser('garbo_frequently')
--        job = PopulateLatestPersonSourcepackageReleaseCache(FakeLogger())
++        job = PopulateLatestPersonSourcePackageReleaseCache(FakeLogger())
          while not job.isDone():
              job(chunk_size=100)
          switch_dbuser('launchpad')
@@ -1062,7 +1062,7 @@
          self.build.archive = publisher.distroseries.main_archive
          # Update the releases cache table.
          switch_dbuser('garbo_frequently')
--        job = PopulateLatestPersonSourcepackageReleaseCache(FakeLogger())
++        job = PopulateLatestPersonSourcePackageReleaseCache(FakeLogger())
          while not job.isDone():
              job(chunk_size=100)
          switch_dbuser('launchpad')
 === modified file 'lib/lp/registry/model/person.py'
 --- lib/lp/registry/model/person.py	2012-11-08 05:35:59 +0000
 +++ lib/lp/registry/model/person.py	2012-11-14 07:45:27 +0000
@@ -328,7 +328,7 @@
      Archive,
      validate_ppa,
+     )
--from lp.soyuz.model.reporting import LatestPersonSourcepackageReleaseCache
++from lp.soyuz.model.reporting import LatestPersonSourcePackageReleaseCache
  from lp.soyuz.model.publishing import SourcePackagePublishingHistory
  from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
  from lp.translations.model.hastranslationimports import (
@@ -2270,8 +2270,8 @@
              ('QuestionSubscription', 'person'),
              ('SpecificationSubscription', 'person'),
              ('AnswerContact', 'person'),
--            ('LatestPersonSourcepackageReleaseCache', 'creator'),
--            ('LatestPersonSourcepackageReleaseCache', 'maintainer')]
++            ('LatestPersonSourcePackageReleaseCache', 'creator'),
++            ('LatestPersonSourcePackageReleaseCache', 'maintainer')]
          cur = cursor()
          for table, person_id_column in removals:
              cur.execute("DELETE FROM %s WHERE %s=%d"
@@ -2838,33 +2838,33 @@
          clauses = []
          if uploader_only:
              clauses.append(
--                LatestPersonSourcepackageReleaseCache.creator_id == self.id)
++                LatestPersonSourcePackageReleaseCache.creator_id == self.id)
          if ppa_only:
              # Source maintainer is irrelevant for PPA uploads.
              pass
          elif uploader_only:
--            lpspr = ClassAlias(LatestPersonSourcepackageReleaseCache, 'lpspr')
++            lpspr = ClassAlias(LatestPersonSourcePackageReleaseCache, 'lpspr')
              clauses.append(Not(Exists(Select(1,
              where=And(
                  lpspr.sourcepackagename_id ==
--                    LatestPersonSourcepackageReleaseCache.sourcepackagename_id,
++                    LatestPersonSourcePackageReleaseCache.sourcepackagename_id,
                  lpspr.upload_archive_id ==
--                    LatestPersonSourcepackageReleaseCache.upload_archive_id,
++                    LatestPersonSourcePackageReleaseCache.upload_archive_id,
                  lpspr.upload_distroseries_id ==
--                    LatestPersonSourcepackageReleaseCache.upload_distroseries_id,
++                    LatestPersonSourcePackageReleaseCache.upload_distroseries_id,
                  lpspr.archive_purpose != ArchivePurpose.PPA,
                  lpspr.maintainer_id == self.id),
              tables=lpspr))))
          else:
              clauses.append(
--                LatestPersonSourcepackageReleaseCache.maintainer_id == self.id)
++                LatestPersonSourcePackageReleaseCache.maintainer_id == self.id)
          if ppa_only:
              clauses.append(
--                LatestPersonSourcepackageReleaseCache.archive_purpose ==
++                LatestPersonSourcePackageReleaseCache.archive_purpose ==
                  ArchivePurpose.PPA)
          else:
              clauses.append(
--                LatestPersonSourcepackageReleaseCache.archive_purpose !=
++                LatestPersonSourcePackageReleaseCache.archive_purpose !=
                  ArchivePurpose.PPA)
          return clauses
@@ -2876,8 +2876,8 @@
              return self._legacy_hasReleasesQuery(uploader_only, ppa_only)
          clauses = self._releasesQueryFilter(uploader_only, ppa_only)
--        rs = Store.of(self).using(LatestPersonSourcepackageReleaseCache).find(
--            LatestPersonSourcepackageReleaseCache.publication_id, *clauses)
++        rs = Store.of(self).using(LatestPersonSourcePackageReleaseCache).find(
++            LatestPersonSourcePackageReleaseCache.publication_id, *clauses)
          return not rs.is_empty()
      def _latestReleasesQuery(self, uploader_only=False, ppa_only=False):
@@ -2889,8 +2889,8 @@
          clauses = self._releasesQueryFilter(uploader_only, ppa_only)
          rs = Store.of(self).find(
--            LatestPersonSourcepackageReleaseCache, *clauses).order_by(
--            Desc(LatestPersonSourcepackageReleaseCache.dateuploaded))
++            LatestPersonSourcePackageReleaseCache, *clauses).order_by(
++            Desc(LatestPersonSourcePackageReleaseCache.dateuploaded))
          def load_related_objects(rows):
              if rows and rows[0].maintainer_id:
@@ -2904,7 +2904,6 @@
          return DecoratedResultSet(rs, pre_iter_hook=load_related_objects)
--
      def _legacy_releasesQueryFilter(self, uploader_only=False, ppa_only=False):
          """Return the filter used to find sourcepackagereleases (SPRs)
          related to this person.
 === modified file 'lib/lp/scripts/garbo.py'
 --- lib/lp/scripts/garbo.py	2012-11-09 14:18:45 +0000
 +++ lib/lp/scripts/garbo.py	2012-11-14 07:45:27 +0000
@@ -31,22 +31,19 @@
  from psycopg2 import IntegrityError
  import pytz
  from storm.expr import (
--    Alias,
      And,
--    Desc,
      In,
--    Insert,
      Join,
      Like,
++    Max,
++    Min,
++    Or,
++    Row,
      Select,
++    SQL,
      Update,
+     )
  from storm.info import ClassAlias
--from storm.locals import (
--    Max,
--    Min,
--    SQL,
--    )
  from storm.store import EmptyResultSet
  import transaction
  from zope.component import getUtility
@@ -75,6 +72,10 @@
  from lp.registry.model.product import Product
  from lp.services.config import config
  from lp.services.database import postgresql
++from lp.services.database.bulk import (
++    create,
++    dbify_value,
++    )
  from lp.services.database.constants import UTC_NOW
  from lp.services.database.interfaces import (
      IStoreSelector,
@@ -87,6 +88,10 @@
      session_store,
      sqlvalues,
+     )
++from lp.services.database.stormexpr import (
++    BulkUpdate,
++    Values,
++    )
  from lp.services.features import (
      getFeatureFlag,
      install_feature_controller,
@@ -116,7 +121,7 @@
  from lp.services.verification.model.logintoken import LoginToken
  from lp.soyuz.model.archive import Archive
  from lp.soyuz.model.publishing import SourcePackagePublishingHistory
--from lp.soyuz.model.reporting import LatestPersonSourcepackageReleaseCache
++from lp.soyuz.model.reporting import LatestPersonSourcePackageReleaseCache
  from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
  from lp.translations.interfaces.potemplate import IPOTemplateSet
  from lp.translations.model.potmsgset import POTMsgSet
@@ -463,164 +468,177 @@
          transaction.commit()
--class PopulateLatestPersonSourcepackageReleaseCache(TunableLoop):
--    """Populate the LatestPersonSourcepackageReleaseCache table.
++class PopulateLatestPersonSourcePackageReleaseCache(TunableLoop):
++    """Populate the LatestPersonSourcePackageReleaseCache table.
--    The LatestPersonSourcepackageReleaseCache contains 2 sets of data, one set
--    for package maintainers and another for package creators. This job first
--    populates the creator data and then does the maintainer data.
++    The LatestPersonSourcePackageReleaseCache contains 2 sets of data, one set
++    for package maintainers and another for package creators. This job iterates
++    over the SPPH records, populating the cache table.
      """
      maximum_chunk_size = 1000
++    cache_columns = (
++        LatestPersonSourcePackageReleaseCache.maintainer_id,
++        LatestPersonSourcePackageReleaseCache.creator_id,
++        LatestPersonSourcePackageReleaseCache.upload_archive_id,
++        LatestPersonSourcePackageReleaseCache.upload_distroseries_id,
++        LatestPersonSourcePackageReleaseCache.sourcepackagename_id,
++        LatestPersonSourcePackageReleaseCache.archive_purpose,
++        LatestPersonSourcePackageReleaseCache.publication_id,
++        LatestPersonSourcePackageReleaseCache.dateuploaded,
++        LatestPersonSourcePackageReleaseCache.sourcepackagerelease_id,
++    )
++
      def __init__(self, log, abort_time=None):
--        super_cl = super(PopulateLatestPersonSourcepackageReleaseCache, self)
++        super_cl = super(PopulateLatestPersonSourcePackageReleaseCache, self)
          super_cl.__init__(log, abort_time)
          self.store = getUtility(IStoreSelector).get(MAIN_STORE, MASTER_FLAVOR)
          # Keep a record of the processed source package release id and data
          # type (creator or maintainer) so we know where to job got up to.
--        self.next_id_for_creator = 0
--        self.next_id_for_maintainer = 0
--        self.current_person_filter_type = 'creator'
--        self.starting_person_filter_type = self.current_person_filter_type
++        self.last_spph_id = 0
          self.job_name = self.__class__.__name__
          job_data = load_garbo_job_state(self.job_name)
          if job_data:
--            self.next_id_for_creator = job_data['next_id_for_creator']
--            self.next_id_for_maintainer = job_data['next_id_for_maintainer']
--            self.current_person_filter_type = job_data['person_filter_type']
--            self.starting_person_filter_type = self.current_person_filter_type
++            self.last_spph_id = job_data.get('last_spph_id', 0)
      def getPendingUpdates(self):
--        # Load the latest published source package release data keyed on either
--        # creator or maintainer as required.
--        if self.current_person_filter_type == 'creator':
--            person_filter = SourcePackageRelease.creatorID
--            next_id = self.next_id_for_creator
--        else:
--            person_filter = SourcePackageRelease.maintainerID
--            next_id = self.next_id_for_maintainer
--        spph = ClassAlias(SourcePackagePublishingHistory, "spph")
++        # Load the latest published source package release data.
++        spph = SourcePackagePublishingHistory
          origin = [
              SourcePackageRelease,
              Join(
                  spph,
                  And(spph.sourcepackagereleaseID == SourcePackageRelease.id,
--                    spph.archiveID == SourcePackageRelease.upload_archiveID))]
--        spr_select = self.store.using(*origin).find(
--            (SourcePackageRelease.id, Alias(spph.id, 'spph_id')),
--            SourcePackageRelease.id > next_id
--        ).order_by(
--            person_filter,
--            SourcePackageRelease.upload_distroseriesID,
--            SourcePackageRelease.sourcepackagenameID,
--            SourcePackageRelease.upload_archiveID,
--            Desc(SourcePackageRelease.dateuploaded),
--            SourcePackageRelease.id
--        ).config(distinct=(
--            person_filter,
--            SourcePackageRelease.upload_distroseriesID,
--            SourcePackageRelease.sourcepackagenameID,
--            SourcePackageRelease.upload_archiveID))._get_select()
--
--        spr = Alias(spr_select, 'spr')
--        origin = [
--            SourcePackageRelease,
--            Join(spr, SQL('spr.id') == SourcePackageRelease.id),
--            Join(Archive, Archive.id == SourcePackageRelease.upload_archiveID)]
++                    spph.archiveID == SourcePackageRelease.upload_archiveID)),
++            Join(Archive, Archive.id == spph.archiveID)]
          rs = self.store.using(*origin).find(
              (SourcePackageRelease.id,
--            person_filter,
++            SourcePackageRelease.creatorID,
++            SourcePackageRelease.maintainerID,
              SourcePackageRelease.upload_archiveID,
              Archive.purpose,
              SourcePackageRelease.upload_distroseriesID,
              SourcePackageRelease.sourcepackagenameID,
--            SourcePackageRelease.dateuploaded, SQL('spph_id'))
--        ).order_by(SourcePackageRelease.id)
++            SourcePackageRelease.dateuploaded, spph.id),
++            spph.id > self.last_spph_id
++        ).order_by(spph.id)
          return rs
      def isDone(self):
--        # If there is no more data to process for creators, switch over to
--        # processing data for maintainers, or visa versa.
--        current_count = self.getPendingUpdates().count()
--        if current_count == 0:
--            if (self.current_person_filter_type !=
--                self.starting_person_filter_type):
--                return True
--            if  self.current_person_filter_type == 'creator':
--                self.current_person_filter_type = 'maintainer'
--            else:
--                self.current_person_filter_type = 'creator'
--            current_count = self.getPendingUpdates().count()
--        return current_count == 0
--
--    def update_cache(self, updates):
--        # Update the LatestPersonSourcepackageReleaseCache table. Records for
--        # each creator/maintainer will either be new inserts or updates. We try
--        # to update first, and gather data for missing (new) records along the
--        # way. At the end, a bulk insert is done for any new data.
--        # Updates is a list of data records (tuples of values).
--        # Each record is keyed on:
--        # - (creator/maintainer), archive, distroseries, sourcepackagename
--        inserts = []
--        columns = (
--            LatestPersonSourcepackageReleaseCache.sourcepackagerelease_id,
--            LatestPersonSourcepackageReleaseCache.creator_id,
--            LatestPersonSourcepackageReleaseCache.maintainer_id,
--            LatestPersonSourcepackageReleaseCache.upload_archive_id,
--            LatestPersonSourcepackageReleaseCache.archive_purpose,
--            LatestPersonSourcepackageReleaseCache.upload_distroseries_id,
--            LatestPersonSourcepackageReleaseCache.sourcepackagename_id,
--            LatestPersonSourcepackageReleaseCache.dateuploaded,
--            LatestPersonSourcepackageReleaseCache.publication_id,
--        )
--        for update in updates:
--            (spr_id, person_id, archive_id, purpose,
--             distroseries_id, spn_id, dateuploaded, spph_id) = update
--            if self.current_person_filter_type == 'creator':
--                creator_id = person_id
--                maintainer_id = None
--            else:
--                creator_id = None
--                maintainer_id = person_id
--            values = (
--                spr_id, creator_id, maintainer_id, archive_id, purpose.value,
--                distroseries_id, spn_id, dateuploaded, spph_id)
--            data = dict(zip(columns, values))
--            result = self.store.execute(Update(
--                data, And(
--                LatestPersonSourcepackageReleaseCache.upload_archive_id ==
--                    archive_id,
--                LatestPersonSourcepackageReleaseCache.upload_distroseries_id ==
--                    distroseries_id,
--                LatestPersonSourcepackageReleaseCache.sourcepackagename_id ==
--                    spn_id,
--                LatestPersonSourcepackageReleaseCache.creator_id ==
--                    creator_id,
--                LatestPersonSourcepackageReleaseCache.maintainer_id ==
--                    maintainer_id)))
--            if result.rowcount == 0:
--                inserts.append(values)
++        return self.getPendingUpdates().is_empty()
++
++    def __call__(self, chunk_size):
++        cache_filter_data = []
++        new_records = dict()
++        # Create a map of new published spr data for creators and maintainers.
++        # The map is keyed on (creator/maintainer, archive, spn, distroseries).
++        for new_published_spr_data in self.getPendingUpdates()[:chunk_size]:
++            (spr_id, creator_id, maintainer_id, archive_id, purpose,
++             distroseries_id, spn_id, dateuploaded,
++             spph_id) = new_published_spr_data
++            cache_filter_data.append((archive_id, distroseries_id, spn_id))
++
++            value = (purpose, spph_id, dateuploaded, spr_id)
++            maintainer_key = (
++                maintainer_id, None, archive_id, distroseries_id, spn_id)
++            creator_key = (
++                None, creator_id, archive_id, distroseries_id, spn_id)
++            new_records[maintainer_key] = maintainer_key + value
++            new_records[creator_key] = creator_key + value
++            self.last_spph_id = spph_id
++
++        # Gather all the current cached reporting records corresponding to the
++        # data in the current batch. We select matching records from the
++        # reporting cache table based on
++        # (archive_id, distroseries_id, sourcepackagename_id).
++        existing_records = dict()
++        lpsprc = LatestPersonSourcePackageReleaseCache
++        rs = self.store.find(
++            lpsprc,
++            In(
++                Row(
++                    lpsprc.upload_archive_id,
++                    lpsprc.upload_distroseries_id,
++                    lpsprc.sourcepackagename_id),
++                map(Row, cache_filter_data)))
++        for lpsprc_record in rs:
++            key = (
++                lpsprc_record.maintainer_id,
++                lpsprc_record.creator_id,
++                lpsprc_record.upload_archive_id,
++                lpsprc_record.upload_distroseries_id,
++                lpsprc_record.sourcepackagename_id)
++            existing_records[key] = pytz.UTC.localize(
++                lpsprc_record.dateuploaded)
++
++        # Figure out what records from the new published spr data need to be
++        # inserted and updated into the cache table.
++        inserts = dict()
++        updates = dict()
++        for key, new_published_spr_data in new_records.items():
++            existing_dateuploaded = existing_records.get(key, None)
++            new_dateuploaded = new_published_spr_data[7]
++            if existing_dateuploaded is None:
++                target = inserts
++            else:
++                target = updates
++
++            existing_action = target.get(key, None)
++            if (existing_action is None
++                or existing_action[7] < new_dateuploaded):
++                target[key] = new_published_spr_data
++
          if inserts:
--            self.store.execute(Insert(columns, values=inserts))
--
--    def __call__(self, chunk_size):
--        max_id = None
--        updates = []
--        for update in (self.getPendingUpdates()[:chunk_size]):
--            updates.append(update)
--            max_id = update[0]
--        self.update_cache(updates)
--
--        if max_id:
--            if self.current_person_filter_type == 'creator':
--                self.next_id_for_creator = max_id
--            else:
--                self.next_id_for_maintainer = max_id
++            # Do a bulk insert.
++            create(self.cache_columns, inserts.values())
++        if updates:
++            # Do a bulk update.
++            cols = [
++                ("maintainer", "integer"),
++                ("creator", "integer"),
++                ("upload_archive", "integer"),
++                ("upload_distroseries", "integer"),
++                ("sourcepackagename", "integer"),
++                ("archive_purpose", "integer"),
++                ("publication", "integer"),
++                ("date_uploaded", "timestamp without time zone"),
++                ("sourcepackagerelease", "integer"),
++                ]
++            values = [
++                [dbify_value(col, val)[0]
++                 for (col, val) in zip(self.cache_columns, data)]
++                for data in updates.values()]
++
++            cache_data_expr = Values('cache_data', cols, values)
++            cache_data = ClassAlias(lpsprc, "cache_data")
++
++            # The columns to be updated.
++            updated_columns = dict([
++                (lpsprc.dateuploaded, cache_data.dateuploaded),
++                (lpsprc.sourcepackagerelease_id,
++                 cache_data.sourcepackagerelease_id),
++                (lpsprc.publication_id, cache_data.publication_id)])
++            # The update filter.
++            filter = And(
++                Or(
++                    cache_data.creator_id == None,
++                    lpsprc.creator_id == cache_data.creator_id),
++                Or(
++                    cache_data.maintainer_id == None,
++                    lpsprc.maintainer_id == cache_data.maintainer_id),
++                lpsprc.upload_archive_id == cache_data.upload_archive_id,
++                lpsprc.upload_distroseries_id ==
++                    cache_data.upload_distroseries_id,
++                lpsprc.sourcepackagename_id == cache_data.sourcepackagename_id)
++
++            self.store.execute(
++                BulkUpdate(
++                    updated_columns,
++                    table=LatestPersonSourcePackageReleaseCache,
++                    values=cache_data_expr, where=filter))
          self.store.flush()
          save_garbo_job_state(self.job_name, {
--            'next_id_for_creator': self.next_id_for_creator,
--            'next_id_for_maintainer': self.next_id_for_maintainer,
--            'person_filter_type': self.current_person_filter_type})
++            'last_spph_id': self.last_spph_id})
          transaction.commit()
@@ -1560,7 +1578,7 @@
          OpenIDConsumerAssociationPruner,
          AntiqueSessionPruner,
          VoucherRedeemer,
--        PopulateLatestPersonSourcepackageReleaseCache,
++        PopulateLatestPersonSourcePackageReleaseCache,
+         ]
      experimental_tunable_loops = []
 === modified file 'lib/lp/scripts/tests/test_garbo.py'
 --- lib/lp/scripts/tests/test_garbo.py	2012-11-09 14:18:45 +0000
 +++ lib/lp/scripts/tests/test_garbo.py	2012-11-14 07:45:27 +0000
@@ -114,7 +114,7 @@
  from lp.services.verification.model.logintoken import LoginToken
  from lp.services.worlddata.interfaces.language import ILanguageSet
  from lp.soyuz.enums import PackagePublishingStatus
--from lp.soyuz.model.reporting import LatestPersonSourcepackageReleaseCache
++from lp.soyuz.model.reporting import LatestPersonSourcePackageReleaseCache
  from lp.testing import (
      FakeAdapterMixin,
      person_logged_in,
@@ -1166,7 +1166,7 @@
          self.assertEqual(0, store.find(Product,
              Product._information_type == None).count())
--    def test_PopulateLatestPersonSourcepackageReleaseCache(self):
++    def test_PopulateLatestPersonSourcePackageReleaseCache(self):
          switch_dbuser('testadmin')
          # Make some same test data - we create published source package
          # releases for 2 different creators and maintainers.
@@ -1204,25 +1204,25 @@
              creator=creators[1], maintainer=maintainers[1],
              distroseries=distroseries, sourcepackagename=spn,
              date_uploaded=datetime(2010, 12, 4, tzinfo=pytz.UTC))
--        self.factory.makeSourcePackagePublishingHistory(
++        spph_1 = self.factory.makeSourcePackagePublishingHistory(
              status=PackagePublishingStatus.PUBLISHED,
              sourcepackagerelease=spr4)
          transaction.commit()
          self.runFrequently()
--        store = IMasterStore(LatestPersonSourcepackageReleaseCache)
++        store = IMasterStore(LatestPersonSourcePackageReleaseCache)
          # Check that the garbo state table has data.
          self.assertIsNotNone(
              store.execute(
                  'SELECT * FROM GarboJobState WHERE name=?',
--                params=[u'PopulateLatestPersonSourcepackageReleaseCache']
++                params=[u'PopulateLatestPersonSourcePackageReleaseCache']
              ).get_one())
          def _assert_release_by_creator(creator, spr):
              release_records = store.find(
--                LatestPersonSourcepackageReleaseCache,
--                LatestPersonSourcepackageReleaseCache.creator_id == creator.id)
++                LatestPersonSourcePackageReleaseCache,
++                LatestPersonSourcePackageReleaseCache.creator_id == creator.id)
              [record] = list(release_records)
              self.assertEqual(spr.creator, record.creator)
              self.assertIsNone(record.maintainer_id)
@@ -1231,8 +1231,8 @@
          def _assert_release_by_maintainer(maintainer, spr):
              release_records = store.find(
--                LatestPersonSourcepackageReleaseCache,
--                LatestPersonSourcepackageReleaseCache.maintainer_id ==
++                LatestPersonSourcePackageReleaseCache,
++                LatestPersonSourcePackageReleaseCache.maintainer_id ==
                  maintainer.id)
              [record] = list(release_records)
              self.assertEqual(spr.maintainer, record.maintainer)
@@ -1246,9 +1246,8 @@
          _assert_release_by_maintainer(maintainers[1], spr4)
          job_data = load_garbo_job_state(
--            'PopulateLatestPersonSourcepackageReleaseCache')
--        self.assertEqual(spr4.id, job_data['next_id_for_creator'])
--        self.assertEqual(spr4.id, job_data['next_id_for_maintainer'])
++            'PopulateLatestPersonSourcePackageReleaseCache')
++        self.assertEqual(spph_1.id, job_data['last_spph_id'])
          # Create a newer published source package release and ensure the
          # release cache table is correctly updated.
@@ -1257,7 +1256,7 @@
              creator=creators[1], maintainer=maintainers[1],
              distroseries=distroseries, sourcepackagename=spn,
              date_uploaded=datetime(2010, 12, 5, tzinfo=pytz.UTC))
--        self.factory.makeSourcePackagePublishingHistory(
++        spph_2 = self.factory.makeSourcePackagePublishingHistory(
              status=PackagePublishingStatus.PUBLISHED,
              sourcepackagerelease=spr5)
@@ -1270,9 +1269,8 @@
          _assert_release_by_maintainer(maintainers[1], spr5)
          job_data = load_garbo_job_state(
--            'PopulateLatestPersonSourcepackageReleaseCache')
--        self.assertEqual(spr5.id, job_data['next_id_for_creator'])
--        self.assertEqual(spr5.id, job_data['next_id_for_maintainer'])
++            'PopulateLatestPersonSourcePackageReleaseCache')
++        self.assertEqual(spph_2.id, job_data['last_spph_id'])
  class TestGarboTasks(TestCaseWithFactory):
 === modified file 'lib/lp/services/database/bulk.py'
 --- lib/lp/services/database/bulk.py	2012-10-17 00:22:31 +0000
 +++ lib/lp/services/database/bulk.py	2012-11-14 07:45:27 +0000
@@ -6,6 +6,7 @@
  __metaclass__ = type
  __all__ = [
      'create',
++    'dbify_value',
      'load',
      'load_referencing',
      'load_related',
@@ -167,7 +168,7 @@
      return load(object_type, keys)
--def _dbify_value(col, val):
++def dbify_value(col, val):
      """Convert a value into a form that Storm can compile directly."""
      if isinstance(val, SQL):
          return (val,)
@@ -184,7 +185,7 @@
          return (col.variable_factory(value=val),)
--def _dbify_column(col):
++def dbify_column(col):
      """Convert a column into a form that Storm can compile directly."""
      if isinstance(col, Reference):
          # References are mainly meant to be used as descriptors, so we
@@ -207,7 +208,7 @@
      :return: A list of the created objects if get_created, otherwise None.
      """
      # Flatten Reference faux-columns into their primary keys.
--    db_cols = list(chain.from_iterable(map(_dbify_column, columns)))
++    db_cols = list(chain.from_iterable(map(dbify_column, columns)))
      clses = set(col.cls for col in db_cols)
      if len(clses) != 1:
          raise ValueError(
@@ -228,7 +229,7 @@
      # squashed into primary key variables.
      db_values = [
          list(chain.from_iterable(
--            _dbify_value(col, val) for col, val in zip(columns, value)))
++            dbify_value(col, val) for col, val in zip(columns, value)))
          for value in values]
      if get_objects or get_primary_keys:
 === modified file 'lib/lp/services/database/stormexpr.py'
 --- lib/lp/services/database/stormexpr.py	2012-09-24 16:03:53 +0000
 +++ lib/lp/services/database/stormexpr.py	2012-11-14 07:45:27 +0000
@@ -8,6 +8,7 @@
      'ArrayAgg',
      'ArrayContains',
      'ArrayIntersects',
++    'BulkUpdate',
      'ColumnSelect',
      'Concatenate',
      'CountDistinct',
@@ -17,11 +18,14 @@
      'NullCount',
      'TryAdvisoryLock',
      'Unnest',
++    'Values',
+     ]
++from storm import Undef
  from storm.exceptions import ClassInfoError
  from storm.expr import (
      BinaryOper,
++    COLUMN_NAME,
      ComparableExpr,
      compile,
      CompoundOper,
@@ -31,6 +35,7 @@
      NamedFunc,
      Or,
      SQL,
++    TABLE,
+     )
  from storm.info import (
      get_cls_info,
@@ -38,6 +43,59 @@
+     )
++class BulkUpdate(Expr):
++    # Perform a bulk table update using literal values.
++    __slots__ = ("map", "where", "table", "values")
++
++    def __init__(self, map, table, values, where=Undef):
++        self.map = map
++        self.where = where
++        self.table = table
++        self.values = values
++
++
++@compile.when(BulkUpdate)
++def compile_bulkupdate(compile, update, state):
++    pairs = update.map.items()
++    state.push("context", COLUMN_NAME)
++    col_names = [compile(col, state, token=True) for col, val in pairs]
++    state.context = EXPR
++    col_values = [compile(val, state) for col, val in pairs]
++    sets = ["%s=%s" % (col, val) for col, val in zip(col_names, col_values)]
++    state.context = TABLE
++    tokens = ["UPDATE ", compile(update.table, state, token=True), " SET ",
++              ", ".join(sets), " FROM "]
++    state.context = EXPR
++    # We don't want the values expression wrapped in parenthesis.
++    state.precedence = 0
++    tokens.append(compile(update.values, state, raw=True))
++    if update.where is not Undef:
++        tokens.append(" WHERE ")
++        tokens.append(compile(update.where, state, raw=True))
++    state.pop()
++    return "".join(tokens)
++
++
++class Values(Expr):
++    __slots__ = ("name", "cols", "values")
++
++    def __init__(self, name, cols, values):
++        self.name = name
++        self.cols = cols
++        self.values = values
++
++
++@compile.when(Values)
++def compile_values(compile, expr, state):
++    col_names, col_types = zip(*expr.cols)
++    first_row = ", ".join(
++        "%s::%s" % (compile(value, state), type)
++        for value, type in zip(expr.values[0], col_types))
++    rows = [first_row] + [compile(value, state) for value in expr.values[1:]]
++    return "(VALUES (%s)) AS %s(%s)" % (
++        "), (".join(rows), expr.name, ', '.join(col_names))
++
++
  class ColumnSelect(Expr):
      # Wrap a select statement in braces so that it can be used as a column
      # expression in another query.
 === modified file 'lib/lp/soyuz/configure.zcml'
 --- lib/lp/soyuz/configure.zcml	2012-11-02 04:29:26 +0000
 +++ lib/lp/soyuz/configure.zcml	2012-11-14 07:45:27 +0000
@@ -1003,9 +1003,9 @@
      </class>
      <class
--        class="lp.soyuz.model.reporting.LatestPersonSourcepackageReleaseCache">
++        class="lp.soyuz.model.reporting.LatestPersonSourcePackageReleaseCache">
          <allow
--            interface="lp.soyuz.interfaces.reporting.ILatestPersonSourcepackageReleaseCache"/>
++            interface="lp.soyuz.interfaces.reporting.ILatestPersonSourcePackageReleaseCache"/>
      </class>
      <!-- ProcessAcceptedBugsJobSource -->
 === modified file 'lib/lp/soyuz/interfaces/reporting.py'
 --- lib/lp/soyuz/interfaces/reporting.py	2012-11-07 12:38:19 +0000
 +++ lib/lp/soyuz/interfaces/reporting.py	2012-11-14 07:45:27 +0000
@@ -3,7 +3,7 @@
  __metaclass__ = type
  __all__ = [
--    'ILatestPersonSourcepackageReleaseCache',
++    'ILatestPersonSourcePackageReleaseCache',
+     ]
@@ -11,7 +11,7 @@
  from lp.soyuz.interfaces.sourcepackagerelease import ISourcePackageRelease
--class ILatestPersonSourcepackageReleaseCache(ISourcePackageRelease):
++class ILatestPersonSourcePackageReleaseCache(ISourcePackageRelease):
      """Published source package release information for a person.
      The records represented by this object are the latest published source
@@ -22,7 +22,7 @@
      """
      cache_id = Attribute(
--        "The id of the associated LatestPersonSourcepackageReleaseCache"
++        "The id of the associated LatestPersonSourcePackageReleaseCache"
          "record.")
      sourcepackagerelease = Attribute(
          "The SourcePackageRelease which this object represents.")
 === modified file 'lib/lp/soyuz/model/reporting.py'
 --- lib/lp/soyuz/model/reporting.py	2012-11-07 12:38:19 +0000
 +++ lib/lp/soyuz/model/reporting.py	2012-11-14 07:45:27 +0000
@@ -3,7 +3,7 @@
  __metaclass__ = type
  __all__ = [
--    'LatestPersonSourcepackageReleaseCache',
++    'LatestPersonSourcePackageReleaseCache',
+     ]
  from lazr.delegates import delegates
@@ -18,17 +18,17 @@
  from lp.services.database.enumcol import EnumCol
  from lp.soyuz.enums import ArchivePurpose
  from lp.soyuz.interfaces.reporting import (
--    ILatestPersonSourcepackageReleaseCache,
++    ILatestPersonSourcePackageReleaseCache,
+     )
  from lp.soyuz.interfaces.sourcepackagerelease import ISourcePackageRelease
--class LatestPersonSourcepackageReleaseCache(Storm):
--    """See `LatestPersonSourcepackageReleaseCache`."""
--    implements(ILatestPersonSourcepackageReleaseCache)
++class LatestPersonSourcePackageReleaseCache(Storm):
++    """See `LatestPersonSourcePackageReleaseCache`."""
++    implements(ILatestPersonSourcePackageReleaseCache)
      delegates(ISourcePackageRelease, context='sourcepackagerelease')
--    __storm_table__ = 'LatestPersonSourcepackageReleaseCache'
++    __storm_table__ = 'LatestPersonSourcePackageReleaseCache'
      cache_id = Int(name='id', primary=True)
      publication_id = Int(name='publication')
 === modified file 'lib/lp/soyuz/stories/soyuz/xx-person-packages.txt'
 --- lib/lp/soyuz/stories/soyuz/xx-person-packages.txt	2012-11-07 12:38:19 +0000
 +++ lib/lp/soyuz/stories/soyuz/xx-person-packages.txt	2012-11-14 07:45:27 +0000
@@ -163,7 +163,7 @@
  Make a function to update the cached latest person source package release
  records.
--    >>> from lp.scripts.garbo import PopulateLatestPersonSourcepackageReleaseCache
++    >>> from lp.scripts.garbo import PopulateLatestPersonSourcePackageReleaseCache
      >>> from lp.services.database.sqlbase import flush_database_updates
      >>> from lp.services.log.logger import FakeLogger
      >>> from lp.testing.dbuser import switch_dbuser
@@ -180,7 +180,7 @@
      ...             "delete from latestpersonsourcepackagereleasecache")
      ...     flush_database_updates()
      ...     switch_dbuser('garbo_frequently')
--    ...     job = PopulateLatestPersonSourcepackageReleaseCache(FakeLogger())
++    ...     job = PopulateLatestPersonSourcePackageReleaseCache(FakeLogger())
      ...     while not job.isDone():
      ...         job(chunk_size=100)
      ...     switch_dbuser('launchpad')