cron.germinate is very slow

Bug #899972 reported by Colin Watson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Colin Watson

Bug Description

cron.germinate currently takes on the order of ten minutes, and really doesn't have a particularly good excuse for taking so long. Fixing this would probably be enough to let us move back to 30-minute publisher cycles for Ubuntu (as we used to have with dak, back in the dawn of time), which would increase our velocity and make some of us very happy.

The main reason it's so slow is that it runs germinate as a separate process once for each of eight flavours (Ubuntu, Kubuntu, etc., each with its own seed collection) on each of five architectures. There are a few inefficiencies inherent in this approach, but the most significant is that it has to expand dependencies and build-dependencies of the seeds that are common to all the flavours eight times as often as necessary. Since the build-dependency chain in particular of the base system winds its way through a good fraction of main, this winds up being rather a lot of duplicated work.

I've been working on this problem for some time now, and have just released germinate 2.0 to support solving it properly. The most important change here is that germinate can now process multiple seed collections for a single architecture in a single instance of the core Germinator class, which allows reusing the expansions of common seeds. While I could technically have extended the command-line interface further for this, I felt that the command-line interface was already far too complicated, and decided instead to export a public, documented, and stable Python interface which can be used for this purpose.

I have been writing a Python replacement for most of cron.germinate in parallel with this to ensure that the interface is sufficient to meet Launchpad's needs, and this runs in about three minutes on my laptop and under five minutes on mawson (I haven't yet timed it on ftpmaster).

The deployment steps should be as follows:

 * Get germinate 2.1 into the Launchpad PPA. I'm preparing backports that should be usable for this.

 * Get germinate 2.1 deployed on relevant datacentre machines. This does not require any Launchpad changes, and so should be done early; also, there was a bug fix in 2.0 that changes the output in a few cases (I've manually verified that the changes are correct), and so it will be simpler to deploy this first so that we can more easily check that changes in the output due to the Python rewrite are harmless.

 * Change launchpad-soyuz-dependencies to depend on python-germinate. Get this deployed everywhere relevant.

 * Land my branch.

Tags: qa-ok

Related branches

Revision history for this message
Colin Watson (cjwatson) wrote :

As a matter of interest (although it would be a separate bug, and is lower priority), a good item of future work would be to move the process of generating extra overrides to before apt-ftparchive runs, and have it read the state of the archive from the Launchpad database rather than from the published archive on disk. This would fix the extremely long-standing problem that germinate output is itself an input to the archive state, and so there are some changes that require multiple publisher runs to publish completely.

I've tried to design python-germinate's archive interface so that this should be possible, although I haven't yet actually used it this way in practice.

Changed in launchpad:
assignee: nobody → Colin Watson (cjwatson)
Revision history for this message
Colin Watson (cjwatson) wrote :

I've filed https://rt.admin.canonical.com/Ticket/Display.html?id=49745 (internal only) for the sysadmin work involved here.

Changed in launchpad:
status: New → Triaged
importance: Undecided → High
Colin Watson (cjwatson)
description: updated
Colin Watson (cjwatson)
Changed in launchpad:
status: Triaged → In Progress
Colin Watson (cjwatson)
description: updated
Revision history for this message
Colin Watson (cjwatson) wrote :

I'm waiting for qastaging to update before I can actually tag this bug, but I've QAed it on dogfood and everything seems fine.

cron.germinate runtime before (r14515):

  real 7m39.023s
  user 2m29.057s
  sys 0m40.159s

cron.germinate runtime after (r14517):

  real 3m13.435s
  user 1m24.801s
  sys 0m9.617s

(A good amount of the remaining time is maintenance-check.py, which I have plans to rewrite in the future but which doesn't need to be part of this bug.)

I had some initial problems because dogfood's database has an ambiguity about what the Ubuntu development series might be: all of rusty, precise, and oneiric are in status FROZEN. This caused generate-extra-overrides.py to get confused as apparently its find_operable_series produces different results from 'lp-query-distro.py development'. I changed the status of rusty and precise to FUTURE temporarily in order to do this test, and then everything went through cleanly. This will not be a problem on production because there will only ever be one Ubuntu series there in either DEVELOPMENT or FROZEN status. I've filed bug 904538 for this, though, as we should clean it up at some point.

I compared the old and new ubuntu-misc and ubuntu-germinate trees with these commands:

  diff -u <(<ubuntu-misc.r14515/more-extra.override.oneiric.main sed 's/Task */Task /' | sort) <(<ubuntu-misc/more-extra.override.oneiric.main sort)
  # ... no output
  diff -bru ubuntu-germinate.r14515 ubuntu-germinate
  # ... considerable output, but it's all either in _maintenance-check.*.stderr (don't care) or expected changes to the germinate.output log file due to the new and more efficient ordering of operations

Thus, all is well. qa-ok.

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Steve Kowalik (stevenk)
tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Colin Watson (cjwatson) wrote :

Rolled out this morning. The last five publisher runs have completed at :30 :31 :37 :26 :25 (starting at :03), and the problem in the :37 run is being addressed in bug 905333, so I think we're going to try switching to 30-minute runs now.

Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.