Merge lp:~jtv/launchpad/bug-526462 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: not available
Merged at revision: not available
Proposed branch: lp:~jtv/launchpad/bug-526462
Merge into: lp:launchpad
Diff against target: 347 lines (+338/-0)
2 files modified
lib/lp/soyuz/doc/sampledata-cleanup.txt (+22/-0)
utilities/soyuz-sampledata-cleanup.py (+316/-0)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-526462
Reviewer Review Type Date Requested Status
Michael Nelson (community) code Approve
Review via email: mp+20052@code.launchpad.net

Commit message

Integrate make-ubuntu-sane.py sample-data cleanup script into Launchpad.

To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

= Bug 526462 =

To run Soyuz locally on a development machine, one needs to run a few scripts that are not being maintained, and are kept outside of the Launchpad source tree. See https://dev.launchpad.net/Soyuz/HowToUseSoyuzLocally

This branch takes the make-ubuntu-sane.py script from that page and integrates it into the Launchpad source tree as utilities/soyuz-sampledata-cleanup.py. The original was written by William Grant, but I discussed with him on IRC today and he has no objection to this being included under Canonical's copyright with a "based on code by wgrant" notice.

As you'll see, the script replaces the existing Ubuntu test series in the playground sample data with a series mirroring real-world Ubuntu releases. This is meant for the playground sample data that you would run on your local machine while testing manually. We discussed updating the sample data as included in the source code, but that would involve uploading tarballs to the local librarian which would have complicated the job and bloated the branch.

Since the script is not meant to run on production databases, it attempts to check for that condition. Our procedures should prevent this, however, so it's a pair of suspenders in addition to the belt we already have. There is no guarantee that the check will prevent any attempt at disastrously wrong use, and I don't think it's worth the hassle of automated testing. If deviating from the "happy path" in this way should break the script, then the script is still dealing with the situation properly: die before it deletes any Ubuntu data.

A doctest now verifies that the script will execute properly. The test would have to force a dirty database in order to avoid test isolation problems—if it weren't for the fact that it uses the --dry-run option (not present in the original script) avoids making any changes to the database.

No lint. To Q/A, just use it on your local dev system. To test:
{{{
./bin/test -vv -t sampledata-cleanup
}}}

Jeroen

Revision history for this message
Michael Nelson (michael.nelson) wrote :

Hi Jeroen,

This is great! Thanks for getting this in with a test to ensure it'll stay in sync with the code-base. I think it would also be worthwhile to add a test to ensure it won't run as expected in other envs (you never know who will come along and modify your code with unintended effects).

As mentioned on IRC, I realise this is the suspenders and not the belt, but I still think we should *only* allow this to run when LPCONFIG == 'development'. If someone wants to run it in another env. (or for some reason, hasn't set LPCONFIG), they'll know enough to modify the script for their needs. I mean, looking at lp-production-configs, there are *loads*, and so I'm not sure that blacklisting a few is that useful. This would also mean you could get rid of the 'guess' based on the ids?

Thoughts?

review: Needs Fixing (code)
Revision history for this message
Michael Nelson (michael.nelson) wrote :

As discussed, we have other damaging scripts intended for local development that do not even do any checks. I do think it would be a simpler (and sufficient) implementation to just check LPCONFIG == 'development', but I'll leave that up to you.

16:42 < jtv> noodles775: thanks for the review!
16:43 < jtv> noodles775: I'm a bit worried though that being this strict would stop people from playing with customized configs.
16:43 < jtv> I realize it's always a tradeoff, but there's also the "do we have a really large production-like table" test.
16:44 < noodles775> jtv: well, if they're playing with customized configs, they're not going to have a problem spending 10 seconds to adjust the script if
                    they want to.
16:45 < jtv> noodles775: true, but this script already has another check to ensure it's not running on production; how many scripts do we have (without ever
             any trouble) that are just as dangerous without anyone ever adding checks like this?
16:45 < jtv> I mean, if I hadn't added the check, would the average reviewer have thought of adding it?
16:45 < noodles775> jtv: which scripts? make schema?
16:46 < jtv> noodles775: I guess, though I've no idea whether that has any checks.
16:46 < noodles775> jtv: yeah, I'm not sure. All the other scripts that I can think of *add* info, not delete.
16:46 < jtv> launchpad-database-setup
16:46 * noodles775 looks
16:47 < jtv> mock-code-import ("warning! run make schema first!")
16:48 < jtv> We also have a script now, apparently, to remove private data. There may be more.
16:49 < noodles775> Yep, you're right.
16:50 < jtv> So I don't want to spend my time guarding against the admin who accidentally goes through the rigmarole for running scripts against a
             production db, with the --force option added.
16:51 < noodles775> yeah, I agree. I was more worried about the situation where a person runs it against a smaller DB with a different config.
16:53 < jtv> noodles775: there's a good chance that the script might fail, and I'd estimate the risk of them running "make schema" by accident considerably
             larger.
16:53 < jtv> (from shell history, f'rinstance)
16:54 < noodles775> jtv: I just thought it would have been a simpler implementation, not more difficult (ie. LPCONFIG == 'development'), but yep, I'm
                    updating the MP now.

review: Approve (code)
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Thanks! I did start out thinking that one check should be enough. I ended up with one that might produce false positives, plus one that might produce false negatives. For that situation, a --force against false positives plus a final just-in-case safeguard seemed the right approach. In any case the check is easy; maybe someday we'll build reusable helpers for this.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'lib/lp/soyuz/doc/sampledata-cleanup.txt'
2--- lib/lp/soyuz/doc/sampledata-cleanup.txt 1970-01-01 00:00:00 +0000
3+++ lib/lp/soyuz/doc/sampledata-cleanup.txt 2010-02-24 12:36:31 +0000
4@@ -0,0 +1,22 @@
5+= Sample data cleanup =
6+
7+In order to run Soyuz locally on a development system, the sample data
8+must be cleaned up and customized a bit. This is done by a the script
9+utilities/soyuz-sampledata-cleanup.py.
10+
11+We only need this script for the playground sample data, so there's
12+little point in inspecting what it does to the test database in detail.
13+
14+ >>> from canonical.launchpad.ftests.script import run_script
15+
16+The --dry-run option makes the script roll back its changes.
17+
18+ >>> return_code, output, error = run_script(
19+ ... 'utilities/soyuz-sampledata-cleanup.py', args=['--dry-run'])
20+
21+ >>> print return_code
22+ 0
23+
24+ >>> print error
25+ INFO ...
26+ INFO Done.
27
28=== added file 'utilities/soyuz-sampledata-cleanup.py'
29--- utilities/soyuz-sampledata-cleanup.py 1970-01-01 00:00:00 +0000
30+++ utilities/soyuz-sampledata-cleanup.py 2010-02-24 12:36:31 +0000
31@@ -0,0 +1,316 @@
32+#!/usr/bin/python2.5
33+# pylint: disable-msg=W0403
34+
35+# Copyright 2010 Canonical Ltd. This software is licensed under the
36+# GNU Affero General Public License version 3 (see the file LICENSE).
37+#
38+# This code is based on William Grant's make-ubuntu-sane.py script, but
39+# reorganized to fit Launchpad coding guidelines, and extended. The
40+# code is included under Canonical copyright with his permission
41+# (2010-02-24).
42+
43+"""Clean up sample data so it will allow Soyuz to run locally.
44+
45+DO NOT RUN ON PRODUCTION SYSTEMS. This script deletes lots of
46+Ubuntu-related data.
47+"""
48+
49+__metaclass__ = type
50+
51+import _pythonpath
52+
53+from optparse import OptionParser
54+from os import getenv
55+import re
56+import sys
57+
58+from zope.component import getUtility
59+from zope.event import notify
60+from zope.lifecycleevent import ObjectCreatedEvent
61+from zope.security.proxy import removeSecurityProxy
62+
63+from storm.store import Store
64+
65+from canonical.database.sqlbase import sqlvalues
66+
67+from canonical.lp import initZopeless
68+
69+from canonical.launchpad.interfaces.launchpad import (
70+ ILaunchpadCelebrities)
71+from canonical.launchpad.scripts import execute_zcml_for_scripts
72+from canonical.launchpad.scripts.logger import logger, logger_options
73+from canonical.launchpad.webapp.interfaces import (
74+ IStoreSelector, MAIN_STORE, SLAVE_FLAVOR)
75+
76+from lp.registry.interfaces.series import SeriesStatus
77+from lp.soyuz.interfaces.component import IComponentSet
78+from lp.soyuz.interfaces.section import ISectionSet
79+from lp.soyuz.interfaces.sourcepackageformat import (
80+ ISourcePackageFormatSelectionSet, SourcePackageFormat)
81+from lp.soyuz.model.section import SectionSelection
82+from lp.soyuz.model.component import ComponentSelection
83+
84+
85+class DoNotRunOnProduction(Exception):
86+ """Error: do not run this script on production (-like) systems."""
87+
88+
89+def get_max_id(store, table_name):
90+ """Find highest assigned id in given table."""
91+ max_id = store.execute("SELECT max(id) FROM %s" % table_name).get_one()
92+ if max_id is None:
93+ return None
94+ else:
95+ return max_id[0]
96+
97+
98+def check_preconditions(options):
99+ """Try to ensure that it's safe to run.
100+
101+ This script must not run on a production server, or anything
102+ remotely like it.
103+ """
104+ store = getUtility(IStoreSelector).get(MAIN_STORE, SLAVE_FLAVOR)
105+
106+ # Just a guess, but dev systems aren't likely to have ids this high
107+ # in this table. Production data does.
108+ real_data = (get_max_id(store, "TranslationMessage") >= 1000000)
109+ if real_data and not options.force:
110+ raise DoNotRunOnProduction(
111+ "Refusing to delete Ubuntu data unless you --force me.")
112+
113+ # For some configs it's just absolutely clear this script shouldn't
114+ # run. Don't even accept --force there.
115+ forbidden_configs = re.compile('(edge|lpnet|production)')
116+ current_config = getenv('LPCONFIG', 'an unknown config')
117+ if forbidden_configs.match(current_config):
118+ raise DoNotRunOnProduction(
119+ "I won't delete Ubuntu data on %s and you can't --force me."
120+ % current_config)
121+
122+
123+def parse_args(arguments):
124+ """Parse command-line arguments.
125+
126+ :return: (options, args, logger)
127+ """
128+ parser = OptionParser(
129+ description="Delete existing Ubuntu releases and set up new ones.")
130+ parser.add_option('-f', '--force', action='store_true', dest='force',
131+ help="DANGEROUS: run even if the database looks production-like.")
132+ parser.add_option('-n', '--dry-run', action='store_true', dest='dry_run',
133+ help="Do not commit changes.")
134+ logger_options(parser)
135+
136+ options, args = parser.parse_args(arguments)
137+ return options, args, logger(options)
138+
139+
140+def get_person(name):
141+ """Return `IPersonSet` utility."""
142+ # Avoid circular import.
143+ from lp.registry.interfaces.person import IPersonSet
144+ return getUtility(IPersonSet).getByName(name)
145+
146+
147+def retire_series(distribution):
148+ """Mark all `DistroSeries` for `distribution` as obsolete."""
149+ for series in distribution.series:
150+ series.status = SeriesStatus.OBSOLETE
151+
152+
153+def retire_active_publishing_histories(histories, requester):
154+ """Retire all active publishing histories in the given collection."""
155+ # Avoid circular import.
156+ from lp.soyuz.interfaces.publishing import active_publishing_status
157+ for history in histories(status=active_publishing_status):
158+ history.requestDeletion(
159+ requester, "Cleaned up because of missing Librarian files.")
160+
161+
162+def retire_distro_archives(distribution, culprit):
163+ """Retire all items in `distribution`'s archives."""
164+ for archive in distribution.all_distro_archives:
165+ retire_active_publishing_histories(
166+ archive.getPublishedSources, culprit)
167+ retire_active_publishing_histories(
168+ archive.getAllPublishedBinaries, culprit)
169+
170+
171+def retire_ppas(distribution):
172+ """Disable all PPAs for `distribution`."""
173+ for ppa in distribution.getAllPPAs():
174+ removeSecurityProxy(ppa).publish = False
175+
176+
177+def set_lucille_config(distribution):
178+ """Set lucilleconfig on all series of `distribution`."""
179+ for series in distribution.series:
180+ removeSecurityProxy(series).lucilleconfig = '''[publishing]
181+components = main restricted universe multiverse'''
182+
183+
184+def create_sections(distroseries):
185+ """Set up some sections for `distroseries`."""
186+ section_names = (
187+ 'admin', 'cli-mono', 'comm', 'database', 'devel', 'debug', 'doc',
188+ 'editors', 'electronics', 'embedded', 'fonts', 'games', 'gnome',
189+ 'graphics', 'gnu-r', 'gnustep', 'hamradio', 'haskell', 'httpd',
190+ 'interpreters', 'java', 'kde', 'kernel', 'libs', 'libdevel', 'lisp',
191+ 'localization', 'mail', 'math', 'misc', 'net', 'news', 'ocaml',
192+ 'oldlibs', 'otherosfs', 'perl', 'php', 'python', 'ruby', 'science',
193+ 'shells', 'sound', 'tex', 'text', 'utils', 'vcs', 'video', 'web',
194+ 'x11', 'xfce', 'zope')
195+ store = Store.of(distroseries)
196+ for section_name in section_names:
197+ section = getUtility(ISectionSet).ensure(section_name)
198+ if section not in distroseries.sections:
199+ store.add(
200+ SectionSelection(distroseries=distroseries, section=section))
201+
202+
203+def create_components(distroseries, uploader):
204+ """Set up some components for `distroseries`."""
205+ component_names = ('main', 'restricted', 'universe', 'multiverse')
206+ store = Store.of(distroseries)
207+ main_archive = distroseries.distribution.main_archive
208+ for component_name in component_names:
209+ component = getUtility(IComponentSet).ensure(component_name)
210+ if component not in distroseries.components:
211+ store.add(
212+ ComponentSelection(
213+ distroseries=distroseries, component=component))
214+ main_archive.newComponentUploader(uploader, component)
215+ main_archive.newQueueAdmin(uploader, component)
216+
217+
218+def create_series(parent, full_name, version, status):
219+ """Set up a `DistroSeries`."""
220+ distribution = parent.distribution
221+ owner = parent.owner
222+ name = full_name.split()[0].lower()
223+ title = "The " + full_name
224+ displayname = full_name.split()[0]
225+ new_series = distribution.newSeries(name=name, title=title,
226+ displayname=displayname, summary='Ubuntu %s is good.' % version,
227+ description='%s is awesome.' % version, version=version,
228+ parent_series=parent, owner=owner)
229+ new_series.status = status
230+ notify(ObjectCreatedEvent(new_series))
231+
232+ # This bit copied from scripts/ftpmaster-tools/initialise-from-parent.py.
233+ assert new_series.architectures.count() == 0, (
234+ "Cannot copy distroarchseries from parent; this series already has "
235+ "distroarchseries.")
236+
237+ store = Store.of(parent)
238+ store.execute("""
239+ INSERT INTO DistroArchSeries
240+ (distroseries, processorfamily, architecturetag, owner, official)
241+ SELECT %s, processorfamily, architecturetag, %s, official
242+ FROM DistroArchSeries WHERE distroseries = %s
243+ """ % sqlvalues(new_series, owner, parent))
244+
245+ i386 = new_series.getDistroArchSeries('i386')
246+ i386.supports_virtualized = True
247+ new_series.nominatedarchindep = i386
248+
249+ new_series.initialiseFromParent()
250+ return new_series
251+
252+
253+def create_sample_series(original_series, log):
254+ """Set up sample `DistroSeries`.
255+
256+ :param original_series: The parent for the first new series to be
257+ created. The second new series will have the first as a parent,
258+ and so on.
259+ """
260+ series_descriptions = [
261+ ('Dapper Drake', SeriesStatus.SUPPORTED, '6.06'),
262+ ('Edgy Eft', SeriesStatus.OBSOLETE, '6.10'),
263+ ('Feisty Fawn', SeriesStatus.OBSOLETE, '7.04'),
264+ ('Gutsy Gibbon', SeriesStatus.OBSOLETE, '7.10'),
265+ ('Hardy Heron', SeriesStatus.SUPPORTED, '8.04'),
266+ ('Intrepid Ibex', SeriesStatus.SUPPORTED, '8.10'),
267+ ('Jaunty Jackalope', SeriesStatus.SUPPORTED, '9.04'),
268+ ('Karmic Koala', SeriesStatus.CURRENT, '9.10'),
269+ ('Lucid Lynx', SeriesStatus.DEVELOPMENT, '10.04'),
270+ ]
271+
272+ parent = original_series
273+ for full_name, status, version in series_descriptions:
274+ log.info('Creating %s...' % full_name)
275+ parent = create_series(parent, full_name, version, status)
276+
277+
278+def clean_up(distribution, log):
279+ # First we eliminate all active publishings in the Ubuntu main archives.
280+ # None of the librarian files exist, so it kills the publisher.
281+
282+ # Could use IPublishingSet.requestDeletion() on the published sources to
283+ # get rid of the binaries too, but I don't trust that there aren't
284+ # published binaries without corresponding sources.
285+
286+ log.info("Deleting all items in official archives...")
287+ retire_distro_archives(distribution, get_person('name16'))
288+
289+ # Disable publishing of all PPAs, as they probably have broken
290+ # publishings too.
291+ log.info("Disabling all PPAs...")
292+ retire_ppas(distribution)
293+
294+ retire_series(distribution)
295+
296+
297+def set_source_package_format(distroseries):
298+ """Register a series' source package format selection."""
299+ utility = getUtility(ISourcePackageFormatSelectionSet)
300+ format = SourcePackageFormat.FORMAT_1_0
301+ if utility.getBySeriesAndFormat(distroseries, format) is None:
302+ utility.add(distroseries, format)
303+
304+
305+def populate(distribution, parent_series_name, uploader_name, log):
306+ """Set up sample data on `distribution`."""
307+ parent_series = distribution.getSeries(parent_series_name)
308+
309+ # Set up lucilleconfig on all series. The sample data lacks this.
310+ log.info("Setting lucilleconfig...")
311+ set_lucille_config(distribution)
312+
313+ log.info("Configuring sections...")
314+ create_sections(parent_series)
315+
316+ log.info("Configuring components and permissions...")
317+ create_components(parent_series, get_person(uploader_name))
318+
319+ set_source_package_format(parent_series)
320+
321+ create_sample_series(parent_series, log)
322+
323+
324+def main(argv):
325+ options, args, log = parse_args(argv[1:])
326+
327+ execute_zcml_for_scripts()
328+ txn = initZopeless(dbuser='launchpad')
329+
330+ check_preconditions(options.force)
331+
332+ ubuntu = getUtility(ILaunchpadCelebrities).ubuntu
333+ clean_up(ubuntu, log)
334+
335+ # Use Hoary as the root, as Breezy and Grumpy are broken.
336+ populate(ubuntu, 'hoary', 'ubuntu-team', log)
337+
338+ if options.dry_run:
339+ txn.abort()
340+ else:
341+ txn.commit()
342+
343+ log.info("Done.")
344+
345+
346+if __name__ == "__main__":
347+ main(sys.argv)