Merge lp:~gary/launchpad/bug-687951 into lp:launchpad

Proposed by Gary Poster
Status: Rejected
Rejected by: Robert Collins
Proposed branch: lp:~gary/launchpad/bug-687951
Merge into: lp:launchpad
Diff against target: 52 lines (+28/-0)
1 file modified
test_on_merge.py (+28/-0)
To merge this branch: bzr merge lp:~gary/launchpad/bug-687951
Reviewer Review Type Date Requested Status
Robert Collins (community) Disapprove
Review via email: mp+44300@code.launchpad.net

Description of the change

This is an expedient solution to the problem described in bug 687951--or put another way, it is a good solution to the buildbot and operational problem, while not addressing the underlying Launchpad problem. I will create another bug for the Launchpad problem, which Robert addressed in the IRC conversation I quoted in that bug's comments.

The test_on_merge.py is operational code for which we do not have tests. To demonstrate the way this patch works, do the following.

1) See if your machine already has some of the problematic old databases. Run psql -l and you may see database names of the form launchpad_ftest_NUMBER, like lines 10 through 43 of https://pastebin.canonical.com/40714/ . If you don't have any, you can create some for the test using psql. "create database launchpad_ftest_30322;" will give you one to work with.

2) run test_on_merge.py, with some argument to limit the tests run. For instance, "./test_on_merge.py -t foobar" will not run any tests. There's probably a nicer way to do this, but I'm not worrying about it right now. When you do this, the first one or more lines will look like this: "Dropped old ftest database launchpad_ftest_30322. This should not happen. See bug 687951." After test_on_merge.py runs, psql -l will show you that the databases have been dropped.

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (3.6 KiB)

As this stands it will break parallel testing via testr :(

On 21/12/2010 12:18 PM, "Gary Poster" <email address hidden> wrote:

Gary Poster has proposed merging lp:~gary/launchpad/bug-687951 into
lp:launchpad.

Requested reviews:
 Launchpad code reviewers (launchpad-reviewers)
Related bugs:
 #687951 Large number of DBs left around after test suite runs in buildbot
 https://bugs.launchpad.net/bugs/687951

This is an expedient solution to the problem described in bug 687951--or put
another way, it is a good solution to the buildbot and operational problem,
while not addressing the underlying Launchpad problem. I will create
another bug for the Launchpad problem, which Robert addressed in the IRC
conversation I quoted in that bug's comments.

The test_on_merge.py is operational code for which we do not have tests. To
demonstrate the way this patch works, do the following.

1) See if your machine already has some of the problematic old databases.
 Run psql -l and you may see database names of the form
launchpad_ftest_NUMBER, like lines 10 through 43 of
https://pastebin.canonical.com/40714/ . If you don't have any, you can
create some for the test using psql. "create database
launchpad_ftest_30322;" will give you one to work with.

2) run test_on_merge.py, with some argument to limit the tests run. For
instance, "./test_on_merge.py -t foobar" will not run any tests. There's
probably a nicer way to do this, but I'm not worrying about it right now.
 When you do this, the first one or more lines will look like this: "Dropped
old ftest database launchpad_ftest_30322. This should not happen. See bug
687951
." After test_on_merge.py runs, psql -l will show you that the
databases have been dropped.
--
https://code.launchpad.net/~gary/launchpad/bug-687951/+merge/44300
Your team Launchpad code reviewers from Canonical is subscribed to branch
lp:launchpad.

=== modified file 'test_on_merge.py'
--- test_on_merge.py 2010-09-28 12:37:55 +0000
+++ test_on_merge.py 2010-12-20 23:18:07 +0000
@@ -8,6 +8,7 @@

 import sys, time
 import os, errno
+from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
 import tabnanny
 from StringIO import StringIO
 import psycopg2
@@ -34,6 +35,7 @@

    Returns 1 on error, otherwise it returns the testrunner's exit code.
    """
+ cleanup_old_ftest_databases() # See bug 687951.
    if setup_test_database() != 0:
        return 1

@@ -224,6 +226,32 @@
    return rv

+def cleanup_old_ftest_databases():
+ """Kill old ftest databases."""
+ conn = psycopg2.connect('dbname=template1')
+ conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
+ cur = conn.cursor()
+ try:
+ cur.execute(r"""
+ SELECT d.datname
+ FROM pg_catalog.pg_database d
+ WHERE d.datname ~ '^launchpad_ftest_[0-9]+$';""")
+ old_databases = cur.fetchall()
+ for (database_name,) in old_databases:
+ print ("Dropped old ftest database %s. "
+ "This should not happen. See bug 687951.") % (
+ database_name,)
+ # We are not doing db-level parameterization because it does
not work
+ # (is not designed for) schema e...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

Wow, android handle mail reply is terrible. Sorry about that.

Anyhow, while expedient, this will regress efforts to make the test environment sane.

buildbot is fixable when the alerts flash, and we /need/ to track down and fix the causes of these leaks otherwise more serious parallel test runs will be very hard to execute.

This stopgap will actively prevent us fixing this: the cure would be worse than the disease.

review: Disapprove
Revision history for this message
Gary Poster (gary) wrote :

Unfortunately, I suspect that I don't have time or resources to address the root cause between now and the end of the year. I will check with mthaddon as to whether this is an operational issue that needs a stopgap for the rest of 2010.

Revision history for this message
Gary Poster (gary) wrote :

mthaddon reports that addressing this problem can be postponed to 2011 as long as it has a high priority then. Therefore, I'm moving on.

Unmerged revisions

12120. By Gary Poster

This is an expedient solution to the immediate problem of buildbot falling over.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'test_on_merge.py'
--- test_on_merge.py 2010-09-28 12:37:55 +0000
+++ test_on_merge.py 2010-12-20 23:18:07 +0000
@@ -8,6 +8,7 @@
88
9import sys, time9import sys, time
10import os, errno10import os, errno
11from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
11import tabnanny12import tabnanny
12from StringIO import StringIO13from StringIO import StringIO
13import psycopg214import psycopg2
@@ -34,6 +35,7 @@
3435
35 Returns 1 on error, otherwise it returns the testrunner's exit code.36 Returns 1 on error, otherwise it returns the testrunner's exit code.
36 """37 """
38 cleanup_old_ftest_databases() # See bug 687951.
37 if setup_test_database() != 0:39 if setup_test_database() != 0:
38 return 140 return 1
3941
@@ -224,6 +226,32 @@
224 return rv226 return rv
225227
226228
229def cleanup_old_ftest_databases():
230 """Kill old ftest databases."""
231 conn = psycopg2.connect('dbname=template1')
232 conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
233 cur = conn.cursor()
234 try:
235 cur.execute(r"""
236 SELECT d.datname
237 FROM pg_catalog.pg_database d
238 WHERE d.datname ~ '^launchpad_ftest_[0-9]+$';""")
239 old_databases = cur.fetchall()
240 for (database_name,) in old_databases:
241 print ("Dropped old ftest database %s. "
242 "This should not happen. See bug 687951.") % (
243 database_name,)
244 # We are not doing db-level parameterization because it does not work
245 # (is not designed for) schema entities.
246 cur.execute(r"""
247 DROP DATABASE %s;""" % (database_name,))
248 finally:
249 cur.close()
250 del cur
251 conn.close()
252 del conn
253
254
227def cleanup_hung_testrunner(process):255def cleanup_hung_testrunner(process):
228 """Kill and clean up the testrunner process and its children."""256 """Kill and clean up the testrunner process and its children."""
229 print257 print