Merge lp:~gary/launchpad/bug-687951 into lp:launchpad

Proposed by Gary Poster on 2010-12-20
Status: Rejected
Rejected by: Robert Collins on 2010-12-21
Proposed branch: lp:~gary/launchpad/bug-687951
Merge into: lp:launchpad
Diff against target: 52 lines (+28/-0)
1 file modified
test_on_merge.py (+28/-0)
To merge this branch: bzr merge lp:~gary/launchpad/bug-687951
Reviewer Review Type Date Requested Status
Robert Collins (community) 2010-12-20 Disapprove on 2010-12-21
Review via email: mp+44300@code.launchpad.net

Description of the change

This is an expedient solution to the problem described in bug 687951--or put another way, it is a good solution to the buildbot and operational problem, while not addressing the underlying Launchpad problem. I will create another bug for the Launchpad problem, which Robert addressed in the IRC conversation I quoted in that bug's comments.

The test_on_merge.py is operational code for which we do not have tests. To demonstrate the way this patch works, do the following.

1) See if your machine already has some of the problematic old databases. Run psql -l and you may see database names of the form launchpad_ftest_NUMBER, like lines 10 through 43 of https://pastebin.canonical.com/40714/ . If you don't have any, you can create some for the test using psql. "create database launchpad_ftest_30322;" will give you one to work with.

2) run test_on_merge.py, with some argument to limit the tests run. For instance, "./test_on_merge.py -t foobar" will not run any tests. There's probably a nicer way to do this, but I'm not worrying about it right now. When you do this, the first one or more lines will look like this: "Dropped old ftest database launchpad_ftest_30322. This should not happen. See bug 687951." After test_on_merge.py runs, psql -l will show you that the databases have been dropped.

To post a comment you must log in.
Robert Collins (lifeless) wrote :
Download full text (3.6 KiB)

As this stands it will break parallel testing via testr :(

On 21/12/2010 12:18 PM, "Gary Poster" <email address hidden> wrote:

Gary Poster has proposed merging lp:~gary/launchpad/bug-687951 into
lp:launchpad.

Requested reviews:
 Launchpad code reviewers (launchpad-reviewers)
Related bugs:
 #687951 Large number of DBs left around after test suite runs in buildbot
 https://bugs.launchpad.net/bugs/687951

This is an expedient solution to the problem described in bug 687951--or put
another way, it is a good solution to the buildbot and operational problem,
while not addressing the underlying Launchpad problem. I will create
another bug for the Launchpad problem, which Robert addressed in the IRC
conversation I quoted in that bug's comments.

The test_on_merge.py is operational code for which we do not have tests. To
demonstrate the way this patch works, do the following.

1) See if your machine already has some of the problematic old databases.
 Run psql -l and you may see database names of the form
launchpad_ftest_NUMBER, like lines 10 through 43 of
https://pastebin.canonical.com/40714/ . If you don't have any, you can
create some for the test using psql. "create database
launchpad_ftest_30322;" will give you one to work with.

2) run test_on_merge.py, with some argument to limit the tests run. For
instance, "./test_on_merge.py -t foobar" will not run any tests. There's
probably a nicer way to do this, but I'm not worrying about it right now.
 When you do this, the first one or more lines will look like this: "Dropped
old ftest database launchpad_ftest_30322. This should not happen. See bug
687951
." After test_on_merge.py runs, psql -l will show you that the
databases have been dropped.
--
https://code.launchpad.net/~gary/launchpad/bug-687951/+merge/44300
Your team Launchpad code reviewers from Canonical is subscribed to branch
lp:launchpad.

=== modified file 'test_on_merge.py'
--- test_on_merge.py 2010-09-28 12:37:55 +0000
+++ test_on_merge.py 2010-12-20 23:18:07 +0000
@@ -8,6 +8,7 @@

 import sys, time
 import os, errno
+from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
 import tabnanny
 from StringIO import StringIO
 import psycopg2
@@ -34,6 +35,7 @@

    Returns 1 on error, otherwise it returns the testrunner's exit code.
    """
+ cleanup_old_ftest_databases() # See bug 687951.
    if setup_test_database() != 0:
        return 1

@@ -224,6 +226,32 @@
    return rv

+def cleanup_old_ftest_databases():
+ """Kill old ftest databases."""
+ conn = psycopg2.connect('dbname=template1')
+ conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
+ cur = conn.cursor()
+ try:
+ cur.execute(r"""
+ SELECT d.datname
+ FROM pg_catalog.pg_database d
+ WHERE d.datname ~ '^launchpad_ftest_[0-9]+$';""")
+ old_databases = cur.fetchall()
+ for (database_name,) in old_databases:
+ print ("Dropped old ftest database %s. "
+ "This should not happen. See bug 687951.") % (
+ database_name,)
+ # We are not doing db-level parameterization because it does
not work
+ # (is not designed for) schema e...

Read more...

Robert Collins (lifeless) wrote :

Wow, android handle mail reply is terrible. Sorry about that.

Anyhow, while expedient, this will regress efforts to make the test environment sane.

buildbot is fixable when the alerts flash, and we /need/ to track down and fix the causes of these leaks otherwise more serious parallel test runs will be very hard to execute.

This stopgap will actively prevent us fixing this: the cure would be worse than the disease.

review: Disapprove
Gary Poster (gary) wrote :

Unfortunately, I suspect that I don't have time or resources to address the root cause between now and the end of the year. I will check with mthaddon as to whether this is an operational issue that needs a stopgap for the rest of 2010.

Gary Poster (gary) wrote :

mthaddon reports that addressing this problem can be postponed to 2011 as long as it has a high priority then. Therefore, I'm moving on.

Unmerged revisions

12120. By Gary Poster on 2010-12-20

This is an expedient solution to the immediate problem of buildbot falling over.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'test_on_merge.py'
2--- test_on_merge.py 2010-09-28 12:37:55 +0000
3+++ test_on_merge.py 2010-12-20 23:18:07 +0000
4@@ -8,6 +8,7 @@
5
6 import sys, time
7 import os, errno
8+from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
9 import tabnanny
10 from StringIO import StringIO
11 import psycopg2
12@@ -34,6 +35,7 @@
13
14 Returns 1 on error, otherwise it returns the testrunner's exit code.
15 """
16+ cleanup_old_ftest_databases() # See bug 687951.
17 if setup_test_database() != 0:
18 return 1
19
20@@ -224,6 +226,32 @@
21 return rv
22
23
24+def cleanup_old_ftest_databases():
25+ """Kill old ftest databases."""
26+ conn = psycopg2.connect('dbname=template1')
27+ conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
28+ cur = conn.cursor()
29+ try:
30+ cur.execute(r"""
31+ SELECT d.datname
32+ FROM pg_catalog.pg_database d
33+ WHERE d.datname ~ '^launchpad_ftest_[0-9]+$';""")
34+ old_databases = cur.fetchall()
35+ for (database_name,) in old_databases:
36+ print ("Dropped old ftest database %s. "
37+ "This should not happen. See bug 687951.") % (
38+ database_name,)
39+ # We are not doing db-level parameterization because it does not work
40+ # (is not designed for) schema entities.
41+ cur.execute(r"""
42+ DROP DATABASE %s;""" % (database_name,))
43+ finally:
44+ cur.close()
45+ del cur
46+ conn.close()
47+ del conn
48+
49+
50 def cleanup_hung_testrunner(process):
51 """Kill and clean up the testrunner process and its children."""
52 print