Merge lp:~stub/python-pgbouncer/devel into lp:python-pgbouncer

Proposed by Stuart Bishop
Status: Merged
Merged at revision: 6
Proposed branch: lp:~stub/python-pgbouncer/devel
Merge into: lp:python-pgbouncer
Diff against target: 129 lines (+35/-27)
3 files modified
pgbouncer/__init__.py (+1/-1)
pgbouncer/fixture.py (+32/-24)
setup.py (+2/-2)
To merge this branch: bzr merge lp:~stub/python-pgbouncer/devel
Reviewer Review Type Date Requested Status
Jeroen T. Vermeulen (community) Approve
Review via email: mp+74885@code.launchpad.net

Description of the change

Avoid race conditions with pidfile. In fact, ignore the pidfile entirely by running pgbouncer in foreground mode.

Consider the process available when it is accepting socket connections, rather than when it happens to have written its pidfile.

To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Much better approach I think, thanks.

There's a small wart still, as far as I can see: a timeout seems likely to manifest itself in the form of a nonpositive timeout value being passed to the connection call.

That will still raise a timeout-related exception when that happens so it's not altogether wrong, but it might confuse post-mortem analysis. In my quick experiments I get “socket.error: [Errno 115] Operation now in progress” when the timeout is zero, and “ValueError: Timeout value out of range” when it's negative.

review: Approve
Revision history for this message
Stuart Bishop (stub) wrote :

On Sat, Sep 10, 2011 at 9:49 PM, Jeroen T. Vermeulen <email address hidden> wrote:
> Review: Approve
>
> Much better approach I think, thanks.
>
> There's a small wart still, as far as I can see: a timeout seems likely to manifest itself in the form of a nonpositive timeout value being passed to the connection call.

I don't see how a nonpositive timeout can be passed to the connection
call unless someone sets the constance to <= 0. The loop is exited
with a 'raise' if we grab a new value of 'now' and find the timeout
has been hit.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Revision history for this message
Robert Collins (lifeless) wrote :

Oh, one more thing to check - without -d, pgbouncer may write to
stdout/stderr. If it does this it will deadlock eventually.

Revision history for this message
Stuart Bishop (stub) wrote :

The -q option should take care of stdout/stderr noise. We might lose console errors if the process fails to start up for some reason (missing dynamic libraries or similar).

Revision history for this message
Stuart Bishop (stub) wrote :

We could pass shell=True and redirect stdout and stderr to /dev/null to be sure, but I don't think we need to bother.

lp:~stub/python-pgbouncer/devel updated
10. By Stuart Bishop

Let pgbouncer emit noise rather than hide the failure, avoiding potential hangs

Revision history for this message
Stuart Bishop (stub) wrote :

I'm no longer redirecting stdout/stderr to a pipe and closing it. It should be silent, and if it isn't we should here about it.

Revision history for this message
Gavin Panella (allenap) wrote :

Fwiw, the following would work too:

    with open(os.devnull, "r+b") as devnull:
        process = subprocess.Popen(
            ..., stdin=devnull, stdout=devnull, stderr=devnull)

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, Sep 13, 2011 at 3:50 AM, Stuart Bishop
<email address hidden> wrote:
> I'm no longer redirecting stdout/stderr to a pipe and closing it. It should be silent, and if it isn't we should here about it.

That will break lxc. Please put it back. Details are in the various
merges I did to LP itself a month back.

Revision history for this message
Robert Collins (lifeless) wrote :

Less abruptly, I've mailed the -dev list the (hard) lesson learnt
there. And to avoid creating extra work for you, I've rolled this
back, proposed a narrower fix, got that reviewed, released and its now
playing in EC2 to upgrade LP to 0.0.5.

Revision history for this message
Stuart Bishop (stub) wrote :

On Tue, Sep 13, 2011 at 7:28 AM, Robert Collins
<email address hidden> wrote:
> Less abruptly, I've mailed the -dev list the (hard) lesson learnt
> there. And to avoid creating extra work for you, I've rolled this
> back, proposed a narrower fix, got that reviewed, released and its now
> playing in EC2 to upgrade LP to 0.0.5.

So this version is missing the '\n' detection to ensure the pid number
has actually been flushed as expected, and also doesn't help with the
race condition where starting up enough to write the pid file doesn't
mean the process is bound and accepting connections. I think what I
had except for my final commit would have been better, or using
os.devnull if we don't trust -q.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, Sep 13, 2011 at 3:19 PM, Stuart Bishop
<email address hidden> wrote:
> On Tue, Sep 13, 2011 at 7:28 AM, Robert Collins
> <email address hidden> wrote:
>> Less abruptly, I've mailed the -dev list the (hard) lesson learnt
>> there. And to avoid creating extra work for you, I've rolled this
>> back, proposed a narrower fix, got that reviewed, released and its now
>> playing in EC2 to upgrade LP to 0.0.5.
>
> So this version is missing the '\n' detection to ensure the pid number
> has actually been flushed as expected, and also doesn't help with the
> race condition where starting up enough to write the pid file doesn't
> mean the process is bound and accepting connections. I think what I
> had except for my final commit would have been better, or using
> os.devnull if we don't trust -q.

We can tweak further. The pid file doesn't have the \n - but as *all
of Ubuntu* depends on sane behaviour for pid files I don't think we
need a higher standard - if pgbouncer really writes the pid contents
byte-and-flush-at-a-time, then it is what needs fixing, not out
fixture.

On the not listening side, I haven't checked the code - if there is a
race there as well then yes we need to copy that part of your patch
forward as well.

os.devnull would discard what output it has - using a temp file to get
its output is simple and robust. If it writes that its started to that
file we could check there, or just probe as you did - thats a good
thing to do.

FWIW the 0.0.4 version broke jenkins on the librarian patch, and 0.0.5
works again - but we haven't drilled in enough to know if its a flakey
test or actual fallout from pgbouncer fixture changes.

Revision history for this message
Stuart Bishop (stub) wrote :

We have one failure in
lp.services.job.tests.test_runner.TestTwistedJobRunner.test_timeout_short
which doesn't go near the fixture, and another in the new Librarian
disconnection tests that might be a genuine failure (got a 200
response when 503 expected, so either connections did not get routed
through pgbouncer or there is a problem in the fixture stop() method).

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'pgbouncer/__init__.py'
2--- pgbouncer/__init__.py 2011-07-18 03:31:27 +0000
3+++ pgbouncer/__init__.py 2011-09-12 15:49:32 +0000
4@@ -25,7 +25,7 @@
5 # established at this point, and setup.py will use a version of next-$(revno).
6 # If the releaselevel is 'final', then the tarball will be major.minor.micro.
7 # Otherwise it is major.minor.micro~$(revno).
8-__version__ = (0, 0, 1, 'beta', 0)
9+__version__ = (0, 0, 4, 'beta', 0)
10
11 __all__ = [
12 'PGBouncerFixture',
13
14=== modified file 'pgbouncer/fixture.py'
15--- pgbouncer/fixture.py 2011-09-05 15:33:21 +0000
16+++ pgbouncer/fixture.py 2011-09-12 15:49:32 +0000
17@@ -26,6 +26,9 @@
18
19 from fixtures import Fixture, TempDir
20
21+YIELD_TIME = 0.02 # How much time to sleep in busy loops.
22+PROCESS_TIMEOUT = 5 # How long we wait for pgbouncer to startup and shutdown.
23+
24 def _allocate_ports(n=1):
25 """Allocate `n` unused ports.
26
27@@ -76,7 +79,6 @@
28 self.port = _allocate_ports()[0]
29 self.configdir = self.useFixture(TempDir())
30 self.auth_type = 'trust'
31- self.process_pid = None
32 self.setUpConf()
33 self.start()
34
35@@ -86,7 +88,6 @@
36 self.authpath = os.path.join(self.configdir.path, 'users.txt')
37 self.logpath = os.path.join(self.configdir.path, 'pgbouncer.log')
38 self.pidpath = os.path.join(self.configdir.path, 'pgbouncer.pid')
39- self.outputpath = os.path.join(self.configdir.path, 'output')
40 with open(self.inipath, 'wt') as inifile:
41 inifile.write('[databases]\n')
42 for item in self.databases.items():
43@@ -111,18 +112,19 @@
44 authfile.write('"%s" "%s"\n' % user_creds)
45
46 def stop(self):
47- if self.process_pid is None:
48+ if self.process is None:
49 return
50- os.kill(self.process_pid, signal.SIGTERM)
51- # Wait for the shutdown to occur
52- start = time.time()
53- stop = start + 5.0
54- while time.time() < stop:
55- if not os.path.isfile(self.pidpath):
56- self.process_pid = None
57- return
58- # If its not going away, we might want to raise an error, but for now
59- # it seems reliable.
60+ try:
61+ self.process.terminate()
62+ # Wait for the shutdown to occur
63+ start = time.time()
64+ stop = start + PROCESS_TIMEOUT
65+ while self.process.poll() is None and time.time() < stop:
66+ time.sleep(YIELD_TIME)
67+ finally:
68+ # If its not going away, we might want to raise an error,
69+ # but for now it seems reliable.
70+ self.process = None
71
72 def start(self):
73 self.addCleanup(self.stop)
74@@ -137,16 +139,22 @@
75 env = os.environ.copy()
76 env['PATH'] = ':'.join(path)
77
78- outputfile = open(self.outputpath, 'wt')
79 self.process = subprocess.Popen(
80- [self.pgbouncer, '-d', self.inipath], env=env,
81- stdout=outputfile.fileno(), stderr=outputfile.fileno())
82- self.process.communicate()
83- # Wait up to 5 seconds for the pid file to exist
84- start = time.time()
85- stop = start + 5.0
86- while time.time() < stop:
87- if os.path.isfile(self.pidpath):
88- self.process_pid = int(file(self.pidpath, 'rt').read())
89+ [self.pgbouncer, '-q', self.inipath], env=env,
90+ stderr=subprocess.STDOUT, stdin=subprocess.PIPE)
91+ self.process.stdin.close()
92+
93+ # Wait up to 5 seconds for pgbouncer to launch start responding, or
94+ # raise an exception.
95+ now = time.time()
96+ stop = now + PROCESS_TIMEOUT
97+ while True:
98+ try:
99+ test_socket = socket.create_connection(
100+ ('127.0.0.1', self.port), stop - now)
101 return
102- raise Exception('timeout waiting for pgbouncer to create pid file')
103+ except Exception:
104+ now = time.time()
105+ if now >= stop:
106+ raise
107+ time.sleep(YIELD_TIME)
108
109=== modified file 'setup.py'
110--- setup.py 2011-07-25 08:33:53 +0000
111+++ setup.py 2011-09-12 15:49:32 +0000
112@@ -22,7 +22,7 @@
113 description = file(os.path.join(os.path.dirname(__file__), 'README'), 'rb').read()
114
115 setup(name="pgbouncer",
116- version="0.0.2",
117+ version="0.0.4",
118 description="Fixture to bring up temporary pgbouncer instance.",
119 long_description=description,
120 maintainer="Launchpad Developers",
121@@ -31,7 +31,7 @@
122 packages=['pgbouncer'],
123 package_dir = {'':'.'},
124 classifiers = [
125- 'Development Status :: 2 - Pre-Alpha',
126+ 'Development Status :: 4 - Beta',
127 'Intended Audience :: Developers',
128 'License :: OSI Approved :: GNU Affero General Public License v3',
129 'Operating System :: OS Independent',

Subscribers

People subscribed via source and target branches

to status/vote changes: