Merge lp:~florent.x/openobject-server/trunk-bug-905257-fix-reconnect into lp:openobject-server

Proposed by Florent
Status: Rejected
Rejected by: Olivier Dony (Odoo)
Proposed branch: lp:~florent.x/openobject-server/trunk-bug-905257-fix-reconnect
Merge into: lp:openobject-server
Diff against target: 32 lines (+14/-4)
1 file modified
openerp/sql_db.py (+14/-4)
To merge this branch: bzr merge lp:~florent.x/openobject-server/trunk-bug-905257-fix-reconnect
Reviewer Review Type Date Requested Status
Florent (community) Approve
Stephane Wirtel (OpenERP) Pending
OpenERP's Framework R&D Pending
OpenERP Core Team Pending
Review via email: mp+132149@code.launchpad.net

Description of the change

I've worked out the second part of the fix for issue #902527.
Now the server will silently discard the closed connections when the DB server is restarted.

The behavior is unchanged in all other cases.

To post a comment you must log in.
Revision history for this message
Florent (florent.x) wrote :

Running on the production system for two months, it works fine.

It survived `apt-get upgrade postgresql`.

review: Approve
Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Hi Florent,

Thanks for your contributions on this annoying bug! As I understand it, this patch will hide the connection errors when reusing dead connections and continue walking the pool in the hope that a non-dead one will become available.

Digging further into these mechanisms, it seems that the autodetection of dead connections would better be done in the pool.borrow() method rather than the cursor itself, since borrow() is already in charge of reaping closed connections (and already loops on the connection pool).

However the missing piece is that psycopg2 does not detect that a connection is dead until you try perform an active operation on it, and borrow() is not supposed to do anything to the connection it returns, not even calling autocommit().
After delving in the API, we found the connection.reset() method[1] that is meant to cleanup the session and does allow psycopg2 to detect that the connection is dead. This should be a fast no-op in normal cases, and the perfect way to auto-detect dead connections within borrow(). This makes a minimal and straightforward patch that accomplishes the same thing. Please have a look at it: https://code.launchpad.net/~openerp-dev/openobject-server/7.0-rstcnx-chs/+merge/148455

I think you will agree that it makes sense, and it should work fine for your customers.

If you don't mind, I'll mark this MP as rejected to avoid any confusion.

Thanks!

[1] http://initd.org/psycopg/docs/connection.html#connection.reset (introduced in psycopg2 2.0.12, and for OE 6.1 and 7.0 the minimum version is `debian stable as of the release date`, hence 2.2.1)

Revision history for this message
Florent (florent.x) wrote :

Hi Olivier,

Thank you for your feedback on this patch.
I reviewed your changes, technically it fixes the same issue as this MP.

However, I am not certain that it is a better approach.
Let's imagine that you have a pool of 16 connections. Each time you request a cursor(), the server will check the 16 connections and call cnx.reset() on each one.
According to the psycopg documentation, the reset() executes 2 commands: RESET and SET SESSION AUTHORIZATION. It's not really a no-op.
So you will execute 32 commands each time you grab a cursor, just to be sure that all connection is available. It is a bit suboptimal.

Finally, I still prefer my original patch: "easier to ask for forgiveness than permission"

BTW, there's a mismatch between the comment and the code here (and the comment is right):
http://bazaar.launchpad.net/~openerp/openobject-server/7.0/revision/4837.1.2/openerp/sql_db.py

Revision history for this message
Florent (florent.x) wrote :

An alternative approach is to move the line "self.autocommit(False)" into the method "borrow(...)" with the same "try: except:" as my original proposal.

Revision history for this message
Florent (florent.x) wrote :

Forget the "alternative approach" of my previous message... it does not exist.

Actually I don't see something better than the patch proposed here.
It is simple; it does not add useless network roundtrips; and it is tested.

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Hi Florent,

Thanks for your answer and for your review. There was indeed a typo in the connection closing code (`if` condition inverted), I'm guilty there: I messed up when applying the final version of the various patches we had tried (testing it with all psycopg2 versions was not trivial). Fortunately we noticed it quickly and had already corrected it (though the corrected version is not visible on the MP)

Concerning the performance impact, I agree that the code I merged was suboptimal, and we've just submitted a new variant, which will only call reset() as a last-second check before returning a connection. This has a minimal performance impact while still allowing borrow() to be responsible for validation connections.

I don't share your view on "easier to ask for forgiveness than permission" in this particular case because it breaks the separation of concerns: I don't think the cursor should be in charge of validating the connection it borrows, especially as borrow() is already doing that partly. That would just split the code to do it in several places.

I also believe that calling reset() is in fact a valid extra step when preparing a connection for a new cursor, because it cleans up any leftover session changes from previous cursors. And if we do that only once for the connection that is being borrowed, the impact should be negligible: 1 reset() + 1 autocommit() instead of just 1 autocommit(), i.e. O(1) instead of O(3) in terms of PG instructions - it should even be hard to measure the difference, wouldn't it?

Please see the merge proposal for the update, your feedback is appreciated: https://code.launchpad.net/~openerp-dev/openobject-server/7.0-rstcnx-chs/+merge/148848

I will probably merge it quickly as the previous code might indeed cause performance issues.

Revision history for this message
Florent (florent.x) wrote :

Before the change, "Pool.borrow" was just returning the connection from the pool (and discarding the connections marked as closed), without doing additional network roundtrips.
It does not seem to be a design flaw to check the validity of the cursor's connection in the Cursor.__init__ method, as I did in my patch.
However it does not matter really, and your proposal looks good.

I've noticed another minor issue with your patch, because the `enumerate` built-in returns an iterable:

>>> the_pool = list('Python rocks!')
>>> for idx, char in enumerate(the_pool):
... print the_pool.pop(idx),
...
P t o o k !
>>>

Unmerged revisions

4520. By Florent

[FIX] silently discard closed connection when DB server is restarted.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'openerp/sql_db.py'
2--- openerp/sql_db.py 2012-10-15 10:01:48 +0000
3+++ openerp/sql_db.py 2012-10-30 15:50:53 +0000
4@@ -180,15 +180,25 @@
5 # see also the docstring of Cursor.
6 self._serialized = serialized
7
8- self._cnx = pool.borrow(dsn(dbname))
9- self._obj = self._cnx.cursor(cursor_factory=psycopg1cursor)
10 if self.sql_log:
11 self.__caller = frame_codeinfo(currentframe(),2)
12 else:
13 self.__caller = False
14- self.__closed = False # real initialisation value
15- self.autocommit(False)
16 self.__closer = False
17+ # Ensure connection if the DB server has restarted
18+ for idx in xrange(pool._maxconn):
19+ self._cnx = pool.borrow(dsn(dbname))
20+ self._obj = self._cnx.cursor(cursor_factory=psycopg1cursor)
21+ self.__closed = False # real initialisation value
22+ try:
23+ self.autocommit(False)
24+ except psycopg2.OperationalError:
25+ self.__closed = True
26+ if not self._cnx.closed:
27+ raise
28+ # The closed connection will be discarded from the pool
29+ else:
30+ break
31
32 self._default_log_exceptions = True
33