Merge lp:~openerp-groupes/openobject-server/fix-delayed-ack into lp:openobject-server

Proposed by Julien Thewys
Status: Rejected
Rejected by: Olivier Dony (Odoo)
Proposed branch: lp:~openerp-groupes/openobject-server/fix-delayed-ack
Merge into: lp:openobject-server
Diff against target: 0 lines
To merge this branch: bzr merge lp:~openerp-groupes/openobject-server/fix-delayed-ack
Reviewer Review Type Date Requested Status
Olivier Dony (Odoo) Disapprove
Review via email: mp+78937@code.launchpad.net

This proposal supersedes a proposal from 2011-01-17.

Description of the change

Everything is in the commit message: we just need TCP_NODELAY between client and server.

Windows has 200ms ACK delay by default. If you deploy client and server on different Windows machines, clicking on administration menu (with extended view enabled) takes more than 10 seconds to complete (30+ RPC calls). Other operations are also terribly slowed down. With the patch this is simply reduced by 200ms times number of RPC calls.
Linux has apparently 40ms ACK delay by default, which is also not negligible. I suppose we never experienced the problem because we always deploy client and server on the same machine.
I also guess it is overkill to tune this further for the moment.

To post a comment you must log in.
Revision history for this message
xrg (xrg) wrote : Posted in a previous version of this proposal

Well, one alternative solution would be to use persistent connections. They are now supported for xml-rpc, but not for Net-RPC yet.

By looking at the tiny_socket.py implementation, I can spot the place where the size (8bytes) is sent in a different operation than the pickled data. This means that we will now have 2 TCP packets[1]. That's increased bandwidth, already.

Isn't there any way to "flush" the socket at the end of mysend()? Shouldn't that happen automatically when the socket is closed at the end of every RPC call?

Conclusions: I cast my doubts on this patch.

[1] please confirm with wireshark

review: Needs Information
Revision history for this message
Julien Thewys (julien-thewys) wrote : Posted in a previous version of this proposal

> Well, one alternative solution would be to use persistent connections. They
> are now supported for xml-rpc, but not for Net-RPC yet.

This might also boost general performances a little but this is unrelated to my problem: I had 200ms (two hundred!) delay for _each_ RPC calls between 2 Windows 2008 machines on the same subnet (latency <1ms, TCP open/close <1ms so persistent connections won't help here).
All this have been tested on different environments, measured with wireshark and deployed in production.

Revision history for this message
Julien Thewys (julien-thewys) wrote : Posted in a previous version of this proposal

> the size (8bytes) is sent in a different operation than the pickled data. This
> means that we will now have 2 TCP packets[1]. That's increased bandwidth,

Exactly: the size is sent in a first packet. But the sender then waits for the ACK (200ms by default on Windows) and finally sends the payload. For each NETRPC call. This is more about increased latency than increased bandwidth. The sender should not wait for the ACK in this case, therefore the TCP_NODELAY.

> Isn't there any way to "flush" the socket at the end of mysend()? Shouldn't
> that happen automatically when the socket is closed at the end of every RPC
> call?

The problem here is not at the end of mysend(), it is happening right into mysend().
The buffer is indeed flushed when the socket is closed.

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Based on the analysis of the problem by Nagle himself on the slashdot comment mentioned in the commit message[1], the problem arises only due to the write-write-read pattern in the original mysend() method.
This method was modified right before the relase of v6.0[2] to use single write() call, which the Nagle algorithm will always send immediately (there's no to-be-acknowledged packet on the wire at that point).
Both clients have had a mysend() method with a single write() call since 2009, too.

As a result, I really don't think we have a case to disable Nagle's algorithm. Presumably we should even be able to remove that patch from the clients (both seem to be using TCP_NODELAY at the moment). I can't find the rationale for the addition of that socket option in clients, as the fix of the write-write-read pattern in mysend() predates it for both GTK and Web 6.0.

If you can provide up-to-date data that shows this is still necessary with 6.1, I'll be glad to reconsider :-)

[1] http://developers.slashdot.org/comments.pl?sid=174457&threshold=1&commentsort=0&mode=thread&cid=14515105
[2] rev.3307 revid: <email address hidden>

review: Disapprove

Unmerged revisions

Preview Diff

Empty