Merge lp:~afrantzis/mir/fix-1350207-unresponsive-clients into lp:mir
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Daniel van Vugt | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | 1920 | ||||
Proposed branch: | lp:~afrantzis/mir/fix-1350207-unresponsive-clients | ||||
Merge into: | lp:mir | ||||
Diff against target: |
293 lines (+158/-15) 9 files modified
src/server/frontend/socket_messenger.cpp (+21/-5) tests/acceptance-tests/CMakeLists.txt (+1/-0) tests/acceptance-tests/test_unresponsive_client.cpp (+122/-0) tests/include/mir_test/cross_process_action.h (+2/-1) tests/include/mir_test_framework/display_server_test_fixture.h (+1/-1) tests/include/mir_test_framework/testing_process_manager.h (+1/-1) tests/mir_test/cross_process_action.cpp (+2/-2) tests/mir_test_framework/display_server_test_fixture.cpp (+2/-2) tests/mir_test_framework/testing_process_manager.cpp (+6/-3) |
||||
To merge this branch: | bzr merge lp:~afrantzis/mir/fix-1350207-unresponsive-clients | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Daniel van Vugt | Approve | ||
Alberto Aguirre (community) | Approve | ||
Kevin DuBois (community) | Approve | ||
Chris Halse Rogers | Approve | ||
PS Jenkins bot (community) | continuous-integration | Approve | |
Review via email: mp+233934@code.launchpad.net |
Commit message
server: Work around unresponsive clients causing the server to hang (LP: #1350207)
Description of the change
server: Work around unresponsive clients causing the server to hang (LP: #1350207)
This MP makes the sockets used for communicating with clients non-blocking. When the server tries to write to a socket with a full send buffer it will fail instead of blocking, resulting in either dropping the event (when sending events), or disconnecting the client (when responding to a client rpc request).
This MP also increases the socket send buffer to 64KiB (default is 16KiB) to give clients a little more breathing room for transient freezes. We can increase it even more if needed (max is 4MiB).
Note that this is only a workaround, not a final fix, which would involve making the data writes asynchronous, essentially providing some more buffering on the server side (instead of the kernel). Unfortunately, this is not easily achievable with asio in this particular scenario because:
1. Asio doesn't support sending ancillary data (e.g. file descriptors) asynchronously and in coordination with normal data.
2. There is no guaranteed ordering and atomicity of asio asynchronous writes, and messages may end up being interleaved without additional care.
More discussion is needed in this area:
* what are the criteria for dropping a client (should we decide or the shell or both?)
* application-
* if we drop messages to clients does it make sense for the clients to continue in a possibly inconsistent state (it depends on the kind of dropped messages)
FAILED: Continuous integration, rev:1903 jenkins. qa.ubuntu. com/job/ mir-team- mir-development -branch- ci/2662/ jenkins. qa.ubuntu. com/job/ mir-android- utopic- i386-build/ 1677 jenkins. qa.ubuntu. com/job/ mir-clang- utopic- amd64-build/ 1683 jenkins. qa.ubuntu. com/job/ mir-mediumtests -utopic- touch/1658 jenkins. qa.ubuntu. com/job/ mir-team- mir-development -branch- utopic- amd64-ci/ 1184/console jenkins. qa.ubuntu. com/job/ mir-mediumtests -builder- utopic- armhf/595 jenkins. qa.ubuntu. com/job/ mir-mediumtests -builder- utopic- armhf/595/ artifact/ work/output/ *zip*/output. zip jenkins. qa.ubuntu. com/job/ mir-mediumtests -runner- mako/2713 s-jenkins. ubuntu- ci:8080/ job/touch- flash-device/ 12965
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
FAILURE: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
SUCCESS: http://
Click here to trigger a rebuild: s-jenkins. ubuntu- ci:8080/ job/mir- team-mir- development- branch- ci/2662/ rebuild
http://