Merge ~dirk.zimoch/epics-base:iocLogClientFixesTry2 into ~epics-core/epics-base/+git/epics-base:7.0

Proposed by Dirk Zimoch
Status: Merged
Merged at revision: e178fa85b6db1c798dad3fb8310652aaa2110567
Proposed branch: ~dirk.zimoch/epics-base:iocLogClientFixesTry2
Merge into: ~epics-core/epics-base/+git/epics-base:7.0
Diff against target: 708 lines (+238/-153)
11 files modified
modules/database/src/ioc/misc/dbCore.dbd (+3/-0)
modules/libcom/src/log/iocLog.c (+25/-0)
modules/libcom/src/log/logClient.c (+103/-152)
modules/libcom/src/log/logClient.h (+0/-1)
modules/libcom/src/osi/Makefile (+1/-0)
modules/libcom/src/osi/os/Darwin/osdSockUnsentCount.c (+19/-0)
modules/libcom/src/osi/os/Linux/osdSockUnsentCount.c (+19/-0)
modules/libcom/src/osi/os/WIN32/osdSockUnsentCount.c (+26/-0)
modules/libcom/src/osi/os/default/osdSockUnsentCount.c (+15/-0)
modules/libcom/src/osi/os/iOS/osdSockUnsentCount.c (+19/-0)
modules/libcom/src/osi/osiSock.h (+8/-0)
Reviewer Review Type Date Requested Status
mdavidsaver Needs Fixing
Review via email: mp+372925@code.launchpad.net

This proposal supersedes a proposal from 2019-08-30.

Commit message

fix several problems in logClient that lead to lost messages when the server (or firewall in between) disconnects

Description of the change

Bugfix: don't send errlog to all logClients, only to iocLog
Improvements to minimize message loss when server disconnects:
* detect lost connection to server early
* do not discard buffer at disconnect
* flush buffer directly after reconnect
Other improvements:
* avoid delays on ioc startup and shutdown
* avoid repeated memmove

Relates to bug report https://bugs.launchpad.net/epics-base/+bug/1841608 and Tech-Talk thread https://epics.anl.gov/tech-talk/2019/msg01329.php.

To post a comment you must log in.
Revision history for this message
mdavidsaver (mdavidsaver) wrote : Posted in a previous version of this proposal

What necessitates removing shutdown() and adding select()/recv()? To my knowledge send() should return with an error if the socket is, or becomes, disconnected.

review: Needs Information
Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

Unfortunately send() does not return an error immediately. It simply puts the message into the TCP send queue, even when the peer has already shut down the socket. Only one or more messages later send() returns an error. But by then the messages are lost. The logClient has already "successfully" sent them ti the TCP stack but TCP failed to deliver them and there is no way to recover the messages now.

From the send man page: "No indication of failure to deliver is implicit in a send()."
It only tells you that queuing the message was successful.

When the peer closes the socket, select() will show the socket readable and read() will return 0, but only after everything the peer may have actually send has been read (and discarded).
That is the only reliable way to detect a closed connection.
However this method only works if the receiving side had not been shut down.

I had to google a lot until I found this out. It seems many people get this wrong assuming send() should fail immediately. It doesn't.

I can't find the page that helped me in the end any more, but this one comes close:
https://www.softlab.ntua.gr/facilities/documentation/unix/unix-socket-faq/unix-socket-faq-2.html

Revision history for this message
mdavidsaver (mdavidsaver) wrote : Posted in a previous version of this proposal

When say "closed connection" do you mean normal closure (FIN/ACK), abnormal closure (RST), and/or (ideally) timeout?

Lack of an application level timeout is a known issue with the logging protocol. Are you trying to address this too?

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

I don't know the TCP details. I tested with a local server (ncat) which I terminated and also with our two production servers (syslog-ng adnd logstash) where the firewall closes idle connections after one hour. In both cases I can now detect when the connection breaks. Before I always lost messages.

Which timeout scenario do you have in mind? send() hanging? How could I test that?

Revision history for this message
mdavidsaver (mdavidsaver) wrote : Posted in a previous version of this proposal

I notice that the page you link states:

> DISCLAIMER: This is my interpretation of the RFCs (I have read all the TCP-related ones I could find), but I have not attempted to examine implementation source code or trace actual connections in order to verify it. I am satisfied that the logic is correct, though.

which makes me smile.

The situation you describe, with a "firewall" in between is actually fairly complicated. And opaque in that the firewall might be acting in different ways depending on how stateful and helpful it is (eg. whether one or both sides get RST).

What I understand about BSD sockets and TCP is that:

1. If the peer maintains the connection, but doesn't call recv() (at all, or not fast enough) then send() will eventually fill the Tx socket buffer FIFO and block until it becomes not full (by peer ACK on recv() ).

2. calling send() on a socket which has been closed (peer calls close()) while the TX FIFO is empty results in an immediate EPIPE.

3. if a send() is in progress because the Tx FIFO is full (or probably when not empty), a peer close results in ECONNRESET.

4. if a the peer has shutdown(SHUT_RD), then send() will succeed until the Tx FIFO is full, then it will block as with #2. When the peer does close(), the send will return with EPIPE. (or probably ECONNRESET if some data was sent, but not ACK'd before the FIN).

5. if the peer simply stops responding, then send() will succeed until the Tx FIFO is full, then block waiting for an ACK which never comes.

So normal, or abnormal, termination of the TCP connection will break a send() immediately after RST or the FIN+ACK handshake completes. peer shutdown(SHUT_RD) or interruption of the network path will not break or error send().

I've just verified 1,2,3,4 on Linux with short python scripts. (I was surprised that #4 doesn't error) #5 I know from past experience.

So I don't think that calling recv() serves the purpose you intend. I think that SO_SNDTIMEO would be a better fit.

That said, the recv() might serve as one piece of a mechanism for a log server to detect condition #5 faster than SO_KEEPALIVE would (imo. SO_KEEPALIVE should also be set). Some extra protocol would be needed to allow a log server to detect this protocol feature.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

4. Was the problem. send does not fail until some time later. But than messages are already lost.
So my fix is not to rely on send failing but on recv returning EOF.
SO_KEEPALIVE has a default timeout of 2 hours and you cannot change it per socket but only system wide. It is totally useless for solving the problems with lost messages. It is there to allow to clean up stale connections and not to run out of file descriptors in server programs that are constantly running. But that's about all. So it may help the log server but not the client.

The problem I solve here is that send succeeds until the FIFO is full, just as you pointed out in 4. and 5.

I cannot confirm 2 though. My tests with ncat as a server have shown that send does not detect immediately when the server has exited (and thus the kernel has closed the socket).

But leaving discussions about how TCP is supposed to work aside, my main argument for this change is: Now it works, before it didn't.

Revision history for this message
Andrew Johnson (anj) wrote : Posted in a previous version of this proposal

@Dirk: Does your new version still work with the original iocLogServer?

IIRC at connection time that server explicitly closes the socket in the log-server -> client IOC direction because the server never sends data back to the client. I'm not sure whether that might cause the client to get an EOF from a recv() call sooner than with the other log-servers that don't do that close.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

Indeed this looks like a close to the client and it closes the connection, but re-opens it on the next message. No messages are lost. The disconnect is not nice though. I'll have a look how to fix that.

Revision history for this message
mdavidsaver (mdavidsaver) wrote : Posted in a previous version of this proposal

wrt. iocLogServer, I think it would be reasonable to accept removal of the shutdown(SHUT_WR) in iocLogServer.c. So long as older IOCs can still log against newer iocLogServer (which I think would be true).

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

It seems that the broken connection is quite reliably detected at the second send() after the connection had been closed. A dummy send(socket, NULL, 0, 0) seems sufficient to detect the broken connection to iocLogServer as well as to ncat and do Michael's server.py example.

Revision history for this message
Andrew Johnson (anj) wrote : Posted in a previous version of this proposal

@Dirk: Which architectures have you tested that on?

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

Linux so far. I am about to test on vxWorks and Windows right now.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

Hmm... Not all Linux versions behave the same. On some send(socket, NULL, 0, 0) is a NOOP and does not detect a broken connection.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

The send() trick with 0 bytes does not work on all systems. It seems to depend on the Linux kernel version as well.

I am testing now a different solution. On Linux the call ioctl(sock, TIOCOUTQ, &unsent) gives the number of unsent bytes in the TCP output queue. This allows to keep the unsent bytes in the buffer in case it turns out later that the connection is broken.

On macOS there seems to be a similar functionality with getsockopt(sock, SOL_SOCKET, SO_NWRITE, &unsent), but I cannot test because I have no mac.

Other Unix flavors may or may not have such a feature.

For vxWorks I could not find a way to get the number of unsent bytes, neither for version 5.5 nor for 6.7.

I don't know about RTEMS.

On Windows 10 the number of unsent bytes should be available with WSAIoctl(sock, SIO_TCP_INFO, ..._). I spent most of last week to find out why it is not working before finding out that the Win 10 version I got was too old. I am upgrading now. Older Windows versions seem to have no way to find out the number of unsent bytes.

So I can mitigate the problem of message loss for some clients but not for all. Depending on the OS.

Revision history for this message
mdavidsaver (mdavidsaver) wrote : Posted in a previous version of this proposal

> The send() trick with 0 bytes does not work on all systems.

Unless poll() is used, there is a race condition involved. A second send() will only fail if a RST has been received. This may or may not be platform dependent.

Have you discarded the idea of removing the 'shutdown(SHUT_WR)' from iocLogServer?

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

I first want to stay as compatible as possible to existing iocLogServer implementations.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

I made a new branch some commits below the head of this one. How can I change the merge request to the new branch?

Revision history for this message
mdavidsaver (mdavidsaver) wrote : Posted in a previous version of this proposal

I usually do a force push to replace the existing branch content. I don't think branches can be changed, so the alternative is a new MR.

Revision history for this message
Martin Konrad (info-martin-konrad) wrote : Posted in a previous version of this proposal

Beware that force pushing messes up the diff on Launchpad's MR. Dirk, have you considered doing this in smaller steps (separate MRs for each issue you are addressing)? This way some of your modifications might get merged faster.

Revision history for this message
Ralph Lange (ralph-lange) wrote : Posted in a previous version of this proposal

iirc, when you properly supersede the MP through the web interface, you have an option to pull/link the existing conversation to the new proposal.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

The "resubmit proposal" link seems to do the job. It allows to switch branches. No force push needed. Diffs look fine and the conversation gets transferred.
Martin, the smaller steps are the individual commits. Each should be one (and in most cases small and understandable) change. Each commit I did results in working code. So if there is a problem with one particular step, it can be reverted.

I am not yet quite satisfied with the behavior, so some more changes may come. But please have a look already.

My test:
Start iocLogServer
Start softIoc and do ioclogInit
errlog msg1
Stop iocLogServer and restart it
errlog msg2
wait at least 10 seconds
errlog msg3

All 3 error messages should be logged.
On Linux (all version?) connection loss is detected latest 5 seconds after msg2.
On recent Win 10 connection loss is detected at only at msg3 because send(sock,NULL,0) has no effect,
 but no message is lost.
On vxWorks 5.5 and older Windows versions (< Win 10 version 1703) msg2 is lost because they have no o way to detect when a message is actually sent on the network.
Can someone test with Mac and RTEMS?

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote : Posted in a previous version of this proposal

So, now I have both variants for discussion.
This one, which requires a change in iocLogServer as well (not to shutdown its write channel).
Or mp+372925 which does not quite work for vxWorks and older Windows and uses an unspecified feature of send() in Linux.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

> Stop iocLogServer and restart it

A related test would be to suspend (SIGSTOP) iocLogServer. This way the socket remains allocated, which is a better simulation of a non-responsive peer.

You might also want to try sending (large) log messages until the various queues fill up. Messages will be lost at some point.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

I did that briefly. The client (ioc) happily sends to the server and on the server the TCP input queue fills up. If the server then dies the messages are lost.

I have not tried it until the queues filled up. The the logClient may block while holding the mutex which may make errlog (or whatever client) block. I will test that.

I suggest to remove the code that attempts to send from sendMessageChunk() and accept (and report) that messages are lost.

Can you please also review the usage of the mutex in the original code. I have the impression it does not really make sense.

Revision history for this message
Andrew Johnson (anj) wrote :

Can the necessary testing be done in modules/libcom/test/epicsErrlogTest.c or something similar? That file already has tests that implement a local log-server, needed to check iocLogPrefix().

Dirk, did you check this against a VxWorks 6.x version using the newer IP stack? We aren't supporting VxWorks 5.5 in EPICS 7 any more, but we do want it to work with the newer OS versions.

Some comments against the diff as well, but note that I haven't reviewed the functionality, just the build and APIs.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

epicsErrlogTest.c: I have not yet looked into this but I guess it should be possible.

VxWorks 6: I don't have vxWorks 6 running on my IOCs. Was always low on my work list.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

'#if defined (TIOCOUTQ)' fails when compiling for vxWorks 6. The macro does not exist in any vxWorks 6 header file. Same for SO_NWRITE. What exists is FIONWRITE, but that does not work on sockes, only on serial lines (at least for vxWorks 5).

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

Hm, the in-diff comments become kind of hidden with a new push.
About epicsSocketCountUnsentBytes() [or whatever we call it] returning size_t instead of int: In vxWorks and Windows all socket operations use int for size and Windows does not have ssize_t.

Revision history for this message
Andrew Johnson (anj) wrote :

I see what you mean about viewing in-diff comments. Clicking one of the "Show diff comments" links in an existing comment helped me earlier, but older comments seem to have disappeared (at least while I have some new unsaved diff comments pending).

I withdraw my size_t suggestion, but see my new in-diff comments regarding the name.

Revision history for this message
Dirk Zimoch (dirk.zimoch) :
Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

> older comments seem to have disappeared (at least while I have some new unsaved diff
> comments pending).

That may be the result a result of force-push when an in-diff comment refers to a line in a commit that has disappeared.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Dirk, I hope you won't take this as a disincentive, but I think a better approach would be either to switch to an actually reliable logging protocol, log over TLS with the heartbeat extension, switch to simply UDP logging, and/or teach caPutLog to use the system logging (aka call 'syslog()' or open '/dev/log').

I happened upon https://rainer.gerhards.net/2008/04/on-unreliability-of-plain-tcp-syslog.html which does a good job of summarizing some of the thoughts I've been having about iocLog "protocol" and the seeming impossibility of making it truly reliable while maintaining compatibility.

I also wonder if it would be better to move iocLogClient out of Base and into the caPutLog module as an internal API? caPutLog seems to be the only (public) user of this API, and I think that removing the separation would enable a more effective restructuring of the code.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Specifically, I don't like the idea of epicsSocketCountUnsentBytes() becoming public API. This is a highly specific use case, which can't be fully supported on our range of target OSs. I would like to see it removed from the public/installed headers unless at least one external user can be identified.

review: Needs Fixing
Revision history for this message
mdavidsaver (mdavidsaver) wrote :

I think the 'send(NULL)' pattern is a race condition as it depends on a TCP reset message having arrived prior to the second send(NULL) call. With real network latency I don't think it will work as intended to quickly and reliably detect when the preceding send() elicits a RST.

I'm not going to insist on a change here because I don't think it will break anything. The RST just won't be handled (via. EPIPE) unless the next time the log queue is flushed, and send() is called again.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

cf. https://code.launchpad.net/~epics-core/epics-base/+git/Com/+merge/373098

which is not mutually exclusive to this work. I wanted to try the idea.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

It is totally fine for me to keep os-specific unsent cound inside logClient as a bunch of #if branches.

'send(NULL)' is not race condition, because I do not rely on getting the RST right after sending. It is fine to get it at some point. It simply polls for RST every 5 seconds.

Maybe it is better to use the other approach: https://code.launchpad.net/~dirk.zimoch/epics-base/+git/epics-base/+merge/372043. This one is very os-dependent and uses unspecified behavior.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Results of discussion at core dev. meeting

Add '#ifdef EPICS_PRIVATE_API' around declaration of epicsSocketCountUnsentBytes() in osiSock.h.

Also, can you confirm that you've tested this (final) version on a target where epicsSocketCountUnsentBytes()==-1 ?

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

I have tested it against vxWorks 5 where epicsSocketCountUnsentBytes()==-1. The effect is that the log Client cannot keep track of the unsent bytes and messages may be lost as before.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

I did a test merge. Some CI failures resulted with cross-mingw and some older MSVC. Looks like some of the defines and structs related to TCP_INFO aren't available.

https://ci.appveyor.com/project/mdavidsaver/epics-base/builds/28358563

https://travis-ci.org/mdavidsaver/epics-base/builds/602443942

review: Needs Fixing
Revision history for this message
mdavidsaver (mdavidsaver) wrote :
Revision history for this message
Andrew Johnson (anj) wrote :

This could use an entry describing the changes in the Release Notes (which are now formatted in Markdown on both branches; I merged 3.15 up over the weekend and just pushed).

Revision history for this message
mdavidsaver (mdavidsaver) wrote : Posted in a previous version of this proposal

The other variant was merged, so this one is no longer relevant.

review: Disapprove

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/modules/database/src/ioc/misc/dbCore.dbd b/modules/database/src/ioc/misc/dbCore.dbd
2index 51f9c96..adde24e 100644
3--- a/modules/database/src/ioc/misc/dbCore.dbd
4+++ b/modules/database/src/ioc/misc/dbCore.dbd
5@@ -33,3 +33,6 @@ variable(callbackParallelThreadsDefault,int)
6
7 # Real-time operation
8 variable(dbThreadRealtimeLock,int)
9+
10+# show logClient network activity
11+variable(logClientDebug,int)
12diff --git a/modules/libcom/src/log/iocLog.c b/modules/libcom/src/log/iocLog.c
13index e62da20..ba78041 100644
14--- a/modules/libcom/src/log/iocLog.c
15+++ b/modules/libcom/src/log/iocLog.c
16@@ -18,8 +18,10 @@
17
18 #define epicsExportSharedSymbols
19 #include "envDefs.h"
20+#include "errlog.h"
21 #include "logClient.h"
22 #include "iocLog.h"
23+#include "epicsExit.h"
24
25 int iocLogDisable = 0;
26
27@@ -75,6 +77,24 @@ void epicsShareAPI epicsShareAPI iocLogFlush (void)
28 }
29
30 /*
31+ * logClientSendMessage ()
32+ */
33+static void logClientSendMessage ( logClientId id, const char * message )
34+{
35+ if ( !iocLogDisable ) {
36+ logClientSend (id, message);
37+ }
38+}
39+
40+/*
41+ * iocLogClientDestroy()
42+ */
43+static void iocLogClientDestroy (logClientId id)
44+{
45+ errlogRemoveListeners (logClientSendMessage, id);
46+}
47+
48+/*
49 * iocLogClientInit()
50 */
51 static logClientId iocLogClientInit (void)
52@@ -89,6 +109,10 @@ static logClientId iocLogClientInit (void)
53 return NULL;
54 }
55 id = logClientCreate (addr, port);
56+ if (id != NULL) {
57+ errlogAddListener (logClientSendMessage, id);
58+ epicsAtExit (iocLogClientDestroy, id);
59+ }
60 return id;
61 }
62
63@@ -135,3 +159,4 @@ logClientId epicsShareAPI logClientInit (void)
64 {
65 return iocLogClientInit ();
66 }
67+
68diff --git a/modules/libcom/src/log/logClient.c b/modules/libcom/src/log/logClient.c
69index 99ee671..9a09ef7 100644
70--- a/modules/libcom/src/log/logClient.c
71+++ b/modules/libcom/src/log/logClient.c
72@@ -21,11 +21,11 @@
73 #include <string.h>
74 #include <stdio.h>
75
76+#define EPICS_PRIVATE_API
77 #define epicsExportSharedSymbols
78 #include "dbDefs.h"
79 #include "epicsEvent.h"
80 #include "iocLog.h"
81-#include "errlog.h"
82 #include "epicsMutex.h"
83 #include "epicsThread.h"
84 #include "epicsTime.h"
85@@ -33,9 +33,13 @@
86 #include "epicsAssert.h"
87 #include "epicsExit.h"
88 #include "epicsSignal.h"
89+#include "epicsExport.h"
90
91 #include "logClient.h"
92
93+int logClientDebug = 0;
94+epicsExportAddress (int, logClientDebug);
95+
96 typedef struct {
97 char msgBuf[0x4000];
98 struct sockaddr_in addr;
99@@ -44,8 +48,10 @@ typedef struct {
100 SOCKET sock;
101 epicsThreadId restartThreadId;
102 epicsEventId stateChangeNotify;
103+ epicsEventId shutdownNotify;
104 unsigned connectCount;
105 unsigned nextMsgIndex;
106+ unsigned backlog;
107 unsigned connected;
108 unsigned shutdown;
109 unsigned shutdownConfirm;
110@@ -53,7 +59,6 @@ typedef struct {
111 } logClient;
112
113 static const double LOG_RESTART_DELAY = 5.0; /* sec */
114-static const double LOG_SERVER_CREATE_CONNECT_SYNC_TIMEOUT = 5.0; /* sec */
115 static const double LOG_SERVER_SHUTDOWN_TIMEOUT = 30.0; /* sec */
116
117 /*
118@@ -66,10 +71,10 @@ static char* logClientPrefix = NULL;
119 */
120 static void logClientClose ( logClient *pClient )
121 {
122-# ifdef DEBUG
123+ if (logClientDebug) {
124 fprintf (stderr, "log client: lingering for connection close...");
125 fflush (stderr);
126-# endif
127+ }
128
129 /*
130 * mutex on
131@@ -84,8 +89,6 @@ static void logClientClose ( logClient *pClient )
132 pClient->sock = INVALID_SOCKET;
133 }
134
135- pClient->nextMsgIndex = 0u;
136- memset ( pClient->msgBuf, '\0', sizeof ( pClient->msgBuf ) );
137 pClient->connected = 0u;
138
139 /*
140@@ -93,9 +96,8 @@ static void logClientClose ( logClient *pClient )
141 */
142 epicsMutexUnlock (pClient->mutex);
143
144-# ifdef DEBUG
145+ if (logClientDebug)
146 fprintf (stderr, "done\n");
147-# endif
148 }
149
150 /*
151@@ -113,6 +115,7 @@ static void logClientDestroy (logClientId id)
152 epicsMutexMustLock ( pClient->mutex );
153 pClient->shutdown = 1u;
154 epicsMutexUnlock ( pClient->mutex );
155+ epicsEventSignal ( pClient->shutdownNotify );
156
157 /* unblock log client thread blocking in send() or connect() */
158 interruptInfo =
159@@ -154,13 +157,11 @@ static void logClientDestroy (logClientId id)
160 return;
161 }
162
163- errlogRemoveListeners ( logClientSendMessage, (void *) pClient );
164-
165 logClientClose ( pClient );
166
167 epicsMutexDestroy ( pClient->mutex );
168-
169 epicsEventDestroy ( pClient->stateChangeNotify );
170+ epicsEventDestroy ( pClient->shutdownNotify );
171
172 free ( pClient );
173 }
174@@ -176,61 +177,26 @@ static void sendMessageChunk(logClient * pClient, const char * message) {
175 unsigned msgBufBytesLeft =
176 sizeof ( pClient->msgBuf ) - pClient->nextMsgIndex;
177
178- if ( strSize > msgBufBytesLeft ) {
179- int status;
180-
181- if ( ! pClient->connected ) {
182- break;
183- }
184-
185- if ( msgBufBytesLeft > 0u ) {
186- memcpy ( & pClient->msgBuf[pClient->nextMsgIndex],
187- message, msgBufBytesLeft );
188- pClient->nextMsgIndex += msgBufBytesLeft;
189- strSize -= msgBufBytesLeft;
190- message += msgBufBytesLeft;
191- }
192-
193- status = send ( pClient->sock, pClient->msgBuf,
194- pClient->nextMsgIndex, 0 );
195- if ( status > 0 ) {
196- unsigned nSent = (unsigned) status;
197- if ( nSent < pClient->nextMsgIndex ) {
198- unsigned newNextMsgIndex = pClient->nextMsgIndex - nSent;
199- memmove ( pClient->msgBuf, & pClient->msgBuf[nSent],
200- newNextMsgIndex );
201- pClient->nextMsgIndex = newNextMsgIndex;
202- }
203- else {
204- pClient->nextMsgIndex = 0u;
205- }
206- }
207- else {
208- if ( ! pClient->shutdown ) {
209- char sockErrBuf[64];
210- if ( status ) {
211- epicsSocketConvertErrnoToString ( sockErrBuf, sizeof ( sockErrBuf ) );
212- }
213- else {
214- strcpy ( sockErrBuf, "server initiated disconnect" );
215- }
216- fprintf ( stderr, "log client: lost contact with log server at \"%s\" because \"%s\"\n",
217- pClient->name, sockErrBuf );
218- }
219- logClientClose ( pClient );
220- break;
221- }
222+ if ( msgBufBytesLeft < strSize && pClient->nextMsgIndex != 0u && pClient->connected)
223+ {
224+ /* buffer is full, thus flush it */
225+ logClientFlush ( pClient );
226+ msgBufBytesLeft = sizeof ( pClient->msgBuf ) - pClient->nextMsgIndex;
227 }
228- else {
229- memcpy ( & pClient->msgBuf[pClient->nextMsgIndex],
230- message, strSize );
231- pClient->nextMsgIndex += strSize;
232+ if ( msgBufBytesLeft == 0u ) {
233+ fprintf ( stderr, "log client: messages to \"%s\" are lost\n",
234+ pClient->name );
235 break;
236 }
237+ if ( msgBufBytesLeft > strSize) msgBufBytesLeft = strSize;
238+ memcpy ( & pClient->msgBuf[pClient->nextMsgIndex],
239+ message, msgBufBytesLeft );
240+ pClient->nextMsgIndex += msgBufBytesLeft;
241+ strSize -= msgBufBytesLeft;
242+ message += msgBufBytesLeft;
243 }
244 }
245
246-
247 /*
248 * logClientSend ()
249 */
250@@ -255,43 +221,54 @@ void epicsShareAPI logClientSend ( logClientId id, const char * message )
251
252 void epicsShareAPI logClientFlush ( logClientId id )
253 {
254+ unsigned nSent;
255+ int status = 0;
256+
257 logClient * pClient = ( logClient * ) id;
258
259- if ( ! pClient ) {
260+ if ( ! pClient || ! pClient->connected ) {
261 return;
262 }
263
264 epicsMutexMustLock ( pClient->mutex );
265
266- while ( pClient->nextMsgIndex && pClient->connected ) {
267- int status = send ( pClient->sock, pClient->msgBuf,
268- pClient->nextMsgIndex, 0 );
269- if ( status > 0 ) {
270- unsigned nSent = (unsigned) status;
271- if ( nSent < pClient->nextMsgIndex ) {
272- unsigned newNextMsgIndex = pClient->nextMsgIndex - nSent;
273- memmove ( pClient->msgBuf, & pClient->msgBuf[nSent],
274- newNextMsgIndex );
275- pClient->nextMsgIndex = newNextMsgIndex;
276- }
277- else {
278- pClient->nextMsgIndex = 0u;
279- }
280+ nSent = pClient->backlog;
281+ while ( nSent < pClient->nextMsgIndex && pClient->connected ) {
282+ status = send ( pClient->sock, pClient->msgBuf + nSent,
283+ pClient->nextMsgIndex - nSent, 0 );
284+ if ( status < 0 ) break;
285+ nSent += status;
286+ }
287+
288+ if ( pClient->backlog > 0 && status >= 0 )
289+ {
290+ /* On Linux send 0 bytes can detect EPIPE */
291+ /* NOOP on Windows, fails on vxWorks */
292+ errno = 0;
293+ status = send ( pClient->sock, NULL, 0, 0 );
294+ if (!(errno == ECONNRESET || errno == EPIPE)) status = 0;
295+ }
296+
297+ if ( status < 0 ) {
298+ if ( ! pClient->shutdown ) {
299+ char sockErrBuf[128];
300+ epicsSocketConvertErrnoToString ( sockErrBuf, sizeof ( sockErrBuf ) );
301+ fprintf ( stderr, "log client: lost contact with log server at \"%s\" because \"%s\"\n",
302+ pClient->name, sockErrBuf );
303 }
304- else {
305- if ( ! pClient->shutdown ) {
306- char sockErrBuf[64];
307- if ( status ) {
308- epicsSocketConvertErrnoToString ( sockErrBuf, sizeof ( sockErrBuf ) );
309- }
310- else {
311- strcpy ( sockErrBuf, "server initiated disconnect" );
312- }
313- fprintf ( stderr, "log client: lost contact with log server at \"%s\" because \"%s\"\n",
314- pClient->name, sockErrBuf );
315- }
316- logClientClose ( pClient );
317- break;
318+ pClient->backlog = 0;
319+ logClientClose ( pClient );
320+ }
321+ else if ( nSent > 0 && pClient->nextMsgIndex > 0 ) {
322+ int backlog = epicsSocketUnsentCount ( pClient->sock );
323+ if (backlog >= 0) {
324+ pClient->backlog = backlog;
325+ nSent -= backlog;
326+ }
327+ pClient->nextMsgIndex -= nSent;
328+ if ( nSent > 0 && pClient->nextMsgIndex > 0 ) {
329+ memmove ( pClient->msgBuf, & pClient->msgBuf[nSent],
330+ pClient->nextMsgIndex );
331 }
332 }
333 epicsMutexUnlock ( pClient->mutex );
334@@ -302,10 +279,10 @@ void epicsShareAPI logClientFlush ( logClientId id )
335 */
336 static void logClientMakeSock (logClient *pClient)
337 {
338-
339-# ifdef DEBUG
340+ if (logClientDebug) {
341 fprintf (stderr, "log client: creating socket...");
342-# endif
343+ fflush (stderr);
344+ }
345
346 epicsMutexMustLock (pClient->mutex);
347
348@@ -314,7 +291,7 @@ static void logClientMakeSock (logClient *pClient)
349 */
350 pClient->sock = epicsSocketCreate ( AF_INET, SOCK_STREAM, 0 );
351 if ( pClient->sock == INVALID_SOCKET ) {
352- char sockErrBuf[64];
353+ char sockErrBuf[128];
354 epicsSocketConvertErrnoToString (
355 sockErrBuf, sizeof ( sockErrBuf ) );
356 fprintf ( stderr, "log client: no socket error %s\n",
357@@ -323,10 +300,8 @@ static void logClientMakeSock (logClient *pClient)
358
359 epicsMutexUnlock (pClient->mutex);
360
361-# ifdef DEBUG
362+ if (logClientDebug)
363 fprintf (stderr, "done\n");
364-# endif
365-
366 }
367
368 /*
369@@ -366,7 +341,7 @@ static void logClientConnect (logClient *pClient)
370 }
371 else {
372 if ( pClient->connFailStatus != errnoCpy && ! pClient->shutdown ) {
373- char sockErrBuf[64];
374+ char sockErrBuf[128];
375 epicsSocketConvertErrnoToString (
376 sockErrBuf, sizeof ( sockErrBuf ) );
377 fprintf (stderr,
378@@ -392,7 +367,7 @@ static void logClientConnect (logClient *pClient)
379 optval = TRUE;
380 status = setsockopt (pClient->sock, SOL_SOCKET, SO_KEEPALIVE, (char *)&optval, sizeof(optval));
381 if (status<0) {
382- char sockErrBuf[64];
383+ char sockErrBuf[128];
384 epicsSocketConvertErrnoToString (
385 sockErrBuf, sizeof ( sockErrBuf ) );
386 fprintf (stderr, "log client: unable to enable keepalive option because \"%s\"\n", sockErrBuf);
387@@ -404,11 +379,11 @@ static void logClientConnect (logClient *pClient)
388 */
389 status = shutdown (pClient->sock, SHUT_RD);
390 if (status < 0) {
391- char sockErrBuf[64];
392+ char sockErrBuf[128];
393 epicsSocketConvertErrnoToString (
394 sockErrBuf, sizeof ( sockErrBuf ) );
395- fprintf (stderr, "%s:%d shutdown(%d,SHUT_RD) error was \"%s\"\n",
396- __FILE__, __LINE__, pClient->sock, sockErrBuf);
397+ fprintf (stderr, "%s:%d shutdown(sock,SHUT_RD) error was \"%s\"\n",
398+ __FILE__, __LINE__, sockErrBuf);
399 /* not fatal (although it shouldn't happen) */
400 }
401
402@@ -425,7 +400,7 @@ static void logClientConnect (logClient *pClient)
403 lingerval.l_linger = 60*5;
404 status = setsockopt (pClient->sock, SOL_SOCKET, SO_LINGER, (char *) &lingerval, sizeof(lingerval));
405 if (status<0) {
406- char sockErrBuf[64];
407+ char sockErrBuf[128];
408 epicsSocketConvertErrnoToString (
409 sockErrBuf, sizeof ( sockErrBuf ) );
410 fprintf (stderr, "log client: unable to set linger options because \"%s\"\n", sockErrBuf);
411@@ -457,14 +432,10 @@ static void logClientRestart ( logClientId id )
412
413 epicsMutexUnlock ( pClient->mutex );
414
415- if ( isConn ) {
416- logClientFlush ( pClient );
417- }
418- else {
419- logClientConnect ( pClient );
420- }
421-
422- epicsThreadSleep ( LOG_RESTART_DELAY );
423+ if ( ! isConn ) logClientConnect ( pClient );
424+ logClientFlush ( pClient );
425+
426+ epicsEventWaitWithTimeout ( pClient->shutdownNotify, LOG_RESTART_DELAY);
427
428 epicsMutexMustLock ( pClient->mutex );
429 }
430@@ -480,9 +451,7 @@ static void logClientRestart ( logClientId id )
431 logClientId epicsShareAPI logClientCreate (
432 struct in_addr server_addr, unsigned short server_port)
433 {
434- epicsTimeStamp begin, current;
435 logClient *pClient;
436- double diff;
437
438 pClient = calloc (1, sizeof (*pClient));
439 if (pClient==NULL) {
440@@ -507,14 +476,22 @@ logClientId epicsShareAPI logClientCreate (
441 pClient->shutdownConfirm = 0;
442
443 epicsAtExit (logClientDestroy, (void*) pClient);
444-
445+
446 pClient->stateChangeNotify = epicsEventCreate (epicsEventEmpty);
447 if ( ! pClient->stateChangeNotify ) {
448 epicsMutexDestroy ( pClient->mutex );
449 free ( pClient );
450 return NULL;
451 }
452-
453+
454+ pClient->shutdownNotify = epicsEventCreate (epicsEventEmpty);
455+ if ( ! pClient->shutdownNotify ) {
456+ epicsMutexDestroy ( pClient->mutex );
457+ epicsEventDestroy ( pClient->stateChangeNotify );
458+ free ( pClient );
459+ return NULL;
460+ }
461+
462 pClient->restartThreadId = epicsThreadCreate (
463 "logRestart", epicsThreadPriorityLow,
464 epicsThreadGetStackSize(epicsThreadStackSmall),
465@@ -522,35 +499,12 @@ logClientId epicsShareAPI logClientCreate (
466 if ( pClient->restartThreadId == NULL ) {
467 epicsMutexDestroy ( pClient->mutex );
468 epicsEventDestroy ( pClient->stateChangeNotify );
469+ epicsEventDestroy ( pClient->shutdownNotify );
470 free (pClient);
471 fprintf(stderr, "log client: unable to start log client connection watch dog thread\n");
472 return NULL;
473 }
474
475- /*
476- * attempt to synchronize with circuit connect
477- */
478- epicsTimeGetCurrent ( & begin );
479- epicsMutexMustLock ( pClient->mutex );
480- do {
481- epicsMutexUnlock ( pClient->mutex );
482- epicsEventWaitWithTimeout (
483- pClient->stateChangeNotify,
484- LOG_SERVER_CREATE_CONNECT_SYNC_TIMEOUT / 10.0 );
485- epicsTimeGetCurrent ( & current );
486- diff = epicsTimeDiffInSeconds ( & current, & begin );
487- epicsMutexMustLock ( pClient->mutex );
488- }
489- while ( ! pClient->connected && diff < LOG_SERVER_CREATE_CONNECT_SYNC_TIMEOUT );
490- epicsMutexUnlock ( pClient->mutex );
491-
492- if ( ! pClient->connected ) {
493- fprintf (stderr, "log client create: timed out synchronizing with circuit connect to \"%s\" after %.1f seconds\n",
494- pClient->name, LOG_SERVER_CREATE_CONNECT_SYNC_TIMEOUT );
495- }
496-
497- errlogAddListener ( logClientSendMessage, (void *) pClient );
498-
499 return (void *) pClient;
500 }
501
502@@ -568,24 +522,21 @@ void epicsShareAPI logClientShow (logClientId id, unsigned level)
503 printf ("log client: disconnected from log server at \"%s\"\n", pClient->name);
504 }
505
506- if (level>1) {
507- printf ("log client: sock=%s, connect cycles = %u\n",
508- pClient->sock==INVALID_SOCKET?"INVALID":"OK",
509- pClient->connectCount);
510- }
511-
512 if (logClientPrefix) {
513 printf ("log client: prefix is \"%s\"\n", logClientPrefix);
514 }
515-}
516
517-/*
518- * logClientSendMessage (); deprecated
519- */
520-void logClientSendMessage ( logClientId id, const char * message )
521-{
522- if ( !iocLogDisable ) {
523- logClientSend (id, message);
524+ if (level>0) {
525+ printf ("log client: sock %s, connect cycles = %u\n",
526+ pClient->sock==INVALID_SOCKET?"INVALID":"OK",
527+ pClient->connectCount);
528+ }
529+ if (level>1) {
530+ printf ("log client: %u bytes in buffer\n", pClient->nextMsgIndex);
531+ if (pClient->nextMsgIndex)
532+ printf("-------------------------\n"
533+ "%.*s-------------------------\n",
534+ (int)(pClient->nextMsgIndex), pClient->msgBuf);
535 }
536 }
537
538diff --git a/modules/libcom/src/log/logClient.h b/modules/libcom/src/log/logClient.h
539index 1797bbb..3b3f63a 100644
540--- a/modules/libcom/src/log/logClient.h
541+++ b/modules/libcom/src/log/logClient.h
542@@ -38,7 +38,6 @@ epicsShareFunc void epicsShareAPI iocLogPrefix(const char* prefix);
543 /* deprecated interface; retained for backward compatibility */
544 /* note: implementations are in iocLog.c, not logClient.c */
545 epicsShareFunc logClientId epicsShareAPI logClientInit (void);
546-epicsShareFunc void logClientSendMessage (logClientId id, const char *message);
547
548 #ifdef __cplusplus
549 }
550diff --git a/modules/libcom/src/osi/Makefile b/modules/libcom/src/osi/Makefile
551index ecbf4c2..71d8c3d 100644
552--- a/modules/libcom/src/osi/Makefile
553+++ b/modules/libcom/src/osi/Makefile
554@@ -86,6 +86,7 @@ endif
555
556 Com_SRCS += osdSock.c
557 Com_SRCS += osdSockAddrReuse.cpp
558+Com_SRCS += osdSockUnsentCount.c
559 Com_SRCS += osiSock.c
560 Com_SRCS += systemCallIntMech.cpp
561 Com_SRCS += epicsSocketConvertErrnoToString.cpp
562diff --git a/modules/libcom/src/osi/os/Darwin/osdSockUnsentCount.c b/modules/libcom/src/osi/os/Darwin/osdSockUnsentCount.c
563new file mode 100644
564index 0000000..20bd82b
565--- /dev/null
566+++ b/modules/libcom/src/osi/os/Darwin/osdSockUnsentCount.c
567@@ -0,0 +1,19 @@
568+/*************************************************************************\
569+* EPICS BASE is distributed subject to a Software License Agreement found
570+* in file LICENSE that is included with this distribution.
571+\*************************************************************************/
572+
573+#define EPICS_PRIVATE_API
574+#include "osiSock.h"
575+
576+/*
577+ * epicsSocketUnsentCount ()
578+ * See https://www.unix.com/man-page/osx/2/setsockopt
579+ */
580+int epicsSocketUnsentCount(SOCKET sock) {
581+ int unsent;
582+ socklen_t len = sizeof(unsent);
583+ if (getsockopt(sock, SOL_SOCKET, SO_NWRITE, &unsent, &len) == 0)
584+ return unsent;
585+ return -1;
586+}
587diff --git a/modules/libcom/src/osi/os/Linux/osdSockUnsentCount.c b/modules/libcom/src/osi/os/Linux/osdSockUnsentCount.c
588new file mode 100644
589index 0000000..3c0a8f9
590--- /dev/null
591+++ b/modules/libcom/src/osi/os/Linux/osdSockUnsentCount.c
592@@ -0,0 +1,19 @@
593+/*************************************************************************\
594+* EPICS BASE is distributed subject to a Software License Agreement found
595+* in file LICENSE that is included with this distribution.
596+\*************************************************************************/
597+
598+#include <linux/sockios.h>
599+#define EPICS_PRIVATE_API
600+#include "osiSock.h"
601+
602+/*
603+ * epicsSocketUnsentCount ()
604+ * See https://linux.die.net/man/7/tcp
605+ */
606+int epicsSocketUnsentCount(SOCKET sock) {
607+ int unsent;
608+ if (ioctl(sock, SIOCOUTQ, &unsent) == 0)
609+ return unsent;
610+ return -1;
611+}
612diff --git a/modules/libcom/src/osi/os/WIN32/osdSockUnsentCount.c b/modules/libcom/src/osi/os/WIN32/osdSockUnsentCount.c
613new file mode 100644
614index 0000000..fe68ead
615--- /dev/null
616+++ b/modules/libcom/src/osi/os/WIN32/osdSockUnsentCount.c
617@@ -0,0 +1,26 @@
618+/*************************************************************************\
619+* EPICS BASE is distributed subject to a Software License Agreement found
620+* in file LICENSE that is included with this distribution.
621+\*************************************************************************/
622+
623+#define epicsExportSharedSymbols
624+#define EPICS_PRIVATE_API
625+#include "osiSock.h"
626+#include <mstcpip.h>
627+
628+/*
629+ * epicsSocketUnsentCount ()
630+ * See https://docs.microsoft.com/en-us/windows/win32/api/mstcpip/ns-mstcpip-tcp_info_v0
631+ */
632+int epicsSocketUnsentCount(SOCKET sock) {
633+#if defined (_WIN32) && WINVER >= _WIN32_WINNT_WIN10
634+/* Windows 10 Version 1703 / Server 2016 */
635+ DWORD infoVersion = 0, bytesReturned;
636+ TCP_INFO_v0 tcpInfo;
637+ int status;
638+ if ((status = WSAIoctl(sock, SIO_TCP_INFO, &infoVersion, sizeof(infoVersion),
639+ &tcpInfo, sizeof(tcpInfo), &bytesReturned, NULL, NULL)) == 0)
640+ return tcpInfo.BytesInFlight;
641+#endif
642+ return -1;
643+}
644diff --git a/modules/libcom/src/osi/os/default/osdSockUnsentCount.c b/modules/libcom/src/osi/os/default/osdSockUnsentCount.c
645new file mode 100644
646index 0000000..ef01e9b
647--- /dev/null
648+++ b/modules/libcom/src/osi/os/default/osdSockUnsentCount.c
649@@ -0,0 +1,15 @@
650+/*************************************************************************\
651+* EPICS BASE is distributed subject to a Software License Agreement found
652+* in file LICENSE that is included with this distribution.
653+\*************************************************************************/
654+
655+#define EPICS_PRIVATE_API
656+#include "osiSock.h"
657+
658+/*
659+ * epicsSocketUnsentCount ()
660+ */
661+int epicsSocketUnsentCount(SOCKET sock) {
662+ /* not implemented */
663+ return -1;
664+}
665diff --git a/modules/libcom/src/osi/os/iOS/osdSockUnsentCount.c b/modules/libcom/src/osi/os/iOS/osdSockUnsentCount.c
666new file mode 100644
667index 0000000..20bd82b
668--- /dev/null
669+++ b/modules/libcom/src/osi/os/iOS/osdSockUnsentCount.c
670@@ -0,0 +1,19 @@
671+/*************************************************************************\
672+* EPICS BASE is distributed subject to a Software License Agreement found
673+* in file LICENSE that is included with this distribution.
674+\*************************************************************************/
675+
676+#define EPICS_PRIVATE_API
677+#include "osiSock.h"
678+
679+/*
680+ * epicsSocketUnsentCount ()
681+ * See https://www.unix.com/man-page/osx/2/setsockopt
682+ */
683+int epicsSocketUnsentCount(SOCKET sock) {
684+ int unsent;
685+ socklen_t len = sizeof(unsent);
686+ if (getsockopt(sock, SOL_SOCKET, SO_NWRITE, &unsent, &len) == 0)
687+ return unsent;
688+ return -1;
689+}
690diff --git a/modules/libcom/src/osi/osiSock.h b/modules/libcom/src/osi/osiSock.h
691index 061619e..6e3b053 100644
692--- a/modules/libcom/src/osi/osiSock.h
693+++ b/modules/libcom/src/osi/osiSock.h
694@@ -52,6 +52,14 @@ enum epicsSocketSystemCallInterruptMechanismQueryInfo {
695 epicsShareFunc enum epicsSocketSystemCallInterruptMechanismQueryInfo
696 epicsSocketSystemCallInterruptMechanismQuery ();
697
698+#ifdef EPICS_PRIVATE_API
699+/*
700+ * Some systems (e.g Linux and Windows 10) allow to check the amount
701+ * of unsent data in the output queue.
702+ * Returns -1 if the information is not available.
703+ */
704+epicsShareFunc int epicsSocketUnsentCount(SOCKET sock);
705+#endif
706
707 /*
708 * convert socket address to ASCII in this order

Subscribers

People subscribed via source and target branches