Merge ~dirk.zimoch/epics-base:iocLogClientFixes into ~epics-core/epics-base/+git/epics-base:7.0
- Git
- lp:~dirk.zimoch/epics-base
- iocLogClientFixes
- Merge into 7.0
Status: | Rejected |
---|---|
Rejected by: | mdavidsaver |
Proposed branch: | ~dirk.zimoch/epics-base:iocLogClientFixes |
Merge into: | ~epics-core/epics-base/+git/epics-base:7.0 |
Diff against target: |
579 lines (+129/-161) 5 files modified
modules/database/src/ioc/misc/dbCore.dbd (+3/-0) modules/libcom/src/log/iocLog.c (+25/-0) modules/libcom/src/log/iocLogServer.c (+0/-12) modules/libcom/src/log/logClient.c (+101/-148) modules/libcom/src/log/logClient.h (+0/-1) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
mdavidsaver | Disapprove | ||
Review via email: mp+372043@code.launchpad.net |
This proposal has been superseded by a proposal from 2019-09-18.
Commit message
fix several problems in logClient
Description of the change
Bugfix: don't send errlog to all logClients, only to iocLog
Improvements to minimize message loss when server disconnects:
* detect lost connection to server early
* do not discard buffer at disconnect
* flush buffer directly after reconnect
Other improvements:
* avoid delays on ioc startup and shutdown
* avoid repeated memmove
Relates to bug report https:/
Dirk Zimoch (dirk.zimoch) wrote : | # |
Unfortunately send() does not return an error immediately. It simply puts the message into the TCP send queue, even when the peer has already shut down the socket. Only one or more messages later send() returns an error. But by then the messages are lost. The logClient has already "successfully" sent them ti the TCP stack but TCP failed to deliver them and there is no way to recover the messages now.
From the send man page: "No indication of failure to deliver is implicit in a send()."
It only tells you that queuing the message was successful.
When the peer closes the socket, select() will show the socket readable and read() will return 0, but only after everything the peer may have actually send has been read (and discarded).
That is the only reliable way to detect a closed connection.
However this method only works if the receiving side had not been shut down.
I had to google a lot until I found this out. It seems many people get this wrong assuming send() should fail immediately. It doesn't.
I can't find the page that helped me in the end any more, but this one comes close:
https:/
mdavidsaver (mdavidsaver) wrote : | # |
When say "closed connection" do you mean normal closure (FIN/ACK), abnormal closure (RST), and/or (ideally) timeout?
Lack of an application level timeout is a known issue with the logging protocol. Are you trying to address this too?
Dirk Zimoch (dirk.zimoch) wrote : | # |
I don't know the TCP details. I tested with a local server (ncat) which I terminated and also with our two production servers (syslog-ng adnd logstash) where the firewall closes idle connections after one hour. In both cases I can now detect when the connection breaks. Before I always lost messages.
Which timeout scenario do you have in mind? send() hanging? How could I test that?
mdavidsaver (mdavidsaver) wrote : | # |
I notice that the page you link states:
> DISCLAIMER: This is my interpretation of the RFCs (I have read all the TCP-related ones I could find), but I have not attempted to examine implementation source code or trace actual connections in order to verify it. I am satisfied that the logic is correct, though.
which makes me smile.
The situation you describe, with a "firewall" in between is actually fairly complicated. And opaque in that the firewall might be acting in different ways depending on how stateful and helpful it is (eg. whether one or both sides get RST).
What I understand about BSD sockets and TCP is that:
1. If the peer maintains the connection, but doesn't call recv() (at all, or not fast enough) then send() will eventually fill the Tx socket buffer FIFO and block until it becomes not full (by peer ACK on recv() ).
2. calling send() on a socket which has been closed (peer calls close()) while the TX FIFO is empty results in an immediate EPIPE.
3. if a send() is in progress because the Tx FIFO is full (or probably when not empty), a peer close results in ECONNRESET.
4. if a the peer has shutdown(SHUT_RD), then send() will succeed until the Tx FIFO is full, then it will block as with #2. When the peer does close(), the send will return with EPIPE. (or probably ECONNRESET if some data was sent, but not ACK'd before the FIN).
5. if the peer simply stops responding, then send() will succeed until the Tx FIFO is full, then block waiting for an ACK which never comes.
So normal, or abnormal, termination of the TCP connection will break a send() immediately after RST or the FIN+ACK handshake completes. peer shutdown(SHUT_RD) or interruption of the network path will not break or error send().
I've just verified 1,2,3,4 on Linux with short python scripts. (I was surprised that #4 doesn't error) #5 I know from past experience.
So I don't think that calling recv() serves the purpose you intend. I think that SO_SNDTIMEO would be a better fit.
That said, the recv() might serve as one piece of a mechanism for a log server to detect condition #5 faster than SO_KEEPALIVE would (imo. SO_KEEPALIVE should also be set). Some extra protocol would be needed to allow a log server to detect this protocol feature.
Dirk Zimoch (dirk.zimoch) wrote : | # |
4. Was the problem. send does not fail until some time later. But than messages are already lost.
So my fix is not to rely on send failing but on recv returning EOF.
SO_KEEPALIVE has a default timeout of 2 hours and you cannot change it per socket but only system wide. It is totally useless for solving the problems with lost messages. It is there to allow to clean up stale connections and not to run out of file descriptors in server programs that are constantly running. But that's about all. So it may help the log server but not the client.
The problem I solve here is that send succeeds until the FIFO is full, just as you pointed out in 4. and 5.
I cannot confirm 2 though. My tests with ncat as a server have shown that send does not detect immediately when the server has exited (and thus the kernel has closed the socket).
But leaving discussions about how TCP is supposed to work aside, my main argument for this change is: Now it works, before it didn't.
Andrew Johnson (anj) wrote : | # |
@Dirk: Does your new version still work with the original iocLogServer?
IIRC at connection time that server explicitly closes the socket in the log-server -> client IOC direction because the server never sends data back to the client. I'm not sure whether that might cause the client to get an EOF from a recv() call sooner than with the other log-servers that don't do that close.
Dirk Zimoch (dirk.zimoch) wrote : | # |
Indeed this looks like a close to the client and it closes the connection, but re-opens it on the next message. No messages are lost. The disconnect is not nice though. I'll have a look how to fix that.
mdavidsaver (mdavidsaver) wrote : | # |
wrt. iocLogServer, I think it would be reasonable to accept removal of the shutdown(SHUT_WR) in iocLogServer.c. So long as older IOCs can still log against newer iocLogServer (which I think would be true).
Dirk Zimoch (dirk.zimoch) wrote : | # |
It seems that the broken connection is quite reliably detected at the second send() after the connection had been closed. A dummy send(socket, NULL, 0, 0) seems sufficient to detect the broken connection to iocLogServer as well as to ncat and do Michael's server.py example.
Andrew Johnson (anj) wrote : | # |
@Dirk: Which architectures have you tested that on?
Dirk Zimoch (dirk.zimoch) wrote : | # |
Linux so far. I am about to test on vxWorks and Windows right now.
Dirk Zimoch (dirk.zimoch) wrote : | # |
Hmm... Not all Linux versions behave the same. On some send(socket, NULL, 0, 0) is a NOOP and does not detect a broken connection.
Dirk Zimoch (dirk.zimoch) wrote : | # |
The send() trick with 0 bytes does not work on all systems. It seems to depend on the Linux kernel version as well.
I am testing now a different solution. On Linux the call ioctl(sock, TIOCOUTQ, &unsent) gives the number of unsent bytes in the TCP output queue. This allows to keep the unsent bytes in the buffer in case it turns out later that the connection is broken.
On macOS there seems to be a similar functionality with getsockopt(sock, SOL_SOCKET, SO_NWRITE, &unsent), but I cannot test because I have no mac.
Other Unix flavors may or may not have such a feature.
For vxWorks I could not find a way to get the number of unsent bytes, neither for version 5.5 nor for 6.7.
I don't know about RTEMS.
On Windows 10 the number of unsent bytes should be available with WSAIoctl(sock, SIO_TCP_INFO, ..._). I spent most of last week to find out why it is not working before finding out that the Win 10 version I got was too old. I am upgrading now. Older Windows versions seem to have no way to find out the number of unsent bytes.
So I can mitigate the problem of message loss for some clients but not for all. Depending on the OS.
mdavidsaver (mdavidsaver) wrote : | # |
> The send() trick with 0 bytes does not work on all systems.
Unless poll() is used, there is a race condition involved. A second send() will only fail if a RST has been received. This may or may not be platform dependent.
Have you discarded the idea of removing the 'shutdown(SHUT_WR)' from iocLogServer?
Dirk Zimoch (dirk.zimoch) wrote : | # |
I first want to stay as compatible as possible to existing iocLogServer implementations.
Dirk Zimoch (dirk.zimoch) wrote : | # |
I made a new branch some commits below the head of this one. How can I change the merge request to the new branch?
mdavidsaver (mdavidsaver) wrote : | # |
I usually do a force push to replace the existing branch content. I don't think branches can be changed, so the alternative is a new MR.
Martin Konrad (info-martin-konrad) wrote : | # |
Beware that force pushing messes up the diff on Launchpad's MR. Dirk, have you considered doing this in smaller steps (separate MRs for each issue you are addressing)? This way some of your modifications might get merged faster.
Ralph Lange (ralph-lange) wrote : | # |
iirc, when you properly supersede the MP through the web interface, you have an option to pull/link the existing conversation to the new proposal.
Dirk Zimoch (dirk.zimoch) wrote : | # |
So, now I have both variants for discussion.
This one, which requires a change in iocLogServer as well (not to shutdown its write channel).
Or mp+372925 which does not quite work for vxWorks and older Windows and uses an unspecified feature of send() in Linux.
- abbd54b... by Dirk Zimoch
-
fix wrong function name in comment
- 301a6e0... by Dirk Zimoch
-
moved logClientSendMe
ssage and made it static - 5f0df84... by Dirk Zimoch
-
increase error message buffer size for long (Windows) error messges
mdavidsaver (mdavidsaver) wrote : | # |
The other variant was merged, so this one is no longer relevant.
Unmerged commits
- 5f0df84... by Dirk Zimoch
-
increase error message buffer size for long (Windows) error messges
- 301a6e0... by Dirk Zimoch
-
moved logClientSendMe
ssage and made it static - abbd54b... by Dirk Zimoch
-
fix wrong function name in comment
- bad6295... by Dirk Zimoch
-
don't shutdown write channel of iocLogServer to allow logClient to detect disconnect
- 5801ff2... by Dirk Zimoch
-
use dynamic debug flag for logClient
- 9dfed16... by Dirk Zimoch
-
improve logClientShow to show unsent bytes on level 2 (and fix level 1)
- b925d3f... by Dirk Zimoch
-
(Re-)connect only when there is something to send. This avoids constant useless disconnect/connect cycles if the server (or firewall) keeps closing the connection.
- 2cafe6d... by Dirk Zimoch
-
detect when server closes connection. can only be done reliably with recv, not with send
Preview Diff
1 | diff --git a/modules/database/src/ioc/misc/dbCore.dbd b/modules/database/src/ioc/misc/dbCore.dbd |
2 | index 51f9c96..adde24e 100644 |
3 | --- a/modules/database/src/ioc/misc/dbCore.dbd |
4 | +++ b/modules/database/src/ioc/misc/dbCore.dbd |
5 | @@ -33,3 +33,6 @@ variable(callbackParallelThreadsDefault,int) |
6 | |
7 | # Real-time operation |
8 | variable(dbThreadRealtimeLock,int) |
9 | + |
10 | +# show logClient network activity |
11 | +variable(logClientDebug,int) |
12 | diff --git a/modules/libcom/src/log/iocLog.c b/modules/libcom/src/log/iocLog.c |
13 | index e62da20..ba78041 100644 |
14 | --- a/modules/libcom/src/log/iocLog.c |
15 | +++ b/modules/libcom/src/log/iocLog.c |
16 | @@ -18,8 +18,10 @@ |
17 | |
18 | #define epicsExportSharedSymbols |
19 | #include "envDefs.h" |
20 | +#include "errlog.h" |
21 | #include "logClient.h" |
22 | #include "iocLog.h" |
23 | +#include "epicsExit.h" |
24 | |
25 | int iocLogDisable = 0; |
26 | |
27 | @@ -75,6 +77,24 @@ void epicsShareAPI epicsShareAPI iocLogFlush (void) |
28 | } |
29 | |
30 | /* |
31 | + * logClientSendMessage () |
32 | + */ |
33 | +static void logClientSendMessage ( logClientId id, const char * message ) |
34 | +{ |
35 | + if ( !iocLogDisable ) { |
36 | + logClientSend (id, message); |
37 | + } |
38 | +} |
39 | + |
40 | +/* |
41 | + * iocLogClientDestroy() |
42 | + */ |
43 | +static void iocLogClientDestroy (logClientId id) |
44 | +{ |
45 | + errlogRemoveListeners (logClientSendMessage, id); |
46 | +} |
47 | + |
48 | +/* |
49 | * iocLogClientInit() |
50 | */ |
51 | static logClientId iocLogClientInit (void) |
52 | @@ -89,6 +109,10 @@ static logClientId iocLogClientInit (void) |
53 | return NULL; |
54 | } |
55 | id = logClientCreate (addr, port); |
56 | + if (id != NULL) { |
57 | + errlogAddListener (logClientSendMessage, id); |
58 | + epicsAtExit (iocLogClientDestroy, id); |
59 | + } |
60 | return id; |
61 | } |
62 | |
63 | @@ -135,3 +159,4 @@ logClientId epicsShareAPI logClientInit (void) |
64 | { |
65 | return iocLogClientInit (); |
66 | } |
67 | + |
68 | diff --git a/modules/libcom/src/log/iocLogServer.c b/modules/libcom/src/log/iocLogServer.c |
69 | index e707185..4d2f14f 100644 |
70 | --- a/modules/libcom/src/log/iocLogServer.c |
71 | +++ b/modules/libcom/src/log/iocLogServer.c |
72 | @@ -486,18 +486,6 @@ static void acceptNewClient ( void *pParam ) |
73 | } |
74 | } |
75 | |
76 | - status = shutdown(pclient->insock, SHUT_WR); |
77 | - if(status<0){ |
78 | - char sockErrBuf[64]; |
79 | - epicsSocketConvertErrnoToString ( sockErrBuf, sizeof ( sockErrBuf ) ); |
80 | - fprintf (stderr, "%s:%d shutdown err %s\n", __FILE__, __LINE__, |
81 | - sockErrBuf); |
82 | - epicsSocketDestroy ( pclient->insock ); |
83 | - free(pclient); |
84 | - |
85 | - return; |
86 | - } |
87 | - |
88 | status = fdmgr_add_callback( |
89 | pserver->pfdctx, |
90 | pclient->insock, |
91 | diff --git a/modules/libcom/src/log/logClient.c b/modules/libcom/src/log/logClient.c |
92 | index 99ee671..986c7b8 100644 |
93 | --- a/modules/libcom/src/log/logClient.c |
94 | +++ b/modules/libcom/src/log/logClient.c |
95 | @@ -21,11 +21,9 @@ |
96 | #include <string.h> |
97 | #include <stdio.h> |
98 | |
99 | -#define epicsExportSharedSymbols |
100 | #include "dbDefs.h" |
101 | #include "epicsEvent.h" |
102 | #include "iocLog.h" |
103 | -#include "errlog.h" |
104 | #include "epicsMutex.h" |
105 | #include "epicsThread.h" |
106 | #include "epicsTime.h" |
107 | @@ -33,9 +31,13 @@ |
108 | #include "epicsAssert.h" |
109 | #include "epicsExit.h" |
110 | #include "epicsSignal.h" |
111 | +#include "epicsExport.h" |
112 | |
113 | #include "logClient.h" |
114 | |
115 | +int logClientDebug = 0; |
116 | +epicsExportAddress (int, logClientDebug); |
117 | + |
118 | typedef struct { |
119 | char msgBuf[0x4000]; |
120 | struct sockaddr_in addr; |
121 | @@ -44,6 +46,7 @@ typedef struct { |
122 | SOCKET sock; |
123 | epicsThreadId restartThreadId; |
124 | epicsEventId stateChangeNotify; |
125 | + epicsEventId shutdownNotify; |
126 | unsigned connectCount; |
127 | unsigned nextMsgIndex; |
128 | unsigned connected; |
129 | @@ -53,7 +56,6 @@ typedef struct { |
130 | } logClient; |
131 | |
132 | static const double LOG_RESTART_DELAY = 5.0; /* sec */ |
133 | -static const double LOG_SERVER_CREATE_CONNECT_SYNC_TIMEOUT = 5.0; /* sec */ |
134 | static const double LOG_SERVER_SHUTDOWN_TIMEOUT = 30.0; /* sec */ |
135 | |
136 | /* |
137 | @@ -66,10 +68,10 @@ static char* logClientPrefix = NULL; |
138 | */ |
139 | static void logClientClose ( logClient *pClient ) |
140 | { |
141 | -# ifdef DEBUG |
142 | + if (logClientDebug) { |
143 | fprintf (stderr, "log client: lingering for connection close..."); |
144 | fflush (stderr); |
145 | -# endif |
146 | + } |
147 | |
148 | /* |
149 | * mutex on |
150 | @@ -84,8 +86,6 @@ static void logClientClose ( logClient *pClient ) |
151 | pClient->sock = INVALID_SOCKET; |
152 | } |
153 | |
154 | - pClient->nextMsgIndex = 0u; |
155 | - memset ( pClient->msgBuf, '\0', sizeof ( pClient->msgBuf ) ); |
156 | pClient->connected = 0u; |
157 | |
158 | /* |
159 | @@ -93,9 +93,8 @@ static void logClientClose ( logClient *pClient ) |
160 | */ |
161 | epicsMutexUnlock (pClient->mutex); |
162 | |
163 | -# ifdef DEBUG |
164 | + if (logClientDebug) |
165 | fprintf (stderr, "done\n"); |
166 | -# endif |
167 | } |
168 | |
169 | /* |
170 | @@ -113,6 +112,7 @@ static void logClientDestroy (logClientId id) |
171 | epicsMutexMustLock ( pClient->mutex ); |
172 | pClient->shutdown = 1u; |
173 | epicsMutexUnlock ( pClient->mutex ); |
174 | + epicsEventSignal ( pClient->shutdownNotify ); |
175 | |
176 | /* unblock log client thread blocking in send() or connect() */ |
177 | interruptInfo = |
178 | @@ -154,18 +154,41 @@ static void logClientDestroy (logClientId id) |
179 | return; |
180 | } |
181 | |
182 | - errlogRemoveListeners ( logClientSendMessage, (void *) pClient ); |
183 | - |
184 | logClientClose ( pClient ); |
185 | |
186 | epicsMutexDestroy ( pClient->mutex ); |
187 | - |
188 | epicsEventDestroy ( pClient->stateChangeNotify ); |
189 | + epicsEventDestroy ( pClient->shutdownNotify ); |
190 | |
191 | free ( pClient ); |
192 | } |
193 | |
194 | /* |
195 | + * logClientCheckConnection |
196 | + */ |
197 | +static void logClientCheckConnection( logClient * pClient ) |
198 | +{ |
199 | + epicsMutexMustLock ( pClient->mutex ); |
200 | + while ( pClient->connected ) { |
201 | + struct timeval timeout = { 0, 0 }; |
202 | + fd_set set; |
203 | + char buffer[256]; |
204 | + |
205 | + FD_ZERO ( &set ); |
206 | + FD_SET ( pClient->sock, &set ); |
207 | + if ( select ( pClient->sock + 1, &set, NULL, NULL, &timeout ) == 0) |
208 | + break; |
209 | + if ( recv ( pClient->sock, buffer, sizeof ( buffer ), 0 ) == 0 ) { |
210 | + fprintf ( stderr, "log client: connection closed by server \"%s\"\n", |
211 | + pClient->name ); |
212 | + logClientClose ( pClient ); |
213 | + break; |
214 | + } |
215 | + } |
216 | + epicsMutexUnlock ( pClient->mutex ); |
217 | +} |
218 | + |
219 | +/* |
220 | * This method requires the pClient->mutex be owned already. |
221 | */ |
222 | static void sendMessageChunk(logClient * pClient, const char * message) { |
223 | @@ -176,57 +199,24 @@ static void sendMessageChunk(logClient * pClient, const char * message) { |
224 | unsigned msgBufBytesLeft = |
225 | sizeof ( pClient->msgBuf ) - pClient->nextMsgIndex; |
226 | |
227 | - if ( strSize > msgBufBytesLeft ) { |
228 | - int status; |
229 | - |
230 | - if ( ! pClient->connected ) { |
231 | - break; |
232 | - } |
233 | - |
234 | - if ( msgBufBytesLeft > 0u ) { |
235 | - memcpy ( & pClient->msgBuf[pClient->nextMsgIndex], |
236 | - message, msgBufBytesLeft ); |
237 | - pClient->nextMsgIndex += msgBufBytesLeft; |
238 | - strSize -= msgBufBytesLeft; |
239 | - message += msgBufBytesLeft; |
240 | - } |
241 | - |
242 | - status = send ( pClient->sock, pClient->msgBuf, |
243 | - pClient->nextMsgIndex, 0 ); |
244 | - if ( status > 0 ) { |
245 | - unsigned nSent = (unsigned) status; |
246 | - if ( nSent < pClient->nextMsgIndex ) { |
247 | - unsigned newNextMsgIndex = pClient->nextMsgIndex - nSent; |
248 | - memmove ( pClient->msgBuf, & pClient->msgBuf[nSent], |
249 | - newNextMsgIndex ); |
250 | - pClient->nextMsgIndex = newNextMsgIndex; |
251 | - } |
252 | - else { |
253 | - pClient->nextMsgIndex = 0u; |
254 | - } |
255 | - } |
256 | - else { |
257 | - if ( ! pClient->shutdown ) { |
258 | - char sockErrBuf[64]; |
259 | - if ( status ) { |
260 | - epicsSocketConvertErrnoToString ( sockErrBuf, sizeof ( sockErrBuf ) ); |
261 | - } |
262 | - else { |
263 | - strcpy ( sockErrBuf, "server initiated disconnect" ); |
264 | - } |
265 | - fprintf ( stderr, "log client: lost contact with log server at \"%s\" because \"%s\"\n", |
266 | - pClient->name, sockErrBuf ); |
267 | - } |
268 | - logClientClose ( pClient ); |
269 | - break; |
270 | - } |
271 | + if ( msgBufBytesLeft < strSize && pClient->nextMsgIndex != 0u && pClient->connected) |
272 | + { |
273 | + /* buffer is full, thus flush it */ |
274 | + logClientCheckConnection( pClient ); |
275 | + logClientFlush ( pClient ); |
276 | + msgBufBytesLeft = sizeof ( pClient->msgBuf ) - pClient->nextMsgIndex; |
277 | } |
278 | - else { |
279 | - memcpy ( & pClient->msgBuf[pClient->nextMsgIndex], |
280 | - message, strSize ); |
281 | - pClient->nextMsgIndex += strSize; |
282 | + if ( msgBufBytesLeft == 0u ) { |
283 | + fprintf ( stderr, "log client: messages to \"%s\" are lost\n", |
284 | + pClient->name ); |
285 | break; |
286 | } |
287 | + if ( msgBufBytesLeft > strSize) msgBufBytesLeft = strSize; |
288 | + memcpy ( & pClient->msgBuf[pClient->nextMsgIndex], |
289 | + message, msgBufBytesLeft ); |
290 | + pClient->nextMsgIndex += msgBufBytesLeft; |
291 | + strSize -= msgBufBytesLeft; |
292 | + message += msgBufBytesLeft; |
293 | } |
294 | } |
295 | |
296 | @@ -255,32 +245,25 @@ void epicsShareAPI logClientSend ( logClientId id, const char * message ) |
297 | |
298 | void epicsShareAPI logClientFlush ( logClientId id ) |
299 | { |
300 | + unsigned nSent = 0u; |
301 | + |
302 | logClient * pClient = ( logClient * ) id; |
303 | |
304 | - if ( ! pClient ) { |
305 | + if ( ! pClient || ! pClient->connected ) { |
306 | return; |
307 | } |
308 | |
309 | epicsMutexMustLock ( pClient->mutex ); |
310 | |
311 | - while ( pClient->nextMsgIndex && pClient->connected ) { |
312 | - int status = send ( pClient->sock, pClient->msgBuf, |
313 | - pClient->nextMsgIndex, 0 ); |
314 | + while ( nSent < pClient->nextMsgIndex && pClient->connected ) { |
315 | + int status = send ( pClient->sock, pClient->msgBuf + nSent, |
316 | + pClient->nextMsgIndex - nSent, 0 ); |
317 | if ( status > 0 ) { |
318 | - unsigned nSent = (unsigned) status; |
319 | - if ( nSent < pClient->nextMsgIndex ) { |
320 | - unsigned newNextMsgIndex = pClient->nextMsgIndex - nSent; |
321 | - memmove ( pClient->msgBuf, & pClient->msgBuf[nSent], |
322 | - newNextMsgIndex ); |
323 | - pClient->nextMsgIndex = newNextMsgIndex; |
324 | - } |
325 | - else { |
326 | - pClient->nextMsgIndex = 0u; |
327 | - } |
328 | + nSent += (unsigned) status; |
329 | } |
330 | else { |
331 | if ( ! pClient->shutdown ) { |
332 | - char sockErrBuf[64]; |
333 | + char sockErrBuf[128]; |
334 | if ( status ) { |
335 | epicsSocketConvertErrnoToString ( sockErrBuf, sizeof ( sockErrBuf ) ); |
336 | } |
337 | @@ -294,6 +277,11 @@ void epicsShareAPI logClientFlush ( logClientId id ) |
338 | break; |
339 | } |
340 | } |
341 | + pClient->nextMsgIndex -= nSent; |
342 | + if ( nSent > 0 && pClient->nextMsgIndex > 0 ) { |
343 | + memmove ( pClient->msgBuf, & pClient->msgBuf[nSent], |
344 | + pClient->nextMsgIndex ); |
345 | + } |
346 | epicsMutexUnlock ( pClient->mutex ); |
347 | } |
348 | |
349 | @@ -302,10 +290,10 @@ void epicsShareAPI logClientFlush ( logClientId id ) |
350 | */ |
351 | static void logClientMakeSock (logClient *pClient) |
352 | { |
353 | - |
354 | -# ifdef DEBUG |
355 | + if (logClientDebug) { |
356 | fprintf (stderr, "log client: creating socket..."); |
357 | -# endif |
358 | + fflush (stderr); |
359 | + } |
360 | |
361 | epicsMutexMustLock (pClient->mutex); |
362 | |
363 | @@ -314,7 +302,7 @@ static void logClientMakeSock (logClient *pClient) |
364 | */ |
365 | pClient->sock = epicsSocketCreate ( AF_INET, SOCK_STREAM, 0 ); |
366 | if ( pClient->sock == INVALID_SOCKET ) { |
367 | - char sockErrBuf[64]; |
368 | + char sockErrBuf[128]; |
369 | epicsSocketConvertErrnoToString ( |
370 | sockErrBuf, sizeof ( sockErrBuf ) ); |
371 | fprintf ( stderr, "log client: no socket error %s\n", |
372 | @@ -323,10 +311,8 @@ static void logClientMakeSock (logClient *pClient) |
373 | |
374 | epicsMutexUnlock (pClient->mutex); |
375 | |
376 | -# ifdef DEBUG |
377 | + if (logClientDebug) |
378 | fprintf (stderr, "done\n"); |
379 | -# endif |
380 | - |
381 | } |
382 | |
383 | /* |
384 | @@ -366,7 +352,7 @@ static void logClientConnect (logClient *pClient) |
385 | } |
386 | else { |
387 | if ( pClient->connFailStatus != errnoCpy && ! pClient->shutdown ) { |
388 | - char sockErrBuf[64]; |
389 | + char sockErrBuf[128]; |
390 | epicsSocketConvertErrnoToString ( |
391 | sockErrBuf, sizeof ( sockErrBuf ) ); |
392 | fprintf (stderr, |
393 | @@ -392,27 +378,13 @@ static void logClientConnect (logClient *pClient) |
394 | optval = TRUE; |
395 | status = setsockopt (pClient->sock, SOL_SOCKET, SO_KEEPALIVE, (char *)&optval, sizeof(optval)); |
396 | if (status<0) { |
397 | - char sockErrBuf[64]; |
398 | + char sockErrBuf[128]; |
399 | epicsSocketConvertErrnoToString ( |
400 | sockErrBuf, sizeof ( sockErrBuf ) ); |
401 | fprintf (stderr, "log client: unable to enable keepalive option because \"%s\"\n", sockErrBuf); |
402 | } |
403 | |
404 | /* |
405 | - * we don't need full-duplex (clients only write), so we shutdown |
406 | - * the read end of our socket |
407 | - */ |
408 | - status = shutdown (pClient->sock, SHUT_RD); |
409 | - if (status < 0) { |
410 | - char sockErrBuf[64]; |
411 | - epicsSocketConvertErrnoToString ( |
412 | - sockErrBuf, sizeof ( sockErrBuf ) ); |
413 | - fprintf (stderr, "%s:%d shutdown(%d,SHUT_RD) error was \"%s\"\n", |
414 | - __FILE__, __LINE__, pClient->sock, sockErrBuf); |
415 | - /* not fatal (although it shouldn't happen) */ |
416 | - } |
417 | - |
418 | - /* |
419 | * set how long we will wait for the TCP state machine |
420 | * to clean up when we issue a close(). This |
421 | * guarantees that messages are serialized when we |
422 | @@ -425,7 +397,7 @@ static void logClientConnect (logClient *pClient) |
423 | lingerval.l_linger = 60*5; |
424 | status = setsockopt (pClient->sock, SOL_SOCKET, SO_LINGER, (char *) &lingerval, sizeof(lingerval)); |
425 | if (status<0) { |
426 | - char sockErrBuf[64]; |
427 | + char sockErrBuf[128]; |
428 | epicsSocketConvertErrnoToString ( |
429 | sockErrBuf, sizeof ( sockErrBuf ) ); |
430 | fprintf (stderr, "log client: unable to set linger options because \"%s\"\n", sockErrBuf); |
431 | @@ -451,20 +423,21 @@ static void logClientRestart ( logClientId id ) |
432 | /* SMP safe state inspection */ |
433 | epicsMutexMustLock ( pClient->mutex ); |
434 | while ( ! pClient->shutdown ) { |
435 | - unsigned isConn; |
436 | + unsigned isConn, dataToSend; |
437 | + |
438 | + logClientCheckConnection( pClient ); |
439 | |
440 | isConn = pClient->connected; |
441 | + dataToSend = pClient->nextMsgIndex; |
442 | |
443 | epicsMutexUnlock ( pClient->mutex ); |
444 | |
445 | - if ( isConn ) { |
446 | + if ( dataToSend ) { |
447 | + if ( ! isConn ) logClientConnect ( pClient ); |
448 | logClientFlush ( pClient ); |
449 | } |
450 | - else { |
451 | - logClientConnect ( pClient ); |
452 | - } |
453 | - |
454 | - epicsThreadSleep ( LOG_RESTART_DELAY ); |
455 | + |
456 | + epicsEventWaitWithTimeout ( pClient->shutdownNotify, LOG_RESTART_DELAY); |
457 | |
458 | epicsMutexMustLock ( pClient->mutex ); |
459 | } |
460 | @@ -480,9 +453,7 @@ static void logClientRestart ( logClientId id ) |
461 | logClientId epicsShareAPI logClientCreate ( |
462 | struct in_addr server_addr, unsigned short server_port) |
463 | { |
464 | - epicsTimeStamp begin, current; |
465 | logClient *pClient; |
466 | - double diff; |
467 | |
468 | pClient = calloc (1, sizeof (*pClient)); |
469 | if (pClient==NULL) { |
470 | @@ -507,14 +478,22 @@ logClientId epicsShareAPI logClientCreate ( |
471 | pClient->shutdownConfirm = 0; |
472 | |
473 | epicsAtExit (logClientDestroy, (void*) pClient); |
474 | - |
475 | + |
476 | pClient->stateChangeNotify = epicsEventCreate (epicsEventEmpty); |
477 | if ( ! pClient->stateChangeNotify ) { |
478 | epicsMutexDestroy ( pClient->mutex ); |
479 | free ( pClient ); |
480 | return NULL; |
481 | } |
482 | - |
483 | + |
484 | + pClient->shutdownNotify = epicsEventCreate (epicsEventEmpty); |
485 | + if ( ! pClient->shutdownNotify ) { |
486 | + epicsMutexDestroy ( pClient->mutex ); |
487 | + epicsEventDestroy ( pClient->stateChangeNotify ); |
488 | + free ( pClient ); |
489 | + return NULL; |
490 | + } |
491 | + |
492 | pClient->restartThreadId = epicsThreadCreate ( |
493 | "logRestart", epicsThreadPriorityLow, |
494 | epicsThreadGetStackSize(epicsThreadStackSmall), |
495 | @@ -522,35 +501,12 @@ logClientId epicsShareAPI logClientCreate ( |
496 | if ( pClient->restartThreadId == NULL ) { |
497 | epicsMutexDestroy ( pClient->mutex ); |
498 | epicsEventDestroy ( pClient->stateChangeNotify ); |
499 | + epicsEventDestroy ( pClient->shutdownNotify ); |
500 | free (pClient); |
501 | fprintf(stderr, "log client: unable to start log client connection watch dog thread\n"); |
502 | return NULL; |
503 | } |
504 | |
505 | - /* |
506 | - * attempt to synchronize with circuit connect |
507 | - */ |
508 | - epicsTimeGetCurrent ( & begin ); |
509 | - epicsMutexMustLock ( pClient->mutex ); |
510 | - do { |
511 | - epicsMutexUnlock ( pClient->mutex ); |
512 | - epicsEventWaitWithTimeout ( |
513 | - pClient->stateChangeNotify, |
514 | - LOG_SERVER_CREATE_CONNECT_SYNC_TIMEOUT / 10.0 ); |
515 | - epicsTimeGetCurrent ( & current ); |
516 | - diff = epicsTimeDiffInSeconds ( & current, & begin ); |
517 | - epicsMutexMustLock ( pClient->mutex ); |
518 | - } |
519 | - while ( ! pClient->connected && diff < LOG_SERVER_CREATE_CONNECT_SYNC_TIMEOUT ); |
520 | - epicsMutexUnlock ( pClient->mutex ); |
521 | - |
522 | - if ( ! pClient->connected ) { |
523 | - fprintf (stderr, "log client create: timed out synchronizing with circuit connect to \"%s\" after %.1f seconds\n", |
524 | - pClient->name, LOG_SERVER_CREATE_CONNECT_SYNC_TIMEOUT ); |
525 | - } |
526 | - |
527 | - errlogAddListener ( logClientSendMessage, (void *) pClient ); |
528 | - |
529 | return (void *) pClient; |
530 | } |
531 | |
532 | @@ -568,24 +524,21 @@ void epicsShareAPI logClientShow (logClientId id, unsigned level) |
533 | printf ("log client: disconnected from log server at \"%s\"\n", pClient->name); |
534 | } |
535 | |
536 | - if (level>1) { |
537 | - printf ("log client: sock=%s, connect cycles = %u\n", |
538 | - pClient->sock==INVALID_SOCKET?"INVALID":"OK", |
539 | - pClient->connectCount); |
540 | - } |
541 | - |
542 | if (logClientPrefix) { |
543 | printf ("log client: prefix is \"%s\"\n", logClientPrefix); |
544 | } |
545 | -} |
546 | |
547 | -/* |
548 | - * logClientSendMessage (); deprecated |
549 | - */ |
550 | -void logClientSendMessage ( logClientId id, const char * message ) |
551 | -{ |
552 | - if ( !iocLogDisable ) { |
553 | - logClientSend (id, message); |
554 | + if (level>0) { |
555 | + printf ("log client: sock %s, connect cycles = %u\n", |
556 | + pClient->sock==INVALID_SOCKET?"INVALID":"OK", |
557 | + pClient->connectCount); |
558 | + } |
559 | + if (level>1) { |
560 | + printf ("log client: %u bytes in buffer\n", pClient->nextMsgIndex); |
561 | + if (pClient->nextMsgIndex) |
562 | + printf("-------------------------\n" |
563 | + "%.*s-------------------------\n", |
564 | + (int)(pClient->nextMsgIndex), pClient->msgBuf); |
565 | } |
566 | } |
567 | |
568 | diff --git a/modules/libcom/src/log/logClient.h b/modules/libcom/src/log/logClient.h |
569 | index 1797bbb..3b3f63a 100644 |
570 | --- a/modules/libcom/src/log/logClient.h |
571 | +++ b/modules/libcom/src/log/logClient.h |
572 | @@ -38,7 +38,6 @@ epicsShareFunc void epicsShareAPI iocLogPrefix(const char* prefix); |
573 | /* deprecated interface; retained for backward compatibility */ |
574 | /* note: implementations are in iocLog.c, not logClient.c */ |
575 | epicsShareFunc logClientId epicsShareAPI logClientInit (void); |
576 | -epicsShareFunc void logClientSendMessage (logClientId id, const char *message); |
577 | |
578 | #ifdef __cplusplus |
579 | } |
What necessitates removing shutdown() and adding select()/recv()? To my knowledge send() should return with an error if the socket is, or becomes, disconnected.