race in rsrv on asInit

Bug #1091401 reported by mdavidsaver
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Fix Released
Medium
Jeff Hill

Bug Description

This issue is observed with epics 3.14.11.

I've observed a race when attempting to reload the access security file from a subroutine record (iocStats includes such a record). The symptom is a crash which occurs when using the command line caput utility to process this subroutine. If caput is invoked with -c, or if a longer running CA client is used, no crash is observed (so far).

Stack traces for the relevant threads are given below. The crash occurs in casAccessRightsCB() when attempting to remove an pCIU from chanList when that element is not in the list. This occurs because clear_channel_reply() has already removed the pCIU, but does not update its state before releasing chanListLock.

The client and channel in question is the one to the subroutine which calls asInit.

Thread 64 (Thread 0xb547db70 (LWP 30077)):
#0 0xb7d76c5d in ellDelete (pList=0x86047c0, pNode=0x86e89d0) at ../../../src/libCom/ellLib/ellLib.c:82
#1 0xb7ea8c74 in casAccessRightsCB (ascpvt=0x8659e68, type=asClientCOAR) at ../camessage.c:1107
#2 0xb7e94333 in asComputePvt (asClientPvt=0x8659e68) at ../asLibRoutines.c:1014
#3 0xb7e93f36 in asAddMemberPvt (pasMemberPvt=0xb547d030, asgName=0x8282536 "IOCMANAGERS") at ../asLibRoutines.c:893
#4 0xb7e91adc in asInitialize (inputfunction=0xb7e91be3 <myInputFunction>) at ../asLibRoutines.c:157
#5 0xb7e91e43 in asInitFP (fp=0x86a6680, substitutions=0x80d0318 "P=CtrlSwitch:") at ../asLibRoutines.c:250
#6 0xb7e91b88 in asInitFile (filename=0x80536d8 "/epics/iocs/CtrSwitch.acf", substitutions=0x80d0318 "P=CtrlSwitch:")
    at ../asLibRoutines.c:179
#7 0xb7e8fe24 in asInitCommon () at ../asDbLib.c:136
#8 0xb7e8fecf in asInitTask (pcallback=0x85618e0) at ../asDbLib.c:164
#9 0xb7d92d4a in start_routine (arg=0x86c1e68) at ../../../src/libCom/osi/os/posix/osdThread.c:392
#10 0xb7968955 in start_thread () from /lib/i686/cmov/libpthread.so.0
#11 0xb7b9fe7e in clone () from /lib/i686/cmov/libc.so.6

Thread 63 (Thread 0xb54feb70 (LWP 30076)):
#0 0xb7fe2424 in __kernel_vsyscall ()
#1 0xb796fc39 in __lll_lock_wait () from /lib/i686/cmov/libpthread.so.0
#2 0xb796b049 in _L_lock_839 () from /lib/i686/cmov/libpthread.so.0
#3 0xb796aedb in pthread_mutex_lock () from /lib/i686/cmov/libpthread.so.0
#4 0xb7bacfd6 in pthread_mutex_lock () from /lib/i686/cmov/libc.so.6
#5 0xb7d94261 in mutexLock (id=0x8598ccc) at ../../../src/libCom/osi/os/posix/osdMutex.c:44
#6 0xb7d944c7 in epicsMutexOsdLock (pmutex=0x8598cc8) at ../../../src/libCom/osi/os/posix/osdMutex.c:116
#7 0xb7d8bbd9 in epicsMutexLock (pmutexNode=0x8598ce8) at ../../../src/libCom/osi/epicsMutex.cpp:145
#8 0xb7e9245e in asRemoveClient (asClientPvt=0x86e8a14) at ../asLibRoutines.c:391
#9 0xb7eaac79 in clear_channel_reply (mp=0xb54fe210, pPayload=0xb59d4018, client=0x8604780) at ../camessage.c:2012
#10 0xb7eabb5b in camessage (client=0x8604780) at ../camessage.c:2508
#11 0xb7ea68ca in camsgtask (pParm=0x8604780) at ../camsgtask.c:123
#12 0xb7d92d4a in start_routine (arg=0x86c0508) at ../../../src/libCom/osi/os/posix/osdThread.c:392
#13 0xb7968955 in start_thread () from /lib/i686/cmov/libpthread.so.0
#14 0xb7b9fe7e in clone () from /lib/i686/cmov/libc.so.6

The record in question, from iocAdminSoft.db.

record( sub, "$(IOC,undefined):READACF")
{
    field( DESC, "$(IOC,undefined) ACF Update")
    field( ASG, "IOCMANAGERS")
    field( INAM, "asSubInit")
    field( SNAM, "asSubProcess")
    field( BRSV, "INVALID")
}

Jeff Hill (johill-lanl)
Changed in epics-base:
status: New → Confirmed
Revision history for this message
Jeff Hill (johill-lanl) wrote :

Thanks for your bug report David. I propose this fix; the issue is that it should remove the channel from the as library before starting to destroy the channel. Please verify that it fixes your issue. If it does I will commit a fix.

=== modified file src/rsrv/camessage.c
--- src/rsrv/camessage.c 2010-11-01 21:01:04 +0000
+++ src/rsrv/camessage.c 2012-12-18 00:23:53 +0000
@@ -1992,6 +1992,15 @@
      cas_commit_msg ( client, 0u );
      SEND_UNLOCK(client);

+ /*
+ * remove from access control list
+ */
+ status = asRemoveClient(&pciu->asClientPVT);
+ if(status != 0 && status != S_asLib_asNotActive){
+ errMessage(status, RECORD_NAME(&pciu->addr));
+ return RSRV_ERROR;
+ }
+
      epicsMutexMustLock ( client->chanListLock );
      if ( pciu->state == rsrvCS_inService ||
             pciu->state == rsrvCS_pendConnectResp ) {
@@ -2011,15 +2020,6 @@
      }
      epicsMutexUnlock( client->chanListLock );

- /*
- * remove from access control list
- */
- status = asRemoveClient(&pciu->asClientPVT);
- if(status != 0 && status != S_asLib_asNotActive){
- errMessage(status, RECORD_NAME(&pciu->addr));
- return RSRV_ERROR;
- }
-
      LOCK_CLIENTQ;
      status = bucketRemoveItemUnsignedId (pCaBucket, &pciu->sid);
      if(status != S_bucket_success){

Changed in epics-base:
assignee: nobody → Jeff Hill (johill-lanl)
importance: Undecided → Medium
Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Yes, this change appears to fix the issue.

Revision history for this message
Andrew Johnson (anj) wrote :

Jeff, please commit this fix. I've just added the patch to the 3.14.12 Known Problems page; first one for 3.14.12.3.

Thanks guys,

- Andrew

Revision history for this message
Jeff Hill (johill-lanl) wrote :

Run command: bzr commit -m "fixed problems with ca clear channel protocol during reload of the access security file.\nSee https://bugs.launch...
Connected (version 2.0, client Twisted)
Authentication (publickey) successful!
Secsh channel 1 opened.
Committing to: bzr+ssh://bazaar.launchpad.net/~epics-core/epics-base/3.14/
modified src/rsrv/camessage.c
Committed revision 12399.

Changed in epics-base:
status: Confirmed → Fix Committed
Andrew Johnson (anj)
Changed in epics-base:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.