Random Segmentation faults on fileio test

Bug #1187040 reported by Joe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sysbench
Fix Committed
Undecided
Alexey Kopytov

Bug Description

I am getting random seg falts when running fileio tests.

gdb output:
(gdb) bt
#0 0x0000003755289c1f in memcpy () from /lib64/libc.so.6
#1 0x000000000040997d in sb_percentile_calculate (percentile=0x8905a0, percent=95) at sb_percentile.c:95
#2 0x000000000040b6df in file_print_stats (type=<value optimized out>) at sb_fileio.c:850
#3 0x0000000000404b4a in report_thread_proc (arg=<value optimized out>) at sysbench.c:661
#4 0x0000003755607851 in start_thread () from /lib64/libpthread.so.0
#5 0x00000037552e890d in clone () from /lib64/libc.so.6
(gdb) frame 1
#1 0x000000000040997d in sb_percentile_calculate (percentile=0x8905a0, percent=95) at sb_percentile.c:95
95 sb_percentile.c: No such file or directory.
        in sb_percentile.c
(gdb) quit

sb_percentile.c output:
<snip>
     82 double sb_percentile_calculate(sb_percentile_t *percentile, double percent)
     83 {
     84 unsigned long long ncur, nmax;
     85 unsigned int i;
     86
     87 pthread_mutex_lock(&percentile->mutex);
     88
     89 if (percentile->total == 0)
     90 {
     91 pthread_mutex_unlock(&percentile->mutex);
     92 return 0.0;
     93 }
     94
     95 memcpy(percentile->tmp, percentile->values,
     96 percentile->size * sizeof(unsigned long long));
     97 nmax = floor(percentile->total * percent / 100 + 0.5);
     98
     99 pthread_mutex_unlock(&percentile->mutex);
    100
    101 ncur = percentile->tmp[0];
    102 for (i = 1; i < percentile->size; i++)
    103 {
    104 ncur += percentile->tmp[i];
    105 if (ncur >= nmax)
    106 break;
    107 }
<snip>

Revision history for this message
Joe (joegrasse) wrote :

sysbench 0.5 rev: 116

Revision history for this message
Joe (joegrasse) wrote :

sysbench --test=fileio --file-test-mode=rndwr --file-total-size=100M --file-num=1 --num-threads=16 --file-io-mode=sync --max-time=10 --max-requests=0 --report-interval=10 --rand-init=on --file-fsync-freq=1

I believe the important piece here is the --report-interval parameter. I believe percentile is getting destroyed before the memcpy happens.

It looks like the return value of pthread_mutex_lock isn't being check. Maybe the lock attempt isn't successful.

Revision history for this message
Joe (joegrasse) wrote :

After more digging, when the interval lines up with the total bench time you have a condition percentile->values and percentile->tmp are getting freed then then used. To mitigate most of the cores I encountered I change sb_percentile_reset and sb_percentile_done in sb_percentile.c to the following. This isn't a complete/proper fix though.

void sb_percentile_reset(sb_percentile_t *percentile)
{
  int err;

  err = pthread_mutex_lock(&percentile->mutex);
  if( err == 0){
    percentile->total = 0;
    memset(percentile->values, 0, percentile->size * sizeof(unsigned long long));
    pthread_mutex_unlock(&percentile->mutex);
  }
}

void sb_percentile_done(sb_percentile_t *percentile)
{
  int err;

  err = pthread_mutex_destroy(&percentile->mutex);
  if( err == 0){
    free(percentile->values);
    free(percentile->tmp);
  }
}

Another workaround is to use an interval that doesn't line up to total test time.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Thanks for the report. Fixed in the LP repository (rev. 117).

Changed in sysbench:
status: New → Fix Committed
assignee: nobody → Alexey Kopytov (akopytov)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.