Merge lp:~laurynas-biveinis/percona-server/bug803865-5.1 into lp:percona-server/5.1

Proposed by Laurynas Biveinis
Status: Merged
Merged at revision: 262
Proposed branch: lp:~laurynas-biveinis/percona-server/bug803865-5.1
Merge into: lp:percona-server/5.1
Diff against target: 51 lines (+12/-17)
1 file modified
response-time-distribution.patch (+12/-17)
To merge this branch: bzr merge lp:~laurynas-biveinis/percona-server/bug803865-5.1
Reviewer Review Type Date Requested Status
Stewart Smith (community) Approve
Review via email: mp+68769@code.launchpad.net

Description of the change

Fixes LP bug #803865: atomic_cas_64 crash.

Here is the relevant code from include/atomic/x86_gcc.h:

  asm volatile ("push %%ebx;"
                "movl (%%ecx), %%ebx;"
                "movl 4(%%ecx), %%ecx;"
                LOCK_prefix "; cmpxchg8b %0;"
                "setz %2; pop %%ebx"
                : "=m" (*a), "+A" (*cmp), "=c" (ret)
                : "c" (&set), "m" (*a)
                : "memory", "esp")

There are following issues that make it easy for GCC to break this
code by optimization:
- The zero-th operand is marked as output-only, but it is an
input-output one: CMPXCHG8B reads the memory pointed to it, and also
sets EDX:EAX if the operand and EDX:EAX are non-equal. It is also an
output operand: the instruction sets it to ECX:EBX.
- The code probably tried to say that zero-th operand is also an input
one by specifying it as the 4th operand ("m" (*a)), but this fails,
because GCC does not guarantee putting the input and output operands
referenced by the same C expression at the same location. Also this
operand is not referenced in the assembly at all.
- The code clobbers EFLAGS but does not specify it.

While we are at it, there are following issues that make the resulting
code suboptimal:
- "volatile" is not needed as all the output operands are specified
fully (i.e. no undescribed side-effects in the assembly);
- No need to force using ECX for input &set and output ret,
- Clobber "memory" is redundant as the code does not access memory in
a way undescribed by constraints;
- Clobber "esp", I fail to understand its point at all.

The fix is a full rewrite of the fragment:

  asm ("movl %%edi, -4(%%esp);"
       "leal %0, %%edi;"
       "xchgl %%ebx, %%esi;"
       LOCK_prefix "; cmpxchg8b (%%edi);"
       "movl %%esi, %%ebx;"
       "movl -4(%%esp), %%edi;"
       "setz %1;"
       : "+m" (*a), "=q" (ret), "+A" (*cmp)
       : "S" ((int32)(set & 0xFFFFFFFF)), "c" ((int32)(set >> 32))
       : "flags")

All operands are loaded and EBX is set as the very last op before CMPXCHG8B, because some of the operands might be addressed through EBX even if we specify that the lower half of set should be assigned to it. ESI is fixed for the lower half of set to keep GCC away from reusing any other already taken register if it can prove an equal live value there. The EDI register is saved manually, as simply specifying it as a clobbered register sometimes chokes GCC with "impossible reloads", probably due to high register pressure. Finally, EDI is saved through direct stack manipulation instead of PUSH/POP, as other operands might be addressed through ESP.

The test results are at
http://jenkins.percona.com/job/percona-server-5.1-param/77/

To post a comment you must log in.
Revision history for this message
Stewart Smith (stewart) wrote :

looked as well as i could in reasonable time, did actually fix the problem we were having.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'response-time-distribution.patch'
2--- response-time-distribution.patch 2011-07-20 03:48:37 +0000
3+++ response-time-distribution.patch 2011-07-22 02:44:33 +0000
4@@ -1634,7 +1634,7 @@
5 diff -ruN /dev/null b/include/atomic/x86-gcc.h
6 --- /dev/null 1970-01-01 00:00:00.000000000 +0000
7 +++ b/include/atomic/x86-gcc.h 2011-06-07 08:59:00.000000000 +0300
8-@@ -0,0 +1,145 @@
9+@@ -0,0 +1,140 @@
10 +#ifndef ATOMIC_X86_GCC_INCLUDED
11 +#define ATOMIC_X86_GCC_INCLUDED
12 +
13@@ -1745,27 +1745,22 @@
14 + v=tmp;
15 +
16 +/*
17-+ On some platforms (e.g. Mac OS X and Solaris) the ebx register
18-+ is held as a pointer to the global offset table. Thus we're not
19-+ allowed to use the b-register on those platforms when compiling
20-+ PIC code, to avoid this we push ebx and pop ebx. The new value
21-+ is copied directly from memory to avoid problems with a implicit
22-+ manipulation of the stack pointer by the push.
23-+
24 + cmpxchg8b works on both 32-bit platforms and 64-bit platforms but
25 + the code here is only used on 32-bit platforms, on 64-bit
26 + platforms the much simpler make_atomic_cas_body32 will work
27 + fine.
28 +*/
29-+#define make_atomic_cas_body64 \
30-+ asm volatile ("push %%ebx;" \
31-+ "movl (%%ecx), %%ebx;" \
32-+ "movl 4(%%ecx), %%ecx;" \
33-+ LOCK_prefix "; cmpxchg8b %0;" \
34-+ "setz %2; pop %%ebx" \
35-+ : "=m" (*a), "+A" (*cmp), "=c" (ret) \
36-+ : "c" (&set), "m" (*a) \
37-+ : "memory", "esp")
38++#define make_atomic_cas_body64 \
39++ asm ("movl %%edi, -4(%%esp);" \
40++ "leal %0, %%edi;" \
41++ "xchgl %%ebx, %%esi;" \
42++ LOCK_prefix "; cmpxchg8b (%%edi);" \
43++ "movl %%esi, %%ebx;" \
44++ "movl -4(%%esp), %%edi;" \
45++ "setz %1;" \
46++ : "+m" (*a), "=q" (ret), "+A" (*cmp) \
47++ : "S" ((int32)(set & 0xFFFFFFFF)), "c" ((int32)(set >> 32)) \
48++ : "flags")
49 +#endif
50 +
51 +/*

Subscribers

People subscribed via source and target branches