Merge lp:~schnetter/pocl/main into lp:~pocl/pocl/trunk

Proposed by Erik Schnetter
Status: Merged
Merge reported by: Carlos Sánchez de La Lama
Merged at revision: not available
Proposed branch: lp:~schnetter/pocl/main
Merge into: lp:~pocl/pocl/trunk
Diff against target: 5915 lines (+3164/-1842) (has conflicts)
36 files modified
configure.ac (+7/-0)
include/_kernel.h (+1048/-669)
lib/kernel/Makefile.am (+9/-3)
lib/kernel/all.cl (+68/-68)
lib/kernel/any.cl (+68/-68)
lib/kernel/as_type.cl (+1/-1)
lib/kernel/ceil.cl (+3/-131)
lib/kernel/convert_type.cl (+3/-3)
lib/kernel/copysign.cl (+3/-107)
lib/kernel/cross.cl (+4/-4)
lib/kernel/dot.cl (+56/-56)
lib/kernel/fabs.cl (+3/-108)
lib/kernel/floor.cl (+3/-129)
lib/kernel/fma.cl (+3/-1)
lib/kernel/fmax.cl (+7/-129)
lib/kernel/fmin.cl (+7/-129)
lib/kernel/max.cl (+1/-1)
lib/kernel/maxmag.cl (+15/-1)
lib/kernel/min.cl (+2/-2)
lib/kernel/minmag.cl (+15/-1)
lib/kernel/select.cl (+2/-2)
lib/kernel/sqrt.cl (+3/-99)
lib/kernel/templates.h (+138/-125)
lib/kernel/upsample.cl (+1/-1)
lib/kernel/vload.cl (+106/-0)
lib/kernel/vstore.cl (+100/-0)
lib/kernel/x86/Makefile.am (+169/-0)
lib/kernel/x86/ceil.cl (+149/-0)
lib/kernel/x86/copysign.cl (+169/-0)
lib/kernel/x86/fabs.cl (+144/-0)
lib/kernel/x86/floor.cl (+149/-0)
lib/kernel/x86/max.cl (+291/-0)
lib/kernel/x86/min.cl (+291/-0)
lib/kernel/x86/sqrt.cl (+122/-0)
scripts/pocl-standalone.in (+2/-2)
scripts/pocl-workgroup.in (+2/-2)
Text conflict in configure.ac
Text conflict in include/_kernel.h
Text conflict in lib/kernel/Makefile.am
To merge this branch: bzr merge lp:~schnetter/pocl/main
Reviewer Review Type Date Requested Status
Carlos Sánchez de La Lama Approve
Pekka Jääskeläinen Needs Fixing
Erik Schnetter Needs Resubmitting
Review via email: mp+80755@code.launchpad.net

Description of the change

I have separated x86-specific functions from generic implementations (based on libc), and have also corrected a few errors.

Note that I have also made the x86-specific version the default kernel; you may not want this. I don't know how to choose automatically.

To post a comment you must log in.
Revision history for this message
Carlos Sánchez de La Lama (csanchezdll) wrote :

I am getting some erros buiding your branch:

----

../../../../../src/pocl.schnetter/lib/kernel/x86/max.cl:156:32: error: invalid conversion between ext-vector type 'long2' and 'ulong2'
IMPLEMENT_DIRECT(max, ulong2 , (long2 )(a>=b) ? a : b)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
../../../../../src/pocl.schnetter/lib/kernel/x86/max.cl:29:12: note: expanded from:
    return EXPR; \
           ^
../../../../../src/pocl.schnetter/lib/kernel/x86/max.cl:157:32: error: invalid conversion between ext-vector type 'long3' and 'ulong3'
IMPLEMENT_DIRECT(max, ulong3 , (long3 )(a>=b) ? a : b)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
../../../../../src/pocl.schnetter/lib/kernel/x86/max.cl:29:12: note: expanded from:
    return EXPR; \
           ^
../../../../../src/pocl.schnetter/lib/kernel/x86/max.cl:158:32: error: invalid conversion between ext-vector type 'long4' and 'ulong4'
IMPLEMENT_DIRECT(max, ulong4 , (long4 )(a>=b) ? a : b)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
../../../../../src/pocl.schnetter/lib/kernel/x86/max.cl:29:12: note: expanded from:
    return EXPR; \
           ^
../../../../../src/pocl.schnetter/lib/kernel/x86/max.cl:159:32: error: invalid conversion between ext-vector type 'long8' and 'ulong8'
IMPLEMENT_DIRECT(max, ulong8 , (long8 )(a>=b) ? a : b)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
../../../../../src/pocl.schnetter/lib/kernel/x86/max.cl:29:12: note: expanded from:
    return EXPR; \
           ^
../../../../../src/pocl.schnetter/lib/kernel/x86/max.cl:160:32: error: invalid conversion between ext-vector type 'long16' and 'ulong16'
IMPLEMENT_DIRECT(max, ulong16, (long16)(a>=b) ? a : b)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
../../../../../src/pocl.schnetter/lib/kernel/x86/max.cl:29:12: note: expanded from:
    return EXPR; \
           ^
5 errors generated.

----

Can you have a look at those?

BR

Carlos

lp:~schnetter/pocl/main updated
72. By Erik Schnetter

Correct/Optimise some x86 functions

73. By Erik Schnetter

Merge

74. By Erik Schnetter

Correct error in fabs/copysign

Revision history for this message
Erik Schnetter (schnetter) wrote :

I modified the implementation of max and min.

review: Needs Resubmitting
Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

Builds here now and make check is OK. Looks ok for merging to me.

review: Approve
Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

Well, actually. After merging to trunk, I get the same error Carlos got. Erik, please merge from trunk and fix the issue and I can then merge it to trunk.

review: Needs Fixing
Revision history for this message
Erik Schnetter (schnetter) wrote :

Very strange. There seem to be some pesky differences between different llvm versions.

Anyway, I tried to merge from the trunk yesterday, and found substantial problems because of the tce backend, which doesn't support long or double. I'll have to either comment out all run-time functions, or disable long and double if those are not available.

Revision history for this message
Carlos Sánchez de La Lama (csanchezdll) wrote :

I am merging already, taking into account cl_khr_int64 and cl_khr_fp64 when declaring / implementing the functions so long/ulong and doubles are only used when supported. Hopefully ready in couple of hours.

Carlos

Revision history for this message
Carlos Sánchez de La Lama (csanchezdll) wrote :

Marged. Please be careful when adding new calls or modifying existing ones to take cl_khr_int64 (long and ulong support) and cl_khe_fp64 (double support) into account.

Carlos

review: Approve
lp:~schnetter/pocl/main updated
75. By Erik Schnetter

Correct prototype of signbit.
Add optimised x86 implementation of signbit.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'configure.ac'
2--- configure.ac 2011-10-31 16:58:40 +0000
3+++ configure.ac 2011-10-31 17:03:23 +0000
4@@ -112,7 +112,12 @@
5 lib/CL/Makefile
6 lib/llvmopencl/Makefile
7 lib/kernel/Makefile
8+<<<<<<< TREE
9 lib/kernel/tce/Makefile
10+=======
11+ lib/kernel/dummy/Makefile
12+ lib/kernel/x86/Makefile
13+>>>>>>> MERGE-SOURCE
14 examples/Makefile
15 examples/example1/Makefile
16 examples/example2/Makefile
17@@ -123,5 +128,7 @@
18 scripts/Makefile
19 tests/Makefile
20 tests/atlocal])
21+# lib/kernel/bgp/Makefile
22+# lib/kernel/ppc/Makefile
23
24 AC_OUTPUT
25
26=== modified file 'include/_kernel.h'
27--- include/_kernel.h 2011-10-31 16:58:40 +0000
28+++ include/_kernel.h 2011-10-31 17:03:23 +0000
29@@ -46,6 +46,14 @@
30 */
31 #pragma OPENCL EXTENSION cl_khr_fp64: enable
32
33+<<<<<<< TREE
34+=======
35+#define __SSE4_1__
36+
37+
38+#ifndef __TCE__
39+//#define __kernel __attribute__ ((noinline))
40+>>>>>>> MERGE-SOURCE
41 #define __global __attribute__ ((address_space(3)))
42 #define __local __attribute__ ((address_space(4)))
43 #define __constant __attribute__ ((address_space(5)))
44@@ -68,6 +76,83 @@
45 typedef unsigned int uint;
46 typedef unsigned long ulong;
47
48+#if 0
49+/* 32 bit systems */
50+typedef uint size_t;
51+typedef int ptrdiff_t;
52+typedef int intptr_t;
53+typedef uint uintptr_t;
54+#else
55+/* 64 bit systems */
56+typedef ulong size_t;
57+typedef long ptrdiff_t;
58+typedef long intptr_t;
59+typedef ulong uintptr_t;
60+#endif
61+
62+// We align the 3-vectors, so that their sizeof is correct. Is there a
63+// better way? Should we also align the other vectors?
64+
65+typedef char char2 __attribute__((__ext_vector_type__(2)));
66+typedef char char3 __attribute__((__ext_vector_type__(3), __aligned__(4)));
67+typedef char char4 __attribute__((__ext_vector_type__(4)));
68+typedef char char8 __attribute__((__ext_vector_type__(8)));
69+typedef char char16 __attribute__((__ext_vector_type__(16)));
70+
71+typedef uchar uchar2 __attribute__((__ext_vector_type__(2)));
72+typedef uchar uchar3 __attribute__((__ext_vector_type__(3), __aligned__(4)));
73+typedef uchar uchar4 __attribute__((__ext_vector_type__(4)));
74+typedef uchar uchar8 __attribute__((__ext_vector_type__(8)));
75+typedef uchar uchar16 __attribute__((__ext_vector_type__(16)));
76+
77+typedef short short2 __attribute__((__ext_vector_type__(2)));
78+typedef short short3 __attribute__((__ext_vector_type__(3), __aligned__(8)));
79+typedef short short4 __attribute__((__ext_vector_type__(4)));
80+typedef short short8 __attribute__((__ext_vector_type__(8)));
81+typedef short short16 __attribute__((__ext_vector_type__(16)));
82+
83+typedef ushort ushort2 __attribute__((__ext_vector_type__(2)));
84+typedef ushort ushort3 __attribute__((__ext_vector_type__(3), __aligned__(8)));
85+typedef ushort ushort4 __attribute__((__ext_vector_type__(4)));
86+typedef ushort ushort8 __attribute__((__ext_vector_type__(8)));
87+typedef ushort ushort16 __attribute__((__ext_vector_type__(16)));
88+
89+typedef int int2 __attribute__((__ext_vector_type__(2)));
90+typedef int int3 __attribute__((__ext_vector_type__(3), __aligned__(16)));
91+typedef int int4 __attribute__((__ext_vector_type__(4)));
92+typedef int int8 __attribute__((__ext_vector_type__(8)));
93+typedef int int16 __attribute__((__ext_vector_type__(16)));
94+
95+typedef uint uint2 __attribute__((__ext_vector_type__(2)));
96+typedef uint uint3 __attribute__((__ext_vector_type__(3), __aligned__(16)));
97+typedef uint uint4 __attribute__((__ext_vector_type__(4)));
98+typedef uint uint8 __attribute__((__ext_vector_type__(8)));
99+typedef uint uint16 __attribute__((__ext_vector_type__(16)));
100+
101+typedef long long2 __attribute__((__ext_vector_type__(2)));
102+typedef long long3 __attribute__((__ext_vector_type__(3), __aligned__(32)));
103+typedef long long4 __attribute__((__ext_vector_type__(4)));
104+typedef long long8 __attribute__((__ext_vector_type__(8)));
105+typedef long long16 __attribute__((__ext_vector_type__(16)));
106+
107+typedef ulong ulong2 __attribute__((__ext_vector_type__(2)));
108+typedef ulong ulong3 __attribute__((__ext_vector_type__(3), __aligned__(32)));
109+typedef ulong ulong4 __attribute__((__ext_vector_type__(4)));
110+typedef ulong ulong8 __attribute__((__ext_vector_type__(8)));
111+typedef ulong ulong16 __attribute__((__ext_vector_type__(16)));
112+
113+typedef float float2 __attribute__((__ext_vector_type__(2)));
114+typedef float float3 __attribute__((__ext_vector_type__(3), __aligned__(16)));
115+typedef float float4 __attribute__((__ext_vector_type__(4)));
116+typedef float float8 __attribute__((__ext_vector_type__(8)));
117+typedef float float16 __attribute__((__ext_vector_type__(16)));
118+
119+typedef double double2 __attribute__((__ext_vector_type__(2)));
120+typedef double double3 __attribute__((__ext_vector_type__(3), __aligned__(32)));
121+typedef double double4 __attribute__((__ext_vector_type__(4)));
122+typedef double double8 __attribute__((__ext_vector_type__(8)));
123+typedef double double16 __attribute__((__ext_vector_type__(16)));
124+
125 /* Ensure the data types have the right sizes */
126 #define _cl_static_assert(_t, _x) typedef int ai##_t[(_x) ? 1 : -1]
127 _cl_static_assert(char , sizeof(char ) == 1);
128@@ -83,6 +168,7 @@
129 _cl_static_assert(float , sizeof(float ) == 4);
130 #ifdef cl_khr_fp64
131 _cl_static_assert(double, sizeof(double) == 8);
132+<<<<<<< TREE
133 #endif
134
135 typedef char char2 __attribute__((ext_vector_type(2)));
136@@ -144,26 +230,79 @@
137 typedef double double4 __attribute__((ext_vector_type(4)));
138 typedef double double8 __attribute__((ext_vector_type(8)));
139 typedef double double16 __attribute__((ext_vector_type(16)));
140+=======
141+_cl_static_assert(size_t, sizeof(size_t) == sizeof(void*));
142+
143+_cl_static_assert(char2 , sizeof(char2 ) == 2 *sizeof(char));
144+_cl_static_assert(char3 , sizeof(char3 ) == 4 *sizeof(char));
145+_cl_static_assert(char4 , sizeof(char4 ) == 4 *sizeof(char));
146+_cl_static_assert(char8 , sizeof(char8 ) == 8 *sizeof(char));
147+_cl_static_assert(char16, sizeof(char16) == 16*sizeof(char));
148+
149+_cl_static_assert(uchar2 , sizeof(uchar2 ) == 2 *sizeof(uchar));
150+_cl_static_assert(uchar3 , sizeof(uchar3 ) == 4 *sizeof(uchar));
151+_cl_static_assert(uchar4 , sizeof(uchar4 ) == 4 *sizeof(uchar));
152+_cl_static_assert(uchar8 , sizeof(uchar8 ) == 8 *sizeof(uchar));
153+_cl_static_assert(uchar16, sizeof(uchar16) == 16*sizeof(uchar));
154+
155+_cl_static_assert(short2 , sizeof(short2 ) == 2 *sizeof(short));
156+_cl_static_assert(short3 , sizeof(short3 ) == 4 *sizeof(short));
157+_cl_static_assert(short4 , sizeof(short4 ) == 4 *sizeof(short));
158+_cl_static_assert(short8 , sizeof(short8 ) == 8 *sizeof(short));
159+_cl_static_assert(short16, sizeof(short16) == 16*sizeof(short));
160+
161+_cl_static_assert(ushort2 , sizeof(ushort2 ) == 2 *sizeof(ushort));
162+_cl_static_assert(ushort3 , sizeof(ushort3 ) == 4 *sizeof(ushort));
163+_cl_static_assert(ushort4 , sizeof(ushort4 ) == 4 *sizeof(ushort));
164+_cl_static_assert(ushort8 , sizeof(ushort8 ) == 8 *sizeof(ushort));
165+_cl_static_assert(ushort16, sizeof(ushort16) == 16*sizeof(ushort));
166+
167+_cl_static_assert(int2 , sizeof(int2 ) == 2 *sizeof(int));
168+_cl_static_assert(int3 , sizeof(int3 ) == 4 *sizeof(int));
169+_cl_static_assert(int4 , sizeof(int4 ) == 4 *sizeof(int));
170+_cl_static_assert(int8 , sizeof(int8 ) == 8 *sizeof(int));
171+_cl_static_assert(int16, sizeof(int16) == 16*sizeof(int));
172+
173+_cl_static_assert(uint2 , sizeof(uint2 ) == 2 *sizeof(uint));
174+_cl_static_assert(uint3 , sizeof(uint3 ) == 4 *sizeof(uint));
175+_cl_static_assert(uint4 , sizeof(uint4 ) == 4 *sizeof(uint));
176+_cl_static_assert(uint8 , sizeof(uint8 ) == 8 *sizeof(uint));
177+_cl_static_assert(uint16, sizeof(uint16) == 16*sizeof(uint));
178+
179+_cl_static_assert(float2 , sizeof(float2 ) == 2 *sizeof(float));
180+_cl_static_assert(float3 , sizeof(float3 ) == 4 *sizeof(float));
181+_cl_static_assert(float4 , sizeof(float4 ) == 4 *sizeof(float));
182+_cl_static_assert(float8 , sizeof(float8 ) == 8 *sizeof(float));
183+_cl_static_assert(float16, sizeof(float16) == 16*sizeof(float));
184+
185+_cl_static_assert(double2 , sizeof(double2 ) == 2 *sizeof(double));
186+_cl_static_assert(double3 , sizeof(double3 ) == 4 *sizeof(double));
187+_cl_static_assert(double4 , sizeof(double4 ) == 4 *sizeof(double));
188+_cl_static_assert(double8 , sizeof(double8 ) == 8 *sizeof(double));
189+_cl_static_assert(double16, sizeof(double16) == 16*sizeof(double));
190+>>>>>>> MERGE-SOURCE
191
192
193
194 /* Conversion functions */
195
196-#define _CL_DECLARE_AS_TYPE(SRC, DST) \
197- DST __attribute__ ((overloadable)) as_##DST(SRC a);
198+#define _cl_overloadable __attribute__ ((__overloadable__))
199+
200+#define _CL_DECLARE_AS_TYPE(SRC, DST) \
201+ DST _cl_overloadable as_##DST(SRC a);
202
203 /* 1 byte */
204-#define _CL_DECLARE_AS_TYPE_1(SRC) \
205- _CL_DECLARE_AS_TYPE(SRC, char) \
206+#define _CL_DECLARE_AS_TYPE_1(SRC) \
207+ _CL_DECLARE_AS_TYPE(SRC, char) \
208 _CL_DECLARE_AS_TYPE(SRC, uchar)
209 _CL_DECLARE_AS_TYPE_1(char)
210 _CL_DECLARE_AS_TYPE_1(uchar)
211
212 /* 2 bytes */
213-#define _CL_DECLARE_AS_TYPE_2(SRC) \
214- _CL_DECLARE_AS_TYPE(SRC, char2) \
215- _CL_DECLARE_AS_TYPE(SRC, uchar2) \
216- _CL_DECLARE_AS_TYPE(SRC, short) \
217+#define _CL_DECLARE_AS_TYPE_2(SRC) \
218+ _CL_DECLARE_AS_TYPE(SRC, char2) \
219+ _CL_DECLARE_AS_TYPE(SRC, uchar2) \
220+ _CL_DECLARE_AS_TYPE(SRC, short) \
221 _CL_DECLARE_AS_TYPE(SRC, ushort)
222 _CL_DECLARE_AS_TYPE_2(char2)
223 _CL_DECLARE_AS_TYPE_2(uchar2)
224@@ -171,13 +310,13 @@
225 _CL_DECLARE_AS_TYPE_2(ushort)
226
227 /* 4 bytes */
228-#define _CL_DECLARE_AS_TYPE_4(SRC) \
229- _CL_DECLARE_AS_TYPE(SRC, char4) \
230- _CL_DECLARE_AS_TYPE(SRC, uchar4) \
231- _CL_DECLARE_AS_TYPE(SRC, short2) \
232- _CL_DECLARE_AS_TYPE(SRC, ushort2) \
233- _CL_DECLARE_AS_TYPE(SRC, int) \
234- _CL_DECLARE_AS_TYPE(SRC, uint) \
235+#define _CL_DECLARE_AS_TYPE_4(SRC) \
236+ _CL_DECLARE_AS_TYPE(SRC, char4) \
237+ _CL_DECLARE_AS_TYPE(SRC, uchar4) \
238+ _CL_DECLARE_AS_TYPE(SRC, short2) \
239+ _CL_DECLARE_AS_TYPE(SRC, ushort2) \
240+ _CL_DECLARE_AS_TYPE(SRC, int) \
241+ _CL_DECLARE_AS_TYPE(SRC, uint) \
242 _CL_DECLARE_AS_TYPE(SRC, float)
243 _CL_DECLARE_AS_TYPE_4(char4)
244 _CL_DECLARE_AS_TYPE_4(uchar4)
245@@ -188,16 +327,16 @@
246 _CL_DECLARE_AS_TYPE_4(float)
247
248 /* 8 bytes */
249-#define _CL_DECLARE_AS_TYPE_8(SRC) \
250- _CL_DECLARE_AS_TYPE(SRC, char8) \
251- _CL_DECLARE_AS_TYPE(SRC, uchar8) \
252- _CL_DECLARE_AS_TYPE(SRC, short4) \
253- _CL_DECLARE_AS_TYPE(SRC, ushort4) \
254- _CL_DECLARE_AS_TYPE(SRC, int2) \
255- _CL_DECLARE_AS_TYPE(SRC, uint2) \
256- _CL_DECLARE_AS_TYPE(SRC, long) \
257- _CL_DECLARE_AS_TYPE(SRC, ulong) \
258- _CL_DECLARE_AS_TYPE(SRC, float2) \
259+#define _CL_DECLARE_AS_TYPE_8(SRC) \
260+ _CL_DECLARE_AS_TYPE(SRC, char8) \
261+ _CL_DECLARE_AS_TYPE(SRC, uchar8) \
262+ _CL_DECLARE_AS_TYPE(SRC, short4) \
263+ _CL_DECLARE_AS_TYPE(SRC, ushort4) \
264+ _CL_DECLARE_AS_TYPE(SRC, int2) \
265+ _CL_DECLARE_AS_TYPE(SRC, uint2) \
266+ _CL_DECLARE_AS_TYPE(SRC, long) \
267+ _CL_DECLARE_AS_TYPE(SRC, ulong) \
268+ _CL_DECLARE_AS_TYPE(SRC, float2) \
269 _CL_DECLARE_AS_TYPE(SRC, double)
270 _CL_DECLARE_AS_TYPE_8(char8)
271 _CL_DECLARE_AS_TYPE_8(uchar8)
272@@ -211,16 +350,16 @@
273 _CL_DECLARE_AS_TYPE_8(double)
274
275 /* 16 bytes */
276-#define _CL_DECLARE_AS_TYPE_16(SRC) \
277- _CL_DECLARE_AS_TYPE(SRC, char16) \
278- _CL_DECLARE_AS_TYPE(SRC, uchar16) \
279- _CL_DECLARE_AS_TYPE(SRC, short8) \
280- _CL_DECLARE_AS_TYPE(SRC, ushort8) \
281- _CL_DECLARE_AS_TYPE(SRC, int4) \
282- _CL_DECLARE_AS_TYPE(SRC, uint4) \
283- _CL_DECLARE_AS_TYPE(SRC, long2) \
284- _CL_DECLARE_AS_TYPE(SRC, ulong2) \
285- _CL_DECLARE_AS_TYPE(SRC, float4) \
286+#define _CL_DECLARE_AS_TYPE_16(SRC) \
287+ _CL_DECLARE_AS_TYPE(SRC, char16) \
288+ _CL_DECLARE_AS_TYPE(SRC, uchar16) \
289+ _CL_DECLARE_AS_TYPE(SRC, short8) \
290+ _CL_DECLARE_AS_TYPE(SRC, ushort8) \
291+ _CL_DECLARE_AS_TYPE(SRC, int4) \
292+ _CL_DECLARE_AS_TYPE(SRC, uint4) \
293+ _CL_DECLARE_AS_TYPE(SRC, long2) \
294+ _CL_DECLARE_AS_TYPE(SRC, ulong2) \
295+ _CL_DECLARE_AS_TYPE(SRC, float4) \
296 _CL_DECLARE_AS_TYPE(SRC, double2)
297 _CL_DECLARE_AS_TYPE_16(char16)
298 _CL_DECLARE_AS_TYPE_16(uchar16)
299@@ -234,14 +373,14 @@
300 _CL_DECLARE_AS_TYPE_16(double2)
301
302 /* 32 bytes */
303-#define _CL_DECLARE_AS_TYPE_32(SRC) \
304- _CL_DECLARE_AS_TYPE(SRC, short16) \
305- _CL_DECLARE_AS_TYPE(SRC, ushort16) \
306- _CL_DECLARE_AS_TYPE(SRC, int8) \
307- _CL_DECLARE_AS_TYPE(SRC, uint8) \
308- _CL_DECLARE_AS_TYPE(SRC, long4) \
309- _CL_DECLARE_AS_TYPE(SRC, ulong4) \
310- _CL_DECLARE_AS_TYPE(SRC, float8) \
311+#define _CL_DECLARE_AS_TYPE_32(SRC) \
312+ _CL_DECLARE_AS_TYPE(SRC, short16) \
313+ _CL_DECLARE_AS_TYPE(SRC, ushort16) \
314+ _CL_DECLARE_AS_TYPE(SRC, int8) \
315+ _CL_DECLARE_AS_TYPE(SRC, uint8) \
316+ _CL_DECLARE_AS_TYPE(SRC, long4) \
317+ _CL_DECLARE_AS_TYPE(SRC, ulong4) \
318+ _CL_DECLARE_AS_TYPE(SRC, float8) \
319 _CL_DECLARE_AS_TYPE(SRC, double4)
320 _CL_DECLARE_AS_TYPE_32(short16)
321 _CL_DECLARE_AS_TYPE_32(ushort16)
322@@ -253,12 +392,12 @@
323 _CL_DECLARE_AS_TYPE_32(double4)
324
325 /* 64 bytes */
326-#define _CL_DECLARE_AS_TYPE_64(SRC) \
327- _CL_DECLARE_AS_TYPE(SRC, int16) \
328- _CL_DECLARE_AS_TYPE(SRC, uint16) \
329- _CL_DECLARE_AS_TYPE(SRC, long8) \
330- _CL_DECLARE_AS_TYPE(SRC, ulong8) \
331- _CL_DECLARE_AS_TYPE(SRC, float16) \
332+#define _CL_DECLARE_AS_TYPE_64(SRC) \
333+ _CL_DECLARE_AS_TYPE(SRC, int16) \
334+ _CL_DECLARE_AS_TYPE(SRC, uint16) \
335+ _CL_DECLARE_AS_TYPE(SRC, long8) \
336+ _CL_DECLARE_AS_TYPE(SRC, ulong8) \
337+ _CL_DECLARE_AS_TYPE(SRC, float16) \
338 _CL_DECLARE_AS_TYPE(SRC, double8)
339 _CL_DECLARE_AS_TYPE_64(int16)
340 _CL_DECLARE_AS_TYPE_64(uint16)
341@@ -268,16 +407,16 @@
342 _CL_DECLARE_AS_TYPE_64(double8)
343
344 /* 128 bytes */
345-#define _CL_DECLARE_AS_TYPE_128(SRC) \
346- _CL_DECLARE_AS_TYPE(SRC, long16) \
347- _CL_DECLARE_AS_TYPE(SRC, ulong16) \
348+#define _CL_DECLARE_AS_TYPE_128(SRC) \
349+ _CL_DECLARE_AS_TYPE(SRC, long16) \
350+ _CL_DECLARE_AS_TYPE(SRC, ulong16) \
351 _CL_DECLARE_AS_TYPE(SRC, double16)
352 _CL_DECLARE_AS_TYPE_128(long16)
353 _CL_DECLARE_AS_TYPE_128(ulong16)
354 _CL_DECLARE_AS_TYPE_128(double16)
355
356-#define _CL_DECLARE_CONVERT_TYPE(SRC, DST) \
357- DST __attribute__ ((overloadable)) convert_##DST(SRC a);
358+#define _CL_DECLARE_CONVERT_TYPE(SRC, DST) \
359+ DST _cl_overloadable convert_##DST(SRC a);
360
361 /* 1 element */
362 #define _CL_DECLARE_CONVERT_TYPE_1(SRC) \
363@@ -506,241 +645,241 @@
364 * V: vector of float or double
365 */
366
367-#define _CL_DECLARE_FUNC_V_V(NAME) \
368- float __attribute__ ((overloadable)) NAME(float ); \
369- float2 __attribute__ ((overloadable)) NAME(float2 ); \
370- float3 __attribute__ ((overloadable)) NAME(float3 ); \
371- float4 __attribute__ ((overloadable)) NAME(float4 ); \
372- float8 __attribute__ ((overloadable)) NAME(float8 ); \
373- float16 __attribute__ ((overloadable)) NAME(float16 ); \
374- double __attribute__ ((overloadable)) NAME(double ); \
375- double2 __attribute__ ((overloadable)) NAME(double2 ); \
376- double3 __attribute__ ((overloadable)) NAME(double3 ); \
377- double4 __attribute__ ((overloadable)) NAME(double4 ); \
378- double8 __attribute__ ((overloadable)) NAME(double8 ); \
379- double16 __attribute__ ((overloadable)) NAME(double16);
380-#define _CL_DECLARE_FUNC_V_VV(NAME) \
381- float __attribute__ ((overloadable)) NAME(float , float ); \
382- float2 __attribute__ ((overloadable)) NAME(float2 , float2 ); \
383- float3 __attribute__ ((overloadable)) NAME(float3 , float3 ); \
384- float4 __attribute__ ((overloadable)) NAME(float4 , float4 ); \
385- float8 __attribute__ ((overloadable)) NAME(float8 , float8 ); \
386- float16 __attribute__ ((overloadable)) NAME(float16 , float16 ); \
387- double __attribute__ ((overloadable)) NAME(double , double ); \
388- double2 __attribute__ ((overloadable)) NAME(double2 , double2 ); \
389- double3 __attribute__ ((overloadable)) NAME(double3 , double3 ); \
390- double4 __attribute__ ((overloadable)) NAME(double4 , double4 ); \
391- double8 __attribute__ ((overloadable)) NAME(double8 , double8 ); \
392- double16 __attribute__ ((overloadable)) NAME(double16, double16);
393-#define _CL_DECLARE_FUNC_V_VVV(NAME) \
394- float __attribute__ ((overloadable)) NAME(float , float , float ); \
395- float2 __attribute__ ((overloadable)) NAME(float2 , float2 , float2 ); \
396- float3 __attribute__ ((overloadable)) NAME(float3 , float3 , float3 ); \
397- float4 __attribute__ ((overloadable)) NAME(float4 , float4 , float4 ); \
398- float8 __attribute__ ((overloadable)) NAME(float8 , float8 , float8 ); \
399- float16 __attribute__ ((overloadable)) NAME(float16 , float16 , float16 ); \
400- double __attribute__ ((overloadable)) NAME(double , double , double ); \
401- double2 __attribute__ ((overloadable)) NAME(double2 , double2 , double2 ); \
402- double3 __attribute__ ((overloadable)) NAME(double3 , double3 , double3 ); \
403- double4 __attribute__ ((overloadable)) NAME(double4 , double4 , double4 ); \
404- double8 __attribute__ ((overloadable)) NAME(double8 , double8 , double8 ); \
405- double16 __attribute__ ((overloadable)) NAME(double16, double16, double16);
406-#define _CL_DECLARE_FUNC_V_VVS(NAME) \
407- float2 __attribute__ ((overloadable)) NAME(float2 , float2 , float ); \
408- float3 __attribute__ ((overloadable)) NAME(float3 , float3 , float ); \
409- float4 __attribute__ ((overloadable)) NAME(float4 , float4 , float ); \
410- float8 __attribute__ ((overloadable)) NAME(float8 , float8 , float ); \
411- float16 __attribute__ ((overloadable)) NAME(float16 , float16 , float ); \
412- double2 __attribute__ ((overloadable)) NAME(double2 , double2 , double); \
413- double3 __attribute__ ((overloadable)) NAME(double3 , double3 , double); \
414- double4 __attribute__ ((overloadable)) NAME(double4 , double4 , double); \
415- double8 __attribute__ ((overloadable)) NAME(double8 , double8 , double); \
416- double16 __attribute__ ((overloadable)) NAME(double16, double16, double);
417-#define _CL_DECLARE_FUNC_V_VSS(NAME) \
418- float2 __attribute__ ((overloadable)) NAME(float2 , float , float ); \
419- float3 __attribute__ ((overloadable)) NAME(float3 , float , float ); \
420- float4 __attribute__ ((overloadable)) NAME(float4 , float , float ); \
421- float8 __attribute__ ((overloadable)) NAME(float8 , float , float ); \
422- float16 __attribute__ ((overloadable)) NAME(float16 , float , float ); \
423- double2 __attribute__ ((overloadable)) NAME(double2 , double, double); \
424- double3 __attribute__ ((overloadable)) NAME(double3 , double, double); \
425- double4 __attribute__ ((overloadable)) NAME(double4 , double, double); \
426- double8 __attribute__ ((overloadable)) NAME(double8 , double, double); \
427- double16 __attribute__ ((overloadable)) NAME(double16, double, double);
428-#define _CL_DECLARE_FUNC_V_SSV(NAME) \
429- float2 __attribute__ ((overloadable)) NAME(float , float , float2 ); \
430- float3 __attribute__ ((overloadable)) NAME(float , float , float3 ); \
431- float4 __attribute__ ((overloadable)) NAME(float , float , float4 ); \
432- float8 __attribute__ ((overloadable)) NAME(float , float , float8 ); \
433- float16 __attribute__ ((overloadable)) NAME(float , float , float16 ); \
434- double2 __attribute__ ((overloadable)) NAME(double, double, double2 ); \
435- double3 __attribute__ ((overloadable)) NAME(double, double, double3 ); \
436- double4 __attribute__ ((overloadable)) NAME(double, double, double4 ); \
437- double8 __attribute__ ((overloadable)) NAME(double, double, double8 ); \
438- double16 __attribute__ ((overloadable)) NAME(double, double, double16);
439-#define _CL_DECLARE_FUNC_V_VVJ(NAME) \
440- float __attribute__ ((overloadable)) NAME(float , float , int ); \
441- float2 __attribute__ ((overloadable)) NAME(float2 , float2 , int2 ); \
442- float3 __attribute__ ((overloadable)) NAME(float3 , float3 , int3 ); \
443- float4 __attribute__ ((overloadable)) NAME(float4 , float4 , int4 ); \
444- float8 __attribute__ ((overloadable)) NAME(float8 , float8 , int8 ); \
445- float16 __attribute__ ((overloadable)) NAME(float16 , float16 , int16 ); \
446- double __attribute__ ((overloadable)) NAME(double , double , long ); \
447- double2 __attribute__ ((overloadable)) NAME(double2 , double2 , long2 ); \
448- double3 __attribute__ ((overloadable)) NAME(double3 , double3 , long3 ); \
449- double4 __attribute__ ((overloadable)) NAME(double4 , double4 , long4 ); \
450- double8 __attribute__ ((overloadable)) NAME(double8 , double8 , long8 ); \
451- double16 __attribute__ ((overloadable)) NAME(double16, double16, long16);
452-#define _CL_DECLARE_FUNC_V_U(NAME) \
453- float __attribute__ ((overloadable)) NAME(uint ); \
454- float2 __attribute__ ((overloadable)) NAME(uint2 ); \
455- float3 __attribute__ ((overloadable)) NAME(uint3 ); \
456- float4 __attribute__ ((overloadable)) NAME(uint4 ); \
457- float8 __attribute__ ((overloadable)) NAME(uint8 ); \
458- float16 __attribute__ ((overloadable)) NAME(uint16 ); \
459- double __attribute__ ((overloadable)) NAME(ulong ); \
460- double2 __attribute__ ((overloadable)) NAME(ulong2 ); \
461- double3 __attribute__ ((overloadable)) NAME(ulong3 ); \
462- double4 __attribute__ ((overloadable)) NAME(ulong4 ); \
463- double8 __attribute__ ((overloadable)) NAME(ulong8 ); \
464- double16 __attribute__ ((overloadable)) NAME(ulong16);
465-#define _CL_DECLARE_FUNC_V_VS(NAME) \
466- float2 __attribute__ ((overloadable)) NAME(float2 , float ); \
467- float3 __attribute__ ((overloadable)) NAME(float3 , float ); \
468- float4 __attribute__ ((overloadable)) NAME(float4 , float ); \
469- float8 __attribute__ ((overloadable)) NAME(float8 , float ); \
470- float16 __attribute__ ((overloadable)) NAME(float16 , float ); \
471- double2 __attribute__ ((overloadable)) NAME(double2 , double); \
472- double3 __attribute__ ((overloadable)) NAME(double3 , double); \
473- double4 __attribute__ ((overloadable)) NAME(double4 , double); \
474- double8 __attribute__ ((overloadable)) NAME(double8 , double); \
475- double16 __attribute__ ((overloadable)) NAME(double16, double);
476-#define _CL_DECLARE_FUNC_V_VJ(NAME) \
477- float __attribute__ ((overloadable)) NAME(float , int ); \
478- float2 __attribute__ ((overloadable)) NAME(float2 , int2 ); \
479- float3 __attribute__ ((overloadable)) NAME(float3 , int3 ); \
480- float4 __attribute__ ((overloadable)) NAME(float4 , int4 ); \
481- float8 __attribute__ ((overloadable)) NAME(float8 , int8 ); \
482- float16 __attribute__ ((overloadable)) NAME(float16 , int16); \
483- double __attribute__ ((overloadable)) NAME(double , int ); \
484- double2 __attribute__ ((overloadable)) NAME(double2 , int2 ); \
485- double3 __attribute__ ((overloadable)) NAME(double3 , int3 ); \
486- double4 __attribute__ ((overloadable)) NAME(double4 , int4 ); \
487- double8 __attribute__ ((overloadable)) NAME(double8 , int8 ); \
488- double16 __attribute__ ((overloadable)) NAME(double16, int16);
489-#define _CL_DECLARE_FUNC_J_VV(NAME) \
490- int __attribute__ ((overloadable)) NAME(float , float ); \
491- int2 __attribute__ ((overloadable)) NAME(float2 , float2 ); \
492- int3 __attribute__ ((overloadable)) NAME(float3 , float3 ); \
493- int4 __attribute__ ((overloadable)) NAME(float4 , float4 ); \
494- int8 __attribute__ ((overloadable)) NAME(float8 , float8 ); \
495- int16 __attribute__ ((overloadable)) NAME(float16 , float16 ); \
496- int __attribute__ ((overloadable)) NAME(double , double ); \
497- long2 __attribute__ ((overloadable)) NAME(double2 , double2 ); \
498- long3 __attribute__ ((overloadable)) NAME(double3 , double3 ); \
499- long4 __attribute__ ((overloadable)) NAME(double4 , double4 ); \
500- long8 __attribute__ ((overloadable)) NAME(double8 , double8 ); \
501- long16 __attribute__ ((overloadable)) NAME(double16, double16);
502-#define _CL_DECLARE_FUNC_V_VI(NAME) \
503- float2 __attribute__ ((overloadable)) NAME(float2 , int); \
504- float3 __attribute__ ((overloadable)) NAME(float3 , int); \
505- float4 __attribute__ ((overloadable)) NAME(float4 , int); \
506- float8 __attribute__ ((overloadable)) NAME(float8 , int); \
507- float16 __attribute__ ((overloadable)) NAME(float16 , int); \
508- double2 __attribute__ ((overloadable)) NAME(double2 , int); \
509- double3 __attribute__ ((overloadable)) NAME(double3 , int); \
510- double4 __attribute__ ((overloadable)) NAME(double4 , int); \
511- double8 __attribute__ ((overloadable)) NAME(double8 , int); \
512- double16 __attribute__ ((overloadable)) NAME(double16, int);
513+#define _CL_DECLARE_FUNC_V_V(NAME) \
514+ float _cl_overloadable NAME(float ); \
515+ float2 _cl_overloadable NAME(float2 ); \
516+ float3 _cl_overloadable NAME(float3 ); \
517+ float4 _cl_overloadable NAME(float4 ); \
518+ float8 _cl_overloadable NAME(float8 ); \
519+ float16 _cl_overloadable NAME(float16 ); \
520+ double _cl_overloadable NAME(double ); \
521+ double2 _cl_overloadable NAME(double2 ); \
522+ double3 _cl_overloadable NAME(double3 ); \
523+ double4 _cl_overloadable NAME(double4 ); \
524+ double8 _cl_overloadable NAME(double8 ); \
525+ double16 _cl_overloadable NAME(double16);
526+#define _CL_DECLARE_FUNC_V_VV(NAME) \
527+ float _cl_overloadable NAME(float , float ); \
528+ float2 _cl_overloadable NAME(float2 , float2 ); \
529+ float3 _cl_overloadable NAME(float3 , float3 ); \
530+ float4 _cl_overloadable NAME(float4 , float4 ); \
531+ float8 _cl_overloadable NAME(float8 , float8 ); \
532+ float16 _cl_overloadable NAME(float16 , float16 ); \
533+ double _cl_overloadable NAME(double , double ); \
534+ double2 _cl_overloadable NAME(double2 , double2 ); \
535+ double3 _cl_overloadable NAME(double3 , double3 ); \
536+ double4 _cl_overloadable NAME(double4 , double4 ); \
537+ double8 _cl_overloadable NAME(double8 , double8 ); \
538+ double16 _cl_overloadable NAME(double16, double16);
539+#define _CL_DECLARE_FUNC_V_VVV(NAME) \
540+ float _cl_overloadable NAME(float , float , float ); \
541+ float2 _cl_overloadable NAME(float2 , float2 , float2 ); \
542+ float3 _cl_overloadable NAME(float3 , float3 , float3 ); \
543+ float4 _cl_overloadable NAME(float4 , float4 , float4 ); \
544+ float8 _cl_overloadable NAME(float8 , float8 , float8 ); \
545+ float16 _cl_overloadable NAME(float16 , float16 , float16 ); \
546+ double _cl_overloadable NAME(double , double , double ); \
547+ double2 _cl_overloadable NAME(double2 , double2 , double2 ); \
548+ double3 _cl_overloadable NAME(double3 , double3 , double3 ); \
549+ double4 _cl_overloadable NAME(double4 , double4 , double4 ); \
550+ double8 _cl_overloadable NAME(double8 , double8 , double8 ); \
551+ double16 _cl_overloadable NAME(double16, double16, double16);
552+#define _CL_DECLARE_FUNC_V_VVS(NAME) \
553+ float2 _cl_overloadable NAME(float2 , float2 , float ); \
554+ float3 _cl_overloadable NAME(float3 , float3 , float ); \
555+ float4 _cl_overloadable NAME(float4 , float4 , float ); \
556+ float8 _cl_overloadable NAME(float8 , float8 , float ); \
557+ float16 _cl_overloadable NAME(float16 , float16 , float ); \
558+ double2 _cl_overloadable NAME(double2 , double2 , double); \
559+ double3 _cl_overloadable NAME(double3 , double3 , double); \
560+ double4 _cl_overloadable NAME(double4 , double4 , double); \
561+ double8 _cl_overloadable NAME(double8 , double8 , double); \
562+ double16 _cl_overloadable NAME(double16, double16, double);
563+#define _CL_DECLARE_FUNC_V_VSS(NAME) \
564+ float2 _cl_overloadable NAME(float2 , float , float ); \
565+ float3 _cl_overloadable NAME(float3 , float , float ); \
566+ float4 _cl_overloadable NAME(float4 , float , float ); \
567+ float8 _cl_overloadable NAME(float8 , float , float ); \
568+ float16 _cl_overloadable NAME(float16 , float , float ); \
569+ double2 _cl_overloadable NAME(double2 , double, double); \
570+ double3 _cl_overloadable NAME(double3 , double, double); \
571+ double4 _cl_overloadable NAME(double4 , double, double); \
572+ double8 _cl_overloadable NAME(double8 , double, double); \
573+ double16 _cl_overloadable NAME(double16, double, double);
574+#define _CL_DECLARE_FUNC_V_SSV(NAME) \
575+ float2 _cl_overloadable NAME(float , float , float2 ); \
576+ float3 _cl_overloadable NAME(float , float , float3 ); \
577+ float4 _cl_overloadable NAME(float , float , float4 ); \
578+ float8 _cl_overloadable NAME(float , float , float8 ); \
579+ float16 _cl_overloadable NAME(float , float , float16 ); \
580+ double2 _cl_overloadable NAME(double, double, double2 ); \
581+ double3 _cl_overloadable NAME(double, double, double3 ); \
582+ double4 _cl_overloadable NAME(double, double, double4 ); \
583+ double8 _cl_overloadable NAME(double, double, double8 ); \
584+ double16 _cl_overloadable NAME(double, double, double16);
585+#define _CL_DECLARE_FUNC_V_VVJ(NAME) \
586+ float _cl_overloadable NAME(float , float , int ); \
587+ float2 _cl_overloadable NAME(float2 , float2 , int2 ); \
588+ float3 _cl_overloadable NAME(float3 , float3 , int3 ); \
589+ float4 _cl_overloadable NAME(float4 , float4 , int4 ); \
590+ float8 _cl_overloadable NAME(float8 , float8 , int8 ); \
591+ float16 _cl_overloadable NAME(float16 , float16 , int16 ); \
592+ double _cl_overloadable NAME(double , double , long ); \
593+ double2 _cl_overloadable NAME(double2 , double2 , long2 ); \
594+ double3 _cl_overloadable NAME(double3 , double3 , long3 ); \
595+ double4 _cl_overloadable NAME(double4 , double4 , long4 ); \
596+ double8 _cl_overloadable NAME(double8 , double8 , long8 ); \
597+ double16 _cl_overloadable NAME(double16, double16, long16);
598+#define _CL_DECLARE_FUNC_V_U(NAME) \
599+ float _cl_overloadable NAME(uint ); \
600+ float2 _cl_overloadable NAME(uint2 ); \
601+ float3 _cl_overloadable NAME(uint3 ); \
602+ float4 _cl_overloadable NAME(uint4 ); \
603+ float8 _cl_overloadable NAME(uint8 ); \
604+ float16 _cl_overloadable NAME(uint16 ); \
605+ double _cl_overloadable NAME(ulong ); \
606+ double2 _cl_overloadable NAME(ulong2 ); \
607+ double3 _cl_overloadable NAME(ulong3 ); \
608+ double4 _cl_overloadable NAME(ulong4 ); \
609+ double8 _cl_overloadable NAME(ulong8 ); \
610+ double16 _cl_overloadable NAME(ulong16);
611+#define _CL_DECLARE_FUNC_V_VS(NAME) \
612+ float2 _cl_overloadable NAME(float2 , float ); \
613+ float3 _cl_overloadable NAME(float3 , float ); \
614+ float4 _cl_overloadable NAME(float4 , float ); \
615+ float8 _cl_overloadable NAME(float8 , float ); \
616+ float16 _cl_overloadable NAME(float16 , float ); \
617+ double2 _cl_overloadable NAME(double2 , double); \
618+ double3 _cl_overloadable NAME(double3 , double); \
619+ double4 _cl_overloadable NAME(double4 , double); \
620+ double8 _cl_overloadable NAME(double8 , double); \
621+ double16 _cl_overloadable NAME(double16, double);
622+#define _CL_DECLARE_FUNC_V_VJ(NAME) \
623+ float _cl_overloadable NAME(float , int ); \
624+ float2 _cl_overloadable NAME(float2 , int2 ); \
625+ float3 _cl_overloadable NAME(float3 , int3 ); \
626+ float4 _cl_overloadable NAME(float4 , int4 ); \
627+ float8 _cl_overloadable NAME(float8 , int8 ); \
628+ float16 _cl_overloadable NAME(float16 , int16); \
629+ double _cl_overloadable NAME(double , int ); \
630+ double2 _cl_overloadable NAME(double2 , int2 ); \
631+ double3 _cl_overloadable NAME(double3 , int3 ); \
632+ double4 _cl_overloadable NAME(double4 , int4 ); \
633+ double8 _cl_overloadable NAME(double8 , int8 ); \
634+ double16 _cl_overloadable NAME(double16, int16);
635+#define _CL_DECLARE_FUNC_J_VV(NAME) \
636+ int _cl_overloadable NAME(float , float ); \
637+ int2 _cl_overloadable NAME(float2 , float2 ); \
638+ int3 _cl_overloadable NAME(float3 , float3 ); \
639+ int4 _cl_overloadable NAME(float4 , float4 ); \
640+ int8 _cl_overloadable NAME(float8 , float8 ); \
641+ int16 _cl_overloadable NAME(float16 , float16 ); \
642+ int _cl_overloadable NAME(double , double ); \
643+ long2 _cl_overloadable NAME(double2 , double2 ); \
644+ long3 _cl_overloadable NAME(double3 , double3 ); \
645+ long4 _cl_overloadable NAME(double4 , double4 ); \
646+ long8 _cl_overloadable NAME(double8 , double8 ); \
647+ long16 _cl_overloadable NAME(double16, double16);
648+#define _CL_DECLARE_FUNC_V_VI(NAME) \
649+ float2 _cl_overloadable NAME(float2 , int); \
650+ float3 _cl_overloadable NAME(float3 , int); \
651+ float4 _cl_overloadable NAME(float4 , int); \
652+ float8 _cl_overloadable NAME(float8 , int); \
653+ float16 _cl_overloadable NAME(float16 , int); \
654+ double2 _cl_overloadable NAME(double2 , int); \
655+ double3 _cl_overloadable NAME(double3 , int); \
656+ double4 _cl_overloadable NAME(double4 , int); \
657+ double8 _cl_overloadable NAME(double8 , int); \
658+ double16 _cl_overloadable NAME(double16, int);
659 #define _CL_DECLARE_FUNC_V_VPV(NAME) \
660- float __attribute__ ((overloadable)) NAME(float , __global float *); \
661- float2 __attribute__ ((overloadable)) NAME(float2 , __global float2 *); \
662- float3 __attribute__ ((overloadable)) NAME(float3 , __global float3 *); \
663- float4 __attribute__ ((overloadable)) NAME(float4 , __global float4 *); \
664- float8 __attribute__ ((overloadable)) NAME(float8 , __global float8 *); \
665- float16 __attribute__ ((overloadable)) NAME(float16 , __global float16 *); \
666- double __attribute__ ((overloadable)) NAME(double , __global double *); \
667- double2 __attribute__ ((overloadable)) NAME(double2 , __global double2 *); \
668- double3 __attribute__ ((overloadable)) NAME(double3 , __global double3 *); \
669- double4 __attribute__ ((overloadable)) NAME(double4 , __global double4 *); \
670- double8 __attribute__ ((overloadable)) NAME(double8 , __global double8 *); \
671- double16 __attribute__ ((overloadable)) NAME(double16, __global double16*); \
672- float __attribute__ ((overloadable)) NAME(float , __local float *); \
673- float2 __attribute__ ((overloadable)) NAME(float2 , __local float2 *); \
674- float3 __attribute__ ((overloadable)) NAME(float3 , __local float3 *); \
675- float4 __attribute__ ((overloadable)) NAME(float4 , __local float4 *); \
676- float8 __attribute__ ((overloadable)) NAME(float8 , __local float8 *); \
677- float16 __attribute__ ((overloadable)) NAME(float16 , __local float16 *); \
678- double __attribute__ ((overloadable)) NAME(double , __local double *); \
679- double2 __attribute__ ((overloadable)) NAME(double2 , __local double2 *); \
680- double3 __attribute__ ((overloadable)) NAME(double3 , __local double3 *); \
681- double4 __attribute__ ((overloadable)) NAME(double4 , __local double4 *); \
682- double8 __attribute__ ((overloadable)) NAME(double8 , __local double8 *); \
683- double16 __attribute__ ((overloadable)) NAME(double16, __local double16*); \
684+ float _cl_overloadable NAME(float , __global float *); \
685+ float2 _cl_overloadable NAME(float2 , __global float2 *); \
686+ float3 _cl_overloadable NAME(float3 , __global float3 *); \
687+ float4 _cl_overloadable NAME(float4 , __global float4 *); \
688+ float8 _cl_overloadable NAME(float8 , __global float8 *); \
689+ float16 _cl_overloadable NAME(float16 , __global float16 *); \
690+ double _cl_overloadable NAME(double , __global double *); \
691+ double2 _cl_overloadable NAME(double2 , __global double2 *); \
692+ double3 _cl_overloadable NAME(double3 , __global double3 *); \
693+ double4 _cl_overloadable NAME(double4 , __global double4 *); \
694+ double8 _cl_overloadable NAME(double8 , __global double8 *); \
695+ double16 _cl_overloadable NAME(double16, __global double16*); \
696+ float _cl_overloadable NAME(float , __local float *); \
697+ float2 _cl_overloadable NAME(float2 , __local float2 *); \
698+ float3 _cl_overloadable NAME(float3 , __local float3 *); \
699+ float4 _cl_overloadable NAME(float4 , __local float4 *); \
700+ float8 _cl_overloadable NAME(float8 , __local float8 *); \
701+ float16 _cl_overloadable NAME(float16 , __local float16 *); \
702+ double _cl_overloadable NAME(double , __local double *); \
703+ double2 _cl_overloadable NAME(double2 , __local double2 *); \
704+ double3 _cl_overloadable NAME(double3 , __local double3 *); \
705+ double4 _cl_overloadable NAME(double4 , __local double4 *); \
706+ double8 _cl_overloadable NAME(double8 , __local double8 *); \
707+ double16 _cl_overloadable NAME(double16, __local double16*); \
708 /* __private is not supported yet \
709- float __attribute__ ((overloadable)) NAME(float , __private float *); \
710- float2 __attribute__ ((overloadable)) NAME(float2 , __private float2 *); \
711- float3 __attribute__ ((overloadable)) NAME(float3 , __private float3 *); \
712- float4 __attribute__ ((overloadable)) NAME(float4 , __private float4 *); \
713- float8 __attribute__ ((overloadable)) NAME(float8 , __private float8 *); \
714- float16 __attribute__ ((overloadable)) NAME(float16 , __private float16 *); \
715- double __attribute__ ((overloadable)) NAME(double , __private double *); \
716- double2 __attribute__ ((overloadable)) NAME(double2 , __private double2 *); \
717- double3 __attribute__ ((overloadable)) NAME(double3 , __private double3 *); \
718- double4 __attribute__ ((overloadable)) NAME(double4 , __private double4 *); \
719- double8 __attribute__ ((overloadable)) NAME(double8 , __private double8 *); \
720- double16 __attribute__ ((overloadable)) NAME(double16, __private double16*); \
721+ float _cl_overloadable NAME(float , __private float *); \
722+ float2 _cl_overloadable NAME(float2 , __private float2 *); \
723+ float3 _cl_overloadable NAME(float3 , __private float3 *); \
724+ float4 _cl_overloadable NAME(float4 , __private float4 *); \
725+ float8 _cl_overloadable NAME(float8 , __private float8 *); \
726+ float16 _cl_overloadable NAME(float16 , __private float16 *); \
727+ double _cl_overloadable NAME(double , __private double *); \
728+ double2 _cl_overloadable NAME(double2 , __private double2 *); \
729+ double3 _cl_overloadable NAME(double3 , __private double3 *); \
730+ double4 _cl_overloadable NAME(double4 , __private double4 *); \
731+ double8 _cl_overloadable NAME(double8 , __private double8 *); \
732+ double16 _cl_overloadable NAME(double16, __private double16*); \
733 */
734-#define _CL_DECLARE_FUNC_V_SV(NAME) \
735- float2 __attribute__ ((overloadable)) NAME(float , float2 ); \
736- float3 __attribute__ ((overloadable)) NAME(float , float3 ); \
737- float4 __attribute__ ((overloadable)) NAME(float , float4 ); \
738- float8 __attribute__ ((overloadable)) NAME(float , float8 ); \
739- float16 __attribute__ ((overloadable)) NAME(float , float16 ); \
740- double2 __attribute__ ((overloadable)) NAME(double, double2 ); \
741- double3 __attribute__ ((overloadable)) NAME(double, double3 ); \
742- double4 __attribute__ ((overloadable)) NAME(double, double4 ); \
743- double8 __attribute__ ((overloadable)) NAME(double, double8 ); \
744- double16 __attribute__ ((overloadable)) NAME(double, double16);
745-#define _CL_DECLARE_FUNC_J_V(NAME) \
746- int __attribute__ ((overloadable)) NAME(float ); \
747- int2 __attribute__ ((overloadable)) NAME(float2 ); \
748- int3 __attribute__ ((overloadable)) NAME(float3 ); \
749- int4 __attribute__ ((overloadable)) NAME(float4 ); \
750- int8 __attribute__ ((overloadable)) NAME(float8 ); \
751- int16 __attribute__ ((overloadable)) NAME(float16 ); \
752- int __attribute__ ((overloadable)) NAME(double ); \
753- int2 __attribute__ ((overloadable)) NAME(double2 ); \
754- int3 __attribute__ ((overloadable)) NAME(double3 ); \
755- int4 __attribute__ ((overloadable)) NAME(double4 ); \
756- int8 __attribute__ ((overloadable)) NAME(double8 ); \
757- int16 __attribute__ ((overloadable)) NAME(double16);
758-#define _CL_DECLARE_FUNC_S_V(NAME) \
759- float __attribute__ ((overloadable)) NAME(float ); \
760- float __attribute__ ((overloadable)) NAME(float2 ); \
761- float __attribute__ ((overloadable)) NAME(float3 ); \
762- float __attribute__ ((overloadable)) NAME(float4 ); \
763- float __attribute__ ((overloadable)) NAME(float8 ); \
764- float __attribute__ ((overloadable)) NAME(float16 ); \
765- double __attribute__ ((overloadable)) NAME(double ); \
766- double __attribute__ ((overloadable)) NAME(double2 ); \
767- double __attribute__ ((overloadable)) NAME(double3 ); \
768- double __attribute__ ((overloadable)) NAME(double4 ); \
769- double __attribute__ ((overloadable)) NAME(double8 ); \
770- double __attribute__ ((overloadable)) NAME(double16);
771-#define _CL_DECLARE_FUNC_S_VV(NAME) \
772- float __attribute__ ((overloadable)) NAME(float , float ); \
773- float __attribute__ ((overloadable)) NAME(float2 , float2 ); \
774- float __attribute__ ((overloadable)) NAME(float3 , float3 ); \
775- float __attribute__ ((overloadable)) NAME(float4 , float4 ); \
776- float __attribute__ ((overloadable)) NAME(float8 , float8 ); \
777- float __attribute__ ((overloadable)) NAME(float16 , float16 ); \
778- double __attribute__ ((overloadable)) NAME(double , double ); \
779- double __attribute__ ((overloadable)) NAME(double2 , double2 ); \
780- double __attribute__ ((overloadable)) NAME(double3 , double3 ); \
781- double __attribute__ ((overloadable)) NAME(double4 , double4 ); \
782- double __attribute__ ((overloadable)) NAME(double8 , double8 ); \
783- double __attribute__ ((overloadable)) NAME(double16, double16);
784+#define _CL_DECLARE_FUNC_V_SV(NAME) \
785+ float2 _cl_overloadable NAME(float , float2 ); \
786+ float3 _cl_overloadable NAME(float , float3 ); \
787+ float4 _cl_overloadable NAME(float , float4 ); \
788+ float8 _cl_overloadable NAME(float , float8 ); \
789+ float16 _cl_overloadable NAME(float , float16 ); \
790+ double2 _cl_overloadable NAME(double, double2 ); \
791+ double3 _cl_overloadable NAME(double, double3 ); \
792+ double4 _cl_overloadable NAME(double, double4 ); \
793+ double8 _cl_overloadable NAME(double, double8 ); \
794+ double16 _cl_overloadable NAME(double, double16);
795+#define _CL_DECLARE_FUNC_J_V(NAME) \
796+ int _cl_overloadable NAME(float ); \
797+ int2 _cl_overloadable NAME(float2 ); \
798+ int3 _cl_overloadable NAME(float3 ); \
799+ int4 _cl_overloadable NAME(float4 ); \
800+ int8 _cl_overloadable NAME(float8 ); \
801+ int16 _cl_overloadable NAME(float16 ); \
802+ int _cl_overloadable NAME(double ); \
803+ int2 _cl_overloadable NAME(double2 ); \
804+ int3 _cl_overloadable NAME(double3 ); \
805+ int4 _cl_overloadable NAME(double4 ); \
806+ int8 _cl_overloadable NAME(double8 ); \
807+ int16 _cl_overloadable NAME(double16);
808+#define _CL_DECLARE_FUNC_S_V(NAME) \
809+ float _cl_overloadable NAME(float ); \
810+ float _cl_overloadable NAME(float2 ); \
811+ float _cl_overloadable NAME(float3 ); \
812+ float _cl_overloadable NAME(float4 ); \
813+ float _cl_overloadable NAME(float8 ); \
814+ float _cl_overloadable NAME(float16 ); \
815+ double _cl_overloadable NAME(double ); \
816+ double _cl_overloadable NAME(double2 ); \
817+ double _cl_overloadable NAME(double3 ); \
818+ double _cl_overloadable NAME(double4 ); \
819+ double _cl_overloadable NAME(double8 ); \
820+ double _cl_overloadable NAME(double16);
821+#define _CL_DECLARE_FUNC_S_VV(NAME) \
822+ float _cl_overloadable NAME(float , float ); \
823+ float _cl_overloadable NAME(float2 , float2 ); \
824+ float _cl_overloadable NAME(float3 , float3 ); \
825+ float _cl_overloadable NAME(float4 , float4 ); \
826+ float _cl_overloadable NAME(float8 , float8 ); \
827+ float _cl_overloadable NAME(float16 , float16 ); \
828+ double _cl_overloadable NAME(double , double ); \
829+ double _cl_overloadable NAME(double2 , double2 ); \
830+ double _cl_overloadable NAME(double3 , double3 ); \
831+ double _cl_overloadable NAME(double4 , double4 ); \
832+ double _cl_overloadable NAME(double8 , double8 ); \
833+ double _cl_overloadable NAME(double16, double16);
834
835 /* Move built-in declarations out of the way. (There should be a
836 better way of doing so.) These five functions are built-in math
837@@ -779,11 +918,26 @@
838 _CL_DECLARE_FUNC_V_V(fabs)
839 _CL_DECLARE_FUNC_V_VV(fdim)
840 _CL_DECLARE_FUNC_V_V(floor)
841-_CL_DECLARE_FUNC_V_VVV(fma)
842-_CL_DECLARE_FUNC_V_VV(fmax)
843-_CL_DECLARE_FUNC_V_VS(fmax)
844-_CL_DECLARE_FUNC_V_VV(fmin)
845-_CL_DECLARE_FUNC_V_VS(fmin)
846+#if __FAST__RELAXED__MATH__
847+# define _cl_fma _cl_fast_fma
848+#else
849+# define _cl_fma _cl_std_fma
850+#endif
851+#define _cl_fast_fma mad
852+_CL_DECLARE_FUNC_V_VVV(_cl_std_fma)
853+#if __FAST__RELAXED__MATH__
854+# define fmax _cl_fast_fmax
855+# define fmin _cl_fast_fmin
856+#else
857+# define fmax _cl_std_fmax
858+# define fmin _cl_std_fmin
859+#endif
860+#define _cl_fast_fmax max
861+#define _cl_fast_fmin min
862+_CL_DECLARE_FUNC_V_VV(_cl_std_fmax)
863+_CL_DECLARE_FUNC_V_VS(_cl_std_fmax)
864+_CL_DECLARE_FUNC_V_VV(_cl_std_fmin)
865+_CL_DECLARE_FUNC_V_VS(_cl_std_fmin)
866 _CL_DECLARE_FUNC_V_VV(fmod)
867 _CL_DECLARE_FUNC_V_VPV(fract)
868 // frexp
869@@ -850,380 +1004,380 @@
870
871 /* Integer Functions */
872
873-#define _CL_DECLARE_FUNC_G_G(NAME) \
874- char __attribute__ ((overloadable)) NAME(char ); \
875- char2 __attribute__ ((overloadable)) NAME(char2 ); \
876- char3 __attribute__ ((overloadable)) NAME(char3 ); \
877- char4 __attribute__ ((overloadable)) NAME(char4 ); \
878- char8 __attribute__ ((overloadable)) NAME(char8 ); \
879- char16 __attribute__ ((overloadable)) NAME(char16 ); \
880- short __attribute__ ((overloadable)) NAME(short ); \
881- short2 __attribute__ ((overloadable)) NAME(short2 ); \
882- short3 __attribute__ ((overloadable)) NAME(short3 ); \
883- short4 __attribute__ ((overloadable)) NAME(short4 ); \
884- short8 __attribute__ ((overloadable)) NAME(short8 ); \
885- short16 __attribute__ ((overloadable)) NAME(short16 ); \
886- int __attribute__ ((overloadable)) NAME(int ); \
887- int2 __attribute__ ((overloadable)) NAME(int2 ); \
888- int3 __attribute__ ((overloadable)) NAME(int3 ); \
889- int4 __attribute__ ((overloadable)) NAME(int4 ); \
890- int8 __attribute__ ((overloadable)) NAME(int8 ); \
891- int16 __attribute__ ((overloadable)) NAME(int16 ); \
892- long __attribute__ ((overloadable)) NAME(long ); \
893- long2 __attribute__ ((overloadable)) NAME(long2 ); \
894- long3 __attribute__ ((overloadable)) NAME(long3 ); \
895- long4 __attribute__ ((overloadable)) NAME(long4 ); \
896- long8 __attribute__ ((overloadable)) NAME(long8 ); \
897- long16 __attribute__ ((overloadable)) NAME(long16 ); \
898- uchar __attribute__ ((overloadable)) NAME(uchar ); \
899- uchar2 __attribute__ ((overloadable)) NAME(uchar2 ); \
900- uchar3 __attribute__ ((overloadable)) NAME(uchar3 ); \
901- uchar4 __attribute__ ((overloadable)) NAME(uchar4 ); \
902- uchar8 __attribute__ ((overloadable)) NAME(uchar8 ); \
903- uchar16 __attribute__ ((overloadable)) NAME(uchar16 ); \
904- ushort __attribute__ ((overloadable)) NAME(ushort ); \
905- ushort2 __attribute__ ((overloadable)) NAME(ushort2 ); \
906- ushort3 __attribute__ ((overloadable)) NAME(ushort3 ); \
907- ushort4 __attribute__ ((overloadable)) NAME(ushort4 ); \
908- ushort8 __attribute__ ((overloadable)) NAME(ushort8 ); \
909- ushort16 __attribute__ ((overloadable)) NAME(ushort16); \
910- uint __attribute__ ((overloadable)) NAME(uint ); \
911- uint2 __attribute__ ((overloadable)) NAME(uint2 ); \
912- uint3 __attribute__ ((overloadable)) NAME(uint3 ); \
913- uint4 __attribute__ ((overloadable)) NAME(uint4 ); \
914- uint8 __attribute__ ((overloadable)) NAME(uint8 ); \
915- uint16 __attribute__ ((overloadable)) NAME(uint16 ); \
916- ulong __attribute__ ((overloadable)) NAME(ulong ); \
917- ulong2 __attribute__ ((overloadable)) NAME(ulong2 ); \
918- ulong3 __attribute__ ((overloadable)) NAME(ulong3 ); \
919- ulong4 __attribute__ ((overloadable)) NAME(ulong4 ); \
920- ulong8 __attribute__ ((overloadable)) NAME(ulong8 ); \
921- ulong16 __attribute__ ((overloadable)) NAME(ulong16 );
922-#define _CL_DECLARE_FUNC_G_GG(NAME) \
923- char __attribute__ ((overloadable)) NAME(char , char ); \
924- char2 __attribute__ ((overloadable)) NAME(char2 , char2 ); \
925- char3 __attribute__ ((overloadable)) NAME(char3 , char3 ); \
926- char4 __attribute__ ((overloadable)) NAME(char4 , char4 ); \
927- char8 __attribute__ ((overloadable)) NAME(char8 , char8 ); \
928- char16 __attribute__ ((overloadable)) NAME(char16 , char16 ); \
929- short __attribute__ ((overloadable)) NAME(short , short ); \
930- short2 __attribute__ ((overloadable)) NAME(short2 , short2 ); \
931- short3 __attribute__ ((overloadable)) NAME(short3 , short3 ); \
932- short4 __attribute__ ((overloadable)) NAME(short4 , short4 ); \
933- short8 __attribute__ ((overloadable)) NAME(short8 , short8 ); \
934- short16 __attribute__ ((overloadable)) NAME(short16 , short16 ); \
935- int __attribute__ ((overloadable)) NAME(int , int ); \
936- int2 __attribute__ ((overloadable)) NAME(int2 , int2 ); \
937- int3 __attribute__ ((overloadable)) NAME(int3 , int3 ); \
938- int4 __attribute__ ((overloadable)) NAME(int4 , int4 ); \
939- int8 __attribute__ ((overloadable)) NAME(int8 , int8 ); \
940- int16 __attribute__ ((overloadable)) NAME(int16 , int16 ); \
941- long __attribute__ ((overloadable)) NAME(long , long ); \
942- long2 __attribute__ ((overloadable)) NAME(long2 , long2 ); \
943- long3 __attribute__ ((overloadable)) NAME(long3 , long3 ); \
944- long4 __attribute__ ((overloadable)) NAME(long4 , long4 ); \
945- long8 __attribute__ ((overloadable)) NAME(long8 , long8 ); \
946- long16 __attribute__ ((overloadable)) NAME(long16 , long16 ); \
947- uchar __attribute__ ((overloadable)) NAME(uchar , uchar ); \
948- uchar2 __attribute__ ((overloadable)) NAME(uchar2 , uchar2 ); \
949- uchar3 __attribute__ ((overloadable)) NAME(uchar3 , uchar3 ); \
950- uchar4 __attribute__ ((overloadable)) NAME(uchar4 , uchar4 ); \
951- uchar8 __attribute__ ((overloadable)) NAME(uchar8 , uchar8 ); \
952- uchar16 __attribute__ ((overloadable)) NAME(uchar16 , uchar16 ); \
953- ushort __attribute__ ((overloadable)) NAME(ushort , ushort ); \
954- ushort2 __attribute__ ((overloadable)) NAME(ushort2 , ushort2 ); \
955- ushort3 __attribute__ ((overloadable)) NAME(ushort3 , ushort3 ); \
956- ushort4 __attribute__ ((overloadable)) NAME(ushort4 , ushort4 ); \
957- ushort8 __attribute__ ((overloadable)) NAME(ushort8 , ushort8 ); \
958- ushort16 __attribute__ ((overloadable)) NAME(ushort16, ushort16); \
959- uint __attribute__ ((overloadable)) NAME(uint , uint ); \
960- uint2 __attribute__ ((overloadable)) NAME(uint2 , uint2 ); \
961- uint3 __attribute__ ((overloadable)) NAME(uint3 , uint3 ); \
962- uint4 __attribute__ ((overloadable)) NAME(uint4 , uint4 ); \
963- uint8 __attribute__ ((overloadable)) NAME(uint8 , uint8 ); \
964- uint16 __attribute__ ((overloadable)) NAME(uint16 , uint16 ); \
965- ulong __attribute__ ((overloadable)) NAME(ulong , ulong ); \
966- ulong2 __attribute__ ((overloadable)) NAME(ulong2 , ulong2 ); \
967- ulong3 __attribute__ ((overloadable)) NAME(ulong3 , ulong3 ); \
968- ulong4 __attribute__ ((overloadable)) NAME(ulong4 , ulong4 ); \
969- ulong8 __attribute__ ((overloadable)) NAME(ulong8 , ulong8 ); \
970- ulong16 __attribute__ ((overloadable)) NAME(ulong16 , ulong16 );
971-#define _CL_DECLARE_FUNC_G_GGG(NAME) \
972- char __attribute__ ((overloadable)) NAME(char , char , char ); \
973- char2 __attribute__ ((overloadable)) NAME(char2 , char2 , char2 ); \
974- char3 __attribute__ ((overloadable)) NAME(char3 , char3 , char3 ); \
975- char4 __attribute__ ((overloadable)) NAME(char4 , char4 , char4 ); \
976- char8 __attribute__ ((overloadable)) NAME(char8 , char8 , char8 ); \
977- char16 __attribute__ ((overloadable)) NAME(char16 , char16 , char16 ); \
978- short __attribute__ ((overloadable)) NAME(short , short , short ); \
979- short2 __attribute__ ((overloadable)) NAME(short2 , short2 , short2 ); \
980- short3 __attribute__ ((overloadable)) NAME(short3 , short3 , short3 ); \
981- short4 __attribute__ ((overloadable)) NAME(short4 , short4 , short4 ); \
982- short8 __attribute__ ((overloadable)) NAME(short8 , short8 , short8 ); \
983- short16 __attribute__ ((overloadable)) NAME(short16 , short16 , short16 ); \
984- int __attribute__ ((overloadable)) NAME(int , int , int ); \
985- int2 __attribute__ ((overloadable)) NAME(int2 , int2 , int2 ); \
986- int3 __attribute__ ((overloadable)) NAME(int3 , int3 , int3 ); \
987- int4 __attribute__ ((overloadable)) NAME(int4 , int4 , int4 ); \
988- int8 __attribute__ ((overloadable)) NAME(int8 , int8 , int8 ); \
989- int16 __attribute__ ((overloadable)) NAME(int16 , int16 , int16 ); \
990- long __attribute__ ((overloadable)) NAME(long , long , long ); \
991- long2 __attribute__ ((overloadable)) NAME(long2 , long2 , long2 ); \
992- long3 __attribute__ ((overloadable)) NAME(long3 , long3 , long3 ); \
993- long4 __attribute__ ((overloadable)) NAME(long4 , long4 , long4 ); \
994- long8 __attribute__ ((overloadable)) NAME(long8 , long8 , long8 ); \
995- long16 __attribute__ ((overloadable)) NAME(long16 , long16 , long16 ); \
996- uchar __attribute__ ((overloadable)) NAME(uchar , uchar , uchar ); \
997- uchar2 __attribute__ ((overloadable)) NAME(uchar2 , uchar2 , uchar2 ); \
998- uchar3 __attribute__ ((overloadable)) NAME(uchar3 , uchar3 , uchar3 ); \
999- uchar4 __attribute__ ((overloadable)) NAME(uchar4 , uchar4 , uchar4 ); \
1000- uchar8 __attribute__ ((overloadable)) NAME(uchar8 , uchar8 , uchar8 ); \
1001- uchar16 __attribute__ ((overloadable)) NAME(uchar16 , uchar16 , uchar16 ); \
1002- ushort __attribute__ ((overloadable)) NAME(ushort , ushort , ushort ); \
1003- ushort2 __attribute__ ((overloadable)) NAME(ushort2 , ushort2 , ushort2 ); \
1004- ushort3 __attribute__ ((overloadable)) NAME(ushort3 , ushort3 , ushort3 ); \
1005- ushort4 __attribute__ ((overloadable)) NAME(ushort4 , ushort4 , ushort4 ); \
1006- ushort8 __attribute__ ((overloadable)) NAME(ushort8 , ushort8 , ushort8 ); \
1007- ushort16 __attribute__ ((overloadable)) NAME(ushort16, ushort16, ushort16); \
1008- uint __attribute__ ((overloadable)) NAME(uint , uint , uint ); \
1009- uint2 __attribute__ ((overloadable)) NAME(uint2 , uint2 , uint2 ); \
1010- uint3 __attribute__ ((overloadable)) NAME(uint3 , uint3 , uint3 ); \
1011- uint4 __attribute__ ((overloadable)) NAME(uint4 , uint4 , uint4 ); \
1012- uint8 __attribute__ ((overloadable)) NAME(uint8 , uint8 , uint8 ); \
1013- uint16 __attribute__ ((overloadable)) NAME(uint16 , uint16 , uint16 ); \
1014- ulong __attribute__ ((overloadable)) NAME(ulong , ulong , ulong ); \
1015- ulong2 __attribute__ ((overloadable)) NAME(ulong2 , ulong2 , ulong2 ); \
1016- ulong3 __attribute__ ((overloadable)) NAME(ulong3 , ulong3 , ulong3 ); \
1017- ulong4 __attribute__ ((overloadable)) NAME(ulong4 , ulong4 , ulong4 ); \
1018- ulong8 __attribute__ ((overloadable)) NAME(ulong8 , ulong8 , ulong8 ); \
1019- ulong16 __attribute__ ((overloadable)) NAME(ulong16 , ulong16 , ulong16 );
1020-#define _CL_DECLARE_FUNC_G_GS(NAME) \
1021- char2 __attribute__ ((overloadable)) NAME(char2 , char ); \
1022- char3 __attribute__ ((overloadable)) NAME(char3 , char ); \
1023- char4 __attribute__ ((overloadable)) NAME(char4 , char ); \
1024- char8 __attribute__ ((overloadable)) NAME(char8 , char ); \
1025- char16 __attribute__ ((overloadable)) NAME(char16 , char ); \
1026- short2 __attribute__ ((overloadable)) NAME(short2 , short ); \
1027- short3 __attribute__ ((overloadable)) NAME(short3 , short ); \
1028- short4 __attribute__ ((overloadable)) NAME(short4 , short ); \
1029- short8 __attribute__ ((overloadable)) NAME(short8 , short ); \
1030- short16 __attribute__ ((overloadable)) NAME(short16 , short ); \
1031- int2 __attribute__ ((overloadable)) NAME(int2 , int ); \
1032- int3 __attribute__ ((overloadable)) NAME(int3 , int ); \
1033- int4 __attribute__ ((overloadable)) NAME(int4 , int ); \
1034- int8 __attribute__ ((overloadable)) NAME(int8 , int ); \
1035- int16 __attribute__ ((overloadable)) NAME(int16 , int ); \
1036- long2 __attribute__ ((overloadable)) NAME(long2 , long ); \
1037- long3 __attribute__ ((overloadable)) NAME(long3 , long ); \
1038- long4 __attribute__ ((overloadable)) NAME(long4 , long ); \
1039- long8 __attribute__ ((overloadable)) NAME(long8 , long ); \
1040- long16 __attribute__ ((overloadable)) NAME(long16 , long ); \
1041- uchar2 __attribute__ ((overloadable)) NAME(uchar2 , uchar ); \
1042- uchar3 __attribute__ ((overloadable)) NAME(uchar3 , uchar ); \
1043- uchar4 __attribute__ ((overloadable)) NAME(uchar4 , uchar ); \
1044- uchar8 __attribute__ ((overloadable)) NAME(uchar8 , uchar ); \
1045- uchar16 __attribute__ ((overloadable)) NAME(uchar16 , uchar ); \
1046- ushort2 __attribute__ ((overloadable)) NAME(ushort2 , ushort); \
1047- ushort3 __attribute__ ((overloadable)) NAME(ushort3 , ushort); \
1048- ushort4 __attribute__ ((overloadable)) NAME(ushort4 , ushort); \
1049- ushort8 __attribute__ ((overloadable)) NAME(ushort8 , ushort); \
1050- ushort16 __attribute__ ((overloadable)) NAME(ushort16, ushort); \
1051- uint2 __attribute__ ((overloadable)) NAME(uint2 , uint ); \
1052- uint3 __attribute__ ((overloadable)) NAME(uint3 , uint ); \
1053- uint4 __attribute__ ((overloadable)) NAME(uint4 , uint ); \
1054- uint8 __attribute__ ((overloadable)) NAME(uint8 , uint ); \
1055- uint16 __attribute__ ((overloadable)) NAME(uint16 , uint ); \
1056- ulong2 __attribute__ ((overloadable)) NAME(ulong2 , ulong ); \
1057- ulong3 __attribute__ ((overloadable)) NAME(ulong3 , ulong ); \
1058- ulong4 __attribute__ ((overloadable)) NAME(ulong4 , ulong ); \
1059- ulong8 __attribute__ ((overloadable)) NAME(ulong8 , ulong ); \
1060- ulong16 __attribute__ ((overloadable)) NAME(ulong16 , ulong );
1061-#define _CL_DECLARE_FUNC_UG_G(NAME) \
1062- uchar __attribute__ ((overloadable)) NAME(char ); \
1063- uchar2 __attribute__ ((overloadable)) NAME(char2 ); \
1064- uchar3 __attribute__ ((overloadable)) NAME(char3 ); \
1065- uchar4 __attribute__ ((overloadable)) NAME(char4 ); \
1066- uchar8 __attribute__ ((overloadable)) NAME(char8 ); \
1067- uchar16 __attribute__ ((overloadable)) NAME(char16 ); \
1068- ushort __attribute__ ((overloadable)) NAME(short ); \
1069- ushort2 __attribute__ ((overloadable)) NAME(short2 ); \
1070- ushort3 __attribute__ ((overloadable)) NAME(short3 ); \
1071- ushort4 __attribute__ ((overloadable)) NAME(short4 ); \
1072- ushort8 __attribute__ ((overloadable)) NAME(short8 ); \
1073- ushort16 __attribute__ ((overloadable)) NAME(short16 ); \
1074- uint __attribute__ ((overloadable)) NAME(int ); \
1075- uint2 __attribute__ ((overloadable)) NAME(int2 ); \
1076- uint3 __attribute__ ((overloadable)) NAME(int3 ); \
1077- uint4 __attribute__ ((overloadable)) NAME(int4 ); \
1078- uint8 __attribute__ ((overloadable)) NAME(int8 ); \
1079- uint16 __attribute__ ((overloadable)) NAME(int16 ); \
1080- ulong __attribute__ ((overloadable)) NAME(long ); \
1081- ulong2 __attribute__ ((overloadable)) NAME(long2 ); \
1082- ulong3 __attribute__ ((overloadable)) NAME(long3 ); \
1083- ulong4 __attribute__ ((overloadable)) NAME(long4 ); \
1084- ulong8 __attribute__ ((overloadable)) NAME(long8 ); \
1085- ulong16 __attribute__ ((overloadable)) NAME(long16 ); \
1086- uchar __attribute__ ((overloadable)) NAME(uchar ); \
1087- uchar2 __attribute__ ((overloadable)) NAME(uchar2 ); \
1088- uchar3 __attribute__ ((overloadable)) NAME(uchar3 ); \
1089- uchar4 __attribute__ ((overloadable)) NAME(uchar4 ); \
1090- uchar8 __attribute__ ((overloadable)) NAME(uchar8 ); \
1091- uchar16 __attribute__ ((overloadable)) NAME(uchar16 ); \
1092- ushort __attribute__ ((overloadable)) NAME(ushort ); \
1093- ushort2 __attribute__ ((overloadable)) NAME(ushort2 ); \
1094- ushort3 __attribute__ ((overloadable)) NAME(ushort3 ); \
1095- ushort4 __attribute__ ((overloadable)) NAME(ushort4 ); \
1096- ushort8 __attribute__ ((overloadable)) NAME(ushort8 ); \
1097- ushort16 __attribute__ ((overloadable)) NAME(ushort16); \
1098- uint __attribute__ ((overloadable)) NAME(uint ); \
1099- uint2 __attribute__ ((overloadable)) NAME(uint2 ); \
1100- uint3 __attribute__ ((overloadable)) NAME(uint3 ); \
1101- uint4 __attribute__ ((overloadable)) NAME(uint4 ); \
1102- uint8 __attribute__ ((overloadable)) NAME(uint8 ); \
1103- uint16 __attribute__ ((overloadable)) NAME(uint16 ); \
1104- ulong __attribute__ ((overloadable)) NAME(ulong ); \
1105- ulong2 __attribute__ ((overloadable)) NAME(ulong2 ); \
1106- ulong3 __attribute__ ((overloadable)) NAME(ulong3 ); \
1107- ulong4 __attribute__ ((overloadable)) NAME(ulong4 ); \
1108- ulong8 __attribute__ ((overloadable)) NAME(ulong8 ); \
1109- ulong16 __attribute__ ((overloadable)) NAME(ulong16 );
1110-#define _CL_DECLARE_FUNC_UG_GG(NAME) \
1111- uchar __attribute__ ((overloadable)) NAME(char , char ); \
1112- uchar2 __attribute__ ((overloadable)) NAME(char2 , char2 ); \
1113- uchar3 __attribute__ ((overloadable)) NAME(char3 , char3 ); \
1114- uchar4 __attribute__ ((overloadable)) NAME(char4 , char4 ); \
1115- uchar8 __attribute__ ((overloadable)) NAME(char8 , char8 ); \
1116- uchar16 __attribute__ ((overloadable)) NAME(char16 , char16 ); \
1117- ushort __attribute__ ((overloadable)) NAME(short , short ); \
1118- ushort2 __attribute__ ((overloadable)) NAME(short2 , short2 ); \
1119- ushort3 __attribute__ ((overloadable)) NAME(short3 , short3 ); \
1120- ushort4 __attribute__ ((overloadable)) NAME(short4 , short4 ); \
1121- ushort8 __attribute__ ((overloadable)) NAME(short8 , short8 ); \
1122- ushort16 __attribute__ ((overloadable)) NAME(short16 , short16 ); \
1123- uint __attribute__ ((overloadable)) NAME(int , int ); \
1124- uint2 __attribute__ ((overloadable)) NAME(int2 , int2 ); \
1125- uint3 __attribute__ ((overloadable)) NAME(int3 , int3 ); \
1126- uint4 __attribute__ ((overloadable)) NAME(int4 , int4 ); \
1127- uint8 __attribute__ ((overloadable)) NAME(int8 , int8 ); \
1128- uint16 __attribute__ ((overloadable)) NAME(int16 , int16 ); \
1129- ulong __attribute__ ((overloadable)) NAME(long , long ); \
1130- ulong2 __attribute__ ((overloadable)) NAME(long2 , long2 ); \
1131- ulong3 __attribute__ ((overloadable)) NAME(long3 , long3 ); \
1132- ulong4 __attribute__ ((overloadable)) NAME(long4 , long4 ); \
1133- ulong8 __attribute__ ((overloadable)) NAME(long8 , long8 ); \
1134- ulong16 __attribute__ ((overloadable)) NAME(long16 , long16 ); \
1135- uchar __attribute__ ((overloadable)) NAME(uchar , uchar ); \
1136- uchar2 __attribute__ ((overloadable)) NAME(uchar2 , uchar2 ); \
1137- uchar3 __attribute__ ((overloadable)) NAME(uchar3 , uchar3 ); \
1138- uchar4 __attribute__ ((overloadable)) NAME(uchar4 , uchar4 ); \
1139- uchar8 __attribute__ ((overloadable)) NAME(uchar8 , uchar8 ); \
1140- uchar16 __attribute__ ((overloadable)) NAME(uchar16 , uchar16 ); \
1141- ushort __attribute__ ((overloadable)) NAME(ushort , ushort ); \
1142- ushort2 __attribute__ ((overloadable)) NAME(ushort2 , ushort2 ); \
1143- ushort3 __attribute__ ((overloadable)) NAME(ushort3 , ushort3 ); \
1144- ushort4 __attribute__ ((overloadable)) NAME(ushort4 , ushort4 ); \
1145- ushort8 __attribute__ ((overloadable)) NAME(ushort8 , ushort8 ); \
1146- ushort16 __attribute__ ((overloadable)) NAME(ushort16, ushort16); \
1147- uint __attribute__ ((overloadable)) NAME(uint , uint ); \
1148- uint2 __attribute__ ((overloadable)) NAME(uint2 , uint2 ); \
1149- uint3 __attribute__ ((overloadable)) NAME(uint3 , uint3 ); \
1150- uint4 __attribute__ ((overloadable)) NAME(uint4 , uint4 ); \
1151- uint8 __attribute__ ((overloadable)) NAME(uint8 , uint8 ); \
1152- uint16 __attribute__ ((overloadable)) NAME(uint16 , uint16 ); \
1153- ulong __attribute__ ((overloadable)) NAME(ulong , ulong ); \
1154- ulong2 __attribute__ ((overloadable)) NAME(ulong2 , ulong2 ); \
1155- ulong3 __attribute__ ((overloadable)) NAME(ulong3 , ulong3 ); \
1156- ulong4 __attribute__ ((overloadable)) NAME(ulong4 , ulong4 ); \
1157- ulong8 __attribute__ ((overloadable)) NAME(ulong8 , ulong8 ); \
1158- ulong16 __attribute__ ((overloadable)) NAME(ulong16 , ulong16 );
1159-#define _CL_DECLARE_FUNC_LG_GUG(NAME) \
1160- short __attribute__ ((overloadable)) NAME(char , uchar ); \
1161- short2 __attribute__ ((overloadable)) NAME(char2 , uchar2 ); \
1162- short3 __attribute__ ((overloadable)) NAME(char3 , uchar3 ); \
1163- short4 __attribute__ ((overloadable)) NAME(char4 , uchar4 ); \
1164- short8 __attribute__ ((overloadable)) NAME(char8 , uchar8 ); \
1165- short16 __attribute__ ((overloadable)) NAME(char16 , uchar16 ); \
1166- int __attribute__ ((overloadable)) NAME(short , ushort ); \
1167- int2 __attribute__ ((overloadable)) NAME(short2 , ushort2 ); \
1168- int3 __attribute__ ((overloadable)) NAME(short3 , ushort3 ); \
1169- int4 __attribute__ ((overloadable)) NAME(short4 , ushort4 ); \
1170- int8 __attribute__ ((overloadable)) NAME(short8 , ushort8 ); \
1171- int16 __attribute__ ((overloadable)) NAME(short16 , ushort16 ); \
1172- long __attribute__ ((overloadable)) NAME(int , uint ); \
1173- long2 __attribute__ ((overloadable)) NAME(int2 , uint2 ); \
1174- long3 __attribute__ ((overloadable)) NAME(int3 , uint3 ); \
1175- long4 __attribute__ ((overloadable)) NAME(int4 , uint4 ); \
1176- long8 __attribute__ ((overloadable)) NAME(int8 , uint8 ); \
1177- long16 __attribute__ ((overloadable)) NAME(int16 , uint16 ); \
1178- ushort __attribute__ ((overloadable)) NAME(uchar , uchar ); \
1179- ushort2 __attribute__ ((overloadable)) NAME(uchar2 , uchar2 ); \
1180- ushort3 __attribute__ ((overloadable)) NAME(uchar3 , uchar3 ); \
1181- ushort4 __attribute__ ((overloadable)) NAME(uchar4 , uchar4 ); \
1182- ushort8 __attribute__ ((overloadable)) NAME(uchar8 , uchar8 ); \
1183- ushort16 __attribute__ ((overloadable)) NAME(uchar16 , uchar16 ); \
1184- uint __attribute__ ((overloadable)) NAME(ushort , ushort ); \
1185- uint2 __attribute__ ((overloadable)) NAME(ushort2 , ushort2 ); \
1186- uint3 __attribute__ ((overloadable)) NAME(ushort3 , ushort3 ); \
1187- uint4 __attribute__ ((overloadable)) NAME(ushort4 , ushort4 ); \
1188- uint8 __attribute__ ((overloadable)) NAME(ushort8 , ushort8 ); \
1189- uint16 __attribute__ ((overloadable)) NAME(ushort16, ushort16); \
1190- ulong __attribute__ ((overloadable)) NAME(uint , uint ); \
1191- ulong2 __attribute__ ((overloadable)) NAME(uint2 , uint2 ); \
1192- ulong3 __attribute__ ((overloadable)) NAME(uint3 , uint3 ); \
1193- ulong4 __attribute__ ((overloadable)) NAME(uint4 , uint4 ); \
1194- ulong8 __attribute__ ((overloadable)) NAME(uint8 , uint8 ); \
1195- ulong16 __attribute__ ((overloadable)) NAME(uint16 , uint16 );
1196-#define _CL_DECLARE_FUNC_I_IG(NAME) \
1197- int __attribute__ ((overloadable)) NAME(char ); \
1198- int __attribute__ ((overloadable)) NAME(char2 ); \
1199- int __attribute__ ((overloadable)) NAME(char3 ); \
1200- int __attribute__ ((overloadable)) NAME(char4 ); \
1201- int __attribute__ ((overloadable)) NAME(char8 ); \
1202- int __attribute__ ((overloadable)) NAME(char16 ); \
1203- int __attribute__ ((overloadable)) NAME(short ); \
1204- int __attribute__ ((overloadable)) NAME(short2 ); \
1205- int __attribute__ ((overloadable)) NAME(short3 ); \
1206- int __attribute__ ((overloadable)) NAME(short4 ); \
1207- int __attribute__ ((overloadable)) NAME(short8 ); \
1208- int __attribute__ ((overloadable)) NAME(short16); \
1209- int __attribute__ ((overloadable)) NAME(int ); \
1210- int __attribute__ ((overloadable)) NAME(int2 ); \
1211- int __attribute__ ((overloadable)) NAME(int3 ); \
1212- int __attribute__ ((overloadable)) NAME(int4 ); \
1213- int __attribute__ ((overloadable)) NAME(int8 ); \
1214- int __attribute__ ((overloadable)) NAME(int16 ); \
1215- int __attribute__ ((overloadable)) NAME(long ); \
1216- int __attribute__ ((overloadable)) NAME(long2 ); \
1217- int __attribute__ ((overloadable)) NAME(long3 ); \
1218- int __attribute__ ((overloadable)) NAME(long4 ); \
1219- int __attribute__ ((overloadable)) NAME(long8 ); \
1220- int __attribute__ ((overloadable)) NAME(long16 );
1221-#define _CL_DECLARE_FUNC_J_JJ(NAME) \
1222- int __attribute__ ((overloadable)) NAME(int , int ); \
1223- int2 __attribute__ ((overloadable)) NAME(int2 , int2 ); \
1224- int3 __attribute__ ((overloadable)) NAME(int3 , int3 ); \
1225- int4 __attribute__ ((overloadable)) NAME(int4 , int4 ); \
1226- int8 __attribute__ ((overloadable)) NAME(int8 , int8 ); \
1227- int16 __attribute__ ((overloadable)) NAME(int16 , int16 ); \
1228- uint __attribute__ ((overloadable)) NAME(uint , uint ); \
1229- uint2 __attribute__ ((overloadable)) NAME(uint2 , uint2 ); \
1230- uint3 __attribute__ ((overloadable)) NAME(uint3 , uint3 ); \
1231- uint4 __attribute__ ((overloadable)) NAME(uint4 , uint4 ); \
1232- uint8 __attribute__ ((overloadable)) NAME(uint8 , uint8 ); \
1233- uint16 __attribute__ ((overloadable)) NAME(uint16 , uint16 );
1234-#define _CL_DECLARE_FUNC_J_JJJ(NAME) \
1235- int __attribute__ ((overloadable)) NAME(int , int , int ); \
1236- int2 __attribute__ ((overloadable)) NAME(int2 , int2 , int2 ); \
1237- int3 __attribute__ ((overloadable)) NAME(int3 , int3 , int3 ); \
1238- int4 __attribute__ ((overloadable)) NAME(int4 , int4 , int4 ); \
1239- int8 __attribute__ ((overloadable)) NAME(int8 , int8 , int8 ); \
1240- int16 __attribute__ ((overloadable)) NAME(int16 , int16 , int16 ); \
1241- uint __attribute__ ((overloadable)) NAME(uint , uint , uint ); \
1242- uint2 __attribute__ ((overloadable)) NAME(uint2 , uint2 , uint2 ); \
1243- uint3 __attribute__ ((overloadable)) NAME(uint3 , uint3 , uint3 ); \
1244- uint4 __attribute__ ((overloadable)) NAME(uint4 , uint4 , uint4 ); \
1245- uint8 __attribute__ ((overloadable)) NAME(uint8 , uint8 , uint8 ); \
1246- uint16 __attribute__ ((overloadable)) NAME(uint16 , uint16 , uint16 );
1247+#define _CL_DECLARE_FUNC_G_G(NAME) \
1248+ char _cl_overloadable NAME(char ); \
1249+ char2 _cl_overloadable NAME(char2 ); \
1250+ char3 _cl_overloadable NAME(char3 ); \
1251+ char4 _cl_overloadable NAME(char4 ); \
1252+ char8 _cl_overloadable NAME(char8 ); \
1253+ char16 _cl_overloadable NAME(char16 ); \
1254+ short _cl_overloadable NAME(short ); \
1255+ short2 _cl_overloadable NAME(short2 ); \
1256+ short3 _cl_overloadable NAME(short3 ); \
1257+ short4 _cl_overloadable NAME(short4 ); \
1258+ short8 _cl_overloadable NAME(short8 ); \
1259+ short16 _cl_overloadable NAME(short16 ); \
1260+ int _cl_overloadable NAME(int ); \
1261+ int2 _cl_overloadable NAME(int2 ); \
1262+ int3 _cl_overloadable NAME(int3 ); \
1263+ int4 _cl_overloadable NAME(int4 ); \
1264+ int8 _cl_overloadable NAME(int8 ); \
1265+ int16 _cl_overloadable NAME(int16 ); \
1266+ long _cl_overloadable NAME(long ); \
1267+ long2 _cl_overloadable NAME(long2 ); \
1268+ long3 _cl_overloadable NAME(long3 ); \
1269+ long4 _cl_overloadable NAME(long4 ); \
1270+ long8 _cl_overloadable NAME(long8 ); \
1271+ long16 _cl_overloadable NAME(long16 ); \
1272+ uchar _cl_overloadable NAME(uchar ); \
1273+ uchar2 _cl_overloadable NAME(uchar2 ); \
1274+ uchar3 _cl_overloadable NAME(uchar3 ); \
1275+ uchar4 _cl_overloadable NAME(uchar4 ); \
1276+ uchar8 _cl_overloadable NAME(uchar8 ); \
1277+ uchar16 _cl_overloadable NAME(uchar16 ); \
1278+ ushort _cl_overloadable NAME(ushort ); \
1279+ ushort2 _cl_overloadable NAME(ushort2 ); \
1280+ ushort3 _cl_overloadable NAME(ushort3 ); \
1281+ ushort4 _cl_overloadable NAME(ushort4 ); \
1282+ ushort8 _cl_overloadable NAME(ushort8 ); \
1283+ ushort16 _cl_overloadable NAME(ushort16); \
1284+ uint _cl_overloadable NAME(uint ); \
1285+ uint2 _cl_overloadable NAME(uint2 ); \
1286+ uint3 _cl_overloadable NAME(uint3 ); \
1287+ uint4 _cl_overloadable NAME(uint4 ); \
1288+ uint8 _cl_overloadable NAME(uint8 ); \
1289+ uint16 _cl_overloadable NAME(uint16 ); \
1290+ ulong _cl_overloadable NAME(ulong ); \
1291+ ulong2 _cl_overloadable NAME(ulong2 ); \
1292+ ulong3 _cl_overloadable NAME(ulong3 ); \
1293+ ulong4 _cl_overloadable NAME(ulong4 ); \
1294+ ulong8 _cl_overloadable NAME(ulong8 ); \
1295+ ulong16 _cl_overloadable NAME(ulong16 );
1296+#define _CL_DECLARE_FUNC_G_GG(NAME) \
1297+ char _cl_overloadable NAME(char , char ); \
1298+ char2 _cl_overloadable NAME(char2 , char2 ); \
1299+ char3 _cl_overloadable NAME(char3 , char3 ); \
1300+ char4 _cl_overloadable NAME(char4 , char4 ); \
1301+ char8 _cl_overloadable NAME(char8 , char8 ); \
1302+ char16 _cl_overloadable NAME(char16 , char16 ); \
1303+ short _cl_overloadable NAME(short , short ); \
1304+ short2 _cl_overloadable NAME(short2 , short2 ); \
1305+ short3 _cl_overloadable NAME(short3 , short3 ); \
1306+ short4 _cl_overloadable NAME(short4 , short4 ); \
1307+ short8 _cl_overloadable NAME(short8 , short8 ); \
1308+ short16 _cl_overloadable NAME(short16 , short16 ); \
1309+ int _cl_overloadable NAME(int , int ); \
1310+ int2 _cl_overloadable NAME(int2 , int2 ); \
1311+ int3 _cl_overloadable NAME(int3 , int3 ); \
1312+ int4 _cl_overloadable NAME(int4 , int4 ); \
1313+ int8 _cl_overloadable NAME(int8 , int8 ); \
1314+ int16 _cl_overloadable NAME(int16 , int16 ); \
1315+ long _cl_overloadable NAME(long , long ); \
1316+ long2 _cl_overloadable NAME(long2 , long2 ); \
1317+ long3 _cl_overloadable NAME(long3 , long3 ); \
1318+ long4 _cl_overloadable NAME(long4 , long4 ); \
1319+ long8 _cl_overloadable NAME(long8 , long8 ); \
1320+ long16 _cl_overloadable NAME(long16 , long16 ); \
1321+ uchar _cl_overloadable NAME(uchar , uchar ); \
1322+ uchar2 _cl_overloadable NAME(uchar2 , uchar2 ); \
1323+ uchar3 _cl_overloadable NAME(uchar3 , uchar3 ); \
1324+ uchar4 _cl_overloadable NAME(uchar4 , uchar4 ); \
1325+ uchar8 _cl_overloadable NAME(uchar8 , uchar8 ); \
1326+ uchar16 _cl_overloadable NAME(uchar16 , uchar16 ); \
1327+ ushort _cl_overloadable NAME(ushort , ushort ); \
1328+ ushort2 _cl_overloadable NAME(ushort2 , ushort2 ); \
1329+ ushort3 _cl_overloadable NAME(ushort3 , ushort3 ); \
1330+ ushort4 _cl_overloadable NAME(ushort4 , ushort4 ); \
1331+ ushort8 _cl_overloadable NAME(ushort8 , ushort8 ); \
1332+ ushort16 _cl_overloadable NAME(ushort16, ushort16); \
1333+ uint _cl_overloadable NAME(uint , uint ); \
1334+ uint2 _cl_overloadable NAME(uint2 , uint2 ); \
1335+ uint3 _cl_overloadable NAME(uint3 , uint3 ); \
1336+ uint4 _cl_overloadable NAME(uint4 , uint4 ); \
1337+ uint8 _cl_overloadable NAME(uint8 , uint8 ); \
1338+ uint16 _cl_overloadable NAME(uint16 , uint16 ); \
1339+ ulong _cl_overloadable NAME(ulong , ulong ); \
1340+ ulong2 _cl_overloadable NAME(ulong2 , ulong2 ); \
1341+ ulong3 _cl_overloadable NAME(ulong3 , ulong3 ); \
1342+ ulong4 _cl_overloadable NAME(ulong4 , ulong4 ); \
1343+ ulong8 _cl_overloadable NAME(ulong8 , ulong8 ); \
1344+ ulong16 _cl_overloadable NAME(ulong16 , ulong16 );
1345+#define _CL_DECLARE_FUNC_G_GGG(NAME) \
1346+ char _cl_overloadable NAME(char , char , char ); \
1347+ char2 _cl_overloadable NAME(char2 , char2 , char2 ); \
1348+ char3 _cl_overloadable NAME(char3 , char3 , char3 ); \
1349+ char4 _cl_overloadable NAME(char4 , char4 , char4 ); \
1350+ char8 _cl_overloadable NAME(char8 , char8 , char8 ); \
1351+ char16 _cl_overloadable NAME(char16 , char16 , char16 ); \
1352+ short _cl_overloadable NAME(short , short , short ); \
1353+ short2 _cl_overloadable NAME(short2 , short2 , short2 ); \
1354+ short3 _cl_overloadable NAME(short3 , short3 , short3 ); \
1355+ short4 _cl_overloadable NAME(short4 , short4 , short4 ); \
1356+ short8 _cl_overloadable NAME(short8 , short8 , short8 ); \
1357+ short16 _cl_overloadable NAME(short16 , short16 , short16 ); \
1358+ int _cl_overloadable NAME(int , int , int ); \
1359+ int2 _cl_overloadable NAME(int2 , int2 , int2 ); \
1360+ int3 _cl_overloadable NAME(int3 , int3 , int3 ); \
1361+ int4 _cl_overloadable NAME(int4 , int4 , int4 ); \
1362+ int8 _cl_overloadable NAME(int8 , int8 , int8 ); \
1363+ int16 _cl_overloadable NAME(int16 , int16 , int16 ); \
1364+ long _cl_overloadable NAME(long , long , long ); \
1365+ long2 _cl_overloadable NAME(long2 , long2 , long2 ); \
1366+ long3 _cl_overloadable NAME(long3 , long3 , long3 ); \
1367+ long4 _cl_overloadable NAME(long4 , long4 , long4 ); \
1368+ long8 _cl_overloadable NAME(long8 , long8 , long8 ); \
1369+ long16 _cl_overloadable NAME(long16 , long16 , long16 ); \
1370+ uchar _cl_overloadable NAME(uchar , uchar , uchar ); \
1371+ uchar2 _cl_overloadable NAME(uchar2 , uchar2 , uchar2 ); \
1372+ uchar3 _cl_overloadable NAME(uchar3 , uchar3 , uchar3 ); \
1373+ uchar4 _cl_overloadable NAME(uchar4 , uchar4 , uchar4 ); \
1374+ uchar8 _cl_overloadable NAME(uchar8 , uchar8 , uchar8 ); \
1375+ uchar16 _cl_overloadable NAME(uchar16 , uchar16 , uchar16 ); \
1376+ ushort _cl_overloadable NAME(ushort , ushort , ushort ); \
1377+ ushort2 _cl_overloadable NAME(ushort2 , ushort2 , ushort2 ); \
1378+ ushort3 _cl_overloadable NAME(ushort3 , ushort3 , ushort3 ); \
1379+ ushort4 _cl_overloadable NAME(ushort4 , ushort4 , ushort4 ); \
1380+ ushort8 _cl_overloadable NAME(ushort8 , ushort8 , ushort8 ); \
1381+ ushort16 _cl_overloadable NAME(ushort16, ushort16, ushort16); \
1382+ uint _cl_overloadable NAME(uint , uint , uint ); \
1383+ uint2 _cl_overloadable NAME(uint2 , uint2 , uint2 ); \
1384+ uint3 _cl_overloadable NAME(uint3 , uint3 , uint3 ); \
1385+ uint4 _cl_overloadable NAME(uint4 , uint4 , uint4 ); \
1386+ uint8 _cl_overloadable NAME(uint8 , uint8 , uint8 ); \
1387+ uint16 _cl_overloadable NAME(uint16 , uint16 , uint16 ); \
1388+ ulong _cl_overloadable NAME(ulong , ulong , ulong ); \
1389+ ulong2 _cl_overloadable NAME(ulong2 , ulong2 , ulong2 ); \
1390+ ulong3 _cl_overloadable NAME(ulong3 , ulong3 , ulong3 ); \
1391+ ulong4 _cl_overloadable NAME(ulong4 , ulong4 , ulong4 ); \
1392+ ulong8 _cl_overloadable NAME(ulong8 , ulong8 , ulong8 ); \
1393+ ulong16 _cl_overloadable NAME(ulong16 , ulong16 , ulong16 );
1394+#define _CL_DECLARE_FUNC_G_GS(NAME) \
1395+ char2 _cl_overloadable NAME(char2 , char ); \
1396+ char3 _cl_overloadable NAME(char3 , char ); \
1397+ char4 _cl_overloadable NAME(char4 , char ); \
1398+ char8 _cl_overloadable NAME(char8 , char ); \
1399+ char16 _cl_overloadable NAME(char16 , char ); \
1400+ short2 _cl_overloadable NAME(short2 , short ); \
1401+ short3 _cl_overloadable NAME(short3 , short ); \
1402+ short4 _cl_overloadable NAME(short4 , short ); \
1403+ short8 _cl_overloadable NAME(short8 , short ); \
1404+ short16 _cl_overloadable NAME(short16 , short ); \
1405+ int2 _cl_overloadable NAME(int2 , int ); \
1406+ int3 _cl_overloadable NAME(int3 , int ); \
1407+ int4 _cl_overloadable NAME(int4 , int ); \
1408+ int8 _cl_overloadable NAME(int8 , int ); \
1409+ int16 _cl_overloadable NAME(int16 , int ); \
1410+ long2 _cl_overloadable NAME(long2 , long ); \
1411+ long3 _cl_overloadable NAME(long3 , long ); \
1412+ long4 _cl_overloadable NAME(long4 , long ); \
1413+ long8 _cl_overloadable NAME(long8 , long ); \
1414+ long16 _cl_overloadable NAME(long16 , long ); \
1415+ uchar2 _cl_overloadable NAME(uchar2 , uchar ); \
1416+ uchar3 _cl_overloadable NAME(uchar3 , uchar ); \
1417+ uchar4 _cl_overloadable NAME(uchar4 , uchar ); \
1418+ uchar8 _cl_overloadable NAME(uchar8 , uchar ); \
1419+ uchar16 _cl_overloadable NAME(uchar16 , uchar ); \
1420+ ushort2 _cl_overloadable NAME(ushort2 , ushort); \
1421+ ushort3 _cl_overloadable NAME(ushort3 , ushort); \
1422+ ushort4 _cl_overloadable NAME(ushort4 , ushort); \
1423+ ushort8 _cl_overloadable NAME(ushort8 , ushort); \
1424+ ushort16 _cl_overloadable NAME(ushort16, ushort); \
1425+ uint2 _cl_overloadable NAME(uint2 , uint ); \
1426+ uint3 _cl_overloadable NAME(uint3 , uint ); \
1427+ uint4 _cl_overloadable NAME(uint4 , uint ); \
1428+ uint8 _cl_overloadable NAME(uint8 , uint ); \
1429+ uint16 _cl_overloadable NAME(uint16 , uint ); \
1430+ ulong2 _cl_overloadable NAME(ulong2 , ulong ); \
1431+ ulong3 _cl_overloadable NAME(ulong3 , ulong ); \
1432+ ulong4 _cl_overloadable NAME(ulong4 , ulong ); \
1433+ ulong8 _cl_overloadable NAME(ulong8 , ulong ); \
1434+ ulong16 _cl_overloadable NAME(ulong16 , ulong );
1435+#define _CL_DECLARE_FUNC_UG_G(NAME) \
1436+ uchar _cl_overloadable NAME(char ); \
1437+ uchar2 _cl_overloadable NAME(char2 ); \
1438+ uchar3 _cl_overloadable NAME(char3 ); \
1439+ uchar4 _cl_overloadable NAME(char4 ); \
1440+ uchar8 _cl_overloadable NAME(char8 ); \
1441+ uchar16 _cl_overloadable NAME(char16 ); \
1442+ ushort _cl_overloadable NAME(short ); \
1443+ ushort2 _cl_overloadable NAME(short2 ); \
1444+ ushort3 _cl_overloadable NAME(short3 ); \
1445+ ushort4 _cl_overloadable NAME(short4 ); \
1446+ ushort8 _cl_overloadable NAME(short8 ); \
1447+ ushort16 _cl_overloadable NAME(short16 ); \
1448+ uint _cl_overloadable NAME(int ); \
1449+ uint2 _cl_overloadable NAME(int2 ); \
1450+ uint3 _cl_overloadable NAME(int3 ); \
1451+ uint4 _cl_overloadable NAME(int4 ); \
1452+ uint8 _cl_overloadable NAME(int8 ); \
1453+ uint16 _cl_overloadable NAME(int16 ); \
1454+ ulong _cl_overloadable NAME(long ); \
1455+ ulong2 _cl_overloadable NAME(long2 ); \
1456+ ulong3 _cl_overloadable NAME(long3 ); \
1457+ ulong4 _cl_overloadable NAME(long4 ); \
1458+ ulong8 _cl_overloadable NAME(long8 ); \
1459+ ulong16 _cl_overloadable NAME(long16 ); \
1460+ uchar _cl_overloadable NAME(uchar ); \
1461+ uchar2 _cl_overloadable NAME(uchar2 ); \
1462+ uchar3 _cl_overloadable NAME(uchar3 ); \
1463+ uchar4 _cl_overloadable NAME(uchar4 ); \
1464+ uchar8 _cl_overloadable NAME(uchar8 ); \
1465+ uchar16 _cl_overloadable NAME(uchar16 ); \
1466+ ushort _cl_overloadable NAME(ushort ); \
1467+ ushort2 _cl_overloadable NAME(ushort2 ); \
1468+ ushort3 _cl_overloadable NAME(ushort3 ); \
1469+ ushort4 _cl_overloadable NAME(ushort4 ); \
1470+ ushort8 _cl_overloadable NAME(ushort8 ); \
1471+ ushort16 _cl_overloadable NAME(ushort16); \
1472+ uint _cl_overloadable NAME(uint ); \
1473+ uint2 _cl_overloadable NAME(uint2 ); \
1474+ uint3 _cl_overloadable NAME(uint3 ); \
1475+ uint4 _cl_overloadable NAME(uint4 ); \
1476+ uint8 _cl_overloadable NAME(uint8 ); \
1477+ uint16 _cl_overloadable NAME(uint16 ); \
1478+ ulong _cl_overloadable NAME(ulong ); \
1479+ ulong2 _cl_overloadable NAME(ulong2 ); \
1480+ ulong3 _cl_overloadable NAME(ulong3 ); \
1481+ ulong4 _cl_overloadable NAME(ulong4 ); \
1482+ ulong8 _cl_overloadable NAME(ulong8 ); \
1483+ ulong16 _cl_overloadable NAME(ulong16 );
1484+#define _CL_DECLARE_FUNC_UG_GG(NAME) \
1485+ uchar _cl_overloadable NAME(char , char ); \
1486+ uchar2 _cl_overloadable NAME(char2 , char2 ); \
1487+ uchar3 _cl_overloadable NAME(char3 , char3 ); \
1488+ uchar4 _cl_overloadable NAME(char4 , char4 ); \
1489+ uchar8 _cl_overloadable NAME(char8 , char8 ); \
1490+ uchar16 _cl_overloadable NAME(char16 , char16 ); \
1491+ ushort _cl_overloadable NAME(short , short ); \
1492+ ushort2 _cl_overloadable NAME(short2 , short2 ); \
1493+ ushort3 _cl_overloadable NAME(short3 , short3 ); \
1494+ ushort4 _cl_overloadable NAME(short4 , short4 ); \
1495+ ushort8 _cl_overloadable NAME(short8 , short8 ); \
1496+ ushort16 _cl_overloadable NAME(short16 , short16 ); \
1497+ uint _cl_overloadable NAME(int , int ); \
1498+ uint2 _cl_overloadable NAME(int2 , int2 ); \
1499+ uint3 _cl_overloadable NAME(int3 , int3 ); \
1500+ uint4 _cl_overloadable NAME(int4 , int4 ); \
1501+ uint8 _cl_overloadable NAME(int8 , int8 ); \
1502+ uint16 _cl_overloadable NAME(int16 , int16 ); \
1503+ ulong _cl_overloadable NAME(long , long ); \
1504+ ulong2 _cl_overloadable NAME(long2 , long2 ); \
1505+ ulong3 _cl_overloadable NAME(long3 , long3 ); \
1506+ ulong4 _cl_overloadable NAME(long4 , long4 ); \
1507+ ulong8 _cl_overloadable NAME(long8 , long8 ); \
1508+ ulong16 _cl_overloadable NAME(long16 , long16 ); \
1509+ uchar _cl_overloadable NAME(uchar , uchar ); \
1510+ uchar2 _cl_overloadable NAME(uchar2 , uchar2 ); \
1511+ uchar3 _cl_overloadable NAME(uchar3 , uchar3 ); \
1512+ uchar4 _cl_overloadable NAME(uchar4 , uchar4 ); \
1513+ uchar8 _cl_overloadable NAME(uchar8 , uchar8 ); \
1514+ uchar16 _cl_overloadable NAME(uchar16 , uchar16 ); \
1515+ ushort _cl_overloadable NAME(ushort , ushort ); \
1516+ ushort2 _cl_overloadable NAME(ushort2 , ushort2 ); \
1517+ ushort3 _cl_overloadable NAME(ushort3 , ushort3 ); \
1518+ ushort4 _cl_overloadable NAME(ushort4 , ushort4 ); \
1519+ ushort8 _cl_overloadable NAME(ushort8 , ushort8 ); \
1520+ ushort16 _cl_overloadable NAME(ushort16, ushort16); \
1521+ uint _cl_overloadable NAME(uint , uint ); \
1522+ uint2 _cl_overloadable NAME(uint2 , uint2 ); \
1523+ uint3 _cl_overloadable NAME(uint3 , uint3 ); \
1524+ uint4 _cl_overloadable NAME(uint4 , uint4 ); \
1525+ uint8 _cl_overloadable NAME(uint8 , uint8 ); \
1526+ uint16 _cl_overloadable NAME(uint16 , uint16 ); \
1527+ ulong _cl_overloadable NAME(ulong , ulong ); \
1528+ ulong2 _cl_overloadable NAME(ulong2 , ulong2 ); \
1529+ ulong3 _cl_overloadable NAME(ulong3 , ulong3 ); \
1530+ ulong4 _cl_overloadable NAME(ulong4 , ulong4 ); \
1531+ ulong8 _cl_overloadable NAME(ulong8 , ulong8 ); \
1532+ ulong16 _cl_overloadable NAME(ulong16 , ulong16 );
1533+#define _CL_DECLARE_FUNC_LG_GUG(NAME) \
1534+ short _cl_overloadable NAME(char , uchar ); \
1535+ short2 _cl_overloadable NAME(char2 , uchar2 ); \
1536+ short3 _cl_overloadable NAME(char3 , uchar3 ); \
1537+ short4 _cl_overloadable NAME(char4 , uchar4 ); \
1538+ short8 _cl_overloadable NAME(char8 , uchar8 ); \
1539+ short16 _cl_overloadable NAME(char16 , uchar16 ); \
1540+ int _cl_overloadable NAME(short , ushort ); \
1541+ int2 _cl_overloadable NAME(short2 , ushort2 ); \
1542+ int3 _cl_overloadable NAME(short3 , ushort3 ); \
1543+ int4 _cl_overloadable NAME(short4 , ushort4 ); \
1544+ int8 _cl_overloadable NAME(short8 , ushort8 ); \
1545+ int16 _cl_overloadable NAME(short16 , ushort16 ); \
1546+ long _cl_overloadable NAME(int , uint ); \
1547+ long2 _cl_overloadable NAME(int2 , uint2 ); \
1548+ long3 _cl_overloadable NAME(int3 , uint3 ); \
1549+ long4 _cl_overloadable NAME(int4 , uint4 ); \
1550+ long8 _cl_overloadable NAME(int8 , uint8 ); \
1551+ long16 _cl_overloadable NAME(int16 , uint16 ); \
1552+ ushort _cl_overloadable NAME(uchar , uchar ); \
1553+ ushort2 _cl_overloadable NAME(uchar2 , uchar2 ); \
1554+ ushort3 _cl_overloadable NAME(uchar3 , uchar3 ); \
1555+ ushort4 _cl_overloadable NAME(uchar4 , uchar4 ); \
1556+ ushort8 _cl_overloadable NAME(uchar8 , uchar8 ); \
1557+ ushort16 _cl_overloadable NAME(uchar16 , uchar16 ); \
1558+ uint _cl_overloadable NAME(ushort , ushort ); \
1559+ uint2 _cl_overloadable NAME(ushort2 , ushort2 ); \
1560+ uint3 _cl_overloadable NAME(ushort3 , ushort3 ); \
1561+ uint4 _cl_overloadable NAME(ushort4 , ushort4 ); \
1562+ uint8 _cl_overloadable NAME(ushort8 , ushort8 ); \
1563+ uint16 _cl_overloadable NAME(ushort16, ushort16); \
1564+ ulong _cl_overloadable NAME(uint , uint ); \
1565+ ulong2 _cl_overloadable NAME(uint2 , uint2 ); \
1566+ ulong3 _cl_overloadable NAME(uint3 , uint3 ); \
1567+ ulong4 _cl_overloadable NAME(uint4 , uint4 ); \
1568+ ulong8 _cl_overloadable NAME(uint8 , uint8 ); \
1569+ ulong16 _cl_overloadable NAME(uint16 , uint16 );
1570+#define _CL_DECLARE_FUNC_I_IG(NAME) \
1571+ int _cl_overloadable NAME(char ); \
1572+ int _cl_overloadable NAME(char2 ); \
1573+ int _cl_overloadable NAME(char3 ); \
1574+ int _cl_overloadable NAME(char4 ); \
1575+ int _cl_overloadable NAME(char8 ); \
1576+ int _cl_overloadable NAME(char16 ); \
1577+ int _cl_overloadable NAME(short ); \
1578+ int _cl_overloadable NAME(short2 ); \
1579+ int _cl_overloadable NAME(short3 ); \
1580+ int _cl_overloadable NAME(short4 ); \
1581+ int _cl_overloadable NAME(short8 ); \
1582+ int _cl_overloadable NAME(short16); \
1583+ int _cl_overloadable NAME(int ); \
1584+ int _cl_overloadable NAME(int2 ); \
1585+ int _cl_overloadable NAME(int3 ); \
1586+ int _cl_overloadable NAME(int4 ); \
1587+ int _cl_overloadable NAME(int8 ); \
1588+ int _cl_overloadable NAME(int16 ); \
1589+ int _cl_overloadable NAME(long ); \
1590+ int _cl_overloadable NAME(long2 ); \
1591+ int _cl_overloadable NAME(long3 ); \
1592+ int _cl_overloadable NAME(long4 ); \
1593+ int _cl_overloadable NAME(long8 ); \
1594+ int _cl_overloadable NAME(long16 );
1595+#define _CL_DECLARE_FUNC_J_JJ(NAME) \
1596+ int _cl_overloadable NAME(int , int ); \
1597+ int2 _cl_overloadable NAME(int2 , int2 ); \
1598+ int3 _cl_overloadable NAME(int3 , int3 ); \
1599+ int4 _cl_overloadable NAME(int4 , int4 ); \
1600+ int8 _cl_overloadable NAME(int8 , int8 ); \
1601+ int16 _cl_overloadable NAME(int16 , int16 ); \
1602+ uint _cl_overloadable NAME(uint , uint ); \
1603+ uint2 _cl_overloadable NAME(uint2 , uint2 ); \
1604+ uint3 _cl_overloadable NAME(uint3 , uint3 ); \
1605+ uint4 _cl_overloadable NAME(uint4 , uint4 ); \
1606+ uint8 _cl_overloadable NAME(uint8 , uint8 ); \
1607+ uint16 _cl_overloadable NAME(uint16 , uint16 );
1608+#define _CL_DECLARE_FUNC_J_JJJ(NAME) \
1609+ int _cl_overloadable NAME(int , int , int ); \
1610+ int2 _cl_overloadable NAME(int2 , int2 , int2 ); \
1611+ int3 _cl_overloadable NAME(int3 , int3 , int3 ); \
1612+ int4 _cl_overloadable NAME(int4 , int4 , int4 ); \
1613+ int8 _cl_overloadable NAME(int8 , int8 , int8 ); \
1614+ int16 _cl_overloadable NAME(int16 , int16 , int16 ); \
1615+ uint _cl_overloadable NAME(uint , uint , uint ); \
1616+ uint2 _cl_overloadable NAME(uint2 , uint2 , uint2 ); \
1617+ uint3 _cl_overloadable NAME(uint3 , uint3 , uint3 ); \
1618+ uint4 _cl_overloadable NAME(uint4 , uint4 , uint4 ); \
1619+ uint8 _cl_overloadable NAME(uint8 , uint8 , uint8 ); \
1620+ uint16 _cl_overloadable NAME(uint16 , uint16 , uint16 );
1621
1622 _CL_DECLARE_FUNC_UG_G(abs)
1623 _CL_DECLARE_FUNC_UG_GG(abs_diff)
1624@@ -1269,10 +1423,10 @@
1625
1626 /* Geometric Functions */
1627
1628-float4 __attribute__ ((overloadable)) cross(float4, float4);
1629-float3 __attribute__ ((overloadable)) cross(float3, float3);
1630-double4 __attribute__ ((overloadable)) cross(double4, double4);
1631-double3 __attribute__ ((overloadable)) cross(double3, double3);
1632+float4 _cl_overloadable cross(float4, float4);
1633+float3 _cl_overloadable cross(float3, float3);
1634+double4 _cl_overloadable cross(double4, double4);
1635+double3 _cl_overloadable cross(double3, double3);
1636 _CL_DECLARE_FUNC_S_VV(dot)
1637 _CL_DECLARE_FUNC_S_VV(distance)
1638 _CL_DECLARE_FUNC_S_V(length)
1639@@ -1306,3 +1460,228 @@
1640 _CL_DECLARE_FUNC_V_VVV(bitselect)
1641 _CL_DECLARE_FUNC_G_GGG(select)
1642 _CL_DECLARE_FUNC_V_VVJ(select)
1643+
1644+
1645+
1646+/* Vector Functions */
1647+
1648+#define _CL_DECLARE_VLOAD(TYPE, MOD) \
1649+ TYPE##2 _cl_overloadable vload2 (size_t offset, const MOD TYPE *p); \
1650+ TYPE##3 _cl_overloadable vload3 (size_t offset, const MOD TYPE *p); \
1651+ TYPE##4 _cl_overloadable vload4 (size_t offset, const MOD TYPE *p); \
1652+ TYPE##8 _cl_overloadable vload8 (size_t offset, const MOD TYPE *p); \
1653+ TYPE##16 _cl_overloadable vload16(size_t offset, const MOD TYPE *p);
1654+
1655+_CL_DECLARE_VLOAD(char , __global)
1656+_CL_DECLARE_VLOAD(short , __global)
1657+_CL_DECLARE_VLOAD(int , __global)
1658+_CL_DECLARE_VLOAD(long , __global)
1659+_CL_DECLARE_VLOAD(uchar , __global)
1660+_CL_DECLARE_VLOAD(ushort, __global)
1661+_CL_DECLARE_VLOAD(uint , __global)
1662+_CL_DECLARE_VLOAD(ulong , __global)
1663+_CL_DECLARE_VLOAD(float , __global)
1664+_CL_DECLARE_VLOAD(double, __global)
1665+
1666+_CL_DECLARE_VLOAD(char , __local)
1667+_CL_DECLARE_VLOAD(short , __local)
1668+_CL_DECLARE_VLOAD(int , __local)
1669+_CL_DECLARE_VLOAD(long , __local)
1670+_CL_DECLARE_VLOAD(uchar , __local)
1671+_CL_DECLARE_VLOAD(ushort, __local)
1672+_CL_DECLARE_VLOAD(uint , __local)
1673+_CL_DECLARE_VLOAD(ulong , __local)
1674+_CL_DECLARE_VLOAD(float , __local)
1675+_CL_DECLARE_VLOAD(double, __local)
1676+
1677+_CL_DECLARE_VLOAD(char , __constant)
1678+_CL_DECLARE_VLOAD(short , __constant)
1679+_CL_DECLARE_VLOAD(int , __constant)
1680+_CL_DECLARE_VLOAD(long , __constant)
1681+_CL_DECLARE_VLOAD(uchar , __constant)
1682+_CL_DECLARE_VLOAD(ushort, __constant)
1683+_CL_DECLARE_VLOAD(uint , __constant)
1684+_CL_DECLARE_VLOAD(ulong , __constant)
1685+_CL_DECLARE_VLOAD(float , __constant)
1686+_CL_DECLARE_VLOAD(double, __constant)
1687+
1688+/* __private is not supported yet \
1689+_CL_DECLARE_VLOAD(char , __private)
1690+_CL_DECLARE_VLOAD(short , __private)
1691+_CL_DECLARE_VLOAD(int , __private)
1692+_CL_DECLARE_VLOAD(long , __private)
1693+_CL_DECLARE_VLOAD(uchar , __private)
1694+_CL_DECLARE_VLOAD(ushort, __private)
1695+_CL_DECLARE_VLOAD(uint , __private)
1696+_CL_DECLARE_VLOAD(ulong , __private)
1697+_CL_DECLARE_VLOAD(float , __private)
1698+_CL_DECLARE_VLOAD(double, __private)
1699+*/
1700+
1701+#define _CL_DECLARE_VSTORE(TYPE, MOD) \
1702+ void _cl_overloadable vstore2 (TYPE##2 data, size_t offset, MOD TYPE *p); \
1703+ void _cl_overloadable vstore3 (TYPE##3 data, size_t offset, MOD TYPE *p); \
1704+ void _cl_overloadable vstore4 (TYPE##4 data, size_t offset, MOD TYPE *p); \
1705+ void _cl_overloadable vstore8 (TYPE##8 data, size_t offset, MOD TYPE *p); \
1706+ void _cl_overloadable vstore16(TYPE##16 data, size_t offset, MOD TYPE *p);
1707+
1708+_CL_DECLARE_VSTORE(char , __global)
1709+_CL_DECLARE_VSTORE(short , __global)
1710+_CL_DECLARE_VSTORE(int , __global)
1711+_CL_DECLARE_VSTORE(long , __global)
1712+_CL_DECLARE_VSTORE(uchar , __global)
1713+_CL_DECLARE_VSTORE(ushort, __global)
1714+_CL_DECLARE_VSTORE(uint , __global)
1715+_CL_DECLARE_VSTORE(ulong , __global)
1716+_CL_DECLARE_VSTORE(float , __global)
1717+_CL_DECLARE_VSTORE(double, __global)
1718+
1719+_CL_DECLARE_VSTORE(char , __local)
1720+_CL_DECLARE_VSTORE(short , __local)
1721+_CL_DECLARE_VSTORE(int , __local)
1722+_CL_DECLARE_VSTORE(long , __local)
1723+_CL_DECLARE_VSTORE(uchar , __local)
1724+_CL_DECLARE_VSTORE(ushort, __local)
1725+_CL_DECLARE_VSTORE(uint , __local)
1726+_CL_DECLARE_VSTORE(ulong , __local)
1727+_CL_DECLARE_VSTORE(float , __local)
1728+_CL_DECLARE_VSTORE(double, __local)
1729+
1730+/* __private is not supported yet
1731+_CL_DECLARE_VSTORE(char , __private)
1732+_CL_DECLARE_VSTORE(short , __private)
1733+_CL_DECLARE_VSTORE(int , __private)
1734+_CL_DECLARE_VSTORE(long , __private)
1735+_CL_DECLARE_VSTORE(uchar , __private)
1736+_CL_DECLARE_VSTORE(ushort, __private)
1737+_CL_DECLARE_VSTORE(uint , __private)
1738+_CL_DECLARE_VSTORE(ulong , __private)
1739+_CL_DECLARE_VSTORE(float , __private)
1740+_CL_DECLARE_VSTORE(double, __private)
1741+*/
1742+
1743+
1744+
1745+/* Miscellaneous Vector Functions */
1746+
1747+// convert a vector type to a scalar type
1748+_CL_DECLARE_FUNC_I_IG(_cl_scalar)
1749+_CL_DECLARE_FUNC_S_V(_cl_scalar)
1750+#define vec_step(a) (sizeof(a) / sizeof(_cl_scalar(a)))
1751+
1752+
1753+
1754+// This code leads to an ICE in Clang
1755+
1756+// #define _CL_DECLARE_SHUFFLE_2(GTYPE, UGTYPE, STYPE, M) \
1757+// GTYPE##2 _cl_overloadable shuffle(GTYPE##M x, UGTYPE##2 mask) \
1758+// { \
1759+// UGTYPE bits = (UGTYPE)1 << (UGTYPE)M; \
1760+// UGTYPE bmask = bits - (UGTYPE)1; \
1761+// return __builtin_shufflevector(x, x, \
1762+// mask.s0 & bmask, mask.s1 & bmask); \
1763+// }
1764+// #define _CL_DECLARE_SHUFFLE_3(GTYPE, UGTYPE, STYPE, M) \
1765+// GTYPE##3 _cl_overloadable shuffle(GTYPE##M x, UGTYPE##3 mask) \
1766+// { \
1767+// UGTYPE bits = (UGTYPE)1 << (UGTYPE)M; \
1768+// UGTYPE bmask = bits - (UGTYPE)1; \
1769+// return __builtin_shufflevector(x, x, \
1770+// mask.s0 & bmask, mask.s1 & bmask, \
1771+// mask.s2 & bmask); \
1772+// }
1773+// #define _CL_DECLARE_SHUFFLE_4(GTYPE, UGTYPE, STYPE, M) \
1774+// GTYPE##4 _cl_overloadable shuffle(GTYPE##M x, UGTYPE##4 mask) \
1775+// { \
1776+// UGTYPE bits = (UGTYPE)1 << (UGTYPE)M; \
1777+// UGTYPE bmask = bits - (UGTYPE)1; \
1778+// return __builtin_shufflevector(x, x, \
1779+// mask.s0 & bmask, mask.s1 & bmask, \
1780+// mask.s2 & bmask, mask.s3 & bmask); \
1781+// }
1782+// #define _CL_DECLARE_SHUFFLE_8(GTYPE, UGTYPE, STYPE, M) \
1783+// GTYPE##8 _cl_overloadable shuffle(GTYPE##M x, UGTYPE##8 mask) \
1784+// { \
1785+// UGTYPE bits = (UGTYPE)1 << (UGTYPE)M; \
1786+// UGTYPE bmask = bits - (UGTYPE)1; \
1787+// return __builtin_shufflevector(x, x, \
1788+// mask.s0 & bmask, mask.s1 & bmask, \
1789+// mask.s2 & bmask, mask.s3 & bmask, \
1790+// mask.s4 & bmask, mask.s5 & bmask, \
1791+// mask.s6 & bmask, mask.s7 & bmask); \
1792+// }
1793+// #define _CL_DECLARE_SHUFFLE_16(GTYPE, UGTYPE, STYPE, M) \
1794+// GTYPE##16 _cl_overloadable shuffle(GTYPE##M x, UGTYPE##16 mask) \
1795+// { \
1796+// UGTYPE bits = (UGTYPE)1 << (UGTYPE)M; \
1797+// UGTYPE bmask = bits - (UGTYPE)1; \
1798+// return __builtin_shufflevector(x, x, \
1799+// mask.s0 & bmask, mask.s1 & bmask, \
1800+// mask.s2 & bmask, mask.s3 & bmask, \
1801+// mask.s4 & bmask, mask.s5 & bmask, \
1802+// mask.s6 & bmask, mask.s7 & bmask, \
1803+// mask.s8 & bmask, mask.s9 & bmask, \
1804+// mask.sa & bmask, mask.sb & bmask, \
1805+// mask.sc & bmask, mask.sd & bmask, \
1806+// mask.se & bmask, mask.sf & bmask); \
1807+// }
1808+//
1809+// #define _CL_DECLARE_SHUFFLE(GTYPE, UGTYPE, STYPE, M) \
1810+// _CL_DECLARE_SHUFFLE_2 (GTYPE, UGTYPE, STYPE, M) \
1811+// _CL_DECLARE_SHUFFLE_3 (GTYPE, UGTYPE, STYPE, M) \
1812+// _CL_DECLARE_SHUFFLE_4 (GTYPE, UGTYPE, STYPE, M) \
1813+// _CL_DECLARE_SHUFFLE_8 (GTYPE, UGTYPE, STYPE, M) \
1814+// _CL_DECLARE_SHUFFLE_16(GTYPE, UGTYPE, STYPE, M)
1815+//
1816+// _CL_DECLARE_SHUFFLE(char , uchar , char , 2 )
1817+// _CL_DECLARE_SHUFFLE(char , uchar , char , 3 )
1818+// _CL_DECLARE_SHUFFLE(char , uchar , char , 4 )
1819+// _CL_DECLARE_SHUFFLE(char , uchar , char , 8 )
1820+// _CL_DECLARE_SHUFFLE(char , uchar , char , 16)
1821+// _CL_DECLARE_SHUFFLE(uchar , uchar , char , 2 )
1822+// _CL_DECLARE_SHUFFLE(uchar , uchar , char , 3 )
1823+// _CL_DECLARE_SHUFFLE(uchar , uchar , char , 4 )
1824+// _CL_DECLARE_SHUFFLE(uchar , uchar , char , 8 )
1825+// _CL_DECLARE_SHUFFLE(uchar , uchar , char , 16)
1826+// _CL_DECLARE_SHUFFLE(short , ushort, short , 2 )
1827+// _CL_DECLARE_SHUFFLE(short , ushort, short , 3 )
1828+// _CL_DECLARE_SHUFFLE(short , ushort, short , 4 )
1829+// _CL_DECLARE_SHUFFLE(short , ushort, short , 8 )
1830+// _CL_DECLARE_SHUFFLE(short , ushort, short , 16)
1831+// _CL_DECLARE_SHUFFLE(ushort, ushort, short , 2 )
1832+// _CL_DECLARE_SHUFFLE(ushort, ushort, short , 3 )
1833+// _CL_DECLARE_SHUFFLE(ushort, ushort, short , 4 )
1834+// _CL_DECLARE_SHUFFLE(ushort, ushort, short , 8 )
1835+// _CL_DECLARE_SHUFFLE(ushort, ushort, short , 16)
1836+// _CL_DECLARE_SHUFFLE(int , uint , int , 2 )
1837+// _CL_DECLARE_SHUFFLE(int , uint , int , 3 )
1838+// _CL_DECLARE_SHUFFLE(int , uint , int , 4 )
1839+// _CL_DECLARE_SHUFFLE(int , uint , int , 8 )
1840+// _CL_DECLARE_SHUFFLE(int , uint , int , 16)
1841+// _CL_DECLARE_SHUFFLE(uint , uint , int , 2 )
1842+// _CL_DECLARE_SHUFFLE(uint , uint , int , 3 )
1843+// _CL_DECLARE_SHUFFLE(uint , uint , int , 4 )
1844+// _CL_DECLARE_SHUFFLE(uint , uint , int , 8 )
1845+// _CL_DECLARE_SHUFFLE(uint , uint , int , 16)
1846+// _CL_DECLARE_SHUFFLE(long , ulong , long , 2 )
1847+// _CL_DECLARE_SHUFFLE(long , ulong , long , 3 )
1848+// _CL_DECLARE_SHUFFLE(long , ulong , long , 4 )
1849+// _CL_DECLARE_SHUFFLE(long , ulong , long , 8 )
1850+// _CL_DECLARE_SHUFFLE(long , ulong , long , 16)
1851+// _CL_DECLARE_SHUFFLE(ulong , ulong , long , 2 )
1852+// _CL_DECLARE_SHUFFLE(ulong , ulong , long , 3 )
1853+// _CL_DECLARE_SHUFFLE(ulong , ulong , long , 4 )
1854+// _CL_DECLARE_SHUFFLE(ulong , ulong , long , 8 )
1855+// _CL_DECLARE_SHUFFLE(ulong , ulong , long , 16)
1856+// _CL_DECLARE_SHUFFLE(float , uint , float , 2 )
1857+// _CL_DECLARE_SHUFFLE(float , uint , float , 3 )
1858+// _CL_DECLARE_SHUFFLE(float , uint , float , 4 )
1859+// _CL_DECLARE_SHUFFLE(float , uint , float , 8 )
1860+// _CL_DECLARE_SHUFFLE(float , uint , float , 16)
1861+// _CL_DECLARE_SHUFFLE(double, ulong , double, 2 )
1862+// _CL_DECLARE_SHUFFLE(double, ulong , double, 3 )
1863+// _CL_DECLARE_SHUFFLE(double, ulong , double, 4 )
1864+// _CL_DECLARE_SHUFFLE(double, ulong , double, 8 )
1865+// _CL_DECLARE_SHUFFLE(double, ulong , double, 16)
1866+
1867+// shuffle2
1868
1869=== modified file 'lib/kernel/Makefile.am'
1870--- lib/kernel/Makefile.am 2011-10-31 16:58:40 +0000
1871+++ lib/kernel/Makefile.am 2011-10-31 17:03:23 +0000
1872@@ -22,9 +22,13 @@
1873 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
1874 # THE SOFTWARE.
1875
1876-LEX=
1877+LEX =
1878
1879+<<<<<<< TREE
1880 SUBDIRS = tce
1881+=======
1882+SUBDIRS = dummy x86 # ppc
1883+>>>>>>> MERGE-SOURCE
1884
1885 pkglib_LIBRARIES = libkernel.a
1886
1887@@ -145,7 +149,9 @@
1888 any.cl \
1889 all.cl \
1890 bitselect.cl \
1891- select.cl
1892+ select.cl \
1893+ vload.cl \
1894+ vstore.cl
1895
1896 libkernel_a_LIBADD = barrier.o
1897 EXTRA_DIST = barrier.ll
1898@@ -159,7 +165,7 @@
1899 .c.o:
1900 $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/_kernel.h -o $@ $<
1901
1902-barrier.o: barrier.ll
1903+.ll.o:
1904 $(LLVM_AS) -o $@ $<
1905
1906 $(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h
1907
1908=== modified file 'lib/kernel/all.cl'
1909--- lib/kernel/all.cl 2011-10-27 01:35:56 +0000
1910+++ lib/kernel/all.cl 2011-10-31 17:03:23 +0000
1911@@ -21,122 +21,122 @@
1912 THE SOFTWARE.
1913 */
1914
1915-int __attribute__((overloadable)) all(char a)
1916+int __attribute__((__overloadable__)) all(char a)
1917 {
1918 return a < (char)0;
1919 }
1920
1921-int __attribute__((overloadable)) all(char2 a)
1922+int __attribute__((__overloadable__)) all(char2 a)
1923 {
1924 return all(a.lo) && all(a.hi);
1925 }
1926
1927-int __attribute__((overloadable)) all(char3 a)
1928+int __attribute__((__overloadable__)) all(char3 a)
1929 {
1930 return all(a.s01) && all(a.s2);
1931 }
1932
1933-int __attribute__((overloadable)) all(char4 a)
1934-{
1935- return all(a.lo) && all(a.hi);
1936-}
1937-
1938-int __attribute__((overloadable)) all(char8 a)
1939-{
1940- return all(a.lo) && all(a.hi);
1941-}
1942-
1943-int __attribute__((overloadable)) all(char16 a)
1944-{
1945- return all(a.lo) && all(a.hi);
1946-}
1947-
1948-int __attribute__((overloadable)) all(short a)
1949+int __attribute__((__overloadable__)) all(char4 a)
1950+{
1951+ return all(a.lo) && all(a.hi);
1952+}
1953+
1954+int __attribute__((__overloadable__)) all(char8 a)
1955+{
1956+ return all(a.lo) && all(a.hi);
1957+}
1958+
1959+int __attribute__((__overloadable__)) all(char16 a)
1960+{
1961+ return all(a.lo) && all(a.hi);
1962+}
1963+
1964+int __attribute__((__overloadable__)) all(short a)
1965 {
1966 return a < (short)0;
1967 }
1968
1969-int __attribute__((overloadable)) all(short2 a)
1970+int __attribute__((__overloadable__)) all(short2 a)
1971 {
1972 return all(a.lo) && all(a.hi);
1973 }
1974
1975-int __attribute__((overloadable)) all(short3 a)
1976+int __attribute__((__overloadable__)) all(short3 a)
1977 {
1978 return all(a.s01) && all(a.s2);
1979 }
1980
1981-int __attribute__((overloadable)) all(short4 a)
1982-{
1983- return all(a.lo) && all(a.hi);
1984-}
1985-
1986-int __attribute__((overloadable)) all(short8 a)
1987-{
1988- return all(a.lo) && all(a.hi);
1989-}
1990-
1991-int __attribute__((overloadable)) all(short16 a)
1992-{
1993- return all(a.lo) && all(a.hi);
1994-}
1995-
1996-int __attribute__((overloadable)) all(int a)
1997+int __attribute__((__overloadable__)) all(short4 a)
1998+{
1999+ return all(a.lo) && all(a.hi);
2000+}
2001+
2002+int __attribute__((__overloadable__)) all(short8 a)
2003+{
2004+ return all(a.lo) && all(a.hi);
2005+}
2006+
2007+int __attribute__((__overloadable__)) all(short16 a)
2008+{
2009+ return all(a.lo) && all(a.hi);
2010+}
2011+
2012+int __attribute__((__overloadable__)) all(int a)
2013 {
2014 return a < 0;
2015 }
2016
2017-int __attribute__((overloadable)) all(int2 a)
2018+int __attribute__((__overloadable__)) all(int2 a)
2019 {
2020 return all(a.lo) && all(a.hi);
2021 }
2022
2023-int __attribute__((overloadable)) all(int3 a)
2024+int __attribute__((__overloadable__)) all(int3 a)
2025 {
2026 return all(a.s01) && all(a.s2);
2027 }
2028
2029-int __attribute__((overloadable)) all(int4 a)
2030-{
2031- return all(a.lo) && all(a.hi);
2032-}
2033-
2034-int __attribute__((overloadable)) all(int8 a)
2035-{
2036- return all(a.lo) && all(a.hi);
2037-}
2038-
2039-int __attribute__((overloadable)) all(int16 a)
2040-{
2041- return all(a.lo) && all(a.hi);
2042-}
2043-
2044-int __attribute__((overloadable)) all(long a)
2045+int __attribute__((__overloadable__)) all(int4 a)
2046+{
2047+ return all(a.lo) && all(a.hi);
2048+}
2049+
2050+int __attribute__((__overloadable__)) all(int8 a)
2051+{
2052+ return all(a.lo) && all(a.hi);
2053+}
2054+
2055+int __attribute__((__overloadable__)) all(int16 a)
2056+{
2057+ return all(a.lo) && all(a.hi);
2058+}
2059+
2060+int __attribute__((__overloadable__)) all(long a)
2061 {
2062 return a < 0L;
2063 }
2064
2065-int __attribute__((overloadable)) all(long2 a)
2066+int __attribute__((__overloadable__)) all(long2 a)
2067 {
2068 return all(a.lo) && all(a.hi);
2069 }
2070
2071-int __attribute__((overloadable)) all(long3 a)
2072+int __attribute__((__overloadable__)) all(long3 a)
2073 {
2074 return all(a.s01) && all(a.s2);
2075 }
2076
2077-int __attribute__((overloadable)) all(long4 a)
2078-{
2079- return all(a.lo) && all(a.hi);
2080-}
2081-
2082-int __attribute__((overloadable)) all(long8 a)
2083-{
2084- return all(a.lo) && all(a.hi);
2085-}
2086-
2087-int __attribute__((overloadable)) all(long16 a)
2088+int __attribute__((__overloadable__)) all(long4 a)
2089+{
2090+ return all(a.lo) && all(a.hi);
2091+}
2092+
2093+int __attribute__((__overloadable__)) all(long8 a)
2094+{
2095+ return all(a.lo) && all(a.hi);
2096+}
2097+
2098+int __attribute__((__overloadable__)) all(long16 a)
2099 {
2100 return all(a.lo) && all(a.hi);
2101 }
2102
2103=== modified file 'lib/kernel/any.cl'
2104--- lib/kernel/any.cl 2011-10-27 01:35:56 +0000
2105+++ lib/kernel/any.cl 2011-10-31 17:03:23 +0000
2106@@ -21,122 +21,122 @@
2107 THE SOFTWARE.
2108 */
2109
2110-int __attribute__((overloadable)) any(char a)
2111+int __attribute__((__overloadable__)) any(char a)
2112 {
2113 return a < (char)0;
2114 }
2115
2116-int __attribute__((overloadable)) any(char2 a)
2117+int __attribute__((__overloadable__)) any(char2 a)
2118 {
2119 return any(a.lo) || any(a.hi);
2120 }
2121
2122-int __attribute__((overloadable)) any(char3 a)
2123+int __attribute__((__overloadable__)) any(char3 a)
2124 {
2125 return any(a.s01) || any(a.s2);
2126 }
2127
2128-int __attribute__((overloadable)) any(char4 a)
2129-{
2130- return any(a.lo) || any(a.hi);
2131-}
2132-
2133-int __attribute__((overloadable)) any(char8 a)
2134-{
2135- return any(a.lo) || any(a.hi);
2136-}
2137-
2138-int __attribute__((overloadable)) any(char16 a)
2139-{
2140- return any(a.lo) || any(a.hi);
2141-}
2142-
2143-int __attribute__((overloadable)) any(short a)
2144+int __attribute__((__overloadable__)) any(char4 a)
2145+{
2146+ return any(a.lo) || any(a.hi);
2147+}
2148+
2149+int __attribute__((__overloadable__)) any(char8 a)
2150+{
2151+ return any(a.lo) || any(a.hi);
2152+}
2153+
2154+int __attribute__((__overloadable__)) any(char16 a)
2155+{
2156+ return any(a.lo) || any(a.hi);
2157+}
2158+
2159+int __attribute__((__overloadable__)) any(short a)
2160 {
2161 return a < (short)0;
2162 }
2163
2164-int __attribute__((overloadable)) any(short2 a)
2165+int __attribute__((__overloadable__)) any(short2 a)
2166 {
2167 return any(a.lo) || any(a.hi);
2168 }
2169
2170-int __attribute__((overloadable)) any(short3 a)
2171+int __attribute__((__overloadable__)) any(short3 a)
2172 {
2173 return any(a.s01) || any(a.s2);
2174 }
2175
2176-int __attribute__((overloadable)) any(short4 a)
2177-{
2178- return any(a.lo) || any(a.hi);
2179-}
2180-
2181-int __attribute__((overloadable)) any(short8 a)
2182-{
2183- return any(a.lo) || any(a.hi);
2184-}
2185-
2186-int __attribute__((overloadable)) any(short16 a)
2187-{
2188- return any(a.lo) || any(a.hi);
2189-}
2190-
2191-int __attribute__((overloadable)) any(int a)
2192+int __attribute__((__overloadable__)) any(short4 a)
2193+{
2194+ return any(a.lo) || any(a.hi);
2195+}
2196+
2197+int __attribute__((__overloadable__)) any(short8 a)
2198+{
2199+ return any(a.lo) || any(a.hi);
2200+}
2201+
2202+int __attribute__((__overloadable__)) any(short16 a)
2203+{
2204+ return any(a.lo) || any(a.hi);
2205+}
2206+
2207+int __attribute__((__overloadable__)) any(int a)
2208 {
2209 return a < 0;
2210 }
2211
2212-int __attribute__((overloadable)) any(int2 a)
2213+int __attribute__((__overloadable__)) any(int2 a)
2214 {
2215 return any(a.lo) || any(a.hi);
2216 }
2217
2218-int __attribute__((overloadable)) any(int3 a)
2219+int __attribute__((__overloadable__)) any(int3 a)
2220 {
2221 return any(a.s01) || any(a.s2);
2222 }
2223
2224-int __attribute__((overloadable)) any(int4 a)
2225-{
2226- return any(a.lo) || any(a.hi);
2227-}
2228-
2229-int __attribute__((overloadable)) any(int8 a)
2230-{
2231- return any(a.lo) || any(a.hi);
2232-}
2233-
2234-int __attribute__((overloadable)) any(int16 a)
2235-{
2236- return any(a.lo) || any(a.hi);
2237-}
2238-
2239-int __attribute__((overloadable)) any(long a)
2240+int __attribute__((__overloadable__)) any(int4 a)
2241+{
2242+ return any(a.lo) || any(a.hi);
2243+}
2244+
2245+int __attribute__((__overloadable__)) any(int8 a)
2246+{
2247+ return any(a.lo) || any(a.hi);
2248+}
2249+
2250+int __attribute__((__overloadable__)) any(int16 a)
2251+{
2252+ return any(a.lo) || any(a.hi);
2253+}
2254+
2255+int __attribute__((__overloadable__)) any(long a)
2256 {
2257 return a < 0L;
2258 }
2259
2260-int __attribute__((overloadable)) any(long2 a)
2261+int __attribute__((__overloadable__)) any(long2 a)
2262 {
2263 return any(a.lo) || any(a.hi);
2264 }
2265
2266-int __attribute__((overloadable)) any(long3 a)
2267+int __attribute__((__overloadable__)) any(long3 a)
2268 {
2269 return any(a.s01) || any(a.s2);
2270 }
2271
2272-int __attribute__((overloadable)) any(long4 a)
2273-{
2274- return any(a.lo) || any(a.hi);
2275-}
2276-
2277-int __attribute__((overloadable)) any(long8 a)
2278-{
2279- return any(a.lo) || any(a.hi);
2280-}
2281-
2282-int __attribute__((overloadable)) any(long16 a)
2283+int __attribute__((__overloadable__)) any(long4 a)
2284+{
2285+ return any(a.lo) || any(a.hi);
2286+}
2287+
2288+int __attribute__((__overloadable__)) any(long8 a)
2289+{
2290+ return any(a.lo) || any(a.hi);
2291+}
2292+
2293+int __attribute__((__overloadable__)) any(long16 a)
2294 {
2295 return any(a.lo) || any(a.hi);
2296 }
2297
2298=== modified file 'lib/kernel/as_type.cl'
2299--- lib/kernel/as_type.cl 2011-10-20 19:46:45 +0000
2300+++ lib/kernel/as_type.cl 2011-10-31 17:03:23 +0000
2301@@ -22,7 +22,7 @@
2302 */
2303
2304 #define DEFINE_AS_TYPE(SRC, DST) \
2305- DST __attribute__ ((overloadable)) \
2306+ DST __attribute__ ((__overloadable__)) \
2307 as_##DST(SRC a) \
2308 { \
2309 return *(DST*)&a; \
2310
2311=== modified file 'lib/kernel/ceil.cl'
2312--- lib/kernel/ceil.cl 2011-10-25 18:52:31 +0000
2313+++ lib/kernel/ceil.cl 2011-10-31 17:03:23 +0000
2314@@ -21,134 +21,6 @@
2315 THE SOFTWARE.
2316 */
2317
2318-
2319-
2320-#define _MM_FROUND_TO_NEAREST_INT 0x00
2321-#define _MM_FROUND_TO_NEG_INF 0x01
2322-#define _MM_FROUND_TO_POS_INF 0x02
2323-#define _MM_FROUND_TO_ZERO 0x03
2324-#define _MM_FROUND_CUR_DIRECTION 0x04
2325-
2326-#define _MM_FROUND_RAISE_EXC 0x00
2327-#define _MM_FROUND_NO_EXC 0x08
2328-
2329-#define _MM_FROUND_NINT (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
2330-#define _MM_FROUND_FLOOR (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
2331-#define _MM_FROUND_CEIL (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
2332-#define _MM_FROUND_TRUNC (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
2333-#define _MM_FROUND_RINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
2334-#define _MM_FROUND_NEARBYINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
2335-
2336-
2337-
2338-float __attribute__ ((overloadable))
2339-cl_ceil(float a)
2340-{
2341-#ifdef __SSE4_1__
2342- // LLVM does not optimise this on its own
2343- return ((float4)__builtin_ia32_roundss(*(float4*)&a, *(float4*)&a,
2344- _MM_FROUND_CEIL)).s0;
2345-#else
2346- return __builtin_ceilf(a);
2347-#endif
2348-}
2349-
2350-float2 __attribute__ ((overloadable))
2351-cl_ceil(float2 a)
2352-{
2353-#ifdef __SSE4_1__
2354- return ((float4)cl_ceil(*(float4)&a)).s01;
2355-#else
2356- return (float2)(cl_ceil(a.lo), cl_ceil(a.hi));
2357-#endif
2358-}
2359-
2360-float3 __attribute__ ((overloadable))
2361-cl_ceil(float3 a)
2362-{
2363-#ifdef __SSE4_1__
2364- return ((float4)cl_ceil(*(float4)&a)).s012;
2365-#else
2366- return (float3)(cl_ceil(a.s01), cl_ceil(a.s2));
2367-#endif
2368-}
2369-
2370-float4 __attribute__ ((overloadable))
2371-cl_ceil(float4 a)
2372-{
2373-#ifdef __SSE4_1__
2374- return __builtin_ia32_roundps(a, _MM_FROUND_CEIL);
2375-#else
2376- return (float4)(cl_ceil(a.lo), cl_ceil(a.hi));
2377-#endif
2378-}
2379-
2380-float8 __attribute__ ((overloadable))
2381-cl_ceil(float8 a)
2382-{
2383-#ifdef __AVX__
2384- return __builtin_ia32_roundps256(a, _MM_FROUND_CEIL);
2385-#else
2386- return (float8)(cl_ceil(a.lo), cl_ceil(a.hi));
2387-#endif
2388-}
2389-
2390-float16 __attribute__ ((overloadable))
2391-cl_ceil(float16 a)
2392-{
2393- return (float16)(cl_ceil(a.lo), cl_ceil(a.hi));
2394-}
2395-
2396-double __attribute__ ((overloadable))
2397-cl_ceil(double a)
2398-{
2399-#ifdef __SSE4_1__
2400- // LLVM does not optimise this on its own
2401- return ((double2)__builtin_ia32_roundss(*(double2*)&a, *(double2*)&a,
2402- _MM_FROUND_CEIL)).s0;
2403-#else
2404- return __builtin_ceil(a);
2405-#endif
2406-}
2407-
2408-double2 __attribute__ ((overloadable))
2409-cl_ceil(double2 a)
2410-{
2411-#ifdef __SSE4_1__
2412- return __builtin_ia32_roundpd(a, _MM_FROUND_CEIL);
2413-#else
2414- return (double2)(cl_ceil(a.lo), cl_ceil(a.hi));
2415-#endif
2416-}
2417-
2418-double3 __attribute__ ((overloadable))
2419-cl_ceil(double3 a)
2420-{
2421-#ifdef __AVX__
2422- return ((double4)cl_ceil(*(double4)&a)).s012;
2423-#else
2424- return (double3)(cl_ceil(a.s01), cl_ceil(a.s2));
2425-#endif
2426-}
2427-
2428-double4 __attribute__ ((overloadable))
2429-cl_ceil(double4 a)
2430-{
2431-#ifdef __AVX__
2432- return __builtin_ia32_roundpd256(a, _MM_FROUND_CEIL);
2433-#else
2434- return (double4)(cl_ceil(a.lo), cl_ceil(a.hi));
2435-#endif
2436-}
2437-
2438-double8 __attribute__ ((overloadable))
2439-cl_ceil(double8 a)
2440-{
2441- return (double8)(cl_ceil(a.lo), cl_ceil(a.hi));
2442-}
2443-
2444-double16 __attribute__ ((overloadable))
2445-cl_ceil(double16 a)
2446-{
2447- return (double16)(cl_ceil(a.lo), cl_ceil(a.hi));
2448-}
2449+#include "templates.h"
2450+
2451+DEFINE_BUILTIN_V_V(ceil)
2452
2453=== modified file 'lib/kernel/convert_type.cl'
2454--- lib/kernel/convert_type.cl 2011-10-26 19:49:23 +0000
2455+++ lib/kernel/convert_type.cl 2011-10-31 17:03:23 +0000
2456@@ -24,19 +24,19 @@
2457 #include "templates.h"
2458
2459 #define DEFINE_CONVERT_TYPE(SRC, DST) \
2460- DST __attribute__ ((overloadable)) convert_##DST(SRC a) \
2461+ DST __attribute__ ((__overloadable__)) convert_##DST(SRC a) \
2462 { \
2463 return (DST)a; \
2464 }
2465
2466 #define DEFINE_CONVERT_TYPE_HALF(SRC, DST, HALFDST) \
2467- DST __attribute__ ((overloadable)) convert_##DST(SRC a) \
2468+ DST __attribute__ ((__overloadable__)) convert_##DST(SRC a) \
2469 { \
2470 return (DST)(convert_##HALFDST(a.lo), convert_##HALFDST(a.hi)); \
2471 }
2472
2473 #define DEFINE_CONVERT_TYPE_012(SRC, DST, DST01, DST2) \
2474- DST __attribute__ ((overloadable)) convert_##DST(SRC a) \
2475+ DST __attribute__ ((__overloadable__)) convert_##DST(SRC a) \
2476 { \
2477 return (DST)(convert_##DST01(a.s01), convert_##DST2(a.s2)); \
2478 }
2479
2480=== modified file 'lib/kernel/copysign.cl'
2481--- lib/kernel/copysign.cl 2011-10-25 16:28:54 +0000
2482+++ lib/kernel/copysign.cl 2011-10-31 17:03:23 +0000
2483@@ -21,110 +21,6 @@
2484 THE SOFTWARE.
2485 */
2486
2487-float __attribute__ ((overloadable))
2488-copysign(float a, float b)
2489-{
2490- return __builtin_copysignf(a, b);
2491-}
2492-
2493-float2 __attribute__ ((overloadable))
2494-copysign(float2 a, float2 b)
2495-{
2496-#ifdef __SSE__
2497- return copysign(*(float4*)&a, *(float4*)&b).s01;
2498-#else
2499- return (float2)(copysign(a.lo, b.lo), copysign(a.hi, b.hi));
2500-#endif
2501-}
2502-
2503-float3 __attribute__ ((overloadable))
2504-copysign(float3 a, float3 b)
2505-{
2506-#ifdef __SSE__
2507- return copysign(*(float4*)&a, *(float4*)&b).s012;
2508-#else
2509- return (float3)(copysign(a.s01, b.s01), copysign(a.s2, b.s2));
2510-#endif
2511-}
2512-
2513-float4 __attribute__ ((overloadable))
2514-copysign(float4 a, float4 b)
2515-{
2516-#ifdef __SSE__
2517- const uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U};
2518- return as_float4((~sign_mask & as_uint4(a)) | (sign_mask & as_uint4(b)));
2519-#else
2520- return (float4)(copysign(a.lo, b.lo), copysign(a.hi, b.hi));
2521-#endif
2522-}
2523-
2524-float8 __attribute__ ((overloadable))
2525-copysign(float8 a, float8 b)
2526-{
2527-#ifdef __AVX__
2528- const uint8 sign_mask =
2529- {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U,
2530- 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U};
2531- return as_float8((~sign_mask & as_uint8(a)) | (sign_mask & as_uint8(b)));
2532-#else
2533- return (float8)(copysign(a.lo, b.lo), copysign(a.hi, b.hi));
2534-#endif
2535-}
2536-
2537-float16 __attribute__ ((overloadable))
2538-copysign(float16 a, float16 b)
2539-{
2540- return (float16)(copysign(a.lo, b.lo), copysign(a.hi, b.hi));
2541-}
2542-
2543-double __attribute__ ((overloadable))
2544-copysign(double a, double b)
2545-{
2546- return __builtin_copysign(a, b);
2547-}
2548-
2549-double2 __attribute__ ((overloadable))
2550-copysign(double2 a, double2 b)
2551-{
2552-#ifdef __SSE2__
2553- const ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL};
2554- return as_double2((~sign_mask & as_ulong2(a)) | (sign_mask & as_ulong2(b)));
2555-#else
2556- return (double2)(copysign(a.lo, b.lo), copysign(a.hi, b.hi));
2557-#endif
2558-}
2559-
2560-double3 __attribute__ ((overloadable))
2561-copysign(double3 a, double3 b)
2562-{
2563-#ifdef __AVX__
2564- return copysign(*(double4*)&a, *(double4*)&b).s012;
2565-#else
2566- return (double3)(copysign(a.s01, b.s01), copysign(a.s2, b.s2));
2567-#endif
2568-}
2569-
2570-double4 __attribute__ ((overloadable))
2571-copysign(double4 a, double4 b)
2572-{
2573-#ifdef __AVX__
2574- const ulong4 sign_mask =
2575- {0x8000000000000000UL, 0x8000000000000000UL,
2576- 0x8000000000000000UL, 0x8000000000000000UL};
2577- return as_double4((~sign_mask & as_ulong4(a)) | (sign_mask & as_ulong4(b)));
2578-#else
2579- return (double4)(copysign(a.lo, b.hi), copysign(a.lo, b.hi));
2580-#endif
2581-}
2582-
2583-double8 __attribute__ ((overloadable))
2584-copysign(double8 a, double8 b)
2585-{
2586- return (double8)(copysign(a.lo, b.lo), copysign(a.hi, b.hi));
2587-}
2588-
2589-double16 __attribute__ ((overloadable))
2590-copysign(double16 a, double16 b)
2591-{
2592- return (double16)(copysign(a.lo, b.lo), copysign(a.hi, b.hi));
2593-}
2594+#include "templates.h"
2595+
2596+DEFINE_BUILTIN_V_VV(copysign)
2597
2598=== modified file 'lib/kernel/cross.cl'
2599--- lib/kernel/cross.cl 2011-10-27 01:35:56 +0000
2600+++ lib/kernel/cross.cl 2011-10-31 17:03:23 +0000
2601@@ -21,24 +21,24 @@
2602 THE SOFTWARE.
2603 */
2604
2605-float4 __attribute__ ((overloadable)) cross(float4 a, float4 b)
2606+float4 __attribute__ ((__overloadable__)) cross(float4 a, float4 b)
2607 {
2608 return (float4)(cross(a.xyz, b.xyz), 0.0f);
2609 }
2610
2611-float3 __attribute__ ((overloadable)) cross(float3 a, float3 b)
2612+float3 __attribute__ ((__overloadable__)) cross(float3 a, float3 b)
2613 {
2614 return (float3)(a.y * b.z - a.z * b.y,
2615 a.z * b.x - a.x * b.z,
2616 a.x * b.y - a.y * b.x);
2617 }
2618
2619-double4 __attribute__ ((overloadable)) cross(double4 a, double4 b)
2620+double4 __attribute__ ((__overloadable__)) cross(double4 a, double4 b)
2621 {
2622 return (double4)(cross(a.xyz, b.xyz), 0.0f);
2623 }
2624
2625-double3 __attribute__ ((overloadable)) cross(double3 a, double3 b)
2626+double3 __attribute__ ((__overloadable__)) cross(double3 a, double3 b)
2627 {
2628 return (double3)(a.y * b.z - a.z * b.y,
2629 a.z * b.x - a.x * b.z,
2630
2631=== modified file 'lib/kernel/dot.cl'
2632--- lib/kernel/dot.cl 2011-10-27 01:35:56 +0000
2633+++ lib/kernel/dot.cl 2011-10-31 17:03:23 +0000
2634@@ -21,62 +21,62 @@
2635 THE SOFTWARE.
2636 */
2637
2638-float __attribute__ ((overloadable)) dot(float a, float b)
2639-{
2640- return a * b;
2641-}
2642-
2643-float __attribute__ ((overloadable)) dot(float2 a, float2 b)
2644-{
2645- return a.lo * b.lo + a.hi * b.hi;
2646-}
2647-
2648-float __attribute__ ((overloadable)) dot(float3 a, float3 b)
2649-{
2650- return dot(a.s01, b.s01) + a.s2 * b.s2;
2651-}
2652-
2653-float __attribute__ ((overloadable)) dot(float4 a, float4 b)
2654-{
2655- return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2656-}
2657-
2658-float __attribute__ ((overloadable)) dot(float8 a, float8 b)
2659-{
2660- return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2661-}
2662-
2663-float __attribute__ ((overloadable)) dot(float16 a, float16 b)
2664-{
2665- return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2666-}
2667-
2668-double __attribute__ ((overloadable)) dot(double a, double b)
2669-{
2670- return a * b;
2671-}
2672-
2673-double __attribute__ ((overloadable)) dot(double2 a, double2 b)
2674-{
2675- return a.lo * b.lo + a.hi * b.hi;
2676-}
2677-
2678-double __attribute__ ((overloadable)) dot(double3 a, double3 b)
2679-{
2680- return dot(a.s01, b.s01) + a.s2 * b.s2;
2681-}
2682-
2683-double __attribute__ ((overloadable)) dot(double4 a, double4 b)
2684-{
2685- return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2686-}
2687-
2688-double __attribute__ ((overloadable)) dot(double8 a, double8 b)
2689-{
2690- return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2691-}
2692-
2693-double __attribute__ ((overloadable)) dot(double16 a, double16 b)
2694+float __attribute__ ((__overloadable__)) dot(float a, float b)
2695+{
2696+ return a * b;
2697+}
2698+
2699+float __attribute__ ((__overloadable__)) dot(float2 a, float2 b)
2700+{
2701+ return a.lo * b.lo + a.hi * b.hi;
2702+}
2703+
2704+float __attribute__ ((__overloadable__)) dot(float3 a, float3 b)
2705+{
2706+ return dot(a.s01, b.s01) + a.s2 * b.s2;
2707+}
2708+
2709+float __attribute__ ((__overloadable__)) dot(float4 a, float4 b)
2710+{
2711+ return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2712+}
2713+
2714+float __attribute__ ((__overloadable__)) dot(float8 a, float8 b)
2715+{
2716+ return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2717+}
2718+
2719+float __attribute__ ((__overloadable__)) dot(float16 a, float16 b)
2720+{
2721+ return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2722+}
2723+
2724+double __attribute__ ((__overloadable__)) dot(double a, double b)
2725+{
2726+ return a * b;
2727+}
2728+
2729+double __attribute__ ((__overloadable__)) dot(double2 a, double2 b)
2730+{
2731+ return a.lo * b.lo + a.hi * b.hi;
2732+}
2733+
2734+double __attribute__ ((__overloadable__)) dot(double3 a, double3 b)
2735+{
2736+ return dot(a.s01, b.s01) + a.s2 * b.s2;
2737+}
2738+
2739+double __attribute__ ((__overloadable__)) dot(double4 a, double4 b)
2740+{
2741+ return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2742+}
2743+
2744+double __attribute__ ((__overloadable__)) dot(double8 a, double8 b)
2745+{
2746+ return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2747+}
2748+
2749+double __attribute__ ((__overloadable__)) dot(double16 a, double16 b)
2750 {
2751 return dot(a.lo, b.lo) + dot(a.hi, b.hi);
2752 }
2753
2754=== modified file 'lib/kernel/fabs.cl'
2755--- lib/kernel/fabs.cl 2011-10-25 16:28:54 +0000
2756+++ lib/kernel/fabs.cl 2011-10-31 17:03:23 +0000
2757@@ -21,111 +21,6 @@
2758 THE SOFTWARE.
2759 */
2760
2761-float __attribute__ ((overloadable))
2762-fabs(float a)
2763-{
2764- return __builtin_fabsf(a);
2765-}
2766-
2767-float2 __attribute__ ((overloadable))
2768-fabs(float2 a)
2769-{
2770-#ifdef __SSE__
2771- const uint2 sign_mask = {0x80000000U, 0x80000000U};
2772- return as_float2(~sign_mask & as_uint2(a));
2773-#else
2774- return (float2)(fabs(a.lo), fabs(a.hi));
2775-#endif
2776-}
2777-
2778-float3 __attribute__ ((overloadable))
2779-fabs(float3 a)
2780-{
2781-#ifdef __SSE__
2782- return fabs(*(float4*)&a).s012;
2783-#else
2784- return (float3)(fabs(a.s01), fabs(a.s2));
2785-#endif
2786-}
2787-
2788-float4 __attribute__ ((overloadable))
2789-fabs(float4 a)
2790-{
2791-#ifdef __SSE__
2792- const uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U};
2793- return as_float4(~sign_mask & as_uint4(a));
2794-#else
2795- return (float4)(fabs(a.lo), fabs(a.hi));
2796-#endif
2797-}
2798-
2799-float8 __attribute__ ((overloadable))
2800-fabs(float8 a)
2801-{
2802-#ifdef __AVX__
2803- const uint8 sign_mask =
2804- {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U,
2805- 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U};
2806- return as_float8(~sign_mask & as_uint8(a));
2807-#else
2808- return (float8)(fabs(a.lo), fabs(a.hi));
2809-#endif
2810-}
2811-
2812-float16 __attribute__ ((overloadable))
2813-fabs(float16 a)
2814-{
2815- return (float16)(fabs(a.lo), fabs(a.hi));
2816-}
2817-
2818-double __attribute__ ((overloadable))
2819-fabs(double a)
2820-{
2821- return __builtin_fabs(a);
2822-}
2823-
2824-double2 __attribute__ ((overloadable))
2825-fabs(double2 a)
2826-{
2827-#ifdef __SSE2__
2828- const ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL};
2829- return as_double2(~sign_mask & as_ulong2(a));
2830-#else
2831- return (double2)(fabs(a.lo), fabs(a.hi));
2832-#endif
2833-}
2834-
2835-double3 __attribute__ ((overloadable))
2836-fabs(double3 a)
2837-{
2838-#ifdef __AVX__
2839- return fabs(*(double4*)&a).s012;
2840-#else
2841- return (double3)(fabs(a.s01), fabs(a.s2));
2842-#endif
2843-}
2844-
2845-double4 __attribute__ ((overloadable))
2846-fabs(double4 a)
2847-{
2848-#ifdef __AVX__
2849- const ulong4 sign_mask =
2850- {0x8000000000000000UL, 0x8000000000000000UL,
2851- 0x8000000000000000UL, 0x8000000000000000UL};
2852- return as_double4(~sign_mask & as_ulong4(a));
2853-#else
2854- return (double4)(fabs(a.lo), fabs(a.hi));
2855-#endif
2856-}
2857-
2858-double8 __attribute__ ((overloadable))
2859-fabs(double8 a)
2860-{
2861- return (double8)(fabs(a.lo), fabs(a.hi));
2862-}
2863-
2864-double16 __attribute__ ((overloadable))
2865-fabs(double16 a)
2866-{
2867- return (double16)(fabs(a.lo), fabs(a.hi));
2868-}
2869+#include "templates.h"
2870+
2871+DEFINE_BUILTIN_V_V(fabs)
2872
2873=== modified file 'lib/kernel/floor.cl'
2874--- lib/kernel/floor.cl 2011-10-25 18:52:31 +0000
2875+++ lib/kernel/floor.cl 2011-10-31 17:03:23 +0000
2876@@ -21,132 +21,6 @@
2877 THE SOFTWARE.
2878 */
2879
2880-#define _MM_FROUND_TO_NEAREST_INT 0x00
2881-#define _MM_FROUND_TO_NEG_INF 0x01
2882-#define _MM_FROUND_TO_POS_INF 0x02
2883-#define _MM_FROUND_TO_ZERO 0x03
2884-#define _MM_FROUND_CUR_DIRECTION 0x04
2885-
2886-#define _MM_FROUND_RAISE_EXC 0x00
2887-#define _MM_FROUND_NO_EXC 0x08
2888-
2889-#define _MM_FROUND_NINT (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
2890-#define _MM_FROUND_FLOOR (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
2891-#define _MM_FROUND_CEIL (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
2892-#define _MM_FROUND_TRUNC (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
2893-#define _MM_FROUND_RINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
2894-#define _MM_FROUND_NEARBYINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
2895-
2896-
2897-
2898-float __attribute__ ((overloadable))
2899-floor(float a)
2900-{
2901-#ifdef __SSE4_1__
2902- // LLVM does not optimise this on its own
2903- return ((float4)__builtin_ia32_roundss(*(float4*)&a, *(float4*)&a,
2904- _MM_FROUND_FLOOR)).s0;
2905-#else
2906- return __builtin_floorf(a);
2907-#endif
2908-}
2909-
2910-float2 __attribute__ ((overloadable))
2911-floor(float2 a)
2912-{
2913-#ifdef __SSE4_1__
2914- return ((float4)floor(*(float4)&a)).s01;
2915-#else
2916- return (float2)(floor(a.lo), floor(a.hi));
2917-#endif
2918-}
2919-
2920-float3 __attribute__ ((overloadable))
2921-floor(float3 a)
2922-{
2923-#ifdef __SSE4_1__
2924- return ((float4)floor(*(float4)&a)).s012;
2925-#else
2926- return (float3)(floor(a.s01), floor(a.s2));
2927-#endif
2928-}
2929-
2930-float4 __attribute__ ((overloadable))
2931-floor(float4 a)
2932-{
2933-#ifdef __SSE4_1__
2934- return __builtin_ia32_roundps(a, _MM_FROUND_FLOOR);
2935-#else
2936- return (float4)(floor(a.lo), floor(a.hi));
2937-#endif
2938-}
2939-
2940-float8 __attribute__ ((overloadable))
2941-floor(float8 a)
2942-{
2943-#ifdef __AVX__
2944- return __builtin_ia32_roundps256(a, _MM_FROUND_FLOOR);
2945-#else
2946- return (float8)(floor(a.lo), floor(a.hi));
2947-#endif
2948-}
2949-
2950-float16 __attribute__ ((overloadable))
2951-floor(float16 a)
2952-{
2953- return (float16)(floor(a.lo), floor(a.hi));
2954-}
2955-
2956-double __attribute__ ((overloadable))
2957-floor(double a)
2958-{
2959-#ifdef __SSE4_1__
2960- // LLVM does not optimise this on its own
2961- return ((double2)__builtin_ia32_roundss(*(double2*)&a, *(double2*)&a,
2962- _MM_FROUND_FLOOR)).s0;
2963-#else
2964- return __builtin_floor(a);
2965-#endif
2966-}
2967-
2968-double2 __attribute__ ((overloadable))
2969-floor(double2 a)
2970-{
2971-#ifdef __SSE4_1__
2972- return __builtin_ia32_roundpd(a, _MM_FROUND_FLOOR);
2973-#else
2974- return (double2)(floor(a.lo), floor(a.hi));
2975-#endif
2976-}
2977-
2978-double3 __attribute__ ((overloadable))
2979-floor(double3 a)
2980-{
2981-#ifdef __AVX__
2982- return ((double4)floor(*(double4)&a)).s012;
2983-#else
2984- return (double3)(floor(a.s01), floor(a.s2));
2985-#endif
2986-}
2987-
2988-double4 __attribute__ ((overloadable))
2989-floor(double4 a)
2990-{
2991-#ifdef __AVX__
2992- return __builtin_ia32_roundpd256(a, _MM_FROUND_FLOOR);
2993-#else
2994- return (double4)(floor(a.lo), floor(a.hi));
2995-#endif
2996-}
2997-
2998-double8 __attribute__ ((overloadable))
2999-floor(double8 a)
3000-{
3001- return (double8)(floor(a.lo), floor(a.hi));
3002-}
3003-
3004-double16 __attribute__ ((overloadable))
3005-floor(double16 a)
3006-{
3007- return (double16)(floor(a.lo), floor(a.hi));
3008-}
3009+#include "templates.h"
3010+
3011+DEFINE_BUILTIN_V_V(floor)
3012
3013=== modified file 'lib/kernel/fma.cl'
3014--- lib/kernel/fma.cl 2011-10-26 03:01:29 +0000
3015+++ lib/kernel/fma.cl 2011-10-31 17:03:23 +0000
3016@@ -23,5 +23,7 @@
3017
3018 #include "templates.h"
3019
3020+#define __builtin__cl_std_fmaf __builtin_fmaf
3021+#define __builtin__cl_std_fma __builtin_fma
3022+
3023 DEFINE_BUILTIN_V_VVV(fma)
3024-// DEFINE_EXPR_V_VVV(fma, a*b+c)
3025
3026=== modified file 'lib/kernel/fmax.cl'
3027--- lib/kernel/fmax.cl 2011-10-27 02:59:34 +0000
3028+++ lib/kernel/fmax.cl 2011-10-31 17:03:23 +0000
3029@@ -21,134 +21,12 @@
3030 THE SOFTWARE.
3031 */
3032
3033+#undef fmax
3034+
3035 #include "templates.h"
3036
3037-DEFINE_EXPR_V_VS(fmax, fmax(a, (vtype)b))
3038-
3039-
3040-
3041-float4 _cl_fmax_ensure_float4(float4 a)
3042-{
3043- return a;
3044-}
3045-
3046-double2 _cl_fmax_ensure_double2(double2 a)
3047-{
3048- return a;
3049-}
3050-
3051-
3052-
3053-float __attribute__ ((overloadable))
3054-fmax(float a, float b)
3055-{
3056-#ifdef __SSE__
3057- // LLVM does not optimise this on its own
3058- // Can't convert to float4 (why?)
3059- // return ((float4)__builtin_ia32_maxss(*(float4*)&a, *(float4*)&b)).s0;
3060- return _cl_fmax_ensure_float4(__builtin_ia32_maxss(*(float4*)&a, *(float4*)&b)).s0;
3061-#else
3062- return __builtin_fmaxf(a, b);
3063-#endif
3064-}
3065-
3066-float2 __attribute__ ((overloadable))
3067-fmax(float2 a, float2 b)
3068-{
3069-#ifdef __SSE__
3070- return ((float4)fmax(*(float4*)&a, *(float4*)&b)).s01;
3071-#else
3072- return (float2)(fmax(a.lo, b.lo), fmax(a.hi, b.hi));
3073-#endif
3074-}
3075-
3076-float3 __attribute__ ((overloadable))
3077-fmax(float3 a, float3 b)
3078-{
3079-#ifdef __SSE__
3080- return ((float4)fmax(*(float4*)&a, *(float4*)&b)).s012;
3081-#else
3082- return (float3)(fmax(a.s01, b.s01), fmax(a.s2, b.s2));
3083-#endif
3084-}
3085-
3086-float4 __attribute__ ((overloadable))
3087-fmax(float4 a, float4 b)
3088-{
3089-#ifdef __SSE__
3090- return __builtin_ia32_maxps(a, b);
3091-#else
3092- return (float4)(fmax(a.lo, b.lo), fmax(a.hi, b.hi));
3093-#endif
3094-}
3095-
3096-float8 __attribute__ ((overloadable))
3097-fmax(float8 a, float8 b)
3098-{
3099-#ifdef __AVX__
3100- return __builtin_ia32_maxps256(a, b);
3101-#else
3102- return (float8)(fmax(a.lo, b.lo), fmax(a.hi, b.hi));
3103-#endif
3104-}
3105-
3106-float16 __attribute__ ((overloadable))
3107-fmax(float16 a, float16 b)
3108-{
3109- return (float16)(fmax(a.lo, b.lo), fmax(a.hi, b.hi));
3110-}
3111-
3112-double __attribute__ ((overloadable))
3113-fmax(double a, double b)
3114-{
3115-#ifdef __SSE2__
3116- // LLVM does not optimise this on its own
3117- // Can't convert to double2 (why?)
3118- // return ((double2)__builtin_ia32_maxsd(*(double2*)&a, *(double2*)&b)).s0;
3119- return _cl_fmax_ensure_double2(__builtin_ia32_maxsd(*(double2*)&a, *(double2*)&b)).s0;
3120-#else
3121- return __builtin_fmax(a, b);
3122-#endif
3123-}
3124-
3125-double2 __attribute__ ((overloadable))
3126-fmax(double2 a, double2 b)
3127-{
3128-#ifdef __SSE2__
3129- return __builtin_ia32_maxpd(a, b);
3130-#else
3131- return (double2)(fmax(a.lo, b.lo), fmax(a.hi, b.hi));
3132-#endif
3133-}
3134-
3135-double3 __attribute__ ((overloadable))
3136-fmax(double3 a, double3 b)
3137-{
3138-#ifdef __AVX__
3139- return ((double4)fmax(*(double4*)&a, *(double4*)&b)).s012;
3140-#else
3141- return (double3)(fmax(a.s01, b.s01), fmax(a.s2, b.s2));
3142-#endif
3143-}
3144-
3145-double4 __attribute__ ((overloadable))
3146-fmax(double4 a, double4 b)
3147-{
3148-#ifdef __AVX__
3149- return __builtin_ia32_maxpd256(a, b);
3150-#else
3151- return (double4)(fmax(a.lo, b.lo), fmax(a.hi, b.hi));
3152-#endif
3153-}
3154-
3155-double8 __attribute__ ((overloadable))
3156-fmax(double8 a, double8 b)
3157-{
3158- return (double8)(fmax(a.lo, b.lo), fmax(a.hi, b.hi));
3159-}
3160-
3161-double16 __attribute__ ((overloadable))
3162-fmax(double16 a, double16 b)
3163-{
3164- return (double16)(fmax(a.lo, b.lo), fmax(a.hi, b.hi));
3165-}
3166+#define __builtin__cl_std_fmaxf __builtin_fmaxf
3167+#define __builtin__cl_std_fmax __builtin_fmax
3168+DEFINE_BUILTIN_V_VV(_cl_std_fmax)
3169+
3170+DEFINE_EXPR_V_VS(_cl_std_fmax, _cl_std_fmax(a, (vtype)b))
3171
3172=== modified file 'lib/kernel/fmin.cl'
3173--- lib/kernel/fmin.cl 2011-10-27 02:59:34 +0000
3174+++ lib/kernel/fmin.cl 2011-10-31 17:03:23 +0000
3175@@ -21,134 +21,12 @@
3176 THE SOFTWARE.
3177 */
3178
3179+#undef fmin
3180+
3181 #include "templates.h"
3182
3183-DEFINE_EXPR_V_VS(fmin, fmin(a, (vtype)b))
3184-
3185-
3186-
3187-float4 _cl_fmin_ensure_float4(float4 a)
3188-{
3189- return a;
3190-}
3191-
3192-double2 _cl_fmin_ensure_double2(double2 a)
3193-{
3194- return a;
3195-}
3196-
3197-
3198-
3199-float __attribute__ ((overloadable))
3200-fmin(float a, float b)
3201-{
3202-#ifdef __SSE__
3203- // LLVM does not optimise this on its own
3204- // Can't convert to float4 (why?)
3205- // return ((float4)__builtin_ia32_minss(*(float4*)&a, *(float4*)&b)).s0;
3206- return _cl_fmin_ensure_float4(__builtin_ia32_minss(*(float4*)&a, *(float4*)&b)).s0;
3207-#else
3208- return __builtin_fminf(a, b);
3209-#endif
3210-}
3211-
3212-float2 __attribute__ ((overloadable))
3213-fmin(float2 a, float2 b)
3214-{
3215-#ifdef __SSE__
3216- return ((float4)fmin(*(float4*)&a, *(float4*)&b)).s01;
3217-#else
3218- return (float2)(fmin(a.lo, b.lo), fmin(a.hi, b.hi));
3219-#endif
3220-}
3221-
3222-float3 __attribute__ ((overloadable))
3223-fmin(float3 a, float3 b)
3224-{
3225-#ifdef __SSE__
3226- return ((float4)fmin(*(float4*)&a, *(float4*)&b)).s012;
3227-#else
3228- return (float3)(fmin(a.s01, b.s01), fmin(a.s2, b.s2));
3229-#endif
3230-}
3231-
3232-float4 __attribute__ ((overloadable))
3233-fmin(float4 a, float4 b)
3234-{
3235-#ifdef __SSE__
3236- return __builtin_ia32_minps(a, b);
3237-#else
3238- return (float4)(fmin(a.lo, b.lo), fmin(a.hi, b.hi));
3239-#endif
3240-}
3241-
3242-float8 __attribute__ ((overloadable))
3243-fmin(float8 a, float8 b)
3244-{
3245-#ifdef __AVX__
3246- return __builtin_ia32_minps256(a, b);
3247-#else
3248- return (float8)(fmin(a.lo, b.lo), fmin(a.hi, b.hi));
3249-#endif
3250-}
3251-
3252-float16 __attribute__ ((overloadable))
3253-fmin(float16 a, float16 b)
3254-{
3255- return (float16)(fmin(a.lo, b.lo), fmin(a.hi, b.hi));
3256-}
3257-
3258-double __attribute__ ((overloadable))
3259-fmin(double a, double b)
3260-{
3261-#ifdef __SSE2__
3262- // LLVM does not optimise this on its own
3263- // Can't convert to double2 (why?)
3264- // return ((double2)__builtin_ia32_minsd(*(double2*)&a, *(double2*)&b)).s0;
3265- return _cl_fmin_ensure_double2(__builtin_ia32_minsd(*(double2*)&a, *(double2*)&b)).s0;
3266-#else
3267- return __builtin_fmin(a, b);
3268-#endif
3269-}
3270-
3271-double2 __attribute__ ((overloadable))
3272-fmin(double2 a, double2 b)
3273-{
3274-#ifdef __SSE2__
3275- return __builtin_ia32_minpd(a, b);
3276-#else
3277- return (double2)(fmin(a.lo, b.lo), fmin(a.hi, b.hi));
3278-#endif
3279-}
3280-
3281-double3 __attribute__ ((overloadable))
3282-fmin(double3 a, double3 b)
3283-{
3284-#ifdef __AVX__
3285- return ((double4)fmin(*(double4*)&a, *(double4*)&b)).s012;
3286-#else
3287- return (double3)(fmin(a.s01, b.s01), fmin(a.s2, b.s2));
3288-#endif
3289-}
3290-
3291-double4 __attribute__ ((overloadable))
3292-fmin(double4 a, double4 b)
3293-{
3294-#ifdef __AVX__
3295- return __builtin_ia32_minpd256(a, b);
3296-#else
3297- return (double4)(fmin(a.lo, b.lo), fmin(a.hi, b.hi));
3298-#endif
3299-}
3300-
3301-double8 __attribute__ ((overloadable))
3302-fmin(double8 a, double8 b)
3303-{
3304- return (double8)(fmin(a.lo, b.lo), fmin(a.hi, b.hi));
3305-}
3306-
3307-double16 __attribute__ ((overloadable))
3308-fmin(double16 a, double16 b)
3309-{
3310- return (double16)(fmin(a.lo, b.lo), fmin(a.hi, b.hi));
3311-}
3312+#define __builtin__cl_std_fminf __builtin_fminf
3313+#define __builtin__cl_std_fmin __builtin_fmin
3314+DEFINE_BUILTIN_V_VV(_cl_std_fmin)
3315+
3316+DEFINE_EXPR_V_VS(_cl_std_fmin, _cl_std_fmin(a, (vtype)b))
3317
3318=== modified file 'lib/kernel/max.cl'
3319--- lib/kernel/max.cl 2011-10-26 21:01:40 +0000
3320+++ lib/kernel/max.cl 2011-10-31 17:03:23 +0000
3321@@ -27,5 +27,5 @@
3322 DEFINE_EXPR_G_GS(max, max(a, (gtype)b))
3323
3324 // Note: max() has no special semantics for inf/nan, even if fmax does
3325-DEFINE_EXPR_V_VV(max, fmax(a, b))
3326+DEFINE_EXPR_V_VV(max, select(b, a, (jtype)(a>=b)))
3327 DEFINE_EXPR_V_VS(max, max(a, (vtype)b))
3328
3329=== modified file 'lib/kernel/maxmag.cl'
3330--- lib/kernel/maxmag.cl 2011-10-26 03:01:29 +0000
3331+++ lib/kernel/maxmag.cl 2011-10-31 17:03:23 +0000
3332@@ -23,4 +23,18 @@
3333
3334 #include "templates.h"
3335
3336-DEFINE_EXPR_V_VV(maxmag, fmax(fabs(a), fabs(b)))
3337+float __builtin_maxmagf(float x, float y)
3338+{
3339+ if (fabs(x) > fabs(y)) return x;
3340+ if (fabs(y) > fabs(x)) return y;
3341+ return fmax(x, y);
3342+}
3343+
3344+double __builtin_maxmag(double x, double y)
3345+{
3346+ if (fabs(x) > fabs(y)) return x;
3347+ if (fabs(y) > fabs(x)) return y;
3348+ return fmax(x, y);
3349+}
3350+
3351+DEFINE_BUILTIN_V_VV(maxmag)
3352
3353=== modified file 'lib/kernel/min.cl'
3354--- lib/kernel/min.cl 2011-10-26 21:01:40 +0000
3355+++ lib/kernel/min.cl 2011-10-31 17:03:23 +0000
3356@@ -23,9 +23,9 @@
3357
3358 #include "templates.h"
3359
3360-DEFINE_EXPR_G_GG(min, a<b ? a : b)
3361+DEFINE_EXPR_G_GG(min, a<=b ? a : b)
3362 DEFINE_EXPR_G_GS(min, min(a, (gtype)b))
3363
3364 // Note: min() has no special semantics for inf/nan, even if fmin does
3365-DEFINE_EXPR_V_VV(min, fmin(a, b))
3366+DEFINE_EXPR_V_VV(min, select(b, a, (jtype)(a<=b)))
3367 DEFINE_EXPR_V_VS(min, min(a, (vtype)b))
3368
3369=== modified file 'lib/kernel/minmag.cl'
3370--- lib/kernel/minmag.cl 2011-10-26 03:01:29 +0000
3371+++ lib/kernel/minmag.cl 2011-10-31 17:03:23 +0000
3372@@ -23,4 +23,18 @@
3373
3374 #include "templates.h"
3375
3376-DEFINE_EXPR_V_VV(minmag, fmin(fabs(a), fabs(b)))
3377+float __builtin_minmagf(float x, float y)
3378+{
3379+ if (fabs(x) < fabs(y)) return x;
3380+ if (fabs(y) < fabs(x)) return y;
3381+ return fmin(x, y);
3382+}
3383+
3384+double __builtin_minmag(double x, double y)
3385+{
3386+ if (fabs(x) < fabs(y)) return x;
3387+ if (fabs(y) < fabs(x)) return y;
3388+ return fmin(x, y);
3389+}
3390+
3391+DEFINE_BUILTIN_V_VV(minmag)
3392
3393=== modified file 'lib/kernel/select.cl'
3394--- lib/kernel/select.cl 2011-10-27 01:35:56 +0000
3395+++ lib/kernel/select.cl 2011-10-31 17:03:23 +0000
3396@@ -26,9 +26,9 @@
3397 DEFINE_EXPR_G_GGG(select, c>=(gtype)0 ? a : b)
3398
3399 // This segfaults Clang 3.0, so we work around
3400-// DEFINE_EXPR_V_VVJ(select, c>=(jtype)0 ? a : b)
3401+// DEFINE_EXPR_V_VVJ(select, c ? b : a)
3402 DEFINE_EXPR_V_VVJ(select,
3403 ({
3404- jtype result = c>=(jtype)0 ? *(jtype*)&a : *(jtype*)&b;
3405+ jtype result = c ? *(jtype*)&b : *(jtype*)&a;
3406 *(vtype*)&result;
3407 }))
3408
3409=== modified file 'lib/kernel/sqrt.cl'
3410--- lib/kernel/sqrt.cl 2011-10-25 16:28:54 +0000
3411+++ lib/kernel/sqrt.cl 2011-10-31 17:03:23 +0000
3412@@ -21,102 +21,6 @@
3413 THE SOFTWARE.
3414 */
3415
3416-float __attribute__ ((overloadable))
3417-sqrt(float a)
3418-{
3419- return __builtin_sqrtf(a);
3420-}
3421-
3422-float2 __attribute__ ((overloadable))
3423-sqrt(float2 a)
3424-{
3425-#ifdef __SSE__
3426- return ((float4)sqrt(*(float4*)&a)).s01;
3427-#else
3428- return (float2)(sqrt(a.lo), sqrt(a.hi));
3429-#endif
3430-}
3431-
3432-float3 __attribute__ ((overloadable))
3433-sqrt(float3 a)
3434-{
3435-#ifdef __SSE__
3436- return ((float4)sqrt(*(float4*)&a)).s012;
3437-#else
3438- return (float3)(sqrt(a.s01), sqrt(a.s2));
3439-#endif
3440-}
3441-
3442-float4 __attribute__ ((overloadable))
3443-sqrt(float4 a)
3444-{
3445-#ifdef __SSE__
3446- return __builtin_ia32_sqrtps(a);
3447-#else
3448- return (float4)(sqrt(a.lo), sqrt(a.hi));
3449-#endif
3450-}
3451-
3452-float8 __attribute__ ((overloadable))
3453-sqrt(float8 a)
3454-{
3455-#ifdef __AVX__
3456- return __builtin_ia32_sqrtps256(a);
3457-#else
3458- return (float8)(sqrt(a.lo), sqrt(a.hi));
3459-#endif
3460-}
3461-
3462-float16 __attribute__ ((overloadable))
3463-sqrt(float16 a)
3464-{
3465- return (float16)(sqrt(a.lo), sqrt(a.hi));
3466-}
3467-
3468-double __attribute__ ((overloadable))
3469-sqrt(double a)
3470-{
3471- return __builtin_sqrt(a);
3472-}
3473-
3474-double2 __attribute__ ((overloadable))
3475-sqrt(double2 a)
3476-{
3477-#ifdef __SSE2__
3478- return __builtin_ia32_sqrtpd(a);
3479-#else
3480- return (double2)(sqrt(a.lo), sqrt(a.hi));
3481-#endif
3482-}
3483-
3484-double3 __attribute__ ((overloadable))
3485-sqrt(double3 a)
3486-{
3487-#ifdef __AVX__
3488- return ((double4)sqrt(*(double4*)&a)).s012;
3489-#else
3490- return (double3)(sqrt(a.s01), sqrt(a.s2));
3491-#endif
3492-}
3493-
3494-double4 __attribute__ ((overloadable))
3495-sqrt(double4 a)
3496-{
3497-#ifdef __AVX__
3498- return __builtin_ia32_pd256(a);
3499-#else
3500- return (double4)(sqrt(a.lo), sqrt(a.hi));
3501-#endif
3502-}
3503-
3504-double8 __attribute__ ((overloadable))
3505-sqrt(double8 a)
3506-{
3507- return (double8)(sqrt(a.lo), sqrt(a.hi));
3508-}
3509-
3510-double16 __attribute__ ((overloadable))
3511-sqrt(double16 a)
3512-{
3513- return (double16)(sqrt(a.lo), sqrt(a.hi));
3514-}
3515+#include "templates.h"
3516+
3517+DEFINE_BUILTIN_V_V(sqrt)
3518
3519=== modified file 'lib/kernel/templates.h'
3520--- lib/kernel/templates.h 2011-10-27 01:35:56 +0000
3521+++ lib/kernel/templates.h 2011-10-31 17:03:23 +0000
3522@@ -24,18 +24,18 @@
3523
3524
3525 #define IMPLEMENT_BUILTIN_V_V(NAME, VTYPE, LO, HI) \
3526- VTYPE __attribute__ ((overloadable)) \
3527+ VTYPE _cl_overloadable \
3528 NAME(VTYPE a) \
3529 { \
3530 return (VTYPE)(NAME(a.LO), NAME(a.HI)); \
3531 }
3532 #define DEFINE_BUILTIN_V_V(NAME) \
3533- float __attribute__ ((overloadable)) \
3534+ float _cl_overloadable \
3535 NAME(float a) \
3536 { \
3537 return __builtin_##NAME##f(a); \
3538 } \
3539- double __attribute__ ((overloadable)) \
3540+ double _cl_overloadable \
3541 NAME(double a) \
3542 { \
3543 return __builtin_##NAME(a); \
3544@@ -52,18 +52,18 @@
3545 IMPLEMENT_BUILTIN_V_V(NAME, double16, lo, hi)
3546
3547 #define IMPLEMENT_BUILTIN_V_VV(NAME, VTYPE, LO, HI) \
3548- VTYPE __attribute__ ((overloadable)) \
3549+ VTYPE _cl_overloadable \
3550 NAME(VTYPE a, VTYPE b) \
3551 { \
3552 return (VTYPE)(NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \
3553 }
3554 #define DEFINE_BUILTIN_V_VV(NAME) \
3555- float __attribute__ ((overloadable)) \
3556+ float _cl_overloadable \
3557 NAME(float a, float b) \
3558 { \
3559 return __builtin_##NAME##f(a, b); \
3560 } \
3561- double __attribute__ ((overloadable)) \
3562+ double _cl_overloadable \
3563 NAME(double a, double b) \
3564 { \
3565 return __builtin_##NAME(a, b); \
3566@@ -80,18 +80,18 @@
3567 IMPLEMENT_BUILTIN_V_VV(NAME, double16, lo, hi)
3568
3569 #define IMPLEMENT_BUILTIN_V_VVV(NAME, VTYPE, LO, HI) \
3570- VTYPE __attribute__ ((overloadable)) \
3571+ VTYPE _cl_overloadable \
3572 NAME(VTYPE a, VTYPE b, VTYPE c) \
3573 { \
3574 return (VTYPE)(NAME(a.LO, b.LO, c.LO), NAME(a.HI, b.HI, c.HI)); \
3575 }
3576 #define DEFINE_BUILTIN_V_VVV(NAME) \
3577- float __attribute__ ((overloadable)) \
3578+ float _cl_overloadable \
3579 NAME(float a, float b, float c) \
3580 { \
3581 return __builtin_##NAME##f(a, b, c); \
3582 } \
3583- double __attribute__ ((overloadable)) \
3584+ double _cl_overloadable \
3585 NAME(double a, double b, double c) \
3586 { \
3587 return __builtin_##NAME(a, b, c); \
3588@@ -108,74 +108,86 @@
3589 IMPLEMENT_BUILTIN_V_VVV(NAME, double16, lo, hi)
3590
3591 #define IMPLEMENT_BUILTIN_V_U(NAME, VTYPE, UTYPE, LO, HI) \
3592- VTYPE __attribute__ ((overloadable)) \
3593+ VTYPE _cl_overloadable \
3594 NAME(UTYPE a) \
3595 { \
3596 return (VTYPE)(NAME(a.LO), NAME(a.HI)); \
3597 }
3598-#define DEFINE_BUILTIN_V_U(NAME) \
3599- float __attribute__ ((overloadable)) \
3600- NAME(uint a) \
3601- { \
3602- return __builtin_##NAME##f(a); \
3603- } \
3604- double __attribute__ ((overloadable)) \
3605- NAME(ulong a) \
3606- { \
3607- return __builtin_##NAME(a); \
3608- } \
3609- IMPLEMENT_BUILTIN_V_U(NAME, float2 , uint2 , lo, hi) \
3610- IMPLEMENT_BUILTIN_V_U(NAME, float3 , uint3 , lo, s2) \
3611- IMPLEMENT_BUILTIN_V_U(NAME, float4 , uint4 , lo, hi) \
3612- IMPLEMENT_BUILTIN_V_U(NAME, float8 , uint8 , lo, hi) \
3613- IMPLEMENT_BUILTIN_V_U(NAME, float16 , uint16 , lo, hi) \
3614- IMPLEMENT_BUILTIN_V_U(NAME, double2 , ulong2 , lo, hi) \
3615- IMPLEMENT_BUILTIN_V_U(NAME, double3 , ulong3 , lo, s2) \
3616- IMPLEMENT_BUILTIN_V_U(NAME, double4 , ulong4 , lo, hi) \
3617- IMPLEMENT_BUILTIN_V_U(NAME, double8 , ulong8 , lo, hi) \
3618+#define DEFINE_BUILTIN_V_U(NAME) \
3619+ float _cl_overloadable \
3620+ NAME(uint a) \
3621+ { \
3622+ return __builtin_##NAME##f(a); \
3623+ } \
3624+ double _cl_overloadable \
3625+ NAME(ulong a) \
3626+ { \
3627+ return __builtin_##NAME(a); \
3628+ } \
3629+ IMPLEMENT_BUILTIN_V_U(NAME, float2 , uint2 , lo, hi) \
3630+ IMPLEMENT_BUILTIN_V_U(NAME, float3 , uint3 , lo, s2) \
3631+ IMPLEMENT_BUILTIN_V_U(NAME, float4 , uint4 , lo, hi) \
3632+ IMPLEMENT_BUILTIN_V_U(NAME, float8 , uint8 , lo, hi) \
3633+ IMPLEMENT_BUILTIN_V_U(NAME, float16 , uint16 , lo, hi) \
3634+ IMPLEMENT_BUILTIN_V_U(NAME, double2 , ulong2 , lo, hi) \
3635+ IMPLEMENT_BUILTIN_V_U(NAME, double3 , ulong3 , lo, s2) \
3636+ IMPLEMENT_BUILTIN_V_U(NAME, double4 , ulong4 , lo, hi) \
3637+ IMPLEMENT_BUILTIN_V_U(NAME, double8 , ulong8 , lo, hi) \
3638 IMPLEMENT_BUILTIN_V_U(NAME, double16, ulong16, lo, hi)
3639
3640-#define IMPLEMENT_BUILTIN_J_VV(NAME, VTYPE, JTYPE, LO, HI) \
3641- JTYPE __attribute__ ((overloadable)) \
3642- NAME(VTYPE a, VTYPE b) \
3643- { \
3644- return (JTYPE)(NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \
3645+#define IMPLEMENT_BUILTIN_J_VV(NAME, VTYPE, STYPE, JTYPE, LO, HI) \
3646+ JTYPE _cl_overloadable \
3647+ NAME(VTYPE a, VTYPE b) \
3648+ { \
3649+ if (sizeof(a.LO) == sizeof(STYPE)) { \
3650+ if (sizeof(a.HI) == sizeof(STYPE)) { \
3651+ return (JTYPE)(-NAME(a.LO, b.LO), -NAME(a.HI, b.HI)); \
3652+ } else { \
3653+ return (JTYPE)(-NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \
3654+ } \
3655+ } else { \
3656+ if (sizeof(a.HI) == sizeof(STYPE)) { \
3657+ return (JTYPE)( NAME(a.LO, b.LO), -NAME(a.HI, b.HI)); \
3658+ } else { \
3659+ return (JTYPE)( NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \
3660+ } \
3661+ } \
3662 }
3663-#define DEFINE_BUILTIN_J_VV(NAME) \
3664- int __attribute__ ((overloadable)) \
3665- NAME(float a, float b) \
3666- { \
3667- return __builtin_##NAME##f(a, b); \
3668- } \
3669- int __attribute__ ((overloadable)) \
3670- NAME(double a, double b) \
3671- { \
3672- return __builtin_##NAME(a, b); \
3673- } \
3674- IMPLEMENT_BUILTIN_J_VV(NAME, float2 , int2 , lo, hi) \
3675- IMPLEMENT_BUILTIN_J_VV(NAME, float3 , int3 , lo, s2) \
3676- IMPLEMENT_BUILTIN_J_VV(NAME, float4 , int4 , lo, hi) \
3677- IMPLEMENT_BUILTIN_J_VV(NAME, float8 , int8 , lo, hi) \
3678- IMPLEMENT_BUILTIN_J_VV(NAME, float16 , int16 , lo, hi) \
3679- IMPLEMENT_BUILTIN_J_VV(NAME, double2 , long2 , lo, hi) \
3680- IMPLEMENT_BUILTIN_J_VV(NAME, double3 , long3 , lo, s2) \
3681- IMPLEMENT_BUILTIN_J_VV(NAME, double4 , long4 , lo, hi) \
3682- IMPLEMENT_BUILTIN_J_VV(NAME, double8 , long8 , lo, hi) \
3683- IMPLEMENT_BUILTIN_J_VV(NAME, double16, long16, lo, hi)
3684+#define DEFINE_BUILTIN_J_VV(NAME) \
3685+ int _cl_overloadable \
3686+ NAME(float a, float b) \
3687+ { \
3688+ return __builtin_##NAME##f(a, b); \
3689+ } \
3690+ int _cl_overloadable \
3691+ NAME(double a, double b) \
3692+ { \
3693+ return __builtin_##NAME(a, b); \
3694+ } \
3695+ IMPLEMENT_BUILTIN_J_VV(NAME, float2 , float , int2 , lo, hi) \
3696+ IMPLEMENT_BUILTIN_J_VV(NAME, float3 , float , int3 , lo, s2) \
3697+ IMPLEMENT_BUILTIN_J_VV(NAME, float4 , float , int4 , lo, hi) \
3698+ IMPLEMENT_BUILTIN_J_VV(NAME, float8 , float , int8 , lo, hi) \
3699+ IMPLEMENT_BUILTIN_J_VV(NAME, float16 , float , int16 , lo, hi) \
3700+ IMPLEMENT_BUILTIN_J_VV(NAME, double2 , double, long2 , lo, hi) \
3701+ IMPLEMENT_BUILTIN_J_VV(NAME, double3 , double, long3 , lo, s2) \
3702+ IMPLEMENT_BUILTIN_J_VV(NAME, double4 , double, long4 , lo, hi) \
3703+ IMPLEMENT_BUILTIN_J_VV(NAME, double8 , double, long8 , lo, hi) \
3704+ IMPLEMENT_BUILTIN_J_VV(NAME, double16, double, long16, lo, hi)
3705
3706 #define IMPLEMENT_BUILTIN_V_VJ(NAME, VTYPE, JTYPE, LO, HI) \
3707- VTYPE __attribute__ ((overloadable)) \
3708+ VTYPE _cl_overloadable \
3709 NAME(VTYPE a, JTYPE b) \
3710 { \
3711 return (VTYPE)(NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \
3712 }
3713 #define DEFINE_BUILTIN_V_VJ(NAME) \
3714- float __attribute__ ((overloadable)) \
3715+ float _cl_overloadable \
3716 NAME(float a, int b) \
3717 { \
3718 return __builtin_##NAME##f(a, b); \
3719 } \
3720- double __attribute__ ((overloadable)) \
3721+ double _cl_overloadable \
3722 NAME(double a, int b) \
3723 { \
3724 return __builtin_##NAME(a, b); \
3725@@ -192,7 +204,7 @@
3726 IMPLEMENT_BUILTIN_V_VJ(NAME, double16, int16, lo, hi)
3727
3728 #define IMPLEMENT_BUILTIN_V_VI(NAME, VTYPE, ITYPE, LO, HI) \
3729- VTYPE __attribute__ ((overloadable)) \
3730+ VTYPE _cl_overloadable \
3731 NAME(VTYPE a, ITYPE b) \
3732 { \
3733 return (VTYPE)(NAME(a.LO, b), NAME(a.HI, b)); \
3734@@ -210,18 +222,18 @@
3735 IMPLEMENT_BUILTIN_V_VI(NAME, double16, int, lo, hi)
3736
3737 #define IMPLEMENT_BUILTIN_J_V(NAME, JTYPE, VTYPE, LO, HI) \
3738- JTYPE __attribute__ ((overloadable)) \
3739+ JTYPE _cl_overloadable \
3740 NAME(VTYPE a) \
3741 { \
3742 return (JTYPE)(NAME(a.LO), NAME(a.HI)); \
3743 }
3744 #define DEFINE_BUILTIN_J_V(NAME) \
3745- int __attribute__ ((overloadable)) \
3746+ int _cl_overloadable \
3747 NAME(float a) \
3748 { \
3749 return __builtin_##NAME##f(a); \
3750 } \
3751- int __attribute__ ((overloadable)) \
3752+ int _cl_overloadable \
3753 NAME(double a) \
3754 { \
3755 return __builtin_##NAME(a); \
3756@@ -239,30 +251,31 @@
3757
3758
3759
3760-#define IMPLEMENT_EXPR_V_V(NAME, EXPR, VTYPE, STYPE) \
3761- VTYPE __attribute__ ((overloadable)) \
3762- NAME(VTYPE a, VTYPE b) \
3763- { \
3764- typedef VTYPE vtype; \
3765- typedef STYPE stype; \
3766- return EXPR; \
3767+#define IMPLEMENT_EXPR_V_V(NAME, EXPR, VTYPE, STYPE, JTYPE) \
3768+ VTYPE _cl_overloadable \
3769+ NAME(VTYPE a) \
3770+ { \
3771+ typedef VTYPE vtype; \
3772+ typedef STYPE stype; \
3773+ typedef JTYPE jtype; \
3774+ return EXPR; \
3775 }
3776-#define DEFINE_EXPR_V_V(NAME, EXPR) \
3777- IMPLEMENT_EXPR_V_V(NAME, EXPR, float , float ) \
3778- IMPLEMENT_EXPR_V_V(NAME, EXPR, float2 , float ) \
3779- IMPLEMENT_EXPR_V_V(NAME, EXPR, float3 , float ) \
3780- IMPLEMENT_EXPR_V_V(NAME, EXPR, float4 , float ) \
3781- IMPLEMENT_EXPR_V_V(NAME, EXPR, float8 , float ) \
3782- IMPLEMENT_EXPR_V_V(NAME, EXPR, float16 , float ) \
3783- IMPLEMENT_EXPR_V_V(NAME, EXPR, double , double) \
3784- IMPLEMENT_EXPR_V_V(NAME, EXPR, double2 , double) \
3785- IMPLEMENT_EXPR_V_V(NAME, EXPR, double3 , double) \
3786- IMPLEMENT_EXPR_V_V(NAME, EXPR, double4 , double) \
3787- IMPLEMENT_EXPR_V_V(NAME, EXPR, double8 , double) \
3788- IMPLEMENT_EXPR_V_V(NAME, EXPR, double16, double)
3789+#define DEFINE_EXPR_V_V(NAME, EXPR) \
3790+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float , float , int ) \
3791+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float2 , float , int2 ) \
3792+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float3 , float , int3 ) \
3793+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float4 , float , int4 ) \
3794+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float8 , float , int8 ) \
3795+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float16 , float , int16 ) \
3796+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double , double, long ) \
3797+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double2 , double, long2 ) \
3798+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double3 , double, long3 ) \
3799+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double4 , double, long4 ) \
3800+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double8 , double, long8 ) \
3801+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double16, double, long16)
3802
3803 #define IMPLEMENT_EXPR_V_VV(NAME, EXPR, VTYPE, STYPE, JTYPE) \
3804- VTYPE __attribute__ ((overloadable)) \
3805+ VTYPE _cl_overloadable \
3806 NAME(VTYPE a, VTYPE b) \
3807 { \
3808 typedef VTYPE vtype; \
3809@@ -285,7 +298,7 @@
3810 IMPLEMENT_EXPR_V_VV(NAME, EXPR, double16, double, long16)
3811
3812 #define IMPLEMENT_EXPR_V_VVV(NAME, EXPR, VTYPE, STYPE, JTYPE) \
3813- VTYPE __attribute__ ((overloadable)) \
3814+ VTYPE _cl_overloadable \
3815 NAME(VTYPE a, VTYPE b, VTYPE c) \
3816 { \
3817 typedef VTYPE vtype; \
3818@@ -308,7 +321,7 @@
3819 IMPLEMENT_EXPR_V_VVV(NAME, EXPR, double16, double, long16)
3820
3821 #define IMPLEMENT_EXPR_S_VV(NAME, EXPR, VTYPE, STYPE, JTYPE) \
3822- STYPE __attribute__ ((overloadable)) \
3823+ STYPE _cl_overloadable \
3824 NAME(VTYPE a, VTYPE b) \
3825 { \
3826 typedef VTYPE vtype; \
3827@@ -331,7 +344,7 @@
3828 IMPLEMENT_EXPR_S_VV(NAME, EXPR, double16, double, long16)
3829
3830 #define IMPLEMENT_EXPR_V_VVS(NAME, EXPR, VTYPE, STYPE) \
3831- VTYPE __attribute__ ((overloadable)) \
3832+ VTYPE _cl_overloadable \
3833 NAME(VTYPE a, VTYPE b, STYPE c) \
3834 { \
3835 typedef VTYPE vtype; \
3836@@ -351,7 +364,7 @@
3837 IMPLEMENT_EXPR_V_VVS(NAME, EXPR, double16, double)
3838
3839 #define IMPLEMENT_EXPR_V_VSS(NAME, EXPR, VTYPE, STYPE) \
3840- VTYPE __attribute__ ((overloadable)) \
3841+ VTYPE _cl_overloadable \
3842 NAME(VTYPE a, STYPE b, STYPE c) \
3843 { \
3844 typedef VTYPE vtype; \
3845@@ -371,7 +384,7 @@
3846 IMPLEMENT_EXPR_V_VSS(NAME, EXPR, double16, double)
3847
3848 #define IMPLEMENT_EXPR_V_SSV(NAME, EXPR, VTYPE, STYPE) \
3849- VTYPE __attribute__ ((overloadable)) \
3850+ VTYPE _cl_overloadable \
3851 NAME(STYPE a, STYPE b, VTYPE c) \
3852 { \
3853 typedef VTYPE vtype; \
3854@@ -391,7 +404,7 @@
3855 IMPLEMENT_EXPR_V_SSV(NAME, EXPR, double16, double)
3856
3857 #define IMPLEMENT_EXPR_V_VVJ(NAME, EXPR, VTYPE, STYPE, JTYPE) \
3858- VTYPE __attribute__ ((overloadable)) \
3859+ VTYPE _cl_overloadable \
3860 NAME(VTYPE a, VTYPE b, JTYPE c) \
3861 { \
3862 typedef VTYPE vtype; \
3863@@ -414,7 +427,7 @@
3864 IMPLEMENT_EXPR_V_VVJ(NAME, EXPR, double16, double, long16)
3865
3866 #define IMPLEMENT_EXPR_V_U(NAME, EXPR, VTYPE, STYPE, UTYPE) \
3867- VTYPE __attribute__ ((overloadable)) \
3868+ VTYPE _cl_overloadable \
3869 NAME(UTYPE a) \
3870 { \
3871 typedef VTYPE vtype; \
3872@@ -437,7 +450,7 @@
3873 IMPLEMENT_EXPR_V_U(NAME, EXPR, double16, double, ulong16)
3874
3875 #define IMPLEMENT_EXPR_V_VS(NAME, EXPR, VTYPE, STYPE) \
3876- VTYPE __attribute__ ((overloadable)) \
3877+ VTYPE _cl_overloadable \
3878 NAME(VTYPE a, STYPE b) \
3879 { \
3880 typedef VTYPE vtype; \
3881@@ -457,7 +470,7 @@
3882 IMPLEMENT_EXPR_V_VS(NAME, EXPR, double16, double)
3883
3884 #define IMPLEMENT_EXPR_V_VJ(NAME, EXPR, VTYPE, STYPE, JTYPE) \
3885- VTYPE __attribute__ ((overloadable)) \
3886+ VTYPE _cl_overloadable \
3887 NAME(VTYPE a, JTYPE b) \
3888 { \
3889 typedef VTYPE vtype; \
3890@@ -480,7 +493,7 @@
3891 IMPLEMENT_EXPR_V_VJ(NAME, EXPR, double16, double, int16)
3892
3893 #define IMPLEMENT_EXPR_V_VI(NAME, EXPR, VTYPE, STYPE, ITYPE) \
3894- VTYPE __attribute__ ((overloadable)) \
3895+ VTYPE _cl_overloadable \
3896 NAME(VTYPE a, ITYPE b) \
3897 { \
3898 typedef VTYPE vtype; \
3899@@ -501,14 +514,14 @@
3900 IMPLEMENT_EXPR_V_VI(NAME, EXPR, double16, double, int)
3901
3902 #define IMPLEMENT_EXPR_V_VPV(NAME, EXPR, VTYPE, STYPE) \
3903- VTYPE __attribute__ ((overloadable)) \
3904+ VTYPE _cl_overloadable \
3905 NAME(VTYPE a, __global VTYPE *b) \
3906 { \
3907 typedef VTYPE vtype; \
3908 typedef STYPE stype; \
3909 return EXPR; \
3910 } \
3911- VTYPE __attribute__ ((overloadable)) \
3912+ VTYPE _cl_overloadable \
3913 NAME(VTYPE a, __local VTYPE *b) \
3914 { \
3915 typedef VTYPE vtype; \
3916@@ -516,7 +529,7 @@
3917 return EXPR; \
3918 } \
3919 /* __private is not supported yet \
3920- VTYPE __attribute__ ((overloadable)) \
3921+ VTYPE _cl_overloadable \
3922 NAME(VTYPE a, __private VTYPE *b) \
3923 { \
3924 typedef VTYPE vtype; \
3925@@ -539,7 +552,7 @@
3926 IMPLEMENT_EXPR_V_VPV(NAME, EXPR, double16, double)
3927
3928 #define IMPLEMENT_EXPR_V_SV(NAME, EXPR, VTYPE, STYPE) \
3929- VTYPE __attribute__ ((overloadable)) \
3930+ VTYPE _cl_overloadable \
3931 NAME(STYPE a, VTYPE b) \
3932 { \
3933 typedef VTYPE vtype; \
3934@@ -561,48 +574,48 @@
3935
3936
3937 #define IMPLEMENT_BUILTIN_G_G(NAME, GTYPE, UGTYPE, LO, HI) \
3938- GTYPE __attribute__ ((overloadable)) \
3939+ GTYPE _cl_overloadable \
3940 NAME(GTYPE a) \
3941 { \
3942 return (GTYPE)(NAME(a.LO), NAME(a.HI)); \
3943 }
3944 #define DEFINE_BUILTIN_G_G(NAME) \
3945- char __attribute__ ((overloadable)) \
3946+ char _cl_overloadable \
3947 NAME(char a) \
3948 { \
3949 return __builtin_##NAME##hh(a); \
3950 } \
3951- short __attribute__ ((overloadable)) \
3952+ short _cl_overloadable \
3953 NAME(short a) \
3954 { \
3955 return __builtin_##NAME##h(a); \
3956 } \
3957- int __attribute__ ((overloadable)) \
3958+ int _cl_overloadable \
3959 NAME(int a) \
3960 { \
3961 return __builtin_##NAME(a); \
3962 } \
3963- long __attribute__ ((overloadable)) \
3964+ long _cl_overloadable \
3965 NAME(long a) \
3966 { \
3967 return __builtin_##NAME##l(a); \
3968 } \
3969- uchar __attribute__ ((overloadable)) \
3970+ uchar _cl_overloadable \
3971 NAME(uchar a) \
3972 { \
3973 return __builtin_##NAME##uhh(a); \
3974 } \
3975- ushort __attribute__ ((overloadable)) \
3976+ ushort _cl_overloadable \
3977 NAME(ushort a) \
3978 { \
3979 return __builtin_##NAME##uh(a); \
3980 } \
3981- uint __attribute__ ((overloadable)) \
3982+ uint _cl_overloadable \
3983 NAME(uint a) \
3984 { \
3985 return __builtin_##NAME##u(a); \
3986 } \
3987- ulong __attribute__ ((overloadable)) \
3988+ ulong _cl_overloadable \
3989 NAME(ulong a) \
3990 { \
3991 return __builtin_##NAME##ul(a); \
3992@@ -649,48 +662,48 @@
3993 IMPLEMENT_BUILTIN_G_G(NAME, ulong16 , ulong16 , lo, hi)
3994
3995 #define IMPLEMENT_BUILTIN_UG_G(NAME, GTYPE, UGTYPE, LO, HI) \
3996- UGTYPE __attribute__ ((overloadable)) \
3997+ UGTYPE _cl_overloadable \
3998 NAME(GTYPE a) \
3999 { \
4000 return (UGTYPE)(NAME(a.LO), NAME(a.HI)); \
4001 }
4002 #define DEFINE_BUILTIN_UG_G(NAME) \
4003- uchar __attribute__ ((overloadable)) \
4004+ uchar _cl_overloadable \
4005 NAME(char a) \
4006 { \
4007 return __builtin_##NAME##h(a); \
4008 } \
4009- ushort __attribute__ ((overloadable)) \
4010+ ushort _cl_overloadable \
4011 NAME(short a) \
4012 { \
4013 return __builtin_##NAME##h(a); \
4014 } \
4015- uint __attribute__ ((overloadable)) \
4016+ uint _cl_overloadable \
4017 NAME(int a) \
4018 { \
4019 return __builtin_##NAME(a); \
4020 } \
4021- ulong __attribute__ ((overloadable)) \
4022+ ulong _cl_overloadable \
4023 NAME(long a) \
4024 { \
4025 return __builtin_##NAME##l(a); \
4026 } \
4027- uchar __attribute__ ((overloadable)) \
4028+ uchar _cl_overloadable \
4029 NAME(uchar a) \
4030 { \
4031 return __builtin_##NAME##uhh(a); \
4032 } \
4033- ushort __attribute__ ((overloadable)) \
4034+ ushort _cl_overloadable \
4035 NAME(ushort a) \
4036 { \
4037 return __builtin_##NAME##uh(a); \
4038 } \
4039- uint __attribute__ ((overloadable)) \
4040+ uint _cl_overloadable \
4041 NAME(uint a) \
4042 { \
4043 return __builtin_##NAME##u(a); \
4044 } \
4045- ulong __attribute__ ((overloadable)) \
4046+ ulong _cl_overloadable \
4047 NAME(ulong a) \
4048 { \
4049 return __builtin_##NAME##ul(a); \
4050@@ -739,7 +752,7 @@
4051
4052
4053 #define IMPLEMENT_EXPR_G_G(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \
4054- GTYPE __attribute__ ((overloadable)) \
4055+ GTYPE _cl_overloadable \
4056 NAME(GTYPE a) \
4057 { \
4058 typedef GTYPE gtype; \
4059@@ -799,7 +812,7 @@
4060 IMPLEMENT_EXPR_G_G(NAME, EXPR, ulong16 , ulong , ulong16 , ulong )
4061
4062 #define IMPLEMENT_EXPR_UG_G(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \
4063- UGTYPE __attribute__ ((overloadable)) \
4064+ UGTYPE _cl_overloadable \
4065 NAME(GTYPE a) \
4066 { \
4067 typedef GTYPE gtype; \
4068@@ -859,7 +872,7 @@
4069 IMPLEMENT_EXPR_UG_G(NAME, EXPR, ulong16 , ulong , ulong16 , ulong )
4070
4071 #define IMPLEMENT_EXPR_G_GG(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \
4072- GTYPE __attribute__ ((overloadable)) \
4073+ GTYPE _cl_overloadable \
4074 NAME(GTYPE a, GTYPE b) \
4075 { \
4076 typedef GTYPE gtype; \
4077@@ -918,7 +931,7 @@
4078 IMPLEMENT_EXPR_G_GG(NAME, EXPR, ulong8 , ulong , ulong8 , ulong ) \
4079 IMPLEMENT_EXPR_G_GG(NAME, EXPR, ulong16 , ulong , ulong16 , ulong )
4080 #define IMPLEMENT_EXPR_G_GGG(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \
4081- GTYPE __attribute__ ((overloadable)) \
4082+ GTYPE _cl_overloadable \
4083 NAME(GTYPE a, GTYPE b, GTYPE c) \
4084 { \
4085 typedef GTYPE gtype; \
4086@@ -978,7 +991,7 @@
4087 IMPLEMENT_EXPR_G_GGG(NAME, EXPR, ulong16 , ulong , ulong16 , ulong )
4088
4089 #define IMPLEMENT_EXPR_G_GS(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \
4090- GTYPE __attribute__ ((overloadable)) \
4091+ GTYPE _cl_overloadable \
4092 NAME(GTYPE a, SGTYPE b) \
4093 { \
4094 typedef GTYPE gtype; \
4095@@ -1030,7 +1043,7 @@
4096 IMPLEMENT_EXPR_G_GS(NAME, EXPR, ulong16 , ulong , ulong16 , ulong )
4097
4098 #define IMPLEMENT_EXPR_UG_GG(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \
4099- UGTYPE __attribute__ ((overloadable)) \
4100+ UGTYPE _cl_overloadable \
4101 NAME(GTYPE a, GTYPE b) \
4102 { \
4103 typedef GTYPE gtype; \
4104@@ -1090,7 +1103,7 @@
4105 IMPLEMENT_EXPR_UG_GG(NAME, EXPR, ulong16 , ulong , ulong16 , ulong )
4106
4107 #define IMPLEMENT_EXPR_LG_GUG(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, LGTYPE) \
4108- LGTYPE __attribute__ ((overloadable)) \
4109+ LGTYPE _cl_overloadable \
4110 NAME(GTYPE a, UGTYPE b) \
4111 { \
4112 typedef GTYPE gtype; \
4113@@ -1138,7 +1151,7 @@
4114 IMPLEMENT_EXPR_LG_GUG(NAME, EXPR, uint16 , uint , uint16 , ulong16 )
4115
4116 #define IMPLEMENT_EXPR_J_JJ(NAME, EXPR, JTYPE, SJTYPE, UJTYPE, SUJTYPE) \
4117- JTYPE __attribute__ ((overloadable)) \
4118+ JTYPE _cl_overloadable \
4119 NAME(JTYPE a, JTYPE b) \
4120 { \
4121 typedef JTYPE gtype; \
4122@@ -1161,7 +1174,7 @@
4123 IMPLEMENT_EXPR_J_JJ(NAME, EXPR, uint8 , uint , uint8 , uint ) \
4124 IMPLEMENT_EXPR_J_JJ(NAME, EXPR, uint16 , uint , uint16 , uint )
4125 #define IMPLEMENT_EXPR_J_JJJ(NAME, EXPR, JTYPE, SJTYPE, UJTYPE, SUJTYPE) \
4126- JTYPE __attribute__ ((overloadable)) \
4127+ JTYPE _cl_overloadable \
4128 NAME(JTYPE a, JTYPE b, JTYPE c) \
4129 { \
4130 typedef JTYPE gtype; \
4131
4132=== modified file 'lib/kernel/upsample.cl'
4133--- lib/kernel/upsample.cl 2011-10-26 19:49:23 +0000
4134+++ lib/kernel/upsample.cl 2011-10-31 17:03:23 +0000
4135@@ -25,7 +25,7 @@
4136 // convert_* function calls
4137
4138 #define IMPLEMENT_UPSAMPLE_LG_GUG(GTYPE, SGTYPE, UGTYPE, LGTYPE) \
4139- LGTYPE __attribute__ ((overloadable)) \
4140+ LGTYPE __attribute__ ((__overloadable__)) \
4141 upsample(GTYPE a, UGTYPE b) \
4142 { \
4143 int bits = CHAR_BIT * sizeof(SGTYPE); \
4144
4145=== added file 'lib/kernel/vload.cl'
4146--- lib/kernel/vload.cl 1970-01-01 00:00:00 +0000
4147+++ lib/kernel/vload.cl 2011-10-31 17:03:23 +0000
4148@@ -0,0 +1,106 @@
4149+/* OpenCL built-in library: vloa()
4150+
4151+ Copyright (c) 2011 Universidad Rey Juan Carlos
4152+
4153+ Permission is hereby granted, free of charge, to any person obtaining a copy
4154+ of this software and associated documentation files (the "Software"), to deal
4155+ in the Software without restriction, including without limitation the rights
4156+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
4157+ copies of the Software, and to permit persons to whom the Software is
4158+ furnished to do so, subject to the following conditions:
4159+
4160+ The above copyright notice and this permission notice shall be included in
4161+ all copies or substantial portions of the Software.
4162+
4163+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
4164+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
4165+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
4166+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
4167+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
4168+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
4169+ THE SOFTWARE.
4170+*/
4171+
4172+#include "templates.h"
4173+
4174+
4175+
4176+#define IMPLEMENT_VLOAD(TYPE, MOD) \
4177+ \
4178+ TYPE##2 __attribute__ ((__overloadable__)) \
4179+ vload2(size_t offset, const MOD TYPE *p) \
4180+ { \
4181+ return (TYPE##2)(p[offset*2], p[offset*2+1]); \
4182+ } \
4183+ \
4184+ TYPE##3 __attribute__ ((__overloadable__)) \
4185+ vload3(size_t offset, const MOD TYPE *p) \
4186+ { \
4187+ return (TYPE##3)(vload2(0, &p[offset*3]), p[offset*3+2]); \
4188+ } \
4189+ \
4190+ TYPE##4 __attribute__ ((__overloadable__)) \
4191+ vload4(size_t offset, const MOD TYPE *p) \
4192+ { \
4193+ return (TYPE##4)(vload2(0, &p[offset*4]), vload2(0, &p[offset*4+2])); \
4194+ } \
4195+ \
4196+ TYPE##8 __attribute__ ((__overloadable__)) \
4197+ vload8(size_t offset, const MOD TYPE *p) \
4198+ { \
4199+ return (TYPE##8)(vload4(0, &p[offset*8]), vload4(0, &p[offset*8+4])); \
4200+ } \
4201+ \
4202+ TYPE##16 __attribute__ ((__overloadable__)) \
4203+ vload16(size_t offset, const MOD TYPE *p) \
4204+ { \
4205+ return (TYPE##16)(vload8(0, &p[offset*16]), vload8(0, &p[offset*16+8])); \
4206+ }
4207+
4208+
4209+
4210+IMPLEMENT_VLOAD(char , __global)
4211+IMPLEMENT_VLOAD(short , __global)
4212+IMPLEMENT_VLOAD(int , __global)
4213+IMPLEMENT_VLOAD(long , __global)
4214+IMPLEMENT_VLOAD(uchar , __global)
4215+IMPLEMENT_VLOAD(ushort, __global)
4216+IMPLEMENT_VLOAD(uint , __global)
4217+IMPLEMENT_VLOAD(ulong , __global)
4218+IMPLEMENT_VLOAD(float , __global)
4219+IMPLEMENT_VLOAD(double, __global)
4220+
4221+IMPLEMENT_VLOAD(char , __local)
4222+IMPLEMENT_VLOAD(short , __local)
4223+IMPLEMENT_VLOAD(int , __local)
4224+IMPLEMENT_VLOAD(long , __local)
4225+IMPLEMENT_VLOAD(uchar , __local)
4226+IMPLEMENT_VLOAD(ushort, __local)
4227+IMPLEMENT_VLOAD(uint , __local)
4228+IMPLEMENT_VLOAD(ulong , __local)
4229+IMPLEMENT_VLOAD(float , __local)
4230+IMPLEMENT_VLOAD(double, __local)
4231+
4232+IMPLEMENT_VLOAD(char , __constant)
4233+IMPLEMENT_VLOAD(short , __constant)
4234+IMPLEMENT_VLOAD(int , __constant)
4235+IMPLEMENT_VLOAD(long , __constant)
4236+IMPLEMENT_VLOAD(uchar , __constant)
4237+IMPLEMENT_VLOAD(ushort, __constant)
4238+IMPLEMENT_VLOAD(uint , __constant)
4239+IMPLEMENT_VLOAD(ulong , __constant)
4240+IMPLEMENT_VLOAD(float , __constant)
4241+IMPLEMENT_VLOAD(double, __constant)
4242+
4243+/* __private is not supported yet
4244+IMPLEMENT_VLOAD(char , __private)
4245+IMPLEMENT_VLOAD(short , __private)
4246+IMPLEMENT_VLOAD(int , __private)
4247+IMPLEMENT_VLOAD(long , __private)
4248+IMPLEMENT_VLOAD(uchar , __private)
4249+IMPLEMENT_VLOAD(ushort, __private)
4250+IMPLEMENT_VLOAD(uint , __private)
4251+IMPLEMENT_VLOAD(ulong , __private)
4252+IMPLEMENT_VLOAD(float , __private)
4253+IMPLEMENT_VLOAD(double, __private)
4254+*/
4255
4256=== added file 'lib/kernel/vstore.cl'
4257--- lib/kernel/vstore.cl 1970-01-01 00:00:00 +0000
4258+++ lib/kernel/vstore.cl 2011-10-31 17:03:23 +0000
4259@@ -0,0 +1,100 @@
4260+/* OpenCL built-in library: vstore()
4261+
4262+ Copyright (c) 2011 Universidad Rey Juan Carlos
4263+
4264+ Permission is hereby granted, free of charge, to any person obtaining a copy
4265+ of this software and associated documentation files (the "Software"), to deal
4266+ in the Software without restriction, including without limitation the rights
4267+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
4268+ copies of the Software, and to permit persons to whom the Software is
4269+ furnished to do so, subject to the following conditions:
4270+
4271+ The above copyright notice and this permission notice shall be included in
4272+ all copies or substantial portions of the Software.
4273+
4274+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
4275+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
4276+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
4277+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
4278+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
4279+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
4280+ THE SOFTWARE.
4281+*/
4282+
4283+#include "templates.h"
4284+
4285+
4286+
4287+#define IMPLEMENT_VSTORE(TYPE, MOD) \
4288+ \
4289+ void __attribute__ ((__overloadable__)) \
4290+ vstore2(TYPE##2 data, size_t offset, MOD TYPE *p) \
4291+ { \
4292+ p[offset*2] = data.lo; \
4293+ p[offset*2+1] = data.hi; \
4294+ } \
4295+ \
4296+ void __attribute__ ((__overloadable__)) \
4297+ vstore3(TYPE##3 data, size_t offset, MOD TYPE *p) \
4298+ { \
4299+ vstore2(data.lo, 0, &p[offset*3]); \
4300+ p[offset*3+2] = data.s2; \
4301+ } \
4302+ \
4303+ void __attribute__ ((__overloadable__)) \
4304+ vstore4(TYPE##4 data, size_t offset, MOD TYPE *p) \
4305+ { \
4306+ vstore2(data.lo, 0, &p[offset*4]); \
4307+ vstore2(data.hi, 0, &p[offset*4+2]); \
4308+ } \
4309+ \
4310+ void __attribute__ ((__overloadable__)) \
4311+ vstore8(TYPE##8 data, size_t offset, MOD TYPE *p) \
4312+ { \
4313+ vstore4(data.lo, 0, &p[offset*8]); \
4314+ vstore4(data.hi, 0, &p[offset*8+4]); \
4315+ } \
4316+ \
4317+ void __attribute__ ((__overloadable__)) \
4318+ vstore16(TYPE##16 data, size_t offset, MOD TYPE *p) \
4319+ { \
4320+ vstore8(data.lo, 0, &p[offset*16]); \
4321+ vstore8(data.hi, 0, &p[offset*16+8]); \
4322+ }
4323+
4324+
4325+
4326+IMPLEMENT_VSTORE(char , __global)
4327+IMPLEMENT_VSTORE(short , __global)
4328+IMPLEMENT_VSTORE(int , __global)
4329+IMPLEMENT_VSTORE(long , __global)
4330+IMPLEMENT_VSTORE(uchar , __global)
4331+IMPLEMENT_VSTORE(ushort, __global)
4332+IMPLEMENT_VSTORE(uint , __global)
4333+IMPLEMENT_VSTORE(ulong , __global)
4334+IMPLEMENT_VSTORE(float , __global)
4335+IMPLEMENT_VSTORE(double, __global)
4336+
4337+IMPLEMENT_VSTORE(char , __local)
4338+IMPLEMENT_VSTORE(short , __local)
4339+IMPLEMENT_VSTORE(int , __local)
4340+IMPLEMENT_VSTORE(long , __local)
4341+IMPLEMENT_VSTORE(uchar , __local)
4342+IMPLEMENT_VSTORE(ushort, __local)
4343+IMPLEMENT_VSTORE(uint , __local)
4344+IMPLEMENT_VSTORE(ulong , __local)
4345+IMPLEMENT_VSTORE(float , __local)
4346+IMPLEMENT_VSTORE(double, __local)
4347+
4348+/* __private is not supported yet
4349+IMPLEMENT_VSTORE(char , __private)
4350+IMPLEMENT_VSTORE(short , __private)
4351+IMPLEMENT_VSTORE(int , __private)
4352+IMPLEMENT_VSTORE(long , __private)
4353+IMPLEMENT_VSTORE(uchar , __private)
4354+IMPLEMENT_VSTORE(ushort, __private)
4355+IMPLEMENT_VSTORE(uint , __private)
4356+IMPLEMENT_VSTORE(ulong , __private)
4357+IMPLEMENT_VSTORE(float , __private)
4358+IMPLEMENT_VSTORE(double, __private)
4359+*/
4360
4361=== added directory 'lib/kernel/x86'
4362=== added file 'lib/kernel/x86/Makefile.am'
4363--- lib/kernel/x86/Makefile.am 1970-01-01 00:00:00 +0000
4364+++ lib/kernel/x86/Makefile.am 2011-10-31 17:03:23 +0000
4365@@ -0,0 +1,169 @@
4366+# Process this file with automake to produce Makefile.in (in this,
4367+# and all subdirectories).
4368+# Makefile.am for pocl/lib/kernel/dummy.
4369+#
4370+# Copyright (c) 2011 Universidad Rey Juan Carlos
4371+#
4372+# Permission is hereby granted, free of charge, to any person obtaining a copy
4373+# of this software and associated documentation files (the "Software"), to deal
4374+# in the Software without restriction, including without limitation the rights
4375+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
4376+# copies of the Software, and to permit persons to whom the Software is
4377+# furnished to do so, subject to the following conditions:
4378+#
4379+# The above copyright notice and this permission notice shall be included in
4380+# all copies or substantial portions of the Software.
4381+#
4382+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
4383+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
4384+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
4385+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
4386+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
4387+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
4388+# THE SOFTWARE.
4389+
4390+targetpkglibdir = $(pkglibdir)/x86
4391+targetpkglib_LIBRARIES = libkernel.a
4392+
4393+vpath %.cl @srcdir@/..
4394+vpath %.c @srcdir@/..
4395+vpath %.ll @srcdir@/..
4396+
4397+libkernel_a_SOURCES = get_global_size.c \
4398+ get_global_id.c \
4399+ get_local_id.c \
4400+ get_num_groups.c \
4401+ get_group_id.c \
4402+ as_type.cl \
4403+ convert_type.cl \
4404+ acos.cl \
4405+ acosh.cl \
4406+ acospi.cl \
4407+ asin.cl \
4408+ asinh.cl \
4409+ asinpi.cl \
4410+ atan.cl \
4411+ atan2.cl \
4412+ atan2pi.cl \
4413+ atanh.cl \
4414+ atanpi.cl \
4415+ cbrt.cl \
4416+ ceil.cl \
4417+ copysign.cl \
4418+ cos.cl \
4419+ cosh.cl \
4420+ cospi.cl \
4421+ erfc.cl \
4422+ erf.cl \
4423+ exp.cl \
4424+ exp2.cl \
4425+ exp10.cl \
4426+ expm1.cl \
4427+ fabs.cl \
4428+ fdim.cl \
4429+ floor.cl \
4430+ fma.cl \
4431+ fmax.cl \
4432+ fmin.cl \
4433+ fmod.cl \
4434+ fract.cl \
4435+ hypot.cl \
4436+ ilogb.cl \
4437+ ldexp.cl \
4438+ lgamma.cl \
4439+ log.cl \
4440+ log2.cl \
4441+ log10.cl \
4442+ log1p.cl \
4443+ logb.cl \
4444+ mad.cl \
4445+ maxmag.cl \
4446+ minmag.cl \
4447+ nan.cl \
4448+ nextafter.cl \
4449+ pow.cl \
4450+ pown.cl \
4451+ powr.cl \
4452+ remainder.cl \
4453+ rint.cl \
4454+ rootn.cl \
4455+ round.cl \
4456+ rsqrt.cl \
4457+ sin.cl \
4458+ sinh.cl \
4459+ sinpi.cl \
4460+ sqrt.cl \
4461+ tan.cl \
4462+ tanh.cl \
4463+ tanpi.cl \
4464+ tgamma.cl \
4465+ trunc.cl \
4466+ abs.cl \
4467+ abs_diff.cl \
4468+ add_sat.cl \
4469+ hadd.cl \
4470+ rhadd.cl \
4471+ clamp.cl \
4472+ clz.cl \
4473+ mad_hi.cl \
4474+ mad_sat.cl \
4475+ max.cl \
4476+ min.cl \
4477+ mul_hi.cl \
4478+ rotate.cl \
4479+ sub_sat.cl \
4480+ upsample.cl \
4481+ mad24.cl \
4482+ mul24.cl \
4483+ degrees.cl \
4484+ mix.cl \
4485+ radians.cl \
4486+ step.cl \
4487+ smoothstep.cl \
4488+ sign.cl \
4489+ cross.cl \
4490+ dot.cl \
4491+ distance.cl \
4492+ length.cl \
4493+ normalize.cl \
4494+ fast_distance.cl \
4495+ fast_length.cl \
4496+ fast_normalize.cl \
4497+ isequal.cl \
4498+ isnotequal.cl \
4499+ isgreater.cl \
4500+ isgreaterequal.cl \
4501+ isless.cl \
4502+ islessequal.cl \
4503+ islessgreater.cl \
4504+ isfinite.cl \
4505+ isinf.cl \
4506+ isnan.cl \
4507+ isnormal.cl \
4508+ isordered.cl \
4509+ isunordered.cl \
4510+ signbit.cl \
4511+ any.cl \
4512+ all.cl \
4513+ bitselect.cl \
4514+ select.cl \
4515+ vload.cl \
4516+ vstore.cl
4517+
4518+libkernel_a_LIBADD = barrier.o
4519+EXTRA_DIST = barrier.ll
4520+
4521+RANLIB = `@LLVM_CONFIG@ --bindir`/llvm-ranlib
4522+AR = `@LLVM_CONFIG@ --bindir`/llvm-ar
4523+
4524+.cl.o:
4525+ $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/_kernel.h -o $@ $<
4526+
4527+.c.o:
4528+ $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/_kernel.h -o $@ $<
4529+
4530+.ll.o:
4531+ $(LLVM_AS) -o $@ $<
4532+
4533+$(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h
4534+$(libkernel_a_SOURCES:.cl=.o): $(top_srcdir)/include/_kernel.h
4535
4536=== added file 'lib/kernel/x86/ceil.cl'
4537--- lib/kernel/x86/ceil.cl 1970-01-01 00:00:00 +0000
4538+++ lib/kernel/x86/ceil.cl 2011-10-31 17:03:23 +0000
4539@@ -0,0 +1,149 @@
4540+/* OpenCL built-in library: ceil()
4541+
4542+ Copyright (c) 2011 Universidad Rey Juan Carlos
4543+
4544+ Permission is hereby granted, free of charge, to any person obtaining a copy
4545+ of this software and associated documentation files (the "Software"), to deal
4546+ in the Software without restriction, including without limitation the rights
4547+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
4548+ copies of the Software, and to permit persons to whom the Software is
4549+ furnished to do so, subject to the following conditions:
4550+
4551+ The above copyright notice and this permission notice shall be included in
4552+ all copies or substantial portions of the Software.
4553+
4554+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
4555+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
4556+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
4557+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
4558+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
4559+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
4560+ THE SOFTWARE.
4561+*/
4562+
4563+#define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \
4564+ TYPE _cl_overloadable NAME(TYPE a) \
4565+ { \
4566+ typedef TYPE type; \
4567+ return EXPR; \
4568+ }
4569+
4570+#define IMPLEMENT_UPCAST(NAME, TYPE, UPTYPE, LO) \
4571+ TYPE _cl_overloadable NAME(TYPE a) \
4572+ { \
4573+ return NAME(*(UPTYPE*)&a).LO; \
4574+ }
4575+
4576+#define IMPLEMENT_SPLIT(NAME, TYPE, LO, HI) \
4577+ TYPE _cl_overloadable NAME(TYPE a) \
4578+ { \
4579+ return (TYPE)(NAME(a.LO), NAME(a.HI)); \
4580+ }
4581+
4582+
4583+
4584+#define _MM_FROUND_TO_NEAREST_INT 0x00
4585+#define _MM_FROUND_TO_NEG_INF 0x01
4586+#define _MM_FROUND_TO_POS_INF 0x02
4587+#define _MM_FROUND_TO_ZERO 0x03
4588+#define _MM_FROUND_CUR_DIRECTION 0x04
4589+
4590+#define _MM_FROUND_RAISE_EXC 0x00
4591+#define _MM_FROUND_NO_EXC 0x08
4592+
4593+#define _MM_FROUND_NINT (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
4594+#define _MM_FROUND_FLOOR (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
4595+#define _MM_FROUND_CEIL (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
4596+#define _MM_FROUND_TRUNC (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
4597+#define _MM_FROUND_RINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
4598+#define _MM_FROUND_NEARBYINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC )
4599+
4600+
4601+
4602+#define IMPLEMENT_CEIL_DIRECT_FLOAT __builtin_ceilf(a)
4603+#define IMPLEMENT_CEIL_DIRECT_DOUBLE __builtin_ceil(a)
4604+// Using only a single asm operand leads to better code, since LLVM
4605+// doesn't seem to allocate input and output operands to the same
4606+// register
4607+#define IMPLEMENT_CEIL_SSE41_FLOAT \
4608+ ({ \
4609+ __asm__ ("roundss %[dst], %[dst], %[mode]" : \
4610+ [dst] "+x" (a) : \
4611+ [mode] "n" (_MM_FROUND_CEIL)); \
4612+ a; \
4613+ })
4614+#define IMPLEMENT_CEIL_SSE41_FLOAT4 \
4615+ ({ \
4616+ __asm__ ("roundps %[dst], %[dst], %[mode]" : \
4617+ [dst] "+x" (a) : \
4618+ [mode] "n" (_MM_FROUND_CEIL)); \
4619+ a; \
4620+ })
4621+#define IMPLEMENT_CEIL_AVX_FLOAT8 \
4622+ ({ \
4623+ __asm__ ("roundps256 %[dst], %[dst], %[mode]" : \
4624+ [dst] "+x" (a) : \
4625+ [mode] "n" (_MM_FROUND_CEIL)); \
4626+ a; \
4627+ })
4628+#define IMPLEMENT_CEIL_SSE41_DOUBLE \
4629+ ({ \
4630+ __asm__ ("roundsd %[dst], %[dst], %[mode]" : \
4631+ [dst] "+x" (a) : \
4632+ [mode] "n" (_MM_FROUND_CEIL)); \
4633+ a; \
4634+ })
4635+#define IMPLEMENT_CEIL_SSE41_DOUBLE2 \
4636+ ({ \
4637+ __asm__ ("roundpd %[dst], %[dst], %[mode]" : \
4638+ [dst] "+x" (a) : \
4639+ [mode] "n" (_MM_FROUND_CEIL)); \
4640+ a; \
4641+ })
4642+#define IMPLEMENT_CEIL_AVX_DOUBLE4 \
4643+ ({ \
4644+ __asm__ ("roundpd256 %[dst], %[dst], %[mode]" : \
4645+ [dst] "+x" (a) : \
4646+ [mode] "n" (_MM_FROUND_CEIL)); \
4647+ a; \
4648+ })
4649+
4650+
4651+
4652+#ifdef __SSE4_1__
4653+IMPLEMENT_DIRECT(ceil, float , IMPLEMENT_CEIL_SSE41_FLOAT)
4654+IMPLEMENT_UPCAST(ceil, float2 , float4, lo)
4655+IMPLEMENT_UPCAST(ceil, float3 , float4, s012)
4656+IMPLEMENT_DIRECT(ceil, float4 , IMPLEMENT_CEIL_SSE41_FLOAT4)
4657+# ifdef __AVX__
4658+IMPLEMENT_DIRECT(ceil, float8 , IMPLEMENT_CEIL_AVX_FLOAT8)
4659+# else
4660+IMPLEMENT_SPLIT (ceil, float8 , lo, hi)
4661+# endif
4662+#else
4663+IMPLEMENT_DIRECT(ceil, float , IMPLEMENT_CEIL_DIRECT_FLOAT)
4664+IMPLEMENT_SPLIT (ceil, float2 , lo, hi)
4665+IMPLEMENT_SPLIT (ceil, float3 , lo, s2)
4666+IMPLEMENT_SPLIT (ceil, float4 , lo, hi)
4667+IMPLEMENT_SPLIT (ceil, float8 , lo, hi)
4668+#endif
4669+IMPLEMENT_SPLIT (ceil, float16, lo, hi)
4670+
4671+#ifdef __SSE4_1__
4672+IMPLEMENT_DIRECT(ceil, double , IMPLEMENT_CEIL_SSE41_DOUBLE)
4673+IMPLEMENT_DIRECT(ceil, double2 , IMPLEMENT_CEIL_SSE41_DOUBLE2)
4674+# ifdef __AVX__
4675+IMPLEMENT_UPCAST(ceil, double3 , double4, s012)
4676+IMPLEMENT_DIRECT(ceil, double4 , IMPLEMENT_CEIL_AVX_DOUBLE4)
4677+# else
4678+IMPLEMENT_SPLIT (ceil, double3 , lo, s2)
4679+IMPLEMENT_SPLIT (ceil, double4 , lo, hi)
4680+# endif
4681+#else
4682+IMPLEMENT_DIRECT(ceil, double , IMPLEMENT_CEIL_DIRECT_DOUBLE)
4683+IMPLEMENT_SPLIT (ceil, double2 , lo, hi)
4684+IMPLEMENT_SPLIT (ceil, double3 , lo, s2)
4685+IMPLEMENT_SPLIT (ceil, double4 , lo, hi)
4686+#endif
4687+IMPLEMENT_SPLIT (ceil, double8 , lo, hi)
4688+IMPLEMENT_SPLIT (ceil, double16, lo, hi)
4689
4690=== added file 'lib/kernel/x86/copysign.cl'
4691--- lib/kernel/x86/copysign.cl 1970-01-01 00:00:00 +0000
4692+++ lib/kernel/x86/copysign.cl 2011-10-31 17:03:23 +0000
4693@@ -0,0 +1,169 @@
4694+/* OpenCL built-in library: copysign()
4695+
4696+ Copyright (c) 2011 Universidad Rey Juan Carlos
4697+
4698+ Permission is hereby granted, free of charge, to any person obtaining a copy
4699+ of this software and associated documentation files (the "Software"), to deal
4700+ in the Software without restriction, including without limitation the rights
4701+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
4702+ copies of the Software, and to permit persons to whom the Software is
4703+ furnished to do so, subject to the following conditions:
4704+
4705+ The above copyright notice and this permission notice shall be included in
4706+ all copies or substantial portions of the Software.
4707+
4708+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
4709+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
4710+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
4711+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
4712+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
4713+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
4714+ THE SOFTWARE.
4715+*/
4716+
4717+#if 0
4718+
4719+#include "../templates.h"
4720+
4721+// LLVM generates non-optimal code for this implementation
4722+DEFINE_EXPR_V_VV(copysign,
4723+ ({
4724+ int bits = CHAR_BIT * sizeof(stype);
4725+ jtype sign_mask = (jtype)1 << (jtype)(bits - 1);
4726+ jtype result = ((~sign_mask & *(jtype*)&a) |
4727+ ( sign_mask & *(jtype*)&b));
4728+ *(vtype*)&result;
4729+ }))
4730+
4731+#endif
4732+
4733+
4734+
4735+#define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \
4736+ TYPE _cl_overloadable NAME(TYPE a, TYPE b) \
4737+ { \
4738+ return EXPR; \
4739+ }
4740+
4741+#define IMPLEMENT_UPCAST(NAME, TYPE, UPTYPE, LO) \
4742+ TYPE _cl_overloadable NAME(TYPE a, TYPE b) \
4743+ { \
4744+ return NAME(*(UPTYPE*)&a, *(UPTYPE*)&b).LO; \
4745+ }
4746+
4747+#define IMPLEMENT_SPLIT(NAME, TYPE, LO, HI) \
4748+ TYPE _cl_overloadable NAME(TYPE a, TYPE b) \
4749+ { \
4750+ return (TYPE)(NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \
4751+ }
4752+
4753+
4754+
4755+#define IMPLEMENT_COPYSIGN_DIRECT \
4756+ ({ \
4757+ int bits = CHAR_BIT * sizeof(stype); \
4758+ jtype sign_mask = (jtype)1 << (jtype)(bits - 1); \
4759+ jtype result = (~sign_mask & *(jtype*)&a) | (sign_mask & *(jtype*)&b); \
4760+ *(vtype*)&result; \
4761+ })
4762+#define IMPLEMENT_COPYSIGN_SSE_FLOAT4 \
4763+ ({ \
4764+ uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \
4765+ __asm__ ("andps %[src], %[dst]" : \
4766+ [dst] "+x" (a) : \
4767+ [src] "x" (~sign_mask)); \
4768+ __asm__ ("andps %[src], %[dst]" : \
4769+ [dst] "+x" (b) : \
4770+ [src] "x" (sign_mask)); \
4771+ __asm__ ("orps %[src], %[dst]" : \
4772+ [dst] "+x" (a) : \
4773+ [src] "x" (b)); \
4774+ a; \
4775+ })
4776+#define IMPLEMENT_COPYSIGN_AVX_FLOAT8 \
4777+ ({ \
4778+ uint8 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, \
4779+ 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \
4780+ __asm__ ("andps256 %[src], %[dst]" : \
4781+ [dst] "+x" (a) : \
4782+ [src] "x" (~sign_mask)); \
4783+ __asm__ ("andps256 %[src], %[dst]" : \
4784+ [dst] "+x" (b) : \
4785+ [src] "x" (sign_mask)); \
4786+ __asm__ ("orps256 %[src], %[dst]" : \
4787+ [dst] "+x" (a) : \
4788+ [b] "x" (b)); \
4789+ a; \
4790+ })
4791+#define IMPLEMENT_COPYSIGN_SSE2_DOUBLE2 \
4792+ ({ \
4793+ ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL}; \
4794+ __asm__ ("andpd %[src], %[dst]" : \
4795+ [dst] "+x" (a) : \
4796+ [src] "x" (~sign_mask)); \
4797+ __asm__ ("andpd %[src], %[dst]" : \
4798+ [dst] "+x" (b) : \
4799+ [src] "x" (sign_mask)); \
4800+ __asm__ ("orpd %[src], %[dst]" : \
4801+ [dst] "+x" (a) : \
4802+ [src] "x" (b)); \
4803+ a; \
4804+ })
4805+#define IMPLEMENT_COPYSIGN_AVX_DOUBLE4 \
4806+ ({ \
4807+ ulong4 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL, \
4808+ 0x8000000000000000UL, 0x8000000000000000UL}; \
4809+ __asm__ ("andpd256 %[src], %[dst]" : \
4810+ [dst] "+x" (a) : \
4811+ [src] "x" (~sign_mask)); \
4812+ __asm__ ("andpd256 %[src], %[dst]" : \
4813+ [dst] "+x" (b) : \
4814+ [src] "x" (sign_mask)); \
4815+ __asm__ ("orpd256 %[src], %[dst]" : \
4816+ [dst] "+x" (a) : \
4817+ [src] "x" (b)); \
4818+ a; \
4819+ })
4820+
4821+
4822+
4823+#ifdef __SSE__
4824+IMPLEMENT_DIRECT(copysign, float , IMPLEMENT_COPYSIGN_SSE_FLOAT4)
4825+IMPLEMENT_UPCAST(copysign, float2 , float4, lo)
4826+IMPLEMENT_UPCAST(copysign, float3 , float4, s012)
4827+IMPLEMENT_DIRECT(copysign, float4 , IMPLEMENT_COPYSIGN_SSE_FLOAT4)
4828+# ifdef __AVX__
4829+IMPLEMENT_DIRECT(copysign, float8 , IMPLEMENT_COPYSIGN_AVX_FLOAT8)
4830+# else
4831+IMPLEMENT_SPLIT (copysign, float8 , lo, hi)
4832+# endif
4833+IMPLEMENT_SPLIT (copysign, float16, lo, hi)
4834+#else
4835+IMPLEMENT_DIRECT(copysign, float , IMPLEMENT_COPYSIGN_DIRECT)
4836+IMPLEMENT_DIRECT(copysign, float2 , IMPLEMENT_COPYSIGN_DIRECT)
4837+IMPLEMENT_DIRECT(copysign, float3 , IMPLEMENT_COPYSIGN_DIRECT)
4838+IMPLEMENT_DIRECT(copysign, float4 , IMPLEMENT_COPYSIGN_DIRECT)
4839+IMPLEMENT_DIRECT(copysign, float8 , IMPLEMENT_COPYSIGN_DIRECT)
4840+IMPLEMENT_DIRECT(copysign, float16, IMPLEMENT_COPYSIGN_DIRECT)
4841+#endif
4842+
4843+#ifdef __SSE2__
4844+IMPLEMENT_DIRECT(copysign, double , IMPLEMENT_COPYSIGN_SSE2_DOUBLE2)
4845+IMPLEMENT_DIRECT(copysign, double2 , IMPLEMENT_COPYSIGN_SSE2_DOUBLE2)
4846+# ifdef __AVX__
4847+IMPLEMENT_UPCAST(copysign, double3 , double4, s012)
4848+IMPLEMENT_DIRECT(copysign, double4 , IMPLEMENT_COPYSIGN_AVX_DOUBLE4)
4849+# else
4850+IMPLEMENT_SPLIT (copysign, double3 , lo, s2)
4851+IMPLEMENT_SPLIT (copysign, double4 , lo, hi)
4852+# endif
4853+IMPLEMENT_SPLIT (copysign, double8 , lo, hi)
4854+IMPLEMENT_SPLIT (copysign, double16, lo, hi)
4855+#else
4856+IMPLEMENT_DIRECT(copysign, double , IMPLEMENT_COPYSIGN_DIRECT)
4857+IMPLEMENT_DIRECT(copysign, double2 , IMPLEMENT_COPYSIGN_DIRECT)
4858+IMPLEMENT_DIRECT(copysign, double3 , IMPLEMENT_COPYSIGN_DIRECT)
4859+IMPLEMENT_DIRECT(copysign, double4 , IMPLEMENT_COPYSIGN_DIRECT)
4860+IMPLEMENT_DIRECT(copysign, double8 , IMPLEMENT_COPYSIGN_DIRECT)
4861+IMPLEMENT_DIRECT(copysign, double16, IMPLEMENT_COPYSIGN_DIRECT)
4862+#endif
4863
4864=== added file 'lib/kernel/x86/fabs.cl'
4865--- lib/kernel/x86/fabs.cl 1970-01-01 00:00:00 +0000
4866+++ lib/kernel/x86/fabs.cl 2011-10-31 17:03:23 +0000
4867@@ -0,0 +1,144 @@
4868+/* OpenCL built-in library: fabs()
4869+
4870+ Copyright (c) 2011 Universidad Rey Juan Carlos
4871+
4872+ Permission is hereby granted, free of charge, to any person obtaining a copy
4873+ of this software and associated documentation files (the "Software"), to deal
4874+ in the Software without restriction, including without limitation the rights
4875+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
4876+ copies of the Software, and to permit persons to whom the Software is
4877+ furnished to do so, subject to the following conditions:
4878+
4879+ The above copyright notice and this permission notice shall be included in
4880+ all copies or substantial portions of the Software.
4881+
4882+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
4883+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
4884+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
4885+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
4886+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
4887+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
4888+ THE SOFTWARE.
4889+*/
4890+
4891+#if 0
4892+
4893+#include "../templates.h"
4894+
4895+// LLVM generates non-optimal code for this implementation
4896+DEFINE_EXPR_V_V(fabs,
4897+ ({
4898+ int bits = CHAR_BIT * sizeof(stype);
4899+ jtype sign_mask = (jtype)1 << (jtype)(bits - 1);
4900+ jtype result = ~sign_mask & *(jtype*)&a;
4901+ *(vtype*)&result;
4902+ }))
4903+
4904+#endif
4905+
4906+
4907+
4908+#define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \
4909+ TYPE _cl_overloadable NAME(TYPE a) \
4910+ { \
4911+ return EXPR; \
4912+ }
4913+
4914+#define IMPLEMENT_UPCAST(NAME, TYPE, UPTYPE, LO) \
4915+ TYPE _cl_overloadable NAME(TYPE a) \
4916+ { \
4917+ return NAME(*(UPTYPE*)&a).LO; \
4918+ }
4919+
4920+#define IMPLEMENT_SPLIT(NAME, TYPE, LO, HI) \
4921+ TYPE _cl_overloadable NAME(TYPE a) \
4922+ { \
4923+ return (TYPE)(NAME(a.LO), NAME(a.HI)); \
4924+ }
4925+
4926+
4927+
4928+#define IMPLEMENT_FABS_DIRECT \
4929+ ({ \
4930+ int bits = CHAR_BIT * sizeof(stype); \
4931+ jtype sign_mask = (jtype)1 << (jtype)(bits - 1); \
4932+ jtype result = ~sign_mask & *(jtype*)&a; \
4933+ *(vtype*)&result; \
4934+ })
4935+#define IMPLEMENT_FABS_SSE_FLOAT4 \
4936+ ({ \
4937+ uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \
4938+ __asm__ ("andps %[src], %[dst]" : \
4939+ [dst] "+x" (a) : \
4940+ [src] "x" (~sign_mask)); \
4941+ a; \
4942+ })
4943+#define IMPLEMENT_FABS_AVX_FLOAT8 \
4944+ ({ \
4945+ uint8 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, \
4946+ 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \
4947+ __asm__ ("andps256 %[src], %[dst]" : \
4948+ [dst] "=x" (a) : \
4949+ "[dst]" (a), [src] "x" (~sign_mask)); \
4950+ a; \
4951+ })
4952+#define IMPLEMENT_FABS_SSE2_DOUBLE2 \
4953+ ({ \
4954+ ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL}; \
4955+ __asm__ ("andpd %[src], %[dst]" : \
4956+ [dst] "=x" (a) : \
4957+ "[dst]" (a), [src] "x" (~sign_mask)); \
4958+ a; \
4959+ })
4960+#define IMPLEMENT_FABS_AVX_DOUBLE4 \
4961+ ({ \
4962+ ulong4 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL, \
4963+ 0x8000000000000000UL, 0x8000000000000000UL}; \
4964+ __asm__ ("andpd256 %[src], %[dst]" : \
4965+ [dst] "=x" (a) : \
4966+ "[dst]" (a), [src] "x" (~sign_mask)); \
4967+ a; \
4968+ })
4969+
4970+
4971+
4972+#ifdef __SSE__
4973+IMPLEMENT_UPCAST(fabs, float , float2, lo)
4974+IMPLEMENT_UPCAST(fabs, float2 , float4, lo)
4975+IMPLEMENT_UPCAST(fabs, float3 , float4, s012)
4976+IMPLEMENT_DIRECT(fabs, float4 , IMPLEMENT_FABS_SSE_FLOAT4)
4977+# ifdef __AVX__
4978+IMPLEMENT_DIRECT(fabs, float8 , IMPLEMENT_FABS_AVX_FLOAT8)
4979+# else
4980+IMPLEMENT_SPLIT (fabs, float8 , lo, hi)
4981+# endif
4982+IMPLEMENT_SPLIT (fabs, float16, lo, hi)
4983+#else
4984+IMPLEMENT_DIRECT(fabs, float , IMPLEMENT_FABS_DIRECT)
4985+IMPLEMENT_DIRECT(fabs, float2 , IMPLEMENT_FABS_DIRECT)
4986+IMPLEMENT_DIRECT(fabs, float3 , IMPLEMENT_FABS_DIRECT)
4987+IMPLEMENT_DIRECT(fabs, float4 , IMPLEMENT_FABS_DIRECT)
4988+IMPLEMENT_DIRECT(fabs, float8 , IMPLEMENT_FABS_DIRECT)
4989+IMPLEMENT_DIRECT(fabs, float16, IMPLEMENT_FABS_DIRECT)
4990+#endif
4991+
4992+#ifdef __SSE2__
4993+IMPLEMENT_UPCAST(fabs, double , double2, lo)
4994+IMPLEMENT_DIRECT(fabs, double2 , IMPLEMENT_FABS_SSE2_DOUBLE2)
4995+# ifdef __AVX__
4996+IMPLEMENT_UPCAST(fabs, double3 , double4, s012)
4997+IMPLEMENT_DIRECT(fabs, double4 , IMPLEMENT_FABS_AVX_DOUBLE4)
4998+# else
4999+IMPLEMENT_SPLIT (fabs, double3 , lo, s2)
5000+IMPLEMENT_SPLIT (fabs, double4 , lo, hi)
The diff has been truncated for viewing.