Merge lp:~schnetter/pocl/main into lp:~pocl/pocl/trunk
- main
- Merge into trunk
Status: | Merged |
---|---|
Merge reported by: | Carlos Sánchez de La Lama |
Merged at revision: | not available |
Proposed branch: | lp:~schnetter/pocl/main |
Merge into: | lp:~pocl/pocl/trunk |
Diff against target: |
5915 lines (+3164/-1842) (has conflicts) 36 files modified
configure.ac (+7/-0) include/_kernel.h (+1048/-669) lib/kernel/Makefile.am (+9/-3) lib/kernel/all.cl (+68/-68) lib/kernel/any.cl (+68/-68) lib/kernel/as_type.cl (+1/-1) lib/kernel/ceil.cl (+3/-131) lib/kernel/convert_type.cl (+3/-3) lib/kernel/copysign.cl (+3/-107) lib/kernel/cross.cl (+4/-4) lib/kernel/dot.cl (+56/-56) lib/kernel/fabs.cl (+3/-108) lib/kernel/floor.cl (+3/-129) lib/kernel/fma.cl (+3/-1) lib/kernel/fmax.cl (+7/-129) lib/kernel/fmin.cl (+7/-129) lib/kernel/max.cl (+1/-1) lib/kernel/maxmag.cl (+15/-1) lib/kernel/min.cl (+2/-2) lib/kernel/minmag.cl (+15/-1) lib/kernel/select.cl (+2/-2) lib/kernel/sqrt.cl (+3/-99) lib/kernel/templates.h (+138/-125) lib/kernel/upsample.cl (+1/-1) lib/kernel/vload.cl (+106/-0) lib/kernel/vstore.cl (+100/-0) lib/kernel/x86/Makefile.am (+169/-0) lib/kernel/x86/ceil.cl (+149/-0) lib/kernel/x86/copysign.cl (+169/-0) lib/kernel/x86/fabs.cl (+144/-0) lib/kernel/x86/floor.cl (+149/-0) lib/kernel/x86/max.cl (+291/-0) lib/kernel/x86/min.cl (+291/-0) lib/kernel/x86/sqrt.cl (+122/-0) scripts/pocl-standalone.in (+2/-2) scripts/pocl-workgroup.in (+2/-2) Text conflict in configure.ac Text conflict in include/_kernel.h Text conflict in lib/kernel/Makefile.am |
To merge this branch: | bzr merge lp:~schnetter/pocl/main |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Carlos Sánchez de La Lama | Approve | ||
Pekka Jääskeläinen | Needs Fixing | ||
Erik Schnetter | Needs Resubmitting | ||
Review via email:
|
Commit message
Description of the change
I have separated x86-specific functions from generic implementations (based on libc), and have also corrected a few errors.
Note that I have also made the x86-specific version the default kernel; you may not want this. I don't know how to choose automatically.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Carlos Sánchez de La Lama (csanchezdll) wrote : | # |
- 72. By Erik Schnetter
-
Correct/Optimise some x86 functions
- 73. By Erik Schnetter
-
Merge
- 74. By Erik Schnetter
-
Correct error in fabs/copysign
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
I modified the implementation of max and min.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
Builds here now and make check is OK. Looks ok for merging to me.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
Well, actually. After merging to trunk, I get the same error Carlos got. Erik, please merge from trunk and fix the issue and I can then merge it to trunk.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
Very strange. There seem to be some pesky differences between different llvm versions.
Anyway, I tried to merge from the trunk yesterday, and found substantial problems because of the tce backend, which doesn't support long or double. I'll have to either comment out all run-time functions, or disable long and double if those are not available.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Carlos Sánchez de La Lama (csanchezdll) wrote : | # |
I am merging already, taking into account cl_khr_int64 and cl_khr_fp64 when declaring / implementing the functions so long/ulong and doubles are only used when supported. Hopefully ready in couple of hours.
Carlos
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Carlos Sánchez de La Lama (csanchezdll) wrote : | # |
Marged. Please be careful when adding new calls or modifying existing ones to take cl_khr_int64 (long and ulong support) and cl_khe_fp64 (double support) into account.
Carlos
- 75. By Erik Schnetter
-
Correct prototype of signbit.
Add optimised x86 implementation of signbit.
Preview Diff
1 | === modified file 'configure.ac' |
2 | --- configure.ac 2011-10-31 16:58:40 +0000 |
3 | +++ configure.ac 2011-10-31 17:03:23 +0000 |
4 | @@ -112,7 +112,12 @@ |
5 | lib/CL/Makefile |
6 | lib/llvmopencl/Makefile |
7 | lib/kernel/Makefile |
8 | +<<<<<<< TREE |
9 | lib/kernel/tce/Makefile |
10 | +======= |
11 | + lib/kernel/dummy/Makefile |
12 | + lib/kernel/x86/Makefile |
13 | +>>>>>>> MERGE-SOURCE |
14 | examples/Makefile |
15 | examples/example1/Makefile |
16 | examples/example2/Makefile |
17 | @@ -123,5 +128,7 @@ |
18 | scripts/Makefile |
19 | tests/Makefile |
20 | tests/atlocal]) |
21 | +# lib/kernel/bgp/Makefile |
22 | +# lib/kernel/ppc/Makefile |
23 | |
24 | AC_OUTPUT |
25 | |
26 | === modified file 'include/_kernel.h' |
27 | --- include/_kernel.h 2011-10-31 16:58:40 +0000 |
28 | +++ include/_kernel.h 2011-10-31 17:03:23 +0000 |
29 | @@ -46,6 +46,14 @@ |
30 | */ |
31 | #pragma OPENCL EXTENSION cl_khr_fp64: enable |
32 | |
33 | +<<<<<<< TREE |
34 | +======= |
35 | +#define __SSE4_1__ |
36 | + |
37 | + |
38 | +#ifndef __TCE__ |
39 | +//#define __kernel __attribute__ ((noinline)) |
40 | +>>>>>>> MERGE-SOURCE |
41 | #define __global __attribute__ ((address_space(3))) |
42 | #define __local __attribute__ ((address_space(4))) |
43 | #define __constant __attribute__ ((address_space(5))) |
44 | @@ -68,6 +76,83 @@ |
45 | typedef unsigned int uint; |
46 | typedef unsigned long ulong; |
47 | |
48 | +#if 0 |
49 | +/* 32 bit systems */ |
50 | +typedef uint size_t; |
51 | +typedef int ptrdiff_t; |
52 | +typedef int intptr_t; |
53 | +typedef uint uintptr_t; |
54 | +#else |
55 | +/* 64 bit systems */ |
56 | +typedef ulong size_t; |
57 | +typedef long ptrdiff_t; |
58 | +typedef long intptr_t; |
59 | +typedef ulong uintptr_t; |
60 | +#endif |
61 | + |
62 | +// We align the 3-vectors, so that their sizeof is correct. Is there a |
63 | +// better way? Should we also align the other vectors? |
64 | + |
65 | +typedef char char2 __attribute__((__ext_vector_type__(2))); |
66 | +typedef char char3 __attribute__((__ext_vector_type__(3), __aligned__(4))); |
67 | +typedef char char4 __attribute__((__ext_vector_type__(4))); |
68 | +typedef char char8 __attribute__((__ext_vector_type__(8))); |
69 | +typedef char char16 __attribute__((__ext_vector_type__(16))); |
70 | + |
71 | +typedef uchar uchar2 __attribute__((__ext_vector_type__(2))); |
72 | +typedef uchar uchar3 __attribute__((__ext_vector_type__(3), __aligned__(4))); |
73 | +typedef uchar uchar4 __attribute__((__ext_vector_type__(4))); |
74 | +typedef uchar uchar8 __attribute__((__ext_vector_type__(8))); |
75 | +typedef uchar uchar16 __attribute__((__ext_vector_type__(16))); |
76 | + |
77 | +typedef short short2 __attribute__((__ext_vector_type__(2))); |
78 | +typedef short short3 __attribute__((__ext_vector_type__(3), __aligned__(8))); |
79 | +typedef short short4 __attribute__((__ext_vector_type__(4))); |
80 | +typedef short short8 __attribute__((__ext_vector_type__(8))); |
81 | +typedef short short16 __attribute__((__ext_vector_type__(16))); |
82 | + |
83 | +typedef ushort ushort2 __attribute__((__ext_vector_type__(2))); |
84 | +typedef ushort ushort3 __attribute__((__ext_vector_type__(3), __aligned__(8))); |
85 | +typedef ushort ushort4 __attribute__((__ext_vector_type__(4))); |
86 | +typedef ushort ushort8 __attribute__((__ext_vector_type__(8))); |
87 | +typedef ushort ushort16 __attribute__((__ext_vector_type__(16))); |
88 | + |
89 | +typedef int int2 __attribute__((__ext_vector_type__(2))); |
90 | +typedef int int3 __attribute__((__ext_vector_type__(3), __aligned__(16))); |
91 | +typedef int int4 __attribute__((__ext_vector_type__(4))); |
92 | +typedef int int8 __attribute__((__ext_vector_type__(8))); |
93 | +typedef int int16 __attribute__((__ext_vector_type__(16))); |
94 | + |
95 | +typedef uint uint2 __attribute__((__ext_vector_type__(2))); |
96 | +typedef uint uint3 __attribute__((__ext_vector_type__(3), __aligned__(16))); |
97 | +typedef uint uint4 __attribute__((__ext_vector_type__(4))); |
98 | +typedef uint uint8 __attribute__((__ext_vector_type__(8))); |
99 | +typedef uint uint16 __attribute__((__ext_vector_type__(16))); |
100 | + |
101 | +typedef long long2 __attribute__((__ext_vector_type__(2))); |
102 | +typedef long long3 __attribute__((__ext_vector_type__(3), __aligned__(32))); |
103 | +typedef long long4 __attribute__((__ext_vector_type__(4))); |
104 | +typedef long long8 __attribute__((__ext_vector_type__(8))); |
105 | +typedef long long16 __attribute__((__ext_vector_type__(16))); |
106 | + |
107 | +typedef ulong ulong2 __attribute__((__ext_vector_type__(2))); |
108 | +typedef ulong ulong3 __attribute__((__ext_vector_type__(3), __aligned__(32))); |
109 | +typedef ulong ulong4 __attribute__((__ext_vector_type__(4))); |
110 | +typedef ulong ulong8 __attribute__((__ext_vector_type__(8))); |
111 | +typedef ulong ulong16 __attribute__((__ext_vector_type__(16))); |
112 | + |
113 | +typedef float float2 __attribute__((__ext_vector_type__(2))); |
114 | +typedef float float3 __attribute__((__ext_vector_type__(3), __aligned__(16))); |
115 | +typedef float float4 __attribute__((__ext_vector_type__(4))); |
116 | +typedef float float8 __attribute__((__ext_vector_type__(8))); |
117 | +typedef float float16 __attribute__((__ext_vector_type__(16))); |
118 | + |
119 | +typedef double double2 __attribute__((__ext_vector_type__(2))); |
120 | +typedef double double3 __attribute__((__ext_vector_type__(3), __aligned__(32))); |
121 | +typedef double double4 __attribute__((__ext_vector_type__(4))); |
122 | +typedef double double8 __attribute__((__ext_vector_type__(8))); |
123 | +typedef double double16 __attribute__((__ext_vector_type__(16))); |
124 | + |
125 | /* Ensure the data types have the right sizes */ |
126 | #define _cl_static_assert(_t, _x) typedef int ai##_t[(_x) ? 1 : -1] |
127 | _cl_static_assert(char , sizeof(char ) == 1); |
128 | @@ -83,6 +168,7 @@ |
129 | _cl_static_assert(float , sizeof(float ) == 4); |
130 | #ifdef cl_khr_fp64 |
131 | _cl_static_assert(double, sizeof(double) == 8); |
132 | +<<<<<<< TREE |
133 | #endif |
134 | |
135 | typedef char char2 __attribute__((ext_vector_type(2))); |
136 | @@ -144,26 +230,79 @@ |
137 | typedef double double4 __attribute__((ext_vector_type(4))); |
138 | typedef double double8 __attribute__((ext_vector_type(8))); |
139 | typedef double double16 __attribute__((ext_vector_type(16))); |
140 | +======= |
141 | +_cl_static_assert(size_t, sizeof(size_t) == sizeof(void*)); |
142 | + |
143 | +_cl_static_assert(char2 , sizeof(char2 ) == 2 *sizeof(char)); |
144 | +_cl_static_assert(char3 , sizeof(char3 ) == 4 *sizeof(char)); |
145 | +_cl_static_assert(char4 , sizeof(char4 ) == 4 *sizeof(char)); |
146 | +_cl_static_assert(char8 , sizeof(char8 ) == 8 *sizeof(char)); |
147 | +_cl_static_assert(char16, sizeof(char16) == 16*sizeof(char)); |
148 | + |
149 | +_cl_static_assert(uchar2 , sizeof(uchar2 ) == 2 *sizeof(uchar)); |
150 | +_cl_static_assert(uchar3 , sizeof(uchar3 ) == 4 *sizeof(uchar)); |
151 | +_cl_static_assert(uchar4 , sizeof(uchar4 ) == 4 *sizeof(uchar)); |
152 | +_cl_static_assert(uchar8 , sizeof(uchar8 ) == 8 *sizeof(uchar)); |
153 | +_cl_static_assert(uchar16, sizeof(uchar16) == 16*sizeof(uchar)); |
154 | + |
155 | +_cl_static_assert(short2 , sizeof(short2 ) == 2 *sizeof(short)); |
156 | +_cl_static_assert(short3 , sizeof(short3 ) == 4 *sizeof(short)); |
157 | +_cl_static_assert(short4 , sizeof(short4 ) == 4 *sizeof(short)); |
158 | +_cl_static_assert(short8 , sizeof(short8 ) == 8 *sizeof(short)); |
159 | +_cl_static_assert(short16, sizeof(short16) == 16*sizeof(short)); |
160 | + |
161 | +_cl_static_assert(ushort2 , sizeof(ushort2 ) == 2 *sizeof(ushort)); |
162 | +_cl_static_assert(ushort3 , sizeof(ushort3 ) == 4 *sizeof(ushort)); |
163 | +_cl_static_assert(ushort4 , sizeof(ushort4 ) == 4 *sizeof(ushort)); |
164 | +_cl_static_assert(ushort8 , sizeof(ushort8 ) == 8 *sizeof(ushort)); |
165 | +_cl_static_assert(ushort16, sizeof(ushort16) == 16*sizeof(ushort)); |
166 | + |
167 | +_cl_static_assert(int2 , sizeof(int2 ) == 2 *sizeof(int)); |
168 | +_cl_static_assert(int3 , sizeof(int3 ) == 4 *sizeof(int)); |
169 | +_cl_static_assert(int4 , sizeof(int4 ) == 4 *sizeof(int)); |
170 | +_cl_static_assert(int8 , sizeof(int8 ) == 8 *sizeof(int)); |
171 | +_cl_static_assert(int16, sizeof(int16) == 16*sizeof(int)); |
172 | + |
173 | +_cl_static_assert(uint2 , sizeof(uint2 ) == 2 *sizeof(uint)); |
174 | +_cl_static_assert(uint3 , sizeof(uint3 ) == 4 *sizeof(uint)); |
175 | +_cl_static_assert(uint4 , sizeof(uint4 ) == 4 *sizeof(uint)); |
176 | +_cl_static_assert(uint8 , sizeof(uint8 ) == 8 *sizeof(uint)); |
177 | +_cl_static_assert(uint16, sizeof(uint16) == 16*sizeof(uint)); |
178 | + |
179 | +_cl_static_assert(float2 , sizeof(float2 ) == 2 *sizeof(float)); |
180 | +_cl_static_assert(float3 , sizeof(float3 ) == 4 *sizeof(float)); |
181 | +_cl_static_assert(float4 , sizeof(float4 ) == 4 *sizeof(float)); |
182 | +_cl_static_assert(float8 , sizeof(float8 ) == 8 *sizeof(float)); |
183 | +_cl_static_assert(float16, sizeof(float16) == 16*sizeof(float)); |
184 | + |
185 | +_cl_static_assert(double2 , sizeof(double2 ) == 2 *sizeof(double)); |
186 | +_cl_static_assert(double3 , sizeof(double3 ) == 4 *sizeof(double)); |
187 | +_cl_static_assert(double4 , sizeof(double4 ) == 4 *sizeof(double)); |
188 | +_cl_static_assert(double8 , sizeof(double8 ) == 8 *sizeof(double)); |
189 | +_cl_static_assert(double16, sizeof(double16) == 16*sizeof(double)); |
190 | +>>>>>>> MERGE-SOURCE |
191 | |
192 | |
193 | |
194 | /* Conversion functions */ |
195 | |
196 | -#define _CL_DECLARE_AS_TYPE(SRC, DST) \ |
197 | - DST __attribute__ ((overloadable)) as_##DST(SRC a); |
198 | +#define _cl_overloadable __attribute__ ((__overloadable__)) |
199 | + |
200 | +#define _CL_DECLARE_AS_TYPE(SRC, DST) \ |
201 | + DST _cl_overloadable as_##DST(SRC a); |
202 | |
203 | /* 1 byte */ |
204 | -#define _CL_DECLARE_AS_TYPE_1(SRC) \ |
205 | - _CL_DECLARE_AS_TYPE(SRC, char) \ |
206 | +#define _CL_DECLARE_AS_TYPE_1(SRC) \ |
207 | + _CL_DECLARE_AS_TYPE(SRC, char) \ |
208 | _CL_DECLARE_AS_TYPE(SRC, uchar) |
209 | _CL_DECLARE_AS_TYPE_1(char) |
210 | _CL_DECLARE_AS_TYPE_1(uchar) |
211 | |
212 | /* 2 bytes */ |
213 | -#define _CL_DECLARE_AS_TYPE_2(SRC) \ |
214 | - _CL_DECLARE_AS_TYPE(SRC, char2) \ |
215 | - _CL_DECLARE_AS_TYPE(SRC, uchar2) \ |
216 | - _CL_DECLARE_AS_TYPE(SRC, short) \ |
217 | +#define _CL_DECLARE_AS_TYPE_2(SRC) \ |
218 | + _CL_DECLARE_AS_TYPE(SRC, char2) \ |
219 | + _CL_DECLARE_AS_TYPE(SRC, uchar2) \ |
220 | + _CL_DECLARE_AS_TYPE(SRC, short) \ |
221 | _CL_DECLARE_AS_TYPE(SRC, ushort) |
222 | _CL_DECLARE_AS_TYPE_2(char2) |
223 | _CL_DECLARE_AS_TYPE_2(uchar2) |
224 | @@ -171,13 +310,13 @@ |
225 | _CL_DECLARE_AS_TYPE_2(ushort) |
226 | |
227 | /* 4 bytes */ |
228 | -#define _CL_DECLARE_AS_TYPE_4(SRC) \ |
229 | - _CL_DECLARE_AS_TYPE(SRC, char4) \ |
230 | - _CL_DECLARE_AS_TYPE(SRC, uchar4) \ |
231 | - _CL_DECLARE_AS_TYPE(SRC, short2) \ |
232 | - _CL_DECLARE_AS_TYPE(SRC, ushort2) \ |
233 | - _CL_DECLARE_AS_TYPE(SRC, int) \ |
234 | - _CL_DECLARE_AS_TYPE(SRC, uint) \ |
235 | +#define _CL_DECLARE_AS_TYPE_4(SRC) \ |
236 | + _CL_DECLARE_AS_TYPE(SRC, char4) \ |
237 | + _CL_DECLARE_AS_TYPE(SRC, uchar4) \ |
238 | + _CL_DECLARE_AS_TYPE(SRC, short2) \ |
239 | + _CL_DECLARE_AS_TYPE(SRC, ushort2) \ |
240 | + _CL_DECLARE_AS_TYPE(SRC, int) \ |
241 | + _CL_DECLARE_AS_TYPE(SRC, uint) \ |
242 | _CL_DECLARE_AS_TYPE(SRC, float) |
243 | _CL_DECLARE_AS_TYPE_4(char4) |
244 | _CL_DECLARE_AS_TYPE_4(uchar4) |
245 | @@ -188,16 +327,16 @@ |
246 | _CL_DECLARE_AS_TYPE_4(float) |
247 | |
248 | /* 8 bytes */ |
249 | -#define _CL_DECLARE_AS_TYPE_8(SRC) \ |
250 | - _CL_DECLARE_AS_TYPE(SRC, char8) \ |
251 | - _CL_DECLARE_AS_TYPE(SRC, uchar8) \ |
252 | - _CL_DECLARE_AS_TYPE(SRC, short4) \ |
253 | - _CL_DECLARE_AS_TYPE(SRC, ushort4) \ |
254 | - _CL_DECLARE_AS_TYPE(SRC, int2) \ |
255 | - _CL_DECLARE_AS_TYPE(SRC, uint2) \ |
256 | - _CL_DECLARE_AS_TYPE(SRC, long) \ |
257 | - _CL_DECLARE_AS_TYPE(SRC, ulong) \ |
258 | - _CL_DECLARE_AS_TYPE(SRC, float2) \ |
259 | +#define _CL_DECLARE_AS_TYPE_8(SRC) \ |
260 | + _CL_DECLARE_AS_TYPE(SRC, char8) \ |
261 | + _CL_DECLARE_AS_TYPE(SRC, uchar8) \ |
262 | + _CL_DECLARE_AS_TYPE(SRC, short4) \ |
263 | + _CL_DECLARE_AS_TYPE(SRC, ushort4) \ |
264 | + _CL_DECLARE_AS_TYPE(SRC, int2) \ |
265 | + _CL_DECLARE_AS_TYPE(SRC, uint2) \ |
266 | + _CL_DECLARE_AS_TYPE(SRC, long) \ |
267 | + _CL_DECLARE_AS_TYPE(SRC, ulong) \ |
268 | + _CL_DECLARE_AS_TYPE(SRC, float2) \ |
269 | _CL_DECLARE_AS_TYPE(SRC, double) |
270 | _CL_DECLARE_AS_TYPE_8(char8) |
271 | _CL_DECLARE_AS_TYPE_8(uchar8) |
272 | @@ -211,16 +350,16 @@ |
273 | _CL_DECLARE_AS_TYPE_8(double) |
274 | |
275 | /* 16 bytes */ |
276 | -#define _CL_DECLARE_AS_TYPE_16(SRC) \ |
277 | - _CL_DECLARE_AS_TYPE(SRC, char16) \ |
278 | - _CL_DECLARE_AS_TYPE(SRC, uchar16) \ |
279 | - _CL_DECLARE_AS_TYPE(SRC, short8) \ |
280 | - _CL_DECLARE_AS_TYPE(SRC, ushort8) \ |
281 | - _CL_DECLARE_AS_TYPE(SRC, int4) \ |
282 | - _CL_DECLARE_AS_TYPE(SRC, uint4) \ |
283 | - _CL_DECLARE_AS_TYPE(SRC, long2) \ |
284 | - _CL_DECLARE_AS_TYPE(SRC, ulong2) \ |
285 | - _CL_DECLARE_AS_TYPE(SRC, float4) \ |
286 | +#define _CL_DECLARE_AS_TYPE_16(SRC) \ |
287 | + _CL_DECLARE_AS_TYPE(SRC, char16) \ |
288 | + _CL_DECLARE_AS_TYPE(SRC, uchar16) \ |
289 | + _CL_DECLARE_AS_TYPE(SRC, short8) \ |
290 | + _CL_DECLARE_AS_TYPE(SRC, ushort8) \ |
291 | + _CL_DECLARE_AS_TYPE(SRC, int4) \ |
292 | + _CL_DECLARE_AS_TYPE(SRC, uint4) \ |
293 | + _CL_DECLARE_AS_TYPE(SRC, long2) \ |
294 | + _CL_DECLARE_AS_TYPE(SRC, ulong2) \ |
295 | + _CL_DECLARE_AS_TYPE(SRC, float4) \ |
296 | _CL_DECLARE_AS_TYPE(SRC, double2) |
297 | _CL_DECLARE_AS_TYPE_16(char16) |
298 | _CL_DECLARE_AS_TYPE_16(uchar16) |
299 | @@ -234,14 +373,14 @@ |
300 | _CL_DECLARE_AS_TYPE_16(double2) |
301 | |
302 | /* 32 bytes */ |
303 | -#define _CL_DECLARE_AS_TYPE_32(SRC) \ |
304 | - _CL_DECLARE_AS_TYPE(SRC, short16) \ |
305 | - _CL_DECLARE_AS_TYPE(SRC, ushort16) \ |
306 | - _CL_DECLARE_AS_TYPE(SRC, int8) \ |
307 | - _CL_DECLARE_AS_TYPE(SRC, uint8) \ |
308 | - _CL_DECLARE_AS_TYPE(SRC, long4) \ |
309 | - _CL_DECLARE_AS_TYPE(SRC, ulong4) \ |
310 | - _CL_DECLARE_AS_TYPE(SRC, float8) \ |
311 | +#define _CL_DECLARE_AS_TYPE_32(SRC) \ |
312 | + _CL_DECLARE_AS_TYPE(SRC, short16) \ |
313 | + _CL_DECLARE_AS_TYPE(SRC, ushort16) \ |
314 | + _CL_DECLARE_AS_TYPE(SRC, int8) \ |
315 | + _CL_DECLARE_AS_TYPE(SRC, uint8) \ |
316 | + _CL_DECLARE_AS_TYPE(SRC, long4) \ |
317 | + _CL_DECLARE_AS_TYPE(SRC, ulong4) \ |
318 | + _CL_DECLARE_AS_TYPE(SRC, float8) \ |
319 | _CL_DECLARE_AS_TYPE(SRC, double4) |
320 | _CL_DECLARE_AS_TYPE_32(short16) |
321 | _CL_DECLARE_AS_TYPE_32(ushort16) |
322 | @@ -253,12 +392,12 @@ |
323 | _CL_DECLARE_AS_TYPE_32(double4) |
324 | |
325 | /* 64 bytes */ |
326 | -#define _CL_DECLARE_AS_TYPE_64(SRC) \ |
327 | - _CL_DECLARE_AS_TYPE(SRC, int16) \ |
328 | - _CL_DECLARE_AS_TYPE(SRC, uint16) \ |
329 | - _CL_DECLARE_AS_TYPE(SRC, long8) \ |
330 | - _CL_DECLARE_AS_TYPE(SRC, ulong8) \ |
331 | - _CL_DECLARE_AS_TYPE(SRC, float16) \ |
332 | +#define _CL_DECLARE_AS_TYPE_64(SRC) \ |
333 | + _CL_DECLARE_AS_TYPE(SRC, int16) \ |
334 | + _CL_DECLARE_AS_TYPE(SRC, uint16) \ |
335 | + _CL_DECLARE_AS_TYPE(SRC, long8) \ |
336 | + _CL_DECLARE_AS_TYPE(SRC, ulong8) \ |
337 | + _CL_DECLARE_AS_TYPE(SRC, float16) \ |
338 | _CL_DECLARE_AS_TYPE(SRC, double8) |
339 | _CL_DECLARE_AS_TYPE_64(int16) |
340 | _CL_DECLARE_AS_TYPE_64(uint16) |
341 | @@ -268,16 +407,16 @@ |
342 | _CL_DECLARE_AS_TYPE_64(double8) |
343 | |
344 | /* 128 bytes */ |
345 | -#define _CL_DECLARE_AS_TYPE_128(SRC) \ |
346 | - _CL_DECLARE_AS_TYPE(SRC, long16) \ |
347 | - _CL_DECLARE_AS_TYPE(SRC, ulong16) \ |
348 | +#define _CL_DECLARE_AS_TYPE_128(SRC) \ |
349 | + _CL_DECLARE_AS_TYPE(SRC, long16) \ |
350 | + _CL_DECLARE_AS_TYPE(SRC, ulong16) \ |
351 | _CL_DECLARE_AS_TYPE(SRC, double16) |
352 | _CL_DECLARE_AS_TYPE_128(long16) |
353 | _CL_DECLARE_AS_TYPE_128(ulong16) |
354 | _CL_DECLARE_AS_TYPE_128(double16) |
355 | |
356 | -#define _CL_DECLARE_CONVERT_TYPE(SRC, DST) \ |
357 | - DST __attribute__ ((overloadable)) convert_##DST(SRC a); |
358 | +#define _CL_DECLARE_CONVERT_TYPE(SRC, DST) \ |
359 | + DST _cl_overloadable convert_##DST(SRC a); |
360 | |
361 | /* 1 element */ |
362 | #define _CL_DECLARE_CONVERT_TYPE_1(SRC) \ |
363 | @@ -506,241 +645,241 @@ |
364 | * V: vector of float or double |
365 | */ |
366 | |
367 | -#define _CL_DECLARE_FUNC_V_V(NAME) \ |
368 | - float __attribute__ ((overloadable)) NAME(float ); \ |
369 | - float2 __attribute__ ((overloadable)) NAME(float2 ); \ |
370 | - float3 __attribute__ ((overloadable)) NAME(float3 ); \ |
371 | - float4 __attribute__ ((overloadable)) NAME(float4 ); \ |
372 | - float8 __attribute__ ((overloadable)) NAME(float8 ); \ |
373 | - float16 __attribute__ ((overloadable)) NAME(float16 ); \ |
374 | - double __attribute__ ((overloadable)) NAME(double ); \ |
375 | - double2 __attribute__ ((overloadable)) NAME(double2 ); \ |
376 | - double3 __attribute__ ((overloadable)) NAME(double3 ); \ |
377 | - double4 __attribute__ ((overloadable)) NAME(double4 ); \ |
378 | - double8 __attribute__ ((overloadable)) NAME(double8 ); \ |
379 | - double16 __attribute__ ((overloadable)) NAME(double16); |
380 | -#define _CL_DECLARE_FUNC_V_VV(NAME) \ |
381 | - float __attribute__ ((overloadable)) NAME(float , float ); \ |
382 | - float2 __attribute__ ((overloadable)) NAME(float2 , float2 ); \ |
383 | - float3 __attribute__ ((overloadable)) NAME(float3 , float3 ); \ |
384 | - float4 __attribute__ ((overloadable)) NAME(float4 , float4 ); \ |
385 | - float8 __attribute__ ((overloadable)) NAME(float8 , float8 ); \ |
386 | - float16 __attribute__ ((overloadable)) NAME(float16 , float16 ); \ |
387 | - double __attribute__ ((overloadable)) NAME(double , double ); \ |
388 | - double2 __attribute__ ((overloadable)) NAME(double2 , double2 ); \ |
389 | - double3 __attribute__ ((overloadable)) NAME(double3 , double3 ); \ |
390 | - double4 __attribute__ ((overloadable)) NAME(double4 , double4 ); \ |
391 | - double8 __attribute__ ((overloadable)) NAME(double8 , double8 ); \ |
392 | - double16 __attribute__ ((overloadable)) NAME(double16, double16); |
393 | -#define _CL_DECLARE_FUNC_V_VVV(NAME) \ |
394 | - float __attribute__ ((overloadable)) NAME(float , float , float ); \ |
395 | - float2 __attribute__ ((overloadable)) NAME(float2 , float2 , float2 ); \ |
396 | - float3 __attribute__ ((overloadable)) NAME(float3 , float3 , float3 ); \ |
397 | - float4 __attribute__ ((overloadable)) NAME(float4 , float4 , float4 ); \ |
398 | - float8 __attribute__ ((overloadable)) NAME(float8 , float8 , float8 ); \ |
399 | - float16 __attribute__ ((overloadable)) NAME(float16 , float16 , float16 ); \ |
400 | - double __attribute__ ((overloadable)) NAME(double , double , double ); \ |
401 | - double2 __attribute__ ((overloadable)) NAME(double2 , double2 , double2 ); \ |
402 | - double3 __attribute__ ((overloadable)) NAME(double3 , double3 , double3 ); \ |
403 | - double4 __attribute__ ((overloadable)) NAME(double4 , double4 , double4 ); \ |
404 | - double8 __attribute__ ((overloadable)) NAME(double8 , double8 , double8 ); \ |
405 | - double16 __attribute__ ((overloadable)) NAME(double16, double16, double16); |
406 | -#define _CL_DECLARE_FUNC_V_VVS(NAME) \ |
407 | - float2 __attribute__ ((overloadable)) NAME(float2 , float2 , float ); \ |
408 | - float3 __attribute__ ((overloadable)) NAME(float3 , float3 , float ); \ |
409 | - float4 __attribute__ ((overloadable)) NAME(float4 , float4 , float ); \ |
410 | - float8 __attribute__ ((overloadable)) NAME(float8 , float8 , float ); \ |
411 | - float16 __attribute__ ((overloadable)) NAME(float16 , float16 , float ); \ |
412 | - double2 __attribute__ ((overloadable)) NAME(double2 , double2 , double); \ |
413 | - double3 __attribute__ ((overloadable)) NAME(double3 , double3 , double); \ |
414 | - double4 __attribute__ ((overloadable)) NAME(double4 , double4 , double); \ |
415 | - double8 __attribute__ ((overloadable)) NAME(double8 , double8 , double); \ |
416 | - double16 __attribute__ ((overloadable)) NAME(double16, double16, double); |
417 | -#define _CL_DECLARE_FUNC_V_VSS(NAME) \ |
418 | - float2 __attribute__ ((overloadable)) NAME(float2 , float , float ); \ |
419 | - float3 __attribute__ ((overloadable)) NAME(float3 , float , float ); \ |
420 | - float4 __attribute__ ((overloadable)) NAME(float4 , float , float ); \ |
421 | - float8 __attribute__ ((overloadable)) NAME(float8 , float , float ); \ |
422 | - float16 __attribute__ ((overloadable)) NAME(float16 , float , float ); \ |
423 | - double2 __attribute__ ((overloadable)) NAME(double2 , double, double); \ |
424 | - double3 __attribute__ ((overloadable)) NAME(double3 , double, double); \ |
425 | - double4 __attribute__ ((overloadable)) NAME(double4 , double, double); \ |
426 | - double8 __attribute__ ((overloadable)) NAME(double8 , double, double); \ |
427 | - double16 __attribute__ ((overloadable)) NAME(double16, double, double); |
428 | -#define _CL_DECLARE_FUNC_V_SSV(NAME) \ |
429 | - float2 __attribute__ ((overloadable)) NAME(float , float , float2 ); \ |
430 | - float3 __attribute__ ((overloadable)) NAME(float , float , float3 ); \ |
431 | - float4 __attribute__ ((overloadable)) NAME(float , float , float4 ); \ |
432 | - float8 __attribute__ ((overloadable)) NAME(float , float , float8 ); \ |
433 | - float16 __attribute__ ((overloadable)) NAME(float , float , float16 ); \ |
434 | - double2 __attribute__ ((overloadable)) NAME(double, double, double2 ); \ |
435 | - double3 __attribute__ ((overloadable)) NAME(double, double, double3 ); \ |
436 | - double4 __attribute__ ((overloadable)) NAME(double, double, double4 ); \ |
437 | - double8 __attribute__ ((overloadable)) NAME(double, double, double8 ); \ |
438 | - double16 __attribute__ ((overloadable)) NAME(double, double, double16); |
439 | -#define _CL_DECLARE_FUNC_V_VVJ(NAME) \ |
440 | - float __attribute__ ((overloadable)) NAME(float , float , int ); \ |
441 | - float2 __attribute__ ((overloadable)) NAME(float2 , float2 , int2 ); \ |
442 | - float3 __attribute__ ((overloadable)) NAME(float3 , float3 , int3 ); \ |
443 | - float4 __attribute__ ((overloadable)) NAME(float4 , float4 , int4 ); \ |
444 | - float8 __attribute__ ((overloadable)) NAME(float8 , float8 , int8 ); \ |
445 | - float16 __attribute__ ((overloadable)) NAME(float16 , float16 , int16 ); \ |
446 | - double __attribute__ ((overloadable)) NAME(double , double , long ); \ |
447 | - double2 __attribute__ ((overloadable)) NAME(double2 , double2 , long2 ); \ |
448 | - double3 __attribute__ ((overloadable)) NAME(double3 , double3 , long3 ); \ |
449 | - double4 __attribute__ ((overloadable)) NAME(double4 , double4 , long4 ); \ |
450 | - double8 __attribute__ ((overloadable)) NAME(double8 , double8 , long8 ); \ |
451 | - double16 __attribute__ ((overloadable)) NAME(double16, double16, long16); |
452 | -#define _CL_DECLARE_FUNC_V_U(NAME) \ |
453 | - float __attribute__ ((overloadable)) NAME(uint ); \ |
454 | - float2 __attribute__ ((overloadable)) NAME(uint2 ); \ |
455 | - float3 __attribute__ ((overloadable)) NAME(uint3 ); \ |
456 | - float4 __attribute__ ((overloadable)) NAME(uint4 ); \ |
457 | - float8 __attribute__ ((overloadable)) NAME(uint8 ); \ |
458 | - float16 __attribute__ ((overloadable)) NAME(uint16 ); \ |
459 | - double __attribute__ ((overloadable)) NAME(ulong ); \ |
460 | - double2 __attribute__ ((overloadable)) NAME(ulong2 ); \ |
461 | - double3 __attribute__ ((overloadable)) NAME(ulong3 ); \ |
462 | - double4 __attribute__ ((overloadable)) NAME(ulong4 ); \ |
463 | - double8 __attribute__ ((overloadable)) NAME(ulong8 ); \ |
464 | - double16 __attribute__ ((overloadable)) NAME(ulong16); |
465 | -#define _CL_DECLARE_FUNC_V_VS(NAME) \ |
466 | - float2 __attribute__ ((overloadable)) NAME(float2 , float ); \ |
467 | - float3 __attribute__ ((overloadable)) NAME(float3 , float ); \ |
468 | - float4 __attribute__ ((overloadable)) NAME(float4 , float ); \ |
469 | - float8 __attribute__ ((overloadable)) NAME(float8 , float ); \ |
470 | - float16 __attribute__ ((overloadable)) NAME(float16 , float ); \ |
471 | - double2 __attribute__ ((overloadable)) NAME(double2 , double); \ |
472 | - double3 __attribute__ ((overloadable)) NAME(double3 , double); \ |
473 | - double4 __attribute__ ((overloadable)) NAME(double4 , double); \ |
474 | - double8 __attribute__ ((overloadable)) NAME(double8 , double); \ |
475 | - double16 __attribute__ ((overloadable)) NAME(double16, double); |
476 | -#define _CL_DECLARE_FUNC_V_VJ(NAME) \ |
477 | - float __attribute__ ((overloadable)) NAME(float , int ); \ |
478 | - float2 __attribute__ ((overloadable)) NAME(float2 , int2 ); \ |
479 | - float3 __attribute__ ((overloadable)) NAME(float3 , int3 ); \ |
480 | - float4 __attribute__ ((overloadable)) NAME(float4 , int4 ); \ |
481 | - float8 __attribute__ ((overloadable)) NAME(float8 , int8 ); \ |
482 | - float16 __attribute__ ((overloadable)) NAME(float16 , int16); \ |
483 | - double __attribute__ ((overloadable)) NAME(double , int ); \ |
484 | - double2 __attribute__ ((overloadable)) NAME(double2 , int2 ); \ |
485 | - double3 __attribute__ ((overloadable)) NAME(double3 , int3 ); \ |
486 | - double4 __attribute__ ((overloadable)) NAME(double4 , int4 ); \ |
487 | - double8 __attribute__ ((overloadable)) NAME(double8 , int8 ); \ |
488 | - double16 __attribute__ ((overloadable)) NAME(double16, int16); |
489 | -#define _CL_DECLARE_FUNC_J_VV(NAME) \ |
490 | - int __attribute__ ((overloadable)) NAME(float , float ); \ |
491 | - int2 __attribute__ ((overloadable)) NAME(float2 , float2 ); \ |
492 | - int3 __attribute__ ((overloadable)) NAME(float3 , float3 ); \ |
493 | - int4 __attribute__ ((overloadable)) NAME(float4 , float4 ); \ |
494 | - int8 __attribute__ ((overloadable)) NAME(float8 , float8 ); \ |
495 | - int16 __attribute__ ((overloadable)) NAME(float16 , float16 ); \ |
496 | - int __attribute__ ((overloadable)) NAME(double , double ); \ |
497 | - long2 __attribute__ ((overloadable)) NAME(double2 , double2 ); \ |
498 | - long3 __attribute__ ((overloadable)) NAME(double3 , double3 ); \ |
499 | - long4 __attribute__ ((overloadable)) NAME(double4 , double4 ); \ |
500 | - long8 __attribute__ ((overloadable)) NAME(double8 , double8 ); \ |
501 | - long16 __attribute__ ((overloadable)) NAME(double16, double16); |
502 | -#define _CL_DECLARE_FUNC_V_VI(NAME) \ |
503 | - float2 __attribute__ ((overloadable)) NAME(float2 , int); \ |
504 | - float3 __attribute__ ((overloadable)) NAME(float3 , int); \ |
505 | - float4 __attribute__ ((overloadable)) NAME(float4 , int); \ |
506 | - float8 __attribute__ ((overloadable)) NAME(float8 , int); \ |
507 | - float16 __attribute__ ((overloadable)) NAME(float16 , int); \ |
508 | - double2 __attribute__ ((overloadable)) NAME(double2 , int); \ |
509 | - double3 __attribute__ ((overloadable)) NAME(double3 , int); \ |
510 | - double4 __attribute__ ((overloadable)) NAME(double4 , int); \ |
511 | - double8 __attribute__ ((overloadable)) NAME(double8 , int); \ |
512 | - double16 __attribute__ ((overloadable)) NAME(double16, int); |
513 | +#define _CL_DECLARE_FUNC_V_V(NAME) \ |
514 | + float _cl_overloadable NAME(float ); \ |
515 | + float2 _cl_overloadable NAME(float2 ); \ |
516 | + float3 _cl_overloadable NAME(float3 ); \ |
517 | + float4 _cl_overloadable NAME(float4 ); \ |
518 | + float8 _cl_overloadable NAME(float8 ); \ |
519 | + float16 _cl_overloadable NAME(float16 ); \ |
520 | + double _cl_overloadable NAME(double ); \ |
521 | + double2 _cl_overloadable NAME(double2 ); \ |
522 | + double3 _cl_overloadable NAME(double3 ); \ |
523 | + double4 _cl_overloadable NAME(double4 ); \ |
524 | + double8 _cl_overloadable NAME(double8 ); \ |
525 | + double16 _cl_overloadable NAME(double16); |
526 | +#define _CL_DECLARE_FUNC_V_VV(NAME) \ |
527 | + float _cl_overloadable NAME(float , float ); \ |
528 | + float2 _cl_overloadable NAME(float2 , float2 ); \ |
529 | + float3 _cl_overloadable NAME(float3 , float3 ); \ |
530 | + float4 _cl_overloadable NAME(float4 , float4 ); \ |
531 | + float8 _cl_overloadable NAME(float8 , float8 ); \ |
532 | + float16 _cl_overloadable NAME(float16 , float16 ); \ |
533 | + double _cl_overloadable NAME(double , double ); \ |
534 | + double2 _cl_overloadable NAME(double2 , double2 ); \ |
535 | + double3 _cl_overloadable NAME(double3 , double3 ); \ |
536 | + double4 _cl_overloadable NAME(double4 , double4 ); \ |
537 | + double8 _cl_overloadable NAME(double8 , double8 ); \ |
538 | + double16 _cl_overloadable NAME(double16, double16); |
539 | +#define _CL_DECLARE_FUNC_V_VVV(NAME) \ |
540 | + float _cl_overloadable NAME(float , float , float ); \ |
541 | + float2 _cl_overloadable NAME(float2 , float2 , float2 ); \ |
542 | + float3 _cl_overloadable NAME(float3 , float3 , float3 ); \ |
543 | + float4 _cl_overloadable NAME(float4 , float4 , float4 ); \ |
544 | + float8 _cl_overloadable NAME(float8 , float8 , float8 ); \ |
545 | + float16 _cl_overloadable NAME(float16 , float16 , float16 ); \ |
546 | + double _cl_overloadable NAME(double , double , double ); \ |
547 | + double2 _cl_overloadable NAME(double2 , double2 , double2 ); \ |
548 | + double3 _cl_overloadable NAME(double3 , double3 , double3 ); \ |
549 | + double4 _cl_overloadable NAME(double4 , double4 , double4 ); \ |
550 | + double8 _cl_overloadable NAME(double8 , double8 , double8 ); \ |
551 | + double16 _cl_overloadable NAME(double16, double16, double16); |
552 | +#define _CL_DECLARE_FUNC_V_VVS(NAME) \ |
553 | + float2 _cl_overloadable NAME(float2 , float2 , float ); \ |
554 | + float3 _cl_overloadable NAME(float3 , float3 , float ); \ |
555 | + float4 _cl_overloadable NAME(float4 , float4 , float ); \ |
556 | + float8 _cl_overloadable NAME(float8 , float8 , float ); \ |
557 | + float16 _cl_overloadable NAME(float16 , float16 , float ); \ |
558 | + double2 _cl_overloadable NAME(double2 , double2 , double); \ |
559 | + double3 _cl_overloadable NAME(double3 , double3 , double); \ |
560 | + double4 _cl_overloadable NAME(double4 , double4 , double); \ |
561 | + double8 _cl_overloadable NAME(double8 , double8 , double); \ |
562 | + double16 _cl_overloadable NAME(double16, double16, double); |
563 | +#define _CL_DECLARE_FUNC_V_VSS(NAME) \ |
564 | + float2 _cl_overloadable NAME(float2 , float , float ); \ |
565 | + float3 _cl_overloadable NAME(float3 , float , float ); \ |
566 | + float4 _cl_overloadable NAME(float4 , float , float ); \ |
567 | + float8 _cl_overloadable NAME(float8 , float , float ); \ |
568 | + float16 _cl_overloadable NAME(float16 , float , float ); \ |
569 | + double2 _cl_overloadable NAME(double2 , double, double); \ |
570 | + double3 _cl_overloadable NAME(double3 , double, double); \ |
571 | + double4 _cl_overloadable NAME(double4 , double, double); \ |
572 | + double8 _cl_overloadable NAME(double8 , double, double); \ |
573 | + double16 _cl_overloadable NAME(double16, double, double); |
574 | +#define _CL_DECLARE_FUNC_V_SSV(NAME) \ |
575 | + float2 _cl_overloadable NAME(float , float , float2 ); \ |
576 | + float3 _cl_overloadable NAME(float , float , float3 ); \ |
577 | + float4 _cl_overloadable NAME(float , float , float4 ); \ |
578 | + float8 _cl_overloadable NAME(float , float , float8 ); \ |
579 | + float16 _cl_overloadable NAME(float , float , float16 ); \ |
580 | + double2 _cl_overloadable NAME(double, double, double2 ); \ |
581 | + double3 _cl_overloadable NAME(double, double, double3 ); \ |
582 | + double4 _cl_overloadable NAME(double, double, double4 ); \ |
583 | + double8 _cl_overloadable NAME(double, double, double8 ); \ |
584 | + double16 _cl_overloadable NAME(double, double, double16); |
585 | +#define _CL_DECLARE_FUNC_V_VVJ(NAME) \ |
586 | + float _cl_overloadable NAME(float , float , int ); \ |
587 | + float2 _cl_overloadable NAME(float2 , float2 , int2 ); \ |
588 | + float3 _cl_overloadable NAME(float3 , float3 , int3 ); \ |
589 | + float4 _cl_overloadable NAME(float4 , float4 , int4 ); \ |
590 | + float8 _cl_overloadable NAME(float8 , float8 , int8 ); \ |
591 | + float16 _cl_overloadable NAME(float16 , float16 , int16 ); \ |
592 | + double _cl_overloadable NAME(double , double , long ); \ |
593 | + double2 _cl_overloadable NAME(double2 , double2 , long2 ); \ |
594 | + double3 _cl_overloadable NAME(double3 , double3 , long3 ); \ |
595 | + double4 _cl_overloadable NAME(double4 , double4 , long4 ); \ |
596 | + double8 _cl_overloadable NAME(double8 , double8 , long8 ); \ |
597 | + double16 _cl_overloadable NAME(double16, double16, long16); |
598 | +#define _CL_DECLARE_FUNC_V_U(NAME) \ |
599 | + float _cl_overloadable NAME(uint ); \ |
600 | + float2 _cl_overloadable NAME(uint2 ); \ |
601 | + float3 _cl_overloadable NAME(uint3 ); \ |
602 | + float4 _cl_overloadable NAME(uint4 ); \ |
603 | + float8 _cl_overloadable NAME(uint8 ); \ |
604 | + float16 _cl_overloadable NAME(uint16 ); \ |
605 | + double _cl_overloadable NAME(ulong ); \ |
606 | + double2 _cl_overloadable NAME(ulong2 ); \ |
607 | + double3 _cl_overloadable NAME(ulong3 ); \ |
608 | + double4 _cl_overloadable NAME(ulong4 ); \ |
609 | + double8 _cl_overloadable NAME(ulong8 ); \ |
610 | + double16 _cl_overloadable NAME(ulong16); |
611 | +#define _CL_DECLARE_FUNC_V_VS(NAME) \ |
612 | + float2 _cl_overloadable NAME(float2 , float ); \ |
613 | + float3 _cl_overloadable NAME(float3 , float ); \ |
614 | + float4 _cl_overloadable NAME(float4 , float ); \ |
615 | + float8 _cl_overloadable NAME(float8 , float ); \ |
616 | + float16 _cl_overloadable NAME(float16 , float ); \ |
617 | + double2 _cl_overloadable NAME(double2 , double); \ |
618 | + double3 _cl_overloadable NAME(double3 , double); \ |
619 | + double4 _cl_overloadable NAME(double4 , double); \ |
620 | + double8 _cl_overloadable NAME(double8 , double); \ |
621 | + double16 _cl_overloadable NAME(double16, double); |
622 | +#define _CL_DECLARE_FUNC_V_VJ(NAME) \ |
623 | + float _cl_overloadable NAME(float , int ); \ |
624 | + float2 _cl_overloadable NAME(float2 , int2 ); \ |
625 | + float3 _cl_overloadable NAME(float3 , int3 ); \ |
626 | + float4 _cl_overloadable NAME(float4 , int4 ); \ |
627 | + float8 _cl_overloadable NAME(float8 , int8 ); \ |
628 | + float16 _cl_overloadable NAME(float16 , int16); \ |
629 | + double _cl_overloadable NAME(double , int ); \ |
630 | + double2 _cl_overloadable NAME(double2 , int2 ); \ |
631 | + double3 _cl_overloadable NAME(double3 , int3 ); \ |
632 | + double4 _cl_overloadable NAME(double4 , int4 ); \ |
633 | + double8 _cl_overloadable NAME(double8 , int8 ); \ |
634 | + double16 _cl_overloadable NAME(double16, int16); |
635 | +#define _CL_DECLARE_FUNC_J_VV(NAME) \ |
636 | + int _cl_overloadable NAME(float , float ); \ |
637 | + int2 _cl_overloadable NAME(float2 , float2 ); \ |
638 | + int3 _cl_overloadable NAME(float3 , float3 ); \ |
639 | + int4 _cl_overloadable NAME(float4 , float4 ); \ |
640 | + int8 _cl_overloadable NAME(float8 , float8 ); \ |
641 | + int16 _cl_overloadable NAME(float16 , float16 ); \ |
642 | + int _cl_overloadable NAME(double , double ); \ |
643 | + long2 _cl_overloadable NAME(double2 , double2 ); \ |
644 | + long3 _cl_overloadable NAME(double3 , double3 ); \ |
645 | + long4 _cl_overloadable NAME(double4 , double4 ); \ |
646 | + long8 _cl_overloadable NAME(double8 , double8 ); \ |
647 | + long16 _cl_overloadable NAME(double16, double16); |
648 | +#define _CL_DECLARE_FUNC_V_VI(NAME) \ |
649 | + float2 _cl_overloadable NAME(float2 , int); \ |
650 | + float3 _cl_overloadable NAME(float3 , int); \ |
651 | + float4 _cl_overloadable NAME(float4 , int); \ |
652 | + float8 _cl_overloadable NAME(float8 , int); \ |
653 | + float16 _cl_overloadable NAME(float16 , int); \ |
654 | + double2 _cl_overloadable NAME(double2 , int); \ |
655 | + double3 _cl_overloadable NAME(double3 , int); \ |
656 | + double4 _cl_overloadable NAME(double4 , int); \ |
657 | + double8 _cl_overloadable NAME(double8 , int); \ |
658 | + double16 _cl_overloadable NAME(double16, int); |
659 | #define _CL_DECLARE_FUNC_V_VPV(NAME) \ |
660 | - float __attribute__ ((overloadable)) NAME(float , __global float *); \ |
661 | - float2 __attribute__ ((overloadable)) NAME(float2 , __global float2 *); \ |
662 | - float3 __attribute__ ((overloadable)) NAME(float3 , __global float3 *); \ |
663 | - float4 __attribute__ ((overloadable)) NAME(float4 , __global float4 *); \ |
664 | - float8 __attribute__ ((overloadable)) NAME(float8 , __global float8 *); \ |
665 | - float16 __attribute__ ((overloadable)) NAME(float16 , __global float16 *); \ |
666 | - double __attribute__ ((overloadable)) NAME(double , __global double *); \ |
667 | - double2 __attribute__ ((overloadable)) NAME(double2 , __global double2 *); \ |
668 | - double3 __attribute__ ((overloadable)) NAME(double3 , __global double3 *); \ |
669 | - double4 __attribute__ ((overloadable)) NAME(double4 , __global double4 *); \ |
670 | - double8 __attribute__ ((overloadable)) NAME(double8 , __global double8 *); \ |
671 | - double16 __attribute__ ((overloadable)) NAME(double16, __global double16*); \ |
672 | - float __attribute__ ((overloadable)) NAME(float , __local float *); \ |
673 | - float2 __attribute__ ((overloadable)) NAME(float2 , __local float2 *); \ |
674 | - float3 __attribute__ ((overloadable)) NAME(float3 , __local float3 *); \ |
675 | - float4 __attribute__ ((overloadable)) NAME(float4 , __local float4 *); \ |
676 | - float8 __attribute__ ((overloadable)) NAME(float8 , __local float8 *); \ |
677 | - float16 __attribute__ ((overloadable)) NAME(float16 , __local float16 *); \ |
678 | - double __attribute__ ((overloadable)) NAME(double , __local double *); \ |
679 | - double2 __attribute__ ((overloadable)) NAME(double2 , __local double2 *); \ |
680 | - double3 __attribute__ ((overloadable)) NAME(double3 , __local double3 *); \ |
681 | - double4 __attribute__ ((overloadable)) NAME(double4 , __local double4 *); \ |
682 | - double8 __attribute__ ((overloadable)) NAME(double8 , __local double8 *); \ |
683 | - double16 __attribute__ ((overloadable)) NAME(double16, __local double16*); \ |
684 | + float _cl_overloadable NAME(float , __global float *); \ |
685 | + float2 _cl_overloadable NAME(float2 , __global float2 *); \ |
686 | + float3 _cl_overloadable NAME(float3 , __global float3 *); \ |
687 | + float4 _cl_overloadable NAME(float4 , __global float4 *); \ |
688 | + float8 _cl_overloadable NAME(float8 , __global float8 *); \ |
689 | + float16 _cl_overloadable NAME(float16 , __global float16 *); \ |
690 | + double _cl_overloadable NAME(double , __global double *); \ |
691 | + double2 _cl_overloadable NAME(double2 , __global double2 *); \ |
692 | + double3 _cl_overloadable NAME(double3 , __global double3 *); \ |
693 | + double4 _cl_overloadable NAME(double4 , __global double4 *); \ |
694 | + double8 _cl_overloadable NAME(double8 , __global double8 *); \ |
695 | + double16 _cl_overloadable NAME(double16, __global double16*); \ |
696 | + float _cl_overloadable NAME(float , __local float *); \ |
697 | + float2 _cl_overloadable NAME(float2 , __local float2 *); \ |
698 | + float3 _cl_overloadable NAME(float3 , __local float3 *); \ |
699 | + float4 _cl_overloadable NAME(float4 , __local float4 *); \ |
700 | + float8 _cl_overloadable NAME(float8 , __local float8 *); \ |
701 | + float16 _cl_overloadable NAME(float16 , __local float16 *); \ |
702 | + double _cl_overloadable NAME(double , __local double *); \ |
703 | + double2 _cl_overloadable NAME(double2 , __local double2 *); \ |
704 | + double3 _cl_overloadable NAME(double3 , __local double3 *); \ |
705 | + double4 _cl_overloadable NAME(double4 , __local double4 *); \ |
706 | + double8 _cl_overloadable NAME(double8 , __local double8 *); \ |
707 | + double16 _cl_overloadable NAME(double16, __local double16*); \ |
708 | /* __private is not supported yet \ |
709 | - float __attribute__ ((overloadable)) NAME(float , __private float *); \ |
710 | - float2 __attribute__ ((overloadable)) NAME(float2 , __private float2 *); \ |
711 | - float3 __attribute__ ((overloadable)) NAME(float3 , __private float3 *); \ |
712 | - float4 __attribute__ ((overloadable)) NAME(float4 , __private float4 *); \ |
713 | - float8 __attribute__ ((overloadable)) NAME(float8 , __private float8 *); \ |
714 | - float16 __attribute__ ((overloadable)) NAME(float16 , __private float16 *); \ |
715 | - double __attribute__ ((overloadable)) NAME(double , __private double *); \ |
716 | - double2 __attribute__ ((overloadable)) NAME(double2 , __private double2 *); \ |
717 | - double3 __attribute__ ((overloadable)) NAME(double3 , __private double3 *); \ |
718 | - double4 __attribute__ ((overloadable)) NAME(double4 , __private double4 *); \ |
719 | - double8 __attribute__ ((overloadable)) NAME(double8 , __private double8 *); \ |
720 | - double16 __attribute__ ((overloadable)) NAME(double16, __private double16*); \ |
721 | + float _cl_overloadable NAME(float , __private float *); \ |
722 | + float2 _cl_overloadable NAME(float2 , __private float2 *); \ |
723 | + float3 _cl_overloadable NAME(float3 , __private float3 *); \ |
724 | + float4 _cl_overloadable NAME(float4 , __private float4 *); \ |
725 | + float8 _cl_overloadable NAME(float8 , __private float8 *); \ |
726 | + float16 _cl_overloadable NAME(float16 , __private float16 *); \ |
727 | + double _cl_overloadable NAME(double , __private double *); \ |
728 | + double2 _cl_overloadable NAME(double2 , __private double2 *); \ |
729 | + double3 _cl_overloadable NAME(double3 , __private double3 *); \ |
730 | + double4 _cl_overloadable NAME(double4 , __private double4 *); \ |
731 | + double8 _cl_overloadable NAME(double8 , __private double8 *); \ |
732 | + double16 _cl_overloadable NAME(double16, __private double16*); \ |
733 | */ |
734 | -#define _CL_DECLARE_FUNC_V_SV(NAME) \ |
735 | - float2 __attribute__ ((overloadable)) NAME(float , float2 ); \ |
736 | - float3 __attribute__ ((overloadable)) NAME(float , float3 ); \ |
737 | - float4 __attribute__ ((overloadable)) NAME(float , float4 ); \ |
738 | - float8 __attribute__ ((overloadable)) NAME(float , float8 ); \ |
739 | - float16 __attribute__ ((overloadable)) NAME(float , float16 ); \ |
740 | - double2 __attribute__ ((overloadable)) NAME(double, double2 ); \ |
741 | - double3 __attribute__ ((overloadable)) NAME(double, double3 ); \ |
742 | - double4 __attribute__ ((overloadable)) NAME(double, double4 ); \ |
743 | - double8 __attribute__ ((overloadable)) NAME(double, double8 ); \ |
744 | - double16 __attribute__ ((overloadable)) NAME(double, double16); |
745 | -#define _CL_DECLARE_FUNC_J_V(NAME) \ |
746 | - int __attribute__ ((overloadable)) NAME(float ); \ |
747 | - int2 __attribute__ ((overloadable)) NAME(float2 ); \ |
748 | - int3 __attribute__ ((overloadable)) NAME(float3 ); \ |
749 | - int4 __attribute__ ((overloadable)) NAME(float4 ); \ |
750 | - int8 __attribute__ ((overloadable)) NAME(float8 ); \ |
751 | - int16 __attribute__ ((overloadable)) NAME(float16 ); \ |
752 | - int __attribute__ ((overloadable)) NAME(double ); \ |
753 | - int2 __attribute__ ((overloadable)) NAME(double2 ); \ |
754 | - int3 __attribute__ ((overloadable)) NAME(double3 ); \ |
755 | - int4 __attribute__ ((overloadable)) NAME(double4 ); \ |
756 | - int8 __attribute__ ((overloadable)) NAME(double8 ); \ |
757 | - int16 __attribute__ ((overloadable)) NAME(double16); |
758 | -#define _CL_DECLARE_FUNC_S_V(NAME) \ |
759 | - float __attribute__ ((overloadable)) NAME(float ); \ |
760 | - float __attribute__ ((overloadable)) NAME(float2 ); \ |
761 | - float __attribute__ ((overloadable)) NAME(float3 ); \ |
762 | - float __attribute__ ((overloadable)) NAME(float4 ); \ |
763 | - float __attribute__ ((overloadable)) NAME(float8 ); \ |
764 | - float __attribute__ ((overloadable)) NAME(float16 ); \ |
765 | - double __attribute__ ((overloadable)) NAME(double ); \ |
766 | - double __attribute__ ((overloadable)) NAME(double2 ); \ |
767 | - double __attribute__ ((overloadable)) NAME(double3 ); \ |
768 | - double __attribute__ ((overloadable)) NAME(double4 ); \ |
769 | - double __attribute__ ((overloadable)) NAME(double8 ); \ |
770 | - double __attribute__ ((overloadable)) NAME(double16); |
771 | -#define _CL_DECLARE_FUNC_S_VV(NAME) \ |
772 | - float __attribute__ ((overloadable)) NAME(float , float ); \ |
773 | - float __attribute__ ((overloadable)) NAME(float2 , float2 ); \ |
774 | - float __attribute__ ((overloadable)) NAME(float3 , float3 ); \ |
775 | - float __attribute__ ((overloadable)) NAME(float4 , float4 ); \ |
776 | - float __attribute__ ((overloadable)) NAME(float8 , float8 ); \ |
777 | - float __attribute__ ((overloadable)) NAME(float16 , float16 ); \ |
778 | - double __attribute__ ((overloadable)) NAME(double , double ); \ |
779 | - double __attribute__ ((overloadable)) NAME(double2 , double2 ); \ |
780 | - double __attribute__ ((overloadable)) NAME(double3 , double3 ); \ |
781 | - double __attribute__ ((overloadable)) NAME(double4 , double4 ); \ |
782 | - double __attribute__ ((overloadable)) NAME(double8 , double8 ); \ |
783 | - double __attribute__ ((overloadable)) NAME(double16, double16); |
784 | +#define _CL_DECLARE_FUNC_V_SV(NAME) \ |
785 | + float2 _cl_overloadable NAME(float , float2 ); \ |
786 | + float3 _cl_overloadable NAME(float , float3 ); \ |
787 | + float4 _cl_overloadable NAME(float , float4 ); \ |
788 | + float8 _cl_overloadable NAME(float , float8 ); \ |
789 | + float16 _cl_overloadable NAME(float , float16 ); \ |
790 | + double2 _cl_overloadable NAME(double, double2 ); \ |
791 | + double3 _cl_overloadable NAME(double, double3 ); \ |
792 | + double4 _cl_overloadable NAME(double, double4 ); \ |
793 | + double8 _cl_overloadable NAME(double, double8 ); \ |
794 | + double16 _cl_overloadable NAME(double, double16); |
795 | +#define _CL_DECLARE_FUNC_J_V(NAME) \ |
796 | + int _cl_overloadable NAME(float ); \ |
797 | + int2 _cl_overloadable NAME(float2 ); \ |
798 | + int3 _cl_overloadable NAME(float3 ); \ |
799 | + int4 _cl_overloadable NAME(float4 ); \ |
800 | + int8 _cl_overloadable NAME(float8 ); \ |
801 | + int16 _cl_overloadable NAME(float16 ); \ |
802 | + int _cl_overloadable NAME(double ); \ |
803 | + int2 _cl_overloadable NAME(double2 ); \ |
804 | + int3 _cl_overloadable NAME(double3 ); \ |
805 | + int4 _cl_overloadable NAME(double4 ); \ |
806 | + int8 _cl_overloadable NAME(double8 ); \ |
807 | + int16 _cl_overloadable NAME(double16); |
808 | +#define _CL_DECLARE_FUNC_S_V(NAME) \ |
809 | + float _cl_overloadable NAME(float ); \ |
810 | + float _cl_overloadable NAME(float2 ); \ |
811 | + float _cl_overloadable NAME(float3 ); \ |
812 | + float _cl_overloadable NAME(float4 ); \ |
813 | + float _cl_overloadable NAME(float8 ); \ |
814 | + float _cl_overloadable NAME(float16 ); \ |
815 | + double _cl_overloadable NAME(double ); \ |
816 | + double _cl_overloadable NAME(double2 ); \ |
817 | + double _cl_overloadable NAME(double3 ); \ |
818 | + double _cl_overloadable NAME(double4 ); \ |
819 | + double _cl_overloadable NAME(double8 ); \ |
820 | + double _cl_overloadable NAME(double16); |
821 | +#define _CL_DECLARE_FUNC_S_VV(NAME) \ |
822 | + float _cl_overloadable NAME(float , float ); \ |
823 | + float _cl_overloadable NAME(float2 , float2 ); \ |
824 | + float _cl_overloadable NAME(float3 , float3 ); \ |
825 | + float _cl_overloadable NAME(float4 , float4 ); \ |
826 | + float _cl_overloadable NAME(float8 , float8 ); \ |
827 | + float _cl_overloadable NAME(float16 , float16 ); \ |
828 | + double _cl_overloadable NAME(double , double ); \ |
829 | + double _cl_overloadable NAME(double2 , double2 ); \ |
830 | + double _cl_overloadable NAME(double3 , double3 ); \ |
831 | + double _cl_overloadable NAME(double4 , double4 ); \ |
832 | + double _cl_overloadable NAME(double8 , double8 ); \ |
833 | + double _cl_overloadable NAME(double16, double16); |
834 | |
835 | /* Move built-in declarations out of the way. (There should be a |
836 | better way of doing so.) These five functions are built-in math |
837 | @@ -779,11 +918,26 @@ |
838 | _CL_DECLARE_FUNC_V_V(fabs) |
839 | _CL_DECLARE_FUNC_V_VV(fdim) |
840 | _CL_DECLARE_FUNC_V_V(floor) |
841 | -_CL_DECLARE_FUNC_V_VVV(fma) |
842 | -_CL_DECLARE_FUNC_V_VV(fmax) |
843 | -_CL_DECLARE_FUNC_V_VS(fmax) |
844 | -_CL_DECLARE_FUNC_V_VV(fmin) |
845 | -_CL_DECLARE_FUNC_V_VS(fmin) |
846 | +#if __FAST__RELAXED__MATH__ |
847 | +# define _cl_fma _cl_fast_fma |
848 | +#else |
849 | +# define _cl_fma _cl_std_fma |
850 | +#endif |
851 | +#define _cl_fast_fma mad |
852 | +_CL_DECLARE_FUNC_V_VVV(_cl_std_fma) |
853 | +#if __FAST__RELAXED__MATH__ |
854 | +# define fmax _cl_fast_fmax |
855 | +# define fmin _cl_fast_fmin |
856 | +#else |
857 | +# define fmax _cl_std_fmax |
858 | +# define fmin _cl_std_fmin |
859 | +#endif |
860 | +#define _cl_fast_fmax max |
861 | +#define _cl_fast_fmin min |
862 | +_CL_DECLARE_FUNC_V_VV(_cl_std_fmax) |
863 | +_CL_DECLARE_FUNC_V_VS(_cl_std_fmax) |
864 | +_CL_DECLARE_FUNC_V_VV(_cl_std_fmin) |
865 | +_CL_DECLARE_FUNC_V_VS(_cl_std_fmin) |
866 | _CL_DECLARE_FUNC_V_VV(fmod) |
867 | _CL_DECLARE_FUNC_V_VPV(fract) |
868 | // frexp |
869 | @@ -850,380 +1004,380 @@ |
870 | |
871 | /* Integer Functions */ |
872 | |
873 | -#define _CL_DECLARE_FUNC_G_G(NAME) \ |
874 | - char __attribute__ ((overloadable)) NAME(char ); \ |
875 | - char2 __attribute__ ((overloadable)) NAME(char2 ); \ |
876 | - char3 __attribute__ ((overloadable)) NAME(char3 ); \ |
877 | - char4 __attribute__ ((overloadable)) NAME(char4 ); \ |
878 | - char8 __attribute__ ((overloadable)) NAME(char8 ); \ |
879 | - char16 __attribute__ ((overloadable)) NAME(char16 ); \ |
880 | - short __attribute__ ((overloadable)) NAME(short ); \ |
881 | - short2 __attribute__ ((overloadable)) NAME(short2 ); \ |
882 | - short3 __attribute__ ((overloadable)) NAME(short3 ); \ |
883 | - short4 __attribute__ ((overloadable)) NAME(short4 ); \ |
884 | - short8 __attribute__ ((overloadable)) NAME(short8 ); \ |
885 | - short16 __attribute__ ((overloadable)) NAME(short16 ); \ |
886 | - int __attribute__ ((overloadable)) NAME(int ); \ |
887 | - int2 __attribute__ ((overloadable)) NAME(int2 ); \ |
888 | - int3 __attribute__ ((overloadable)) NAME(int3 ); \ |
889 | - int4 __attribute__ ((overloadable)) NAME(int4 ); \ |
890 | - int8 __attribute__ ((overloadable)) NAME(int8 ); \ |
891 | - int16 __attribute__ ((overloadable)) NAME(int16 ); \ |
892 | - long __attribute__ ((overloadable)) NAME(long ); \ |
893 | - long2 __attribute__ ((overloadable)) NAME(long2 ); \ |
894 | - long3 __attribute__ ((overloadable)) NAME(long3 ); \ |
895 | - long4 __attribute__ ((overloadable)) NAME(long4 ); \ |
896 | - long8 __attribute__ ((overloadable)) NAME(long8 ); \ |
897 | - long16 __attribute__ ((overloadable)) NAME(long16 ); \ |
898 | - uchar __attribute__ ((overloadable)) NAME(uchar ); \ |
899 | - uchar2 __attribute__ ((overloadable)) NAME(uchar2 ); \ |
900 | - uchar3 __attribute__ ((overloadable)) NAME(uchar3 ); \ |
901 | - uchar4 __attribute__ ((overloadable)) NAME(uchar4 ); \ |
902 | - uchar8 __attribute__ ((overloadable)) NAME(uchar8 ); \ |
903 | - uchar16 __attribute__ ((overloadable)) NAME(uchar16 ); \ |
904 | - ushort __attribute__ ((overloadable)) NAME(ushort ); \ |
905 | - ushort2 __attribute__ ((overloadable)) NAME(ushort2 ); \ |
906 | - ushort3 __attribute__ ((overloadable)) NAME(ushort3 ); \ |
907 | - ushort4 __attribute__ ((overloadable)) NAME(ushort4 ); \ |
908 | - ushort8 __attribute__ ((overloadable)) NAME(ushort8 ); \ |
909 | - ushort16 __attribute__ ((overloadable)) NAME(ushort16); \ |
910 | - uint __attribute__ ((overloadable)) NAME(uint ); \ |
911 | - uint2 __attribute__ ((overloadable)) NAME(uint2 ); \ |
912 | - uint3 __attribute__ ((overloadable)) NAME(uint3 ); \ |
913 | - uint4 __attribute__ ((overloadable)) NAME(uint4 ); \ |
914 | - uint8 __attribute__ ((overloadable)) NAME(uint8 ); \ |
915 | - uint16 __attribute__ ((overloadable)) NAME(uint16 ); \ |
916 | - ulong __attribute__ ((overloadable)) NAME(ulong ); \ |
917 | - ulong2 __attribute__ ((overloadable)) NAME(ulong2 ); \ |
918 | - ulong3 __attribute__ ((overloadable)) NAME(ulong3 ); \ |
919 | - ulong4 __attribute__ ((overloadable)) NAME(ulong4 ); \ |
920 | - ulong8 __attribute__ ((overloadable)) NAME(ulong8 ); \ |
921 | - ulong16 __attribute__ ((overloadable)) NAME(ulong16 ); |
922 | -#define _CL_DECLARE_FUNC_G_GG(NAME) \ |
923 | - char __attribute__ ((overloadable)) NAME(char , char ); \ |
924 | - char2 __attribute__ ((overloadable)) NAME(char2 , char2 ); \ |
925 | - char3 __attribute__ ((overloadable)) NAME(char3 , char3 ); \ |
926 | - char4 __attribute__ ((overloadable)) NAME(char4 , char4 ); \ |
927 | - char8 __attribute__ ((overloadable)) NAME(char8 , char8 ); \ |
928 | - char16 __attribute__ ((overloadable)) NAME(char16 , char16 ); \ |
929 | - short __attribute__ ((overloadable)) NAME(short , short ); \ |
930 | - short2 __attribute__ ((overloadable)) NAME(short2 , short2 ); \ |
931 | - short3 __attribute__ ((overloadable)) NAME(short3 , short3 ); \ |
932 | - short4 __attribute__ ((overloadable)) NAME(short4 , short4 ); \ |
933 | - short8 __attribute__ ((overloadable)) NAME(short8 , short8 ); \ |
934 | - short16 __attribute__ ((overloadable)) NAME(short16 , short16 ); \ |
935 | - int __attribute__ ((overloadable)) NAME(int , int ); \ |
936 | - int2 __attribute__ ((overloadable)) NAME(int2 , int2 ); \ |
937 | - int3 __attribute__ ((overloadable)) NAME(int3 , int3 ); \ |
938 | - int4 __attribute__ ((overloadable)) NAME(int4 , int4 ); \ |
939 | - int8 __attribute__ ((overloadable)) NAME(int8 , int8 ); \ |
940 | - int16 __attribute__ ((overloadable)) NAME(int16 , int16 ); \ |
941 | - long __attribute__ ((overloadable)) NAME(long , long ); \ |
942 | - long2 __attribute__ ((overloadable)) NAME(long2 , long2 ); \ |
943 | - long3 __attribute__ ((overloadable)) NAME(long3 , long3 ); \ |
944 | - long4 __attribute__ ((overloadable)) NAME(long4 , long4 ); \ |
945 | - long8 __attribute__ ((overloadable)) NAME(long8 , long8 ); \ |
946 | - long16 __attribute__ ((overloadable)) NAME(long16 , long16 ); \ |
947 | - uchar __attribute__ ((overloadable)) NAME(uchar , uchar ); \ |
948 | - uchar2 __attribute__ ((overloadable)) NAME(uchar2 , uchar2 ); \ |
949 | - uchar3 __attribute__ ((overloadable)) NAME(uchar3 , uchar3 ); \ |
950 | - uchar4 __attribute__ ((overloadable)) NAME(uchar4 , uchar4 ); \ |
951 | - uchar8 __attribute__ ((overloadable)) NAME(uchar8 , uchar8 ); \ |
952 | - uchar16 __attribute__ ((overloadable)) NAME(uchar16 , uchar16 ); \ |
953 | - ushort __attribute__ ((overloadable)) NAME(ushort , ushort ); \ |
954 | - ushort2 __attribute__ ((overloadable)) NAME(ushort2 , ushort2 ); \ |
955 | - ushort3 __attribute__ ((overloadable)) NAME(ushort3 , ushort3 ); \ |
956 | - ushort4 __attribute__ ((overloadable)) NAME(ushort4 , ushort4 ); \ |
957 | - ushort8 __attribute__ ((overloadable)) NAME(ushort8 , ushort8 ); \ |
958 | - ushort16 __attribute__ ((overloadable)) NAME(ushort16, ushort16); \ |
959 | - uint __attribute__ ((overloadable)) NAME(uint , uint ); \ |
960 | - uint2 __attribute__ ((overloadable)) NAME(uint2 , uint2 ); \ |
961 | - uint3 __attribute__ ((overloadable)) NAME(uint3 , uint3 ); \ |
962 | - uint4 __attribute__ ((overloadable)) NAME(uint4 , uint4 ); \ |
963 | - uint8 __attribute__ ((overloadable)) NAME(uint8 , uint8 ); \ |
964 | - uint16 __attribute__ ((overloadable)) NAME(uint16 , uint16 ); \ |
965 | - ulong __attribute__ ((overloadable)) NAME(ulong , ulong ); \ |
966 | - ulong2 __attribute__ ((overloadable)) NAME(ulong2 , ulong2 ); \ |
967 | - ulong3 __attribute__ ((overloadable)) NAME(ulong3 , ulong3 ); \ |
968 | - ulong4 __attribute__ ((overloadable)) NAME(ulong4 , ulong4 ); \ |
969 | - ulong8 __attribute__ ((overloadable)) NAME(ulong8 , ulong8 ); \ |
970 | - ulong16 __attribute__ ((overloadable)) NAME(ulong16 , ulong16 ); |
971 | -#define _CL_DECLARE_FUNC_G_GGG(NAME) \ |
972 | - char __attribute__ ((overloadable)) NAME(char , char , char ); \ |
973 | - char2 __attribute__ ((overloadable)) NAME(char2 , char2 , char2 ); \ |
974 | - char3 __attribute__ ((overloadable)) NAME(char3 , char3 , char3 ); \ |
975 | - char4 __attribute__ ((overloadable)) NAME(char4 , char4 , char4 ); \ |
976 | - char8 __attribute__ ((overloadable)) NAME(char8 , char8 , char8 ); \ |
977 | - char16 __attribute__ ((overloadable)) NAME(char16 , char16 , char16 ); \ |
978 | - short __attribute__ ((overloadable)) NAME(short , short , short ); \ |
979 | - short2 __attribute__ ((overloadable)) NAME(short2 , short2 , short2 ); \ |
980 | - short3 __attribute__ ((overloadable)) NAME(short3 , short3 , short3 ); \ |
981 | - short4 __attribute__ ((overloadable)) NAME(short4 , short4 , short4 ); \ |
982 | - short8 __attribute__ ((overloadable)) NAME(short8 , short8 , short8 ); \ |
983 | - short16 __attribute__ ((overloadable)) NAME(short16 , short16 , short16 ); \ |
984 | - int __attribute__ ((overloadable)) NAME(int , int , int ); \ |
985 | - int2 __attribute__ ((overloadable)) NAME(int2 , int2 , int2 ); \ |
986 | - int3 __attribute__ ((overloadable)) NAME(int3 , int3 , int3 ); \ |
987 | - int4 __attribute__ ((overloadable)) NAME(int4 , int4 , int4 ); \ |
988 | - int8 __attribute__ ((overloadable)) NAME(int8 , int8 , int8 ); \ |
989 | - int16 __attribute__ ((overloadable)) NAME(int16 , int16 , int16 ); \ |
990 | - long __attribute__ ((overloadable)) NAME(long , long , long ); \ |
991 | - long2 __attribute__ ((overloadable)) NAME(long2 , long2 , long2 ); \ |
992 | - long3 __attribute__ ((overloadable)) NAME(long3 , long3 , long3 ); \ |
993 | - long4 __attribute__ ((overloadable)) NAME(long4 , long4 , long4 ); \ |
994 | - long8 __attribute__ ((overloadable)) NAME(long8 , long8 , long8 ); \ |
995 | - long16 __attribute__ ((overloadable)) NAME(long16 , long16 , long16 ); \ |
996 | - uchar __attribute__ ((overloadable)) NAME(uchar , uchar , uchar ); \ |
997 | - uchar2 __attribute__ ((overloadable)) NAME(uchar2 , uchar2 , uchar2 ); \ |
998 | - uchar3 __attribute__ ((overloadable)) NAME(uchar3 , uchar3 , uchar3 ); \ |
999 | - uchar4 __attribute__ ((overloadable)) NAME(uchar4 , uchar4 , uchar4 ); \ |
1000 | - uchar8 __attribute__ ((overloadable)) NAME(uchar8 , uchar8 , uchar8 ); \ |
1001 | - uchar16 __attribute__ ((overloadable)) NAME(uchar16 , uchar16 , uchar16 ); \ |
1002 | - ushort __attribute__ ((overloadable)) NAME(ushort , ushort , ushort ); \ |
1003 | - ushort2 __attribute__ ((overloadable)) NAME(ushort2 , ushort2 , ushort2 ); \ |
1004 | - ushort3 __attribute__ ((overloadable)) NAME(ushort3 , ushort3 , ushort3 ); \ |
1005 | - ushort4 __attribute__ ((overloadable)) NAME(ushort4 , ushort4 , ushort4 ); \ |
1006 | - ushort8 __attribute__ ((overloadable)) NAME(ushort8 , ushort8 , ushort8 ); \ |
1007 | - ushort16 __attribute__ ((overloadable)) NAME(ushort16, ushort16, ushort16); \ |
1008 | - uint __attribute__ ((overloadable)) NAME(uint , uint , uint ); \ |
1009 | - uint2 __attribute__ ((overloadable)) NAME(uint2 , uint2 , uint2 ); \ |
1010 | - uint3 __attribute__ ((overloadable)) NAME(uint3 , uint3 , uint3 ); \ |
1011 | - uint4 __attribute__ ((overloadable)) NAME(uint4 , uint4 , uint4 ); \ |
1012 | - uint8 __attribute__ ((overloadable)) NAME(uint8 , uint8 , uint8 ); \ |
1013 | - uint16 __attribute__ ((overloadable)) NAME(uint16 , uint16 , uint16 ); \ |
1014 | - ulong __attribute__ ((overloadable)) NAME(ulong , ulong , ulong ); \ |
1015 | - ulong2 __attribute__ ((overloadable)) NAME(ulong2 , ulong2 , ulong2 ); \ |
1016 | - ulong3 __attribute__ ((overloadable)) NAME(ulong3 , ulong3 , ulong3 ); \ |
1017 | - ulong4 __attribute__ ((overloadable)) NAME(ulong4 , ulong4 , ulong4 ); \ |
1018 | - ulong8 __attribute__ ((overloadable)) NAME(ulong8 , ulong8 , ulong8 ); \ |
1019 | - ulong16 __attribute__ ((overloadable)) NAME(ulong16 , ulong16 , ulong16 ); |
1020 | -#define _CL_DECLARE_FUNC_G_GS(NAME) \ |
1021 | - char2 __attribute__ ((overloadable)) NAME(char2 , char ); \ |
1022 | - char3 __attribute__ ((overloadable)) NAME(char3 , char ); \ |
1023 | - char4 __attribute__ ((overloadable)) NAME(char4 , char ); \ |
1024 | - char8 __attribute__ ((overloadable)) NAME(char8 , char ); \ |
1025 | - char16 __attribute__ ((overloadable)) NAME(char16 , char ); \ |
1026 | - short2 __attribute__ ((overloadable)) NAME(short2 , short ); \ |
1027 | - short3 __attribute__ ((overloadable)) NAME(short3 , short ); \ |
1028 | - short4 __attribute__ ((overloadable)) NAME(short4 , short ); \ |
1029 | - short8 __attribute__ ((overloadable)) NAME(short8 , short ); \ |
1030 | - short16 __attribute__ ((overloadable)) NAME(short16 , short ); \ |
1031 | - int2 __attribute__ ((overloadable)) NAME(int2 , int ); \ |
1032 | - int3 __attribute__ ((overloadable)) NAME(int3 , int ); \ |
1033 | - int4 __attribute__ ((overloadable)) NAME(int4 , int ); \ |
1034 | - int8 __attribute__ ((overloadable)) NAME(int8 , int ); \ |
1035 | - int16 __attribute__ ((overloadable)) NAME(int16 , int ); \ |
1036 | - long2 __attribute__ ((overloadable)) NAME(long2 , long ); \ |
1037 | - long3 __attribute__ ((overloadable)) NAME(long3 , long ); \ |
1038 | - long4 __attribute__ ((overloadable)) NAME(long4 , long ); \ |
1039 | - long8 __attribute__ ((overloadable)) NAME(long8 , long ); \ |
1040 | - long16 __attribute__ ((overloadable)) NAME(long16 , long ); \ |
1041 | - uchar2 __attribute__ ((overloadable)) NAME(uchar2 , uchar ); \ |
1042 | - uchar3 __attribute__ ((overloadable)) NAME(uchar3 , uchar ); \ |
1043 | - uchar4 __attribute__ ((overloadable)) NAME(uchar4 , uchar ); \ |
1044 | - uchar8 __attribute__ ((overloadable)) NAME(uchar8 , uchar ); \ |
1045 | - uchar16 __attribute__ ((overloadable)) NAME(uchar16 , uchar ); \ |
1046 | - ushort2 __attribute__ ((overloadable)) NAME(ushort2 , ushort); \ |
1047 | - ushort3 __attribute__ ((overloadable)) NAME(ushort3 , ushort); \ |
1048 | - ushort4 __attribute__ ((overloadable)) NAME(ushort4 , ushort); \ |
1049 | - ushort8 __attribute__ ((overloadable)) NAME(ushort8 , ushort); \ |
1050 | - ushort16 __attribute__ ((overloadable)) NAME(ushort16, ushort); \ |
1051 | - uint2 __attribute__ ((overloadable)) NAME(uint2 , uint ); \ |
1052 | - uint3 __attribute__ ((overloadable)) NAME(uint3 , uint ); \ |
1053 | - uint4 __attribute__ ((overloadable)) NAME(uint4 , uint ); \ |
1054 | - uint8 __attribute__ ((overloadable)) NAME(uint8 , uint ); \ |
1055 | - uint16 __attribute__ ((overloadable)) NAME(uint16 , uint ); \ |
1056 | - ulong2 __attribute__ ((overloadable)) NAME(ulong2 , ulong ); \ |
1057 | - ulong3 __attribute__ ((overloadable)) NAME(ulong3 , ulong ); \ |
1058 | - ulong4 __attribute__ ((overloadable)) NAME(ulong4 , ulong ); \ |
1059 | - ulong8 __attribute__ ((overloadable)) NAME(ulong8 , ulong ); \ |
1060 | - ulong16 __attribute__ ((overloadable)) NAME(ulong16 , ulong ); |
1061 | -#define _CL_DECLARE_FUNC_UG_G(NAME) \ |
1062 | - uchar __attribute__ ((overloadable)) NAME(char ); \ |
1063 | - uchar2 __attribute__ ((overloadable)) NAME(char2 ); \ |
1064 | - uchar3 __attribute__ ((overloadable)) NAME(char3 ); \ |
1065 | - uchar4 __attribute__ ((overloadable)) NAME(char4 ); \ |
1066 | - uchar8 __attribute__ ((overloadable)) NAME(char8 ); \ |
1067 | - uchar16 __attribute__ ((overloadable)) NAME(char16 ); \ |
1068 | - ushort __attribute__ ((overloadable)) NAME(short ); \ |
1069 | - ushort2 __attribute__ ((overloadable)) NAME(short2 ); \ |
1070 | - ushort3 __attribute__ ((overloadable)) NAME(short3 ); \ |
1071 | - ushort4 __attribute__ ((overloadable)) NAME(short4 ); \ |
1072 | - ushort8 __attribute__ ((overloadable)) NAME(short8 ); \ |
1073 | - ushort16 __attribute__ ((overloadable)) NAME(short16 ); \ |
1074 | - uint __attribute__ ((overloadable)) NAME(int ); \ |
1075 | - uint2 __attribute__ ((overloadable)) NAME(int2 ); \ |
1076 | - uint3 __attribute__ ((overloadable)) NAME(int3 ); \ |
1077 | - uint4 __attribute__ ((overloadable)) NAME(int4 ); \ |
1078 | - uint8 __attribute__ ((overloadable)) NAME(int8 ); \ |
1079 | - uint16 __attribute__ ((overloadable)) NAME(int16 ); \ |
1080 | - ulong __attribute__ ((overloadable)) NAME(long ); \ |
1081 | - ulong2 __attribute__ ((overloadable)) NAME(long2 ); \ |
1082 | - ulong3 __attribute__ ((overloadable)) NAME(long3 ); \ |
1083 | - ulong4 __attribute__ ((overloadable)) NAME(long4 ); \ |
1084 | - ulong8 __attribute__ ((overloadable)) NAME(long8 ); \ |
1085 | - ulong16 __attribute__ ((overloadable)) NAME(long16 ); \ |
1086 | - uchar __attribute__ ((overloadable)) NAME(uchar ); \ |
1087 | - uchar2 __attribute__ ((overloadable)) NAME(uchar2 ); \ |
1088 | - uchar3 __attribute__ ((overloadable)) NAME(uchar3 ); \ |
1089 | - uchar4 __attribute__ ((overloadable)) NAME(uchar4 ); \ |
1090 | - uchar8 __attribute__ ((overloadable)) NAME(uchar8 ); \ |
1091 | - uchar16 __attribute__ ((overloadable)) NAME(uchar16 ); \ |
1092 | - ushort __attribute__ ((overloadable)) NAME(ushort ); \ |
1093 | - ushort2 __attribute__ ((overloadable)) NAME(ushort2 ); \ |
1094 | - ushort3 __attribute__ ((overloadable)) NAME(ushort3 ); \ |
1095 | - ushort4 __attribute__ ((overloadable)) NAME(ushort4 ); \ |
1096 | - ushort8 __attribute__ ((overloadable)) NAME(ushort8 ); \ |
1097 | - ushort16 __attribute__ ((overloadable)) NAME(ushort16); \ |
1098 | - uint __attribute__ ((overloadable)) NAME(uint ); \ |
1099 | - uint2 __attribute__ ((overloadable)) NAME(uint2 ); \ |
1100 | - uint3 __attribute__ ((overloadable)) NAME(uint3 ); \ |
1101 | - uint4 __attribute__ ((overloadable)) NAME(uint4 ); \ |
1102 | - uint8 __attribute__ ((overloadable)) NAME(uint8 ); \ |
1103 | - uint16 __attribute__ ((overloadable)) NAME(uint16 ); \ |
1104 | - ulong __attribute__ ((overloadable)) NAME(ulong ); \ |
1105 | - ulong2 __attribute__ ((overloadable)) NAME(ulong2 ); \ |
1106 | - ulong3 __attribute__ ((overloadable)) NAME(ulong3 ); \ |
1107 | - ulong4 __attribute__ ((overloadable)) NAME(ulong4 ); \ |
1108 | - ulong8 __attribute__ ((overloadable)) NAME(ulong8 ); \ |
1109 | - ulong16 __attribute__ ((overloadable)) NAME(ulong16 ); |
1110 | -#define _CL_DECLARE_FUNC_UG_GG(NAME) \ |
1111 | - uchar __attribute__ ((overloadable)) NAME(char , char ); \ |
1112 | - uchar2 __attribute__ ((overloadable)) NAME(char2 , char2 ); \ |
1113 | - uchar3 __attribute__ ((overloadable)) NAME(char3 , char3 ); \ |
1114 | - uchar4 __attribute__ ((overloadable)) NAME(char4 , char4 ); \ |
1115 | - uchar8 __attribute__ ((overloadable)) NAME(char8 , char8 ); \ |
1116 | - uchar16 __attribute__ ((overloadable)) NAME(char16 , char16 ); \ |
1117 | - ushort __attribute__ ((overloadable)) NAME(short , short ); \ |
1118 | - ushort2 __attribute__ ((overloadable)) NAME(short2 , short2 ); \ |
1119 | - ushort3 __attribute__ ((overloadable)) NAME(short3 , short3 ); \ |
1120 | - ushort4 __attribute__ ((overloadable)) NAME(short4 , short4 ); \ |
1121 | - ushort8 __attribute__ ((overloadable)) NAME(short8 , short8 ); \ |
1122 | - ushort16 __attribute__ ((overloadable)) NAME(short16 , short16 ); \ |
1123 | - uint __attribute__ ((overloadable)) NAME(int , int ); \ |
1124 | - uint2 __attribute__ ((overloadable)) NAME(int2 , int2 ); \ |
1125 | - uint3 __attribute__ ((overloadable)) NAME(int3 , int3 ); \ |
1126 | - uint4 __attribute__ ((overloadable)) NAME(int4 , int4 ); \ |
1127 | - uint8 __attribute__ ((overloadable)) NAME(int8 , int8 ); \ |
1128 | - uint16 __attribute__ ((overloadable)) NAME(int16 , int16 ); \ |
1129 | - ulong __attribute__ ((overloadable)) NAME(long , long ); \ |
1130 | - ulong2 __attribute__ ((overloadable)) NAME(long2 , long2 ); \ |
1131 | - ulong3 __attribute__ ((overloadable)) NAME(long3 , long3 ); \ |
1132 | - ulong4 __attribute__ ((overloadable)) NAME(long4 , long4 ); \ |
1133 | - ulong8 __attribute__ ((overloadable)) NAME(long8 , long8 ); \ |
1134 | - ulong16 __attribute__ ((overloadable)) NAME(long16 , long16 ); \ |
1135 | - uchar __attribute__ ((overloadable)) NAME(uchar , uchar ); \ |
1136 | - uchar2 __attribute__ ((overloadable)) NAME(uchar2 , uchar2 ); \ |
1137 | - uchar3 __attribute__ ((overloadable)) NAME(uchar3 , uchar3 ); \ |
1138 | - uchar4 __attribute__ ((overloadable)) NAME(uchar4 , uchar4 ); \ |
1139 | - uchar8 __attribute__ ((overloadable)) NAME(uchar8 , uchar8 ); \ |
1140 | - uchar16 __attribute__ ((overloadable)) NAME(uchar16 , uchar16 ); \ |
1141 | - ushort __attribute__ ((overloadable)) NAME(ushort , ushort ); \ |
1142 | - ushort2 __attribute__ ((overloadable)) NAME(ushort2 , ushort2 ); \ |
1143 | - ushort3 __attribute__ ((overloadable)) NAME(ushort3 , ushort3 ); \ |
1144 | - ushort4 __attribute__ ((overloadable)) NAME(ushort4 , ushort4 ); \ |
1145 | - ushort8 __attribute__ ((overloadable)) NAME(ushort8 , ushort8 ); \ |
1146 | - ushort16 __attribute__ ((overloadable)) NAME(ushort16, ushort16); \ |
1147 | - uint __attribute__ ((overloadable)) NAME(uint , uint ); \ |
1148 | - uint2 __attribute__ ((overloadable)) NAME(uint2 , uint2 ); \ |
1149 | - uint3 __attribute__ ((overloadable)) NAME(uint3 , uint3 ); \ |
1150 | - uint4 __attribute__ ((overloadable)) NAME(uint4 , uint4 ); \ |
1151 | - uint8 __attribute__ ((overloadable)) NAME(uint8 , uint8 ); \ |
1152 | - uint16 __attribute__ ((overloadable)) NAME(uint16 , uint16 ); \ |
1153 | - ulong __attribute__ ((overloadable)) NAME(ulong , ulong ); \ |
1154 | - ulong2 __attribute__ ((overloadable)) NAME(ulong2 , ulong2 ); \ |
1155 | - ulong3 __attribute__ ((overloadable)) NAME(ulong3 , ulong3 ); \ |
1156 | - ulong4 __attribute__ ((overloadable)) NAME(ulong4 , ulong4 ); \ |
1157 | - ulong8 __attribute__ ((overloadable)) NAME(ulong8 , ulong8 ); \ |
1158 | - ulong16 __attribute__ ((overloadable)) NAME(ulong16 , ulong16 ); |
1159 | -#define _CL_DECLARE_FUNC_LG_GUG(NAME) \ |
1160 | - short __attribute__ ((overloadable)) NAME(char , uchar ); \ |
1161 | - short2 __attribute__ ((overloadable)) NAME(char2 , uchar2 ); \ |
1162 | - short3 __attribute__ ((overloadable)) NAME(char3 , uchar3 ); \ |
1163 | - short4 __attribute__ ((overloadable)) NAME(char4 , uchar4 ); \ |
1164 | - short8 __attribute__ ((overloadable)) NAME(char8 , uchar8 ); \ |
1165 | - short16 __attribute__ ((overloadable)) NAME(char16 , uchar16 ); \ |
1166 | - int __attribute__ ((overloadable)) NAME(short , ushort ); \ |
1167 | - int2 __attribute__ ((overloadable)) NAME(short2 , ushort2 ); \ |
1168 | - int3 __attribute__ ((overloadable)) NAME(short3 , ushort3 ); \ |
1169 | - int4 __attribute__ ((overloadable)) NAME(short4 , ushort4 ); \ |
1170 | - int8 __attribute__ ((overloadable)) NAME(short8 , ushort8 ); \ |
1171 | - int16 __attribute__ ((overloadable)) NAME(short16 , ushort16 ); \ |
1172 | - long __attribute__ ((overloadable)) NAME(int , uint ); \ |
1173 | - long2 __attribute__ ((overloadable)) NAME(int2 , uint2 ); \ |
1174 | - long3 __attribute__ ((overloadable)) NAME(int3 , uint3 ); \ |
1175 | - long4 __attribute__ ((overloadable)) NAME(int4 , uint4 ); \ |
1176 | - long8 __attribute__ ((overloadable)) NAME(int8 , uint8 ); \ |
1177 | - long16 __attribute__ ((overloadable)) NAME(int16 , uint16 ); \ |
1178 | - ushort __attribute__ ((overloadable)) NAME(uchar , uchar ); \ |
1179 | - ushort2 __attribute__ ((overloadable)) NAME(uchar2 , uchar2 ); \ |
1180 | - ushort3 __attribute__ ((overloadable)) NAME(uchar3 , uchar3 ); \ |
1181 | - ushort4 __attribute__ ((overloadable)) NAME(uchar4 , uchar4 ); \ |
1182 | - ushort8 __attribute__ ((overloadable)) NAME(uchar8 , uchar8 ); \ |
1183 | - ushort16 __attribute__ ((overloadable)) NAME(uchar16 , uchar16 ); \ |
1184 | - uint __attribute__ ((overloadable)) NAME(ushort , ushort ); \ |
1185 | - uint2 __attribute__ ((overloadable)) NAME(ushort2 , ushort2 ); \ |
1186 | - uint3 __attribute__ ((overloadable)) NAME(ushort3 , ushort3 ); \ |
1187 | - uint4 __attribute__ ((overloadable)) NAME(ushort4 , ushort4 ); \ |
1188 | - uint8 __attribute__ ((overloadable)) NAME(ushort8 , ushort8 ); \ |
1189 | - uint16 __attribute__ ((overloadable)) NAME(ushort16, ushort16); \ |
1190 | - ulong __attribute__ ((overloadable)) NAME(uint , uint ); \ |
1191 | - ulong2 __attribute__ ((overloadable)) NAME(uint2 , uint2 ); \ |
1192 | - ulong3 __attribute__ ((overloadable)) NAME(uint3 , uint3 ); \ |
1193 | - ulong4 __attribute__ ((overloadable)) NAME(uint4 , uint4 ); \ |
1194 | - ulong8 __attribute__ ((overloadable)) NAME(uint8 , uint8 ); \ |
1195 | - ulong16 __attribute__ ((overloadable)) NAME(uint16 , uint16 ); |
1196 | -#define _CL_DECLARE_FUNC_I_IG(NAME) \ |
1197 | - int __attribute__ ((overloadable)) NAME(char ); \ |
1198 | - int __attribute__ ((overloadable)) NAME(char2 ); \ |
1199 | - int __attribute__ ((overloadable)) NAME(char3 ); \ |
1200 | - int __attribute__ ((overloadable)) NAME(char4 ); \ |
1201 | - int __attribute__ ((overloadable)) NAME(char8 ); \ |
1202 | - int __attribute__ ((overloadable)) NAME(char16 ); \ |
1203 | - int __attribute__ ((overloadable)) NAME(short ); \ |
1204 | - int __attribute__ ((overloadable)) NAME(short2 ); \ |
1205 | - int __attribute__ ((overloadable)) NAME(short3 ); \ |
1206 | - int __attribute__ ((overloadable)) NAME(short4 ); \ |
1207 | - int __attribute__ ((overloadable)) NAME(short8 ); \ |
1208 | - int __attribute__ ((overloadable)) NAME(short16); \ |
1209 | - int __attribute__ ((overloadable)) NAME(int ); \ |
1210 | - int __attribute__ ((overloadable)) NAME(int2 ); \ |
1211 | - int __attribute__ ((overloadable)) NAME(int3 ); \ |
1212 | - int __attribute__ ((overloadable)) NAME(int4 ); \ |
1213 | - int __attribute__ ((overloadable)) NAME(int8 ); \ |
1214 | - int __attribute__ ((overloadable)) NAME(int16 ); \ |
1215 | - int __attribute__ ((overloadable)) NAME(long ); \ |
1216 | - int __attribute__ ((overloadable)) NAME(long2 ); \ |
1217 | - int __attribute__ ((overloadable)) NAME(long3 ); \ |
1218 | - int __attribute__ ((overloadable)) NAME(long4 ); \ |
1219 | - int __attribute__ ((overloadable)) NAME(long8 ); \ |
1220 | - int __attribute__ ((overloadable)) NAME(long16 ); |
1221 | -#define _CL_DECLARE_FUNC_J_JJ(NAME) \ |
1222 | - int __attribute__ ((overloadable)) NAME(int , int ); \ |
1223 | - int2 __attribute__ ((overloadable)) NAME(int2 , int2 ); \ |
1224 | - int3 __attribute__ ((overloadable)) NAME(int3 , int3 ); \ |
1225 | - int4 __attribute__ ((overloadable)) NAME(int4 , int4 ); \ |
1226 | - int8 __attribute__ ((overloadable)) NAME(int8 , int8 ); \ |
1227 | - int16 __attribute__ ((overloadable)) NAME(int16 , int16 ); \ |
1228 | - uint __attribute__ ((overloadable)) NAME(uint , uint ); \ |
1229 | - uint2 __attribute__ ((overloadable)) NAME(uint2 , uint2 ); \ |
1230 | - uint3 __attribute__ ((overloadable)) NAME(uint3 , uint3 ); \ |
1231 | - uint4 __attribute__ ((overloadable)) NAME(uint4 , uint4 ); \ |
1232 | - uint8 __attribute__ ((overloadable)) NAME(uint8 , uint8 ); \ |
1233 | - uint16 __attribute__ ((overloadable)) NAME(uint16 , uint16 ); |
1234 | -#define _CL_DECLARE_FUNC_J_JJJ(NAME) \ |
1235 | - int __attribute__ ((overloadable)) NAME(int , int , int ); \ |
1236 | - int2 __attribute__ ((overloadable)) NAME(int2 , int2 , int2 ); \ |
1237 | - int3 __attribute__ ((overloadable)) NAME(int3 , int3 , int3 ); \ |
1238 | - int4 __attribute__ ((overloadable)) NAME(int4 , int4 , int4 ); \ |
1239 | - int8 __attribute__ ((overloadable)) NAME(int8 , int8 , int8 ); \ |
1240 | - int16 __attribute__ ((overloadable)) NAME(int16 , int16 , int16 ); \ |
1241 | - uint __attribute__ ((overloadable)) NAME(uint , uint , uint ); \ |
1242 | - uint2 __attribute__ ((overloadable)) NAME(uint2 , uint2 , uint2 ); \ |
1243 | - uint3 __attribute__ ((overloadable)) NAME(uint3 , uint3 , uint3 ); \ |
1244 | - uint4 __attribute__ ((overloadable)) NAME(uint4 , uint4 , uint4 ); \ |
1245 | - uint8 __attribute__ ((overloadable)) NAME(uint8 , uint8 , uint8 ); \ |
1246 | - uint16 __attribute__ ((overloadable)) NAME(uint16 , uint16 , uint16 ); |
1247 | +#define _CL_DECLARE_FUNC_G_G(NAME) \ |
1248 | + char _cl_overloadable NAME(char ); \ |
1249 | + char2 _cl_overloadable NAME(char2 ); \ |
1250 | + char3 _cl_overloadable NAME(char3 ); \ |
1251 | + char4 _cl_overloadable NAME(char4 ); \ |
1252 | + char8 _cl_overloadable NAME(char8 ); \ |
1253 | + char16 _cl_overloadable NAME(char16 ); \ |
1254 | + short _cl_overloadable NAME(short ); \ |
1255 | + short2 _cl_overloadable NAME(short2 ); \ |
1256 | + short3 _cl_overloadable NAME(short3 ); \ |
1257 | + short4 _cl_overloadable NAME(short4 ); \ |
1258 | + short8 _cl_overloadable NAME(short8 ); \ |
1259 | + short16 _cl_overloadable NAME(short16 ); \ |
1260 | + int _cl_overloadable NAME(int ); \ |
1261 | + int2 _cl_overloadable NAME(int2 ); \ |
1262 | + int3 _cl_overloadable NAME(int3 ); \ |
1263 | + int4 _cl_overloadable NAME(int4 ); \ |
1264 | + int8 _cl_overloadable NAME(int8 ); \ |
1265 | + int16 _cl_overloadable NAME(int16 ); \ |
1266 | + long _cl_overloadable NAME(long ); \ |
1267 | + long2 _cl_overloadable NAME(long2 ); \ |
1268 | + long3 _cl_overloadable NAME(long3 ); \ |
1269 | + long4 _cl_overloadable NAME(long4 ); \ |
1270 | + long8 _cl_overloadable NAME(long8 ); \ |
1271 | + long16 _cl_overloadable NAME(long16 ); \ |
1272 | + uchar _cl_overloadable NAME(uchar ); \ |
1273 | + uchar2 _cl_overloadable NAME(uchar2 ); \ |
1274 | + uchar3 _cl_overloadable NAME(uchar3 ); \ |
1275 | + uchar4 _cl_overloadable NAME(uchar4 ); \ |
1276 | + uchar8 _cl_overloadable NAME(uchar8 ); \ |
1277 | + uchar16 _cl_overloadable NAME(uchar16 ); \ |
1278 | + ushort _cl_overloadable NAME(ushort ); \ |
1279 | + ushort2 _cl_overloadable NAME(ushort2 ); \ |
1280 | + ushort3 _cl_overloadable NAME(ushort3 ); \ |
1281 | + ushort4 _cl_overloadable NAME(ushort4 ); \ |
1282 | + ushort8 _cl_overloadable NAME(ushort8 ); \ |
1283 | + ushort16 _cl_overloadable NAME(ushort16); \ |
1284 | + uint _cl_overloadable NAME(uint ); \ |
1285 | + uint2 _cl_overloadable NAME(uint2 ); \ |
1286 | + uint3 _cl_overloadable NAME(uint3 ); \ |
1287 | + uint4 _cl_overloadable NAME(uint4 ); \ |
1288 | + uint8 _cl_overloadable NAME(uint8 ); \ |
1289 | + uint16 _cl_overloadable NAME(uint16 ); \ |
1290 | + ulong _cl_overloadable NAME(ulong ); \ |
1291 | + ulong2 _cl_overloadable NAME(ulong2 ); \ |
1292 | + ulong3 _cl_overloadable NAME(ulong3 ); \ |
1293 | + ulong4 _cl_overloadable NAME(ulong4 ); \ |
1294 | + ulong8 _cl_overloadable NAME(ulong8 ); \ |
1295 | + ulong16 _cl_overloadable NAME(ulong16 ); |
1296 | +#define _CL_DECLARE_FUNC_G_GG(NAME) \ |
1297 | + char _cl_overloadable NAME(char , char ); \ |
1298 | + char2 _cl_overloadable NAME(char2 , char2 ); \ |
1299 | + char3 _cl_overloadable NAME(char3 , char3 ); \ |
1300 | + char4 _cl_overloadable NAME(char4 , char4 ); \ |
1301 | + char8 _cl_overloadable NAME(char8 , char8 ); \ |
1302 | + char16 _cl_overloadable NAME(char16 , char16 ); \ |
1303 | + short _cl_overloadable NAME(short , short ); \ |
1304 | + short2 _cl_overloadable NAME(short2 , short2 ); \ |
1305 | + short3 _cl_overloadable NAME(short3 , short3 ); \ |
1306 | + short4 _cl_overloadable NAME(short4 , short4 ); \ |
1307 | + short8 _cl_overloadable NAME(short8 , short8 ); \ |
1308 | + short16 _cl_overloadable NAME(short16 , short16 ); \ |
1309 | + int _cl_overloadable NAME(int , int ); \ |
1310 | + int2 _cl_overloadable NAME(int2 , int2 ); \ |
1311 | + int3 _cl_overloadable NAME(int3 , int3 ); \ |
1312 | + int4 _cl_overloadable NAME(int4 , int4 ); \ |
1313 | + int8 _cl_overloadable NAME(int8 , int8 ); \ |
1314 | + int16 _cl_overloadable NAME(int16 , int16 ); \ |
1315 | + long _cl_overloadable NAME(long , long ); \ |
1316 | + long2 _cl_overloadable NAME(long2 , long2 ); \ |
1317 | + long3 _cl_overloadable NAME(long3 , long3 ); \ |
1318 | + long4 _cl_overloadable NAME(long4 , long4 ); \ |
1319 | + long8 _cl_overloadable NAME(long8 , long8 ); \ |
1320 | + long16 _cl_overloadable NAME(long16 , long16 ); \ |
1321 | + uchar _cl_overloadable NAME(uchar , uchar ); \ |
1322 | + uchar2 _cl_overloadable NAME(uchar2 , uchar2 ); \ |
1323 | + uchar3 _cl_overloadable NAME(uchar3 , uchar3 ); \ |
1324 | + uchar4 _cl_overloadable NAME(uchar4 , uchar4 ); \ |
1325 | + uchar8 _cl_overloadable NAME(uchar8 , uchar8 ); \ |
1326 | + uchar16 _cl_overloadable NAME(uchar16 , uchar16 ); \ |
1327 | + ushort _cl_overloadable NAME(ushort , ushort ); \ |
1328 | + ushort2 _cl_overloadable NAME(ushort2 , ushort2 ); \ |
1329 | + ushort3 _cl_overloadable NAME(ushort3 , ushort3 ); \ |
1330 | + ushort4 _cl_overloadable NAME(ushort4 , ushort4 ); \ |
1331 | + ushort8 _cl_overloadable NAME(ushort8 , ushort8 ); \ |
1332 | + ushort16 _cl_overloadable NAME(ushort16, ushort16); \ |
1333 | + uint _cl_overloadable NAME(uint , uint ); \ |
1334 | + uint2 _cl_overloadable NAME(uint2 , uint2 ); \ |
1335 | + uint3 _cl_overloadable NAME(uint3 , uint3 ); \ |
1336 | + uint4 _cl_overloadable NAME(uint4 , uint4 ); \ |
1337 | + uint8 _cl_overloadable NAME(uint8 , uint8 ); \ |
1338 | + uint16 _cl_overloadable NAME(uint16 , uint16 ); \ |
1339 | + ulong _cl_overloadable NAME(ulong , ulong ); \ |
1340 | + ulong2 _cl_overloadable NAME(ulong2 , ulong2 ); \ |
1341 | + ulong3 _cl_overloadable NAME(ulong3 , ulong3 ); \ |
1342 | + ulong4 _cl_overloadable NAME(ulong4 , ulong4 ); \ |
1343 | + ulong8 _cl_overloadable NAME(ulong8 , ulong8 ); \ |
1344 | + ulong16 _cl_overloadable NAME(ulong16 , ulong16 ); |
1345 | +#define _CL_DECLARE_FUNC_G_GGG(NAME) \ |
1346 | + char _cl_overloadable NAME(char , char , char ); \ |
1347 | + char2 _cl_overloadable NAME(char2 , char2 , char2 ); \ |
1348 | + char3 _cl_overloadable NAME(char3 , char3 , char3 ); \ |
1349 | + char4 _cl_overloadable NAME(char4 , char4 , char4 ); \ |
1350 | + char8 _cl_overloadable NAME(char8 , char8 , char8 ); \ |
1351 | + char16 _cl_overloadable NAME(char16 , char16 , char16 ); \ |
1352 | + short _cl_overloadable NAME(short , short , short ); \ |
1353 | + short2 _cl_overloadable NAME(short2 , short2 , short2 ); \ |
1354 | + short3 _cl_overloadable NAME(short3 , short3 , short3 ); \ |
1355 | + short4 _cl_overloadable NAME(short4 , short4 , short4 ); \ |
1356 | + short8 _cl_overloadable NAME(short8 , short8 , short8 ); \ |
1357 | + short16 _cl_overloadable NAME(short16 , short16 , short16 ); \ |
1358 | + int _cl_overloadable NAME(int , int , int ); \ |
1359 | + int2 _cl_overloadable NAME(int2 , int2 , int2 ); \ |
1360 | + int3 _cl_overloadable NAME(int3 , int3 , int3 ); \ |
1361 | + int4 _cl_overloadable NAME(int4 , int4 , int4 ); \ |
1362 | + int8 _cl_overloadable NAME(int8 , int8 , int8 ); \ |
1363 | + int16 _cl_overloadable NAME(int16 , int16 , int16 ); \ |
1364 | + long _cl_overloadable NAME(long , long , long ); \ |
1365 | + long2 _cl_overloadable NAME(long2 , long2 , long2 ); \ |
1366 | + long3 _cl_overloadable NAME(long3 , long3 , long3 ); \ |
1367 | + long4 _cl_overloadable NAME(long4 , long4 , long4 ); \ |
1368 | + long8 _cl_overloadable NAME(long8 , long8 , long8 ); \ |
1369 | + long16 _cl_overloadable NAME(long16 , long16 , long16 ); \ |
1370 | + uchar _cl_overloadable NAME(uchar , uchar , uchar ); \ |
1371 | + uchar2 _cl_overloadable NAME(uchar2 , uchar2 , uchar2 ); \ |
1372 | + uchar3 _cl_overloadable NAME(uchar3 , uchar3 , uchar3 ); \ |
1373 | + uchar4 _cl_overloadable NAME(uchar4 , uchar4 , uchar4 ); \ |
1374 | + uchar8 _cl_overloadable NAME(uchar8 , uchar8 , uchar8 ); \ |
1375 | + uchar16 _cl_overloadable NAME(uchar16 , uchar16 , uchar16 ); \ |
1376 | + ushort _cl_overloadable NAME(ushort , ushort , ushort ); \ |
1377 | + ushort2 _cl_overloadable NAME(ushort2 , ushort2 , ushort2 ); \ |
1378 | + ushort3 _cl_overloadable NAME(ushort3 , ushort3 , ushort3 ); \ |
1379 | + ushort4 _cl_overloadable NAME(ushort4 , ushort4 , ushort4 ); \ |
1380 | + ushort8 _cl_overloadable NAME(ushort8 , ushort8 , ushort8 ); \ |
1381 | + ushort16 _cl_overloadable NAME(ushort16, ushort16, ushort16); \ |
1382 | + uint _cl_overloadable NAME(uint , uint , uint ); \ |
1383 | + uint2 _cl_overloadable NAME(uint2 , uint2 , uint2 ); \ |
1384 | + uint3 _cl_overloadable NAME(uint3 , uint3 , uint3 ); \ |
1385 | + uint4 _cl_overloadable NAME(uint4 , uint4 , uint4 ); \ |
1386 | + uint8 _cl_overloadable NAME(uint8 , uint8 , uint8 ); \ |
1387 | + uint16 _cl_overloadable NAME(uint16 , uint16 , uint16 ); \ |
1388 | + ulong _cl_overloadable NAME(ulong , ulong , ulong ); \ |
1389 | + ulong2 _cl_overloadable NAME(ulong2 , ulong2 , ulong2 ); \ |
1390 | + ulong3 _cl_overloadable NAME(ulong3 , ulong3 , ulong3 ); \ |
1391 | + ulong4 _cl_overloadable NAME(ulong4 , ulong4 , ulong4 ); \ |
1392 | + ulong8 _cl_overloadable NAME(ulong8 , ulong8 , ulong8 ); \ |
1393 | + ulong16 _cl_overloadable NAME(ulong16 , ulong16 , ulong16 ); |
1394 | +#define _CL_DECLARE_FUNC_G_GS(NAME) \ |
1395 | + char2 _cl_overloadable NAME(char2 , char ); \ |
1396 | + char3 _cl_overloadable NAME(char3 , char ); \ |
1397 | + char4 _cl_overloadable NAME(char4 , char ); \ |
1398 | + char8 _cl_overloadable NAME(char8 , char ); \ |
1399 | + char16 _cl_overloadable NAME(char16 , char ); \ |
1400 | + short2 _cl_overloadable NAME(short2 , short ); \ |
1401 | + short3 _cl_overloadable NAME(short3 , short ); \ |
1402 | + short4 _cl_overloadable NAME(short4 , short ); \ |
1403 | + short8 _cl_overloadable NAME(short8 , short ); \ |
1404 | + short16 _cl_overloadable NAME(short16 , short ); \ |
1405 | + int2 _cl_overloadable NAME(int2 , int ); \ |
1406 | + int3 _cl_overloadable NAME(int3 , int ); \ |
1407 | + int4 _cl_overloadable NAME(int4 , int ); \ |
1408 | + int8 _cl_overloadable NAME(int8 , int ); \ |
1409 | + int16 _cl_overloadable NAME(int16 , int ); \ |
1410 | + long2 _cl_overloadable NAME(long2 , long ); \ |
1411 | + long3 _cl_overloadable NAME(long3 , long ); \ |
1412 | + long4 _cl_overloadable NAME(long4 , long ); \ |
1413 | + long8 _cl_overloadable NAME(long8 , long ); \ |
1414 | + long16 _cl_overloadable NAME(long16 , long ); \ |
1415 | + uchar2 _cl_overloadable NAME(uchar2 , uchar ); \ |
1416 | + uchar3 _cl_overloadable NAME(uchar3 , uchar ); \ |
1417 | + uchar4 _cl_overloadable NAME(uchar4 , uchar ); \ |
1418 | + uchar8 _cl_overloadable NAME(uchar8 , uchar ); \ |
1419 | + uchar16 _cl_overloadable NAME(uchar16 , uchar ); \ |
1420 | + ushort2 _cl_overloadable NAME(ushort2 , ushort); \ |
1421 | + ushort3 _cl_overloadable NAME(ushort3 , ushort); \ |
1422 | + ushort4 _cl_overloadable NAME(ushort4 , ushort); \ |
1423 | + ushort8 _cl_overloadable NAME(ushort8 , ushort); \ |
1424 | + ushort16 _cl_overloadable NAME(ushort16, ushort); \ |
1425 | + uint2 _cl_overloadable NAME(uint2 , uint ); \ |
1426 | + uint3 _cl_overloadable NAME(uint3 , uint ); \ |
1427 | + uint4 _cl_overloadable NAME(uint4 , uint ); \ |
1428 | + uint8 _cl_overloadable NAME(uint8 , uint ); \ |
1429 | + uint16 _cl_overloadable NAME(uint16 , uint ); \ |
1430 | + ulong2 _cl_overloadable NAME(ulong2 , ulong ); \ |
1431 | + ulong3 _cl_overloadable NAME(ulong3 , ulong ); \ |
1432 | + ulong4 _cl_overloadable NAME(ulong4 , ulong ); \ |
1433 | + ulong8 _cl_overloadable NAME(ulong8 , ulong ); \ |
1434 | + ulong16 _cl_overloadable NAME(ulong16 , ulong ); |
1435 | +#define _CL_DECLARE_FUNC_UG_G(NAME) \ |
1436 | + uchar _cl_overloadable NAME(char ); \ |
1437 | + uchar2 _cl_overloadable NAME(char2 ); \ |
1438 | + uchar3 _cl_overloadable NAME(char3 ); \ |
1439 | + uchar4 _cl_overloadable NAME(char4 ); \ |
1440 | + uchar8 _cl_overloadable NAME(char8 ); \ |
1441 | + uchar16 _cl_overloadable NAME(char16 ); \ |
1442 | + ushort _cl_overloadable NAME(short ); \ |
1443 | + ushort2 _cl_overloadable NAME(short2 ); \ |
1444 | + ushort3 _cl_overloadable NAME(short3 ); \ |
1445 | + ushort4 _cl_overloadable NAME(short4 ); \ |
1446 | + ushort8 _cl_overloadable NAME(short8 ); \ |
1447 | + ushort16 _cl_overloadable NAME(short16 ); \ |
1448 | + uint _cl_overloadable NAME(int ); \ |
1449 | + uint2 _cl_overloadable NAME(int2 ); \ |
1450 | + uint3 _cl_overloadable NAME(int3 ); \ |
1451 | + uint4 _cl_overloadable NAME(int4 ); \ |
1452 | + uint8 _cl_overloadable NAME(int8 ); \ |
1453 | + uint16 _cl_overloadable NAME(int16 ); \ |
1454 | + ulong _cl_overloadable NAME(long ); \ |
1455 | + ulong2 _cl_overloadable NAME(long2 ); \ |
1456 | + ulong3 _cl_overloadable NAME(long3 ); \ |
1457 | + ulong4 _cl_overloadable NAME(long4 ); \ |
1458 | + ulong8 _cl_overloadable NAME(long8 ); \ |
1459 | + ulong16 _cl_overloadable NAME(long16 ); \ |
1460 | + uchar _cl_overloadable NAME(uchar ); \ |
1461 | + uchar2 _cl_overloadable NAME(uchar2 ); \ |
1462 | + uchar3 _cl_overloadable NAME(uchar3 ); \ |
1463 | + uchar4 _cl_overloadable NAME(uchar4 ); \ |
1464 | + uchar8 _cl_overloadable NAME(uchar8 ); \ |
1465 | + uchar16 _cl_overloadable NAME(uchar16 ); \ |
1466 | + ushort _cl_overloadable NAME(ushort ); \ |
1467 | + ushort2 _cl_overloadable NAME(ushort2 ); \ |
1468 | + ushort3 _cl_overloadable NAME(ushort3 ); \ |
1469 | + ushort4 _cl_overloadable NAME(ushort4 ); \ |
1470 | + ushort8 _cl_overloadable NAME(ushort8 ); \ |
1471 | + ushort16 _cl_overloadable NAME(ushort16); \ |
1472 | + uint _cl_overloadable NAME(uint ); \ |
1473 | + uint2 _cl_overloadable NAME(uint2 ); \ |
1474 | + uint3 _cl_overloadable NAME(uint3 ); \ |
1475 | + uint4 _cl_overloadable NAME(uint4 ); \ |
1476 | + uint8 _cl_overloadable NAME(uint8 ); \ |
1477 | + uint16 _cl_overloadable NAME(uint16 ); \ |
1478 | + ulong _cl_overloadable NAME(ulong ); \ |
1479 | + ulong2 _cl_overloadable NAME(ulong2 ); \ |
1480 | + ulong3 _cl_overloadable NAME(ulong3 ); \ |
1481 | + ulong4 _cl_overloadable NAME(ulong4 ); \ |
1482 | + ulong8 _cl_overloadable NAME(ulong8 ); \ |
1483 | + ulong16 _cl_overloadable NAME(ulong16 ); |
1484 | +#define _CL_DECLARE_FUNC_UG_GG(NAME) \ |
1485 | + uchar _cl_overloadable NAME(char , char ); \ |
1486 | + uchar2 _cl_overloadable NAME(char2 , char2 ); \ |
1487 | + uchar3 _cl_overloadable NAME(char3 , char3 ); \ |
1488 | + uchar4 _cl_overloadable NAME(char4 , char4 ); \ |
1489 | + uchar8 _cl_overloadable NAME(char8 , char8 ); \ |
1490 | + uchar16 _cl_overloadable NAME(char16 , char16 ); \ |
1491 | + ushort _cl_overloadable NAME(short , short ); \ |
1492 | + ushort2 _cl_overloadable NAME(short2 , short2 ); \ |
1493 | + ushort3 _cl_overloadable NAME(short3 , short3 ); \ |
1494 | + ushort4 _cl_overloadable NAME(short4 , short4 ); \ |
1495 | + ushort8 _cl_overloadable NAME(short8 , short8 ); \ |
1496 | + ushort16 _cl_overloadable NAME(short16 , short16 ); \ |
1497 | + uint _cl_overloadable NAME(int , int ); \ |
1498 | + uint2 _cl_overloadable NAME(int2 , int2 ); \ |
1499 | + uint3 _cl_overloadable NAME(int3 , int3 ); \ |
1500 | + uint4 _cl_overloadable NAME(int4 , int4 ); \ |
1501 | + uint8 _cl_overloadable NAME(int8 , int8 ); \ |
1502 | + uint16 _cl_overloadable NAME(int16 , int16 ); \ |
1503 | + ulong _cl_overloadable NAME(long , long ); \ |
1504 | + ulong2 _cl_overloadable NAME(long2 , long2 ); \ |
1505 | + ulong3 _cl_overloadable NAME(long3 , long3 ); \ |
1506 | + ulong4 _cl_overloadable NAME(long4 , long4 ); \ |
1507 | + ulong8 _cl_overloadable NAME(long8 , long8 ); \ |
1508 | + ulong16 _cl_overloadable NAME(long16 , long16 ); \ |
1509 | + uchar _cl_overloadable NAME(uchar , uchar ); \ |
1510 | + uchar2 _cl_overloadable NAME(uchar2 , uchar2 ); \ |
1511 | + uchar3 _cl_overloadable NAME(uchar3 , uchar3 ); \ |
1512 | + uchar4 _cl_overloadable NAME(uchar4 , uchar4 ); \ |
1513 | + uchar8 _cl_overloadable NAME(uchar8 , uchar8 ); \ |
1514 | + uchar16 _cl_overloadable NAME(uchar16 , uchar16 ); \ |
1515 | + ushort _cl_overloadable NAME(ushort , ushort ); \ |
1516 | + ushort2 _cl_overloadable NAME(ushort2 , ushort2 ); \ |
1517 | + ushort3 _cl_overloadable NAME(ushort3 , ushort3 ); \ |
1518 | + ushort4 _cl_overloadable NAME(ushort4 , ushort4 ); \ |
1519 | + ushort8 _cl_overloadable NAME(ushort8 , ushort8 ); \ |
1520 | + ushort16 _cl_overloadable NAME(ushort16, ushort16); \ |
1521 | + uint _cl_overloadable NAME(uint , uint ); \ |
1522 | + uint2 _cl_overloadable NAME(uint2 , uint2 ); \ |
1523 | + uint3 _cl_overloadable NAME(uint3 , uint3 ); \ |
1524 | + uint4 _cl_overloadable NAME(uint4 , uint4 ); \ |
1525 | + uint8 _cl_overloadable NAME(uint8 , uint8 ); \ |
1526 | + uint16 _cl_overloadable NAME(uint16 , uint16 ); \ |
1527 | + ulong _cl_overloadable NAME(ulong , ulong ); \ |
1528 | + ulong2 _cl_overloadable NAME(ulong2 , ulong2 ); \ |
1529 | + ulong3 _cl_overloadable NAME(ulong3 , ulong3 ); \ |
1530 | + ulong4 _cl_overloadable NAME(ulong4 , ulong4 ); \ |
1531 | + ulong8 _cl_overloadable NAME(ulong8 , ulong8 ); \ |
1532 | + ulong16 _cl_overloadable NAME(ulong16 , ulong16 ); |
1533 | +#define _CL_DECLARE_FUNC_LG_GUG(NAME) \ |
1534 | + short _cl_overloadable NAME(char , uchar ); \ |
1535 | + short2 _cl_overloadable NAME(char2 , uchar2 ); \ |
1536 | + short3 _cl_overloadable NAME(char3 , uchar3 ); \ |
1537 | + short4 _cl_overloadable NAME(char4 , uchar4 ); \ |
1538 | + short8 _cl_overloadable NAME(char8 , uchar8 ); \ |
1539 | + short16 _cl_overloadable NAME(char16 , uchar16 ); \ |
1540 | + int _cl_overloadable NAME(short , ushort ); \ |
1541 | + int2 _cl_overloadable NAME(short2 , ushort2 ); \ |
1542 | + int3 _cl_overloadable NAME(short3 , ushort3 ); \ |
1543 | + int4 _cl_overloadable NAME(short4 , ushort4 ); \ |
1544 | + int8 _cl_overloadable NAME(short8 , ushort8 ); \ |
1545 | + int16 _cl_overloadable NAME(short16 , ushort16 ); \ |
1546 | + long _cl_overloadable NAME(int , uint ); \ |
1547 | + long2 _cl_overloadable NAME(int2 , uint2 ); \ |
1548 | + long3 _cl_overloadable NAME(int3 , uint3 ); \ |
1549 | + long4 _cl_overloadable NAME(int4 , uint4 ); \ |
1550 | + long8 _cl_overloadable NAME(int8 , uint8 ); \ |
1551 | + long16 _cl_overloadable NAME(int16 , uint16 ); \ |
1552 | + ushort _cl_overloadable NAME(uchar , uchar ); \ |
1553 | + ushort2 _cl_overloadable NAME(uchar2 , uchar2 ); \ |
1554 | + ushort3 _cl_overloadable NAME(uchar3 , uchar3 ); \ |
1555 | + ushort4 _cl_overloadable NAME(uchar4 , uchar4 ); \ |
1556 | + ushort8 _cl_overloadable NAME(uchar8 , uchar8 ); \ |
1557 | + ushort16 _cl_overloadable NAME(uchar16 , uchar16 ); \ |
1558 | + uint _cl_overloadable NAME(ushort , ushort ); \ |
1559 | + uint2 _cl_overloadable NAME(ushort2 , ushort2 ); \ |
1560 | + uint3 _cl_overloadable NAME(ushort3 , ushort3 ); \ |
1561 | + uint4 _cl_overloadable NAME(ushort4 , ushort4 ); \ |
1562 | + uint8 _cl_overloadable NAME(ushort8 , ushort8 ); \ |
1563 | + uint16 _cl_overloadable NAME(ushort16, ushort16); \ |
1564 | + ulong _cl_overloadable NAME(uint , uint ); \ |
1565 | + ulong2 _cl_overloadable NAME(uint2 , uint2 ); \ |
1566 | + ulong3 _cl_overloadable NAME(uint3 , uint3 ); \ |
1567 | + ulong4 _cl_overloadable NAME(uint4 , uint4 ); \ |
1568 | + ulong8 _cl_overloadable NAME(uint8 , uint8 ); \ |
1569 | + ulong16 _cl_overloadable NAME(uint16 , uint16 ); |
1570 | +#define _CL_DECLARE_FUNC_I_IG(NAME) \ |
1571 | + int _cl_overloadable NAME(char ); \ |
1572 | + int _cl_overloadable NAME(char2 ); \ |
1573 | + int _cl_overloadable NAME(char3 ); \ |
1574 | + int _cl_overloadable NAME(char4 ); \ |
1575 | + int _cl_overloadable NAME(char8 ); \ |
1576 | + int _cl_overloadable NAME(char16 ); \ |
1577 | + int _cl_overloadable NAME(short ); \ |
1578 | + int _cl_overloadable NAME(short2 ); \ |
1579 | + int _cl_overloadable NAME(short3 ); \ |
1580 | + int _cl_overloadable NAME(short4 ); \ |
1581 | + int _cl_overloadable NAME(short8 ); \ |
1582 | + int _cl_overloadable NAME(short16); \ |
1583 | + int _cl_overloadable NAME(int ); \ |
1584 | + int _cl_overloadable NAME(int2 ); \ |
1585 | + int _cl_overloadable NAME(int3 ); \ |
1586 | + int _cl_overloadable NAME(int4 ); \ |
1587 | + int _cl_overloadable NAME(int8 ); \ |
1588 | + int _cl_overloadable NAME(int16 ); \ |
1589 | + int _cl_overloadable NAME(long ); \ |
1590 | + int _cl_overloadable NAME(long2 ); \ |
1591 | + int _cl_overloadable NAME(long3 ); \ |
1592 | + int _cl_overloadable NAME(long4 ); \ |
1593 | + int _cl_overloadable NAME(long8 ); \ |
1594 | + int _cl_overloadable NAME(long16 ); |
1595 | +#define _CL_DECLARE_FUNC_J_JJ(NAME) \ |
1596 | + int _cl_overloadable NAME(int , int ); \ |
1597 | + int2 _cl_overloadable NAME(int2 , int2 ); \ |
1598 | + int3 _cl_overloadable NAME(int3 , int3 ); \ |
1599 | + int4 _cl_overloadable NAME(int4 , int4 ); \ |
1600 | + int8 _cl_overloadable NAME(int8 , int8 ); \ |
1601 | + int16 _cl_overloadable NAME(int16 , int16 ); \ |
1602 | + uint _cl_overloadable NAME(uint , uint ); \ |
1603 | + uint2 _cl_overloadable NAME(uint2 , uint2 ); \ |
1604 | + uint3 _cl_overloadable NAME(uint3 , uint3 ); \ |
1605 | + uint4 _cl_overloadable NAME(uint4 , uint4 ); \ |
1606 | + uint8 _cl_overloadable NAME(uint8 , uint8 ); \ |
1607 | + uint16 _cl_overloadable NAME(uint16 , uint16 ); |
1608 | +#define _CL_DECLARE_FUNC_J_JJJ(NAME) \ |
1609 | + int _cl_overloadable NAME(int , int , int ); \ |
1610 | + int2 _cl_overloadable NAME(int2 , int2 , int2 ); \ |
1611 | + int3 _cl_overloadable NAME(int3 , int3 , int3 ); \ |
1612 | + int4 _cl_overloadable NAME(int4 , int4 , int4 ); \ |
1613 | + int8 _cl_overloadable NAME(int8 , int8 , int8 ); \ |
1614 | + int16 _cl_overloadable NAME(int16 , int16 , int16 ); \ |
1615 | + uint _cl_overloadable NAME(uint , uint , uint ); \ |
1616 | + uint2 _cl_overloadable NAME(uint2 , uint2 , uint2 ); \ |
1617 | + uint3 _cl_overloadable NAME(uint3 , uint3 , uint3 ); \ |
1618 | + uint4 _cl_overloadable NAME(uint4 , uint4 , uint4 ); \ |
1619 | + uint8 _cl_overloadable NAME(uint8 , uint8 , uint8 ); \ |
1620 | + uint16 _cl_overloadable NAME(uint16 , uint16 , uint16 ); |
1621 | |
1622 | _CL_DECLARE_FUNC_UG_G(abs) |
1623 | _CL_DECLARE_FUNC_UG_GG(abs_diff) |
1624 | @@ -1269,10 +1423,10 @@ |
1625 | |
1626 | /* Geometric Functions */ |
1627 | |
1628 | -float4 __attribute__ ((overloadable)) cross(float4, float4); |
1629 | -float3 __attribute__ ((overloadable)) cross(float3, float3); |
1630 | -double4 __attribute__ ((overloadable)) cross(double4, double4); |
1631 | -double3 __attribute__ ((overloadable)) cross(double3, double3); |
1632 | +float4 _cl_overloadable cross(float4, float4); |
1633 | +float3 _cl_overloadable cross(float3, float3); |
1634 | +double4 _cl_overloadable cross(double4, double4); |
1635 | +double3 _cl_overloadable cross(double3, double3); |
1636 | _CL_DECLARE_FUNC_S_VV(dot) |
1637 | _CL_DECLARE_FUNC_S_VV(distance) |
1638 | _CL_DECLARE_FUNC_S_V(length) |
1639 | @@ -1306,3 +1460,228 @@ |
1640 | _CL_DECLARE_FUNC_V_VVV(bitselect) |
1641 | _CL_DECLARE_FUNC_G_GGG(select) |
1642 | _CL_DECLARE_FUNC_V_VVJ(select) |
1643 | + |
1644 | + |
1645 | + |
1646 | +/* Vector Functions */ |
1647 | + |
1648 | +#define _CL_DECLARE_VLOAD(TYPE, MOD) \ |
1649 | + TYPE##2 _cl_overloadable vload2 (size_t offset, const MOD TYPE *p); \ |
1650 | + TYPE##3 _cl_overloadable vload3 (size_t offset, const MOD TYPE *p); \ |
1651 | + TYPE##4 _cl_overloadable vload4 (size_t offset, const MOD TYPE *p); \ |
1652 | + TYPE##8 _cl_overloadable vload8 (size_t offset, const MOD TYPE *p); \ |
1653 | + TYPE##16 _cl_overloadable vload16(size_t offset, const MOD TYPE *p); |
1654 | + |
1655 | +_CL_DECLARE_VLOAD(char , __global) |
1656 | +_CL_DECLARE_VLOAD(short , __global) |
1657 | +_CL_DECLARE_VLOAD(int , __global) |
1658 | +_CL_DECLARE_VLOAD(long , __global) |
1659 | +_CL_DECLARE_VLOAD(uchar , __global) |
1660 | +_CL_DECLARE_VLOAD(ushort, __global) |
1661 | +_CL_DECLARE_VLOAD(uint , __global) |
1662 | +_CL_DECLARE_VLOAD(ulong , __global) |
1663 | +_CL_DECLARE_VLOAD(float , __global) |
1664 | +_CL_DECLARE_VLOAD(double, __global) |
1665 | + |
1666 | +_CL_DECLARE_VLOAD(char , __local) |
1667 | +_CL_DECLARE_VLOAD(short , __local) |
1668 | +_CL_DECLARE_VLOAD(int , __local) |
1669 | +_CL_DECLARE_VLOAD(long , __local) |
1670 | +_CL_DECLARE_VLOAD(uchar , __local) |
1671 | +_CL_DECLARE_VLOAD(ushort, __local) |
1672 | +_CL_DECLARE_VLOAD(uint , __local) |
1673 | +_CL_DECLARE_VLOAD(ulong , __local) |
1674 | +_CL_DECLARE_VLOAD(float , __local) |
1675 | +_CL_DECLARE_VLOAD(double, __local) |
1676 | + |
1677 | +_CL_DECLARE_VLOAD(char , __constant) |
1678 | +_CL_DECLARE_VLOAD(short , __constant) |
1679 | +_CL_DECLARE_VLOAD(int , __constant) |
1680 | +_CL_DECLARE_VLOAD(long , __constant) |
1681 | +_CL_DECLARE_VLOAD(uchar , __constant) |
1682 | +_CL_DECLARE_VLOAD(ushort, __constant) |
1683 | +_CL_DECLARE_VLOAD(uint , __constant) |
1684 | +_CL_DECLARE_VLOAD(ulong , __constant) |
1685 | +_CL_DECLARE_VLOAD(float , __constant) |
1686 | +_CL_DECLARE_VLOAD(double, __constant) |
1687 | + |
1688 | +/* __private is not supported yet \ |
1689 | +_CL_DECLARE_VLOAD(char , __private) |
1690 | +_CL_DECLARE_VLOAD(short , __private) |
1691 | +_CL_DECLARE_VLOAD(int , __private) |
1692 | +_CL_DECLARE_VLOAD(long , __private) |
1693 | +_CL_DECLARE_VLOAD(uchar , __private) |
1694 | +_CL_DECLARE_VLOAD(ushort, __private) |
1695 | +_CL_DECLARE_VLOAD(uint , __private) |
1696 | +_CL_DECLARE_VLOAD(ulong , __private) |
1697 | +_CL_DECLARE_VLOAD(float , __private) |
1698 | +_CL_DECLARE_VLOAD(double, __private) |
1699 | +*/ |
1700 | + |
1701 | +#define _CL_DECLARE_VSTORE(TYPE, MOD) \ |
1702 | + void _cl_overloadable vstore2 (TYPE##2 data, size_t offset, MOD TYPE *p); \ |
1703 | + void _cl_overloadable vstore3 (TYPE##3 data, size_t offset, MOD TYPE *p); \ |
1704 | + void _cl_overloadable vstore4 (TYPE##4 data, size_t offset, MOD TYPE *p); \ |
1705 | + void _cl_overloadable vstore8 (TYPE##8 data, size_t offset, MOD TYPE *p); \ |
1706 | + void _cl_overloadable vstore16(TYPE##16 data, size_t offset, MOD TYPE *p); |
1707 | + |
1708 | +_CL_DECLARE_VSTORE(char , __global) |
1709 | +_CL_DECLARE_VSTORE(short , __global) |
1710 | +_CL_DECLARE_VSTORE(int , __global) |
1711 | +_CL_DECLARE_VSTORE(long , __global) |
1712 | +_CL_DECLARE_VSTORE(uchar , __global) |
1713 | +_CL_DECLARE_VSTORE(ushort, __global) |
1714 | +_CL_DECLARE_VSTORE(uint , __global) |
1715 | +_CL_DECLARE_VSTORE(ulong , __global) |
1716 | +_CL_DECLARE_VSTORE(float , __global) |
1717 | +_CL_DECLARE_VSTORE(double, __global) |
1718 | + |
1719 | +_CL_DECLARE_VSTORE(char , __local) |
1720 | +_CL_DECLARE_VSTORE(short , __local) |
1721 | +_CL_DECLARE_VSTORE(int , __local) |
1722 | +_CL_DECLARE_VSTORE(long , __local) |
1723 | +_CL_DECLARE_VSTORE(uchar , __local) |
1724 | +_CL_DECLARE_VSTORE(ushort, __local) |
1725 | +_CL_DECLARE_VSTORE(uint , __local) |
1726 | +_CL_DECLARE_VSTORE(ulong , __local) |
1727 | +_CL_DECLARE_VSTORE(float , __local) |
1728 | +_CL_DECLARE_VSTORE(double, __local) |
1729 | + |
1730 | +/* __private is not supported yet |
1731 | +_CL_DECLARE_VSTORE(char , __private) |
1732 | +_CL_DECLARE_VSTORE(short , __private) |
1733 | +_CL_DECLARE_VSTORE(int , __private) |
1734 | +_CL_DECLARE_VSTORE(long , __private) |
1735 | +_CL_DECLARE_VSTORE(uchar , __private) |
1736 | +_CL_DECLARE_VSTORE(ushort, __private) |
1737 | +_CL_DECLARE_VSTORE(uint , __private) |
1738 | +_CL_DECLARE_VSTORE(ulong , __private) |
1739 | +_CL_DECLARE_VSTORE(float , __private) |
1740 | +_CL_DECLARE_VSTORE(double, __private) |
1741 | +*/ |
1742 | + |
1743 | + |
1744 | + |
1745 | +/* Miscellaneous Vector Functions */ |
1746 | + |
1747 | +// convert a vector type to a scalar type |
1748 | +_CL_DECLARE_FUNC_I_IG(_cl_scalar) |
1749 | +_CL_DECLARE_FUNC_S_V(_cl_scalar) |
1750 | +#define vec_step(a) (sizeof(a) / sizeof(_cl_scalar(a))) |
1751 | + |
1752 | + |
1753 | + |
1754 | +// This code leads to an ICE in Clang |
1755 | + |
1756 | +// #define _CL_DECLARE_SHUFFLE_2(GTYPE, UGTYPE, STYPE, M) \ |
1757 | +// GTYPE##2 _cl_overloadable shuffle(GTYPE##M x, UGTYPE##2 mask) \ |
1758 | +// { \ |
1759 | +// UGTYPE bits = (UGTYPE)1 << (UGTYPE)M; \ |
1760 | +// UGTYPE bmask = bits - (UGTYPE)1; \ |
1761 | +// return __builtin_shufflevector(x, x, \ |
1762 | +// mask.s0 & bmask, mask.s1 & bmask); \ |
1763 | +// } |
1764 | +// #define _CL_DECLARE_SHUFFLE_3(GTYPE, UGTYPE, STYPE, M) \ |
1765 | +// GTYPE##3 _cl_overloadable shuffle(GTYPE##M x, UGTYPE##3 mask) \ |
1766 | +// { \ |
1767 | +// UGTYPE bits = (UGTYPE)1 << (UGTYPE)M; \ |
1768 | +// UGTYPE bmask = bits - (UGTYPE)1; \ |
1769 | +// return __builtin_shufflevector(x, x, \ |
1770 | +// mask.s0 & bmask, mask.s1 & bmask, \ |
1771 | +// mask.s2 & bmask); \ |
1772 | +// } |
1773 | +// #define _CL_DECLARE_SHUFFLE_4(GTYPE, UGTYPE, STYPE, M) \ |
1774 | +// GTYPE##4 _cl_overloadable shuffle(GTYPE##M x, UGTYPE##4 mask) \ |
1775 | +// { \ |
1776 | +// UGTYPE bits = (UGTYPE)1 << (UGTYPE)M; \ |
1777 | +// UGTYPE bmask = bits - (UGTYPE)1; \ |
1778 | +// return __builtin_shufflevector(x, x, \ |
1779 | +// mask.s0 & bmask, mask.s1 & bmask, \ |
1780 | +// mask.s2 & bmask, mask.s3 & bmask); \ |
1781 | +// } |
1782 | +// #define _CL_DECLARE_SHUFFLE_8(GTYPE, UGTYPE, STYPE, M) \ |
1783 | +// GTYPE##8 _cl_overloadable shuffle(GTYPE##M x, UGTYPE##8 mask) \ |
1784 | +// { \ |
1785 | +// UGTYPE bits = (UGTYPE)1 << (UGTYPE)M; \ |
1786 | +// UGTYPE bmask = bits - (UGTYPE)1; \ |
1787 | +// return __builtin_shufflevector(x, x, \ |
1788 | +// mask.s0 & bmask, mask.s1 & bmask, \ |
1789 | +// mask.s2 & bmask, mask.s3 & bmask, \ |
1790 | +// mask.s4 & bmask, mask.s5 & bmask, \ |
1791 | +// mask.s6 & bmask, mask.s7 & bmask); \ |
1792 | +// } |
1793 | +// #define _CL_DECLARE_SHUFFLE_16(GTYPE, UGTYPE, STYPE, M) \ |
1794 | +// GTYPE##16 _cl_overloadable shuffle(GTYPE##M x, UGTYPE##16 mask) \ |
1795 | +// { \ |
1796 | +// UGTYPE bits = (UGTYPE)1 << (UGTYPE)M; \ |
1797 | +// UGTYPE bmask = bits - (UGTYPE)1; \ |
1798 | +// return __builtin_shufflevector(x, x, \ |
1799 | +// mask.s0 & bmask, mask.s1 & bmask, \ |
1800 | +// mask.s2 & bmask, mask.s3 & bmask, \ |
1801 | +// mask.s4 & bmask, mask.s5 & bmask, \ |
1802 | +// mask.s6 & bmask, mask.s7 & bmask, \ |
1803 | +// mask.s8 & bmask, mask.s9 & bmask, \ |
1804 | +// mask.sa & bmask, mask.sb & bmask, \ |
1805 | +// mask.sc & bmask, mask.sd & bmask, \ |
1806 | +// mask.se & bmask, mask.sf & bmask); \ |
1807 | +// } |
1808 | +// |
1809 | +// #define _CL_DECLARE_SHUFFLE(GTYPE, UGTYPE, STYPE, M) \ |
1810 | +// _CL_DECLARE_SHUFFLE_2 (GTYPE, UGTYPE, STYPE, M) \ |
1811 | +// _CL_DECLARE_SHUFFLE_3 (GTYPE, UGTYPE, STYPE, M) \ |
1812 | +// _CL_DECLARE_SHUFFLE_4 (GTYPE, UGTYPE, STYPE, M) \ |
1813 | +// _CL_DECLARE_SHUFFLE_8 (GTYPE, UGTYPE, STYPE, M) \ |
1814 | +// _CL_DECLARE_SHUFFLE_16(GTYPE, UGTYPE, STYPE, M) |
1815 | +// |
1816 | +// _CL_DECLARE_SHUFFLE(char , uchar , char , 2 ) |
1817 | +// _CL_DECLARE_SHUFFLE(char , uchar , char , 3 ) |
1818 | +// _CL_DECLARE_SHUFFLE(char , uchar , char , 4 ) |
1819 | +// _CL_DECLARE_SHUFFLE(char , uchar , char , 8 ) |
1820 | +// _CL_DECLARE_SHUFFLE(char , uchar , char , 16) |
1821 | +// _CL_DECLARE_SHUFFLE(uchar , uchar , char , 2 ) |
1822 | +// _CL_DECLARE_SHUFFLE(uchar , uchar , char , 3 ) |
1823 | +// _CL_DECLARE_SHUFFLE(uchar , uchar , char , 4 ) |
1824 | +// _CL_DECLARE_SHUFFLE(uchar , uchar , char , 8 ) |
1825 | +// _CL_DECLARE_SHUFFLE(uchar , uchar , char , 16) |
1826 | +// _CL_DECLARE_SHUFFLE(short , ushort, short , 2 ) |
1827 | +// _CL_DECLARE_SHUFFLE(short , ushort, short , 3 ) |
1828 | +// _CL_DECLARE_SHUFFLE(short , ushort, short , 4 ) |
1829 | +// _CL_DECLARE_SHUFFLE(short , ushort, short , 8 ) |
1830 | +// _CL_DECLARE_SHUFFLE(short , ushort, short , 16) |
1831 | +// _CL_DECLARE_SHUFFLE(ushort, ushort, short , 2 ) |
1832 | +// _CL_DECLARE_SHUFFLE(ushort, ushort, short , 3 ) |
1833 | +// _CL_DECLARE_SHUFFLE(ushort, ushort, short , 4 ) |
1834 | +// _CL_DECLARE_SHUFFLE(ushort, ushort, short , 8 ) |
1835 | +// _CL_DECLARE_SHUFFLE(ushort, ushort, short , 16) |
1836 | +// _CL_DECLARE_SHUFFLE(int , uint , int , 2 ) |
1837 | +// _CL_DECLARE_SHUFFLE(int , uint , int , 3 ) |
1838 | +// _CL_DECLARE_SHUFFLE(int , uint , int , 4 ) |
1839 | +// _CL_DECLARE_SHUFFLE(int , uint , int , 8 ) |
1840 | +// _CL_DECLARE_SHUFFLE(int , uint , int , 16) |
1841 | +// _CL_DECLARE_SHUFFLE(uint , uint , int , 2 ) |
1842 | +// _CL_DECLARE_SHUFFLE(uint , uint , int , 3 ) |
1843 | +// _CL_DECLARE_SHUFFLE(uint , uint , int , 4 ) |
1844 | +// _CL_DECLARE_SHUFFLE(uint , uint , int , 8 ) |
1845 | +// _CL_DECLARE_SHUFFLE(uint , uint , int , 16) |
1846 | +// _CL_DECLARE_SHUFFLE(long , ulong , long , 2 ) |
1847 | +// _CL_DECLARE_SHUFFLE(long , ulong , long , 3 ) |
1848 | +// _CL_DECLARE_SHUFFLE(long , ulong , long , 4 ) |
1849 | +// _CL_DECLARE_SHUFFLE(long , ulong , long , 8 ) |
1850 | +// _CL_DECLARE_SHUFFLE(long , ulong , long , 16) |
1851 | +// _CL_DECLARE_SHUFFLE(ulong , ulong , long , 2 ) |
1852 | +// _CL_DECLARE_SHUFFLE(ulong , ulong , long , 3 ) |
1853 | +// _CL_DECLARE_SHUFFLE(ulong , ulong , long , 4 ) |
1854 | +// _CL_DECLARE_SHUFFLE(ulong , ulong , long , 8 ) |
1855 | +// _CL_DECLARE_SHUFFLE(ulong , ulong , long , 16) |
1856 | +// _CL_DECLARE_SHUFFLE(float , uint , float , 2 ) |
1857 | +// _CL_DECLARE_SHUFFLE(float , uint , float , 3 ) |
1858 | +// _CL_DECLARE_SHUFFLE(float , uint , float , 4 ) |
1859 | +// _CL_DECLARE_SHUFFLE(float , uint , float , 8 ) |
1860 | +// _CL_DECLARE_SHUFFLE(float , uint , float , 16) |
1861 | +// _CL_DECLARE_SHUFFLE(double, ulong , double, 2 ) |
1862 | +// _CL_DECLARE_SHUFFLE(double, ulong , double, 3 ) |
1863 | +// _CL_DECLARE_SHUFFLE(double, ulong , double, 4 ) |
1864 | +// _CL_DECLARE_SHUFFLE(double, ulong , double, 8 ) |
1865 | +// _CL_DECLARE_SHUFFLE(double, ulong , double, 16) |
1866 | + |
1867 | +// shuffle2 |
1868 | |
1869 | === modified file 'lib/kernel/Makefile.am' |
1870 | --- lib/kernel/Makefile.am 2011-10-31 16:58:40 +0000 |
1871 | +++ lib/kernel/Makefile.am 2011-10-31 17:03:23 +0000 |
1872 | @@ -22,9 +22,13 @@ |
1873 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
1874 | # THE SOFTWARE. |
1875 | |
1876 | -LEX= |
1877 | +LEX = |
1878 | |
1879 | +<<<<<<< TREE |
1880 | SUBDIRS = tce |
1881 | +======= |
1882 | +SUBDIRS = dummy x86 # ppc |
1883 | +>>>>>>> MERGE-SOURCE |
1884 | |
1885 | pkglib_LIBRARIES = libkernel.a |
1886 | |
1887 | @@ -145,7 +149,9 @@ |
1888 | any.cl \ |
1889 | all.cl \ |
1890 | bitselect.cl \ |
1891 | - select.cl |
1892 | + select.cl \ |
1893 | + vload.cl \ |
1894 | + vstore.cl |
1895 | |
1896 | libkernel_a_LIBADD = barrier.o |
1897 | EXTRA_DIST = barrier.ll |
1898 | @@ -159,7 +165,7 @@ |
1899 | .c.o: |
1900 | $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/_kernel.h -o $@ $< |
1901 | |
1902 | -barrier.o: barrier.ll |
1903 | +.ll.o: |
1904 | $(LLVM_AS) -o $@ $< |
1905 | |
1906 | $(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h |
1907 | |
1908 | === modified file 'lib/kernel/all.cl' |
1909 | --- lib/kernel/all.cl 2011-10-27 01:35:56 +0000 |
1910 | +++ lib/kernel/all.cl 2011-10-31 17:03:23 +0000 |
1911 | @@ -21,122 +21,122 @@ |
1912 | THE SOFTWARE. |
1913 | */ |
1914 | |
1915 | -int __attribute__((overloadable)) all(char a) |
1916 | +int __attribute__((__overloadable__)) all(char a) |
1917 | { |
1918 | return a < (char)0; |
1919 | } |
1920 | |
1921 | -int __attribute__((overloadable)) all(char2 a) |
1922 | +int __attribute__((__overloadable__)) all(char2 a) |
1923 | { |
1924 | return all(a.lo) && all(a.hi); |
1925 | } |
1926 | |
1927 | -int __attribute__((overloadable)) all(char3 a) |
1928 | +int __attribute__((__overloadable__)) all(char3 a) |
1929 | { |
1930 | return all(a.s01) && all(a.s2); |
1931 | } |
1932 | |
1933 | -int __attribute__((overloadable)) all(char4 a) |
1934 | -{ |
1935 | - return all(a.lo) && all(a.hi); |
1936 | -} |
1937 | - |
1938 | -int __attribute__((overloadable)) all(char8 a) |
1939 | -{ |
1940 | - return all(a.lo) && all(a.hi); |
1941 | -} |
1942 | - |
1943 | -int __attribute__((overloadable)) all(char16 a) |
1944 | -{ |
1945 | - return all(a.lo) && all(a.hi); |
1946 | -} |
1947 | - |
1948 | -int __attribute__((overloadable)) all(short a) |
1949 | +int __attribute__((__overloadable__)) all(char4 a) |
1950 | +{ |
1951 | + return all(a.lo) && all(a.hi); |
1952 | +} |
1953 | + |
1954 | +int __attribute__((__overloadable__)) all(char8 a) |
1955 | +{ |
1956 | + return all(a.lo) && all(a.hi); |
1957 | +} |
1958 | + |
1959 | +int __attribute__((__overloadable__)) all(char16 a) |
1960 | +{ |
1961 | + return all(a.lo) && all(a.hi); |
1962 | +} |
1963 | + |
1964 | +int __attribute__((__overloadable__)) all(short a) |
1965 | { |
1966 | return a < (short)0; |
1967 | } |
1968 | |
1969 | -int __attribute__((overloadable)) all(short2 a) |
1970 | +int __attribute__((__overloadable__)) all(short2 a) |
1971 | { |
1972 | return all(a.lo) && all(a.hi); |
1973 | } |
1974 | |
1975 | -int __attribute__((overloadable)) all(short3 a) |
1976 | +int __attribute__((__overloadable__)) all(short3 a) |
1977 | { |
1978 | return all(a.s01) && all(a.s2); |
1979 | } |
1980 | |
1981 | -int __attribute__((overloadable)) all(short4 a) |
1982 | -{ |
1983 | - return all(a.lo) && all(a.hi); |
1984 | -} |
1985 | - |
1986 | -int __attribute__((overloadable)) all(short8 a) |
1987 | -{ |
1988 | - return all(a.lo) && all(a.hi); |
1989 | -} |
1990 | - |
1991 | -int __attribute__((overloadable)) all(short16 a) |
1992 | -{ |
1993 | - return all(a.lo) && all(a.hi); |
1994 | -} |
1995 | - |
1996 | -int __attribute__((overloadable)) all(int a) |
1997 | +int __attribute__((__overloadable__)) all(short4 a) |
1998 | +{ |
1999 | + return all(a.lo) && all(a.hi); |
2000 | +} |
2001 | + |
2002 | +int __attribute__((__overloadable__)) all(short8 a) |
2003 | +{ |
2004 | + return all(a.lo) && all(a.hi); |
2005 | +} |
2006 | + |
2007 | +int __attribute__((__overloadable__)) all(short16 a) |
2008 | +{ |
2009 | + return all(a.lo) && all(a.hi); |
2010 | +} |
2011 | + |
2012 | +int __attribute__((__overloadable__)) all(int a) |
2013 | { |
2014 | return a < 0; |
2015 | } |
2016 | |
2017 | -int __attribute__((overloadable)) all(int2 a) |
2018 | +int __attribute__((__overloadable__)) all(int2 a) |
2019 | { |
2020 | return all(a.lo) && all(a.hi); |
2021 | } |
2022 | |
2023 | -int __attribute__((overloadable)) all(int3 a) |
2024 | +int __attribute__((__overloadable__)) all(int3 a) |
2025 | { |
2026 | return all(a.s01) && all(a.s2); |
2027 | } |
2028 | |
2029 | -int __attribute__((overloadable)) all(int4 a) |
2030 | -{ |
2031 | - return all(a.lo) && all(a.hi); |
2032 | -} |
2033 | - |
2034 | -int __attribute__((overloadable)) all(int8 a) |
2035 | -{ |
2036 | - return all(a.lo) && all(a.hi); |
2037 | -} |
2038 | - |
2039 | -int __attribute__((overloadable)) all(int16 a) |
2040 | -{ |
2041 | - return all(a.lo) && all(a.hi); |
2042 | -} |
2043 | - |
2044 | -int __attribute__((overloadable)) all(long a) |
2045 | +int __attribute__((__overloadable__)) all(int4 a) |
2046 | +{ |
2047 | + return all(a.lo) && all(a.hi); |
2048 | +} |
2049 | + |
2050 | +int __attribute__((__overloadable__)) all(int8 a) |
2051 | +{ |
2052 | + return all(a.lo) && all(a.hi); |
2053 | +} |
2054 | + |
2055 | +int __attribute__((__overloadable__)) all(int16 a) |
2056 | +{ |
2057 | + return all(a.lo) && all(a.hi); |
2058 | +} |
2059 | + |
2060 | +int __attribute__((__overloadable__)) all(long a) |
2061 | { |
2062 | return a < 0L; |
2063 | } |
2064 | |
2065 | -int __attribute__((overloadable)) all(long2 a) |
2066 | +int __attribute__((__overloadable__)) all(long2 a) |
2067 | { |
2068 | return all(a.lo) && all(a.hi); |
2069 | } |
2070 | |
2071 | -int __attribute__((overloadable)) all(long3 a) |
2072 | +int __attribute__((__overloadable__)) all(long3 a) |
2073 | { |
2074 | return all(a.s01) && all(a.s2); |
2075 | } |
2076 | |
2077 | -int __attribute__((overloadable)) all(long4 a) |
2078 | -{ |
2079 | - return all(a.lo) && all(a.hi); |
2080 | -} |
2081 | - |
2082 | -int __attribute__((overloadable)) all(long8 a) |
2083 | -{ |
2084 | - return all(a.lo) && all(a.hi); |
2085 | -} |
2086 | - |
2087 | -int __attribute__((overloadable)) all(long16 a) |
2088 | +int __attribute__((__overloadable__)) all(long4 a) |
2089 | +{ |
2090 | + return all(a.lo) && all(a.hi); |
2091 | +} |
2092 | + |
2093 | +int __attribute__((__overloadable__)) all(long8 a) |
2094 | +{ |
2095 | + return all(a.lo) && all(a.hi); |
2096 | +} |
2097 | + |
2098 | +int __attribute__((__overloadable__)) all(long16 a) |
2099 | { |
2100 | return all(a.lo) && all(a.hi); |
2101 | } |
2102 | |
2103 | === modified file 'lib/kernel/any.cl' |
2104 | --- lib/kernel/any.cl 2011-10-27 01:35:56 +0000 |
2105 | +++ lib/kernel/any.cl 2011-10-31 17:03:23 +0000 |
2106 | @@ -21,122 +21,122 @@ |
2107 | THE SOFTWARE. |
2108 | */ |
2109 | |
2110 | -int __attribute__((overloadable)) any(char a) |
2111 | +int __attribute__((__overloadable__)) any(char a) |
2112 | { |
2113 | return a < (char)0; |
2114 | } |
2115 | |
2116 | -int __attribute__((overloadable)) any(char2 a) |
2117 | +int __attribute__((__overloadable__)) any(char2 a) |
2118 | { |
2119 | return any(a.lo) || any(a.hi); |
2120 | } |
2121 | |
2122 | -int __attribute__((overloadable)) any(char3 a) |
2123 | +int __attribute__((__overloadable__)) any(char3 a) |
2124 | { |
2125 | return any(a.s01) || any(a.s2); |
2126 | } |
2127 | |
2128 | -int __attribute__((overloadable)) any(char4 a) |
2129 | -{ |
2130 | - return any(a.lo) || any(a.hi); |
2131 | -} |
2132 | - |
2133 | -int __attribute__((overloadable)) any(char8 a) |
2134 | -{ |
2135 | - return any(a.lo) || any(a.hi); |
2136 | -} |
2137 | - |
2138 | -int __attribute__((overloadable)) any(char16 a) |
2139 | -{ |
2140 | - return any(a.lo) || any(a.hi); |
2141 | -} |
2142 | - |
2143 | -int __attribute__((overloadable)) any(short a) |
2144 | +int __attribute__((__overloadable__)) any(char4 a) |
2145 | +{ |
2146 | + return any(a.lo) || any(a.hi); |
2147 | +} |
2148 | + |
2149 | +int __attribute__((__overloadable__)) any(char8 a) |
2150 | +{ |
2151 | + return any(a.lo) || any(a.hi); |
2152 | +} |
2153 | + |
2154 | +int __attribute__((__overloadable__)) any(char16 a) |
2155 | +{ |
2156 | + return any(a.lo) || any(a.hi); |
2157 | +} |
2158 | + |
2159 | +int __attribute__((__overloadable__)) any(short a) |
2160 | { |
2161 | return a < (short)0; |
2162 | } |
2163 | |
2164 | -int __attribute__((overloadable)) any(short2 a) |
2165 | +int __attribute__((__overloadable__)) any(short2 a) |
2166 | { |
2167 | return any(a.lo) || any(a.hi); |
2168 | } |
2169 | |
2170 | -int __attribute__((overloadable)) any(short3 a) |
2171 | +int __attribute__((__overloadable__)) any(short3 a) |
2172 | { |
2173 | return any(a.s01) || any(a.s2); |
2174 | } |
2175 | |
2176 | -int __attribute__((overloadable)) any(short4 a) |
2177 | -{ |
2178 | - return any(a.lo) || any(a.hi); |
2179 | -} |
2180 | - |
2181 | -int __attribute__((overloadable)) any(short8 a) |
2182 | -{ |
2183 | - return any(a.lo) || any(a.hi); |
2184 | -} |
2185 | - |
2186 | -int __attribute__((overloadable)) any(short16 a) |
2187 | -{ |
2188 | - return any(a.lo) || any(a.hi); |
2189 | -} |
2190 | - |
2191 | -int __attribute__((overloadable)) any(int a) |
2192 | +int __attribute__((__overloadable__)) any(short4 a) |
2193 | +{ |
2194 | + return any(a.lo) || any(a.hi); |
2195 | +} |
2196 | + |
2197 | +int __attribute__((__overloadable__)) any(short8 a) |
2198 | +{ |
2199 | + return any(a.lo) || any(a.hi); |
2200 | +} |
2201 | + |
2202 | +int __attribute__((__overloadable__)) any(short16 a) |
2203 | +{ |
2204 | + return any(a.lo) || any(a.hi); |
2205 | +} |
2206 | + |
2207 | +int __attribute__((__overloadable__)) any(int a) |
2208 | { |
2209 | return a < 0; |
2210 | } |
2211 | |
2212 | -int __attribute__((overloadable)) any(int2 a) |
2213 | +int __attribute__((__overloadable__)) any(int2 a) |
2214 | { |
2215 | return any(a.lo) || any(a.hi); |
2216 | } |
2217 | |
2218 | -int __attribute__((overloadable)) any(int3 a) |
2219 | +int __attribute__((__overloadable__)) any(int3 a) |
2220 | { |
2221 | return any(a.s01) || any(a.s2); |
2222 | } |
2223 | |
2224 | -int __attribute__((overloadable)) any(int4 a) |
2225 | -{ |
2226 | - return any(a.lo) || any(a.hi); |
2227 | -} |
2228 | - |
2229 | -int __attribute__((overloadable)) any(int8 a) |
2230 | -{ |
2231 | - return any(a.lo) || any(a.hi); |
2232 | -} |
2233 | - |
2234 | -int __attribute__((overloadable)) any(int16 a) |
2235 | -{ |
2236 | - return any(a.lo) || any(a.hi); |
2237 | -} |
2238 | - |
2239 | -int __attribute__((overloadable)) any(long a) |
2240 | +int __attribute__((__overloadable__)) any(int4 a) |
2241 | +{ |
2242 | + return any(a.lo) || any(a.hi); |
2243 | +} |
2244 | + |
2245 | +int __attribute__((__overloadable__)) any(int8 a) |
2246 | +{ |
2247 | + return any(a.lo) || any(a.hi); |
2248 | +} |
2249 | + |
2250 | +int __attribute__((__overloadable__)) any(int16 a) |
2251 | +{ |
2252 | + return any(a.lo) || any(a.hi); |
2253 | +} |
2254 | + |
2255 | +int __attribute__((__overloadable__)) any(long a) |
2256 | { |
2257 | return a < 0L; |
2258 | } |
2259 | |
2260 | -int __attribute__((overloadable)) any(long2 a) |
2261 | +int __attribute__((__overloadable__)) any(long2 a) |
2262 | { |
2263 | return any(a.lo) || any(a.hi); |
2264 | } |
2265 | |
2266 | -int __attribute__((overloadable)) any(long3 a) |
2267 | +int __attribute__((__overloadable__)) any(long3 a) |
2268 | { |
2269 | return any(a.s01) || any(a.s2); |
2270 | } |
2271 | |
2272 | -int __attribute__((overloadable)) any(long4 a) |
2273 | -{ |
2274 | - return any(a.lo) || any(a.hi); |
2275 | -} |
2276 | - |
2277 | -int __attribute__((overloadable)) any(long8 a) |
2278 | -{ |
2279 | - return any(a.lo) || any(a.hi); |
2280 | -} |
2281 | - |
2282 | -int __attribute__((overloadable)) any(long16 a) |
2283 | +int __attribute__((__overloadable__)) any(long4 a) |
2284 | +{ |
2285 | + return any(a.lo) || any(a.hi); |
2286 | +} |
2287 | + |
2288 | +int __attribute__((__overloadable__)) any(long8 a) |
2289 | +{ |
2290 | + return any(a.lo) || any(a.hi); |
2291 | +} |
2292 | + |
2293 | +int __attribute__((__overloadable__)) any(long16 a) |
2294 | { |
2295 | return any(a.lo) || any(a.hi); |
2296 | } |
2297 | |
2298 | === modified file 'lib/kernel/as_type.cl' |
2299 | --- lib/kernel/as_type.cl 2011-10-20 19:46:45 +0000 |
2300 | +++ lib/kernel/as_type.cl 2011-10-31 17:03:23 +0000 |
2301 | @@ -22,7 +22,7 @@ |
2302 | */ |
2303 | |
2304 | #define DEFINE_AS_TYPE(SRC, DST) \ |
2305 | - DST __attribute__ ((overloadable)) \ |
2306 | + DST __attribute__ ((__overloadable__)) \ |
2307 | as_##DST(SRC a) \ |
2308 | { \ |
2309 | return *(DST*)&a; \ |
2310 | |
2311 | === modified file 'lib/kernel/ceil.cl' |
2312 | --- lib/kernel/ceil.cl 2011-10-25 18:52:31 +0000 |
2313 | +++ lib/kernel/ceil.cl 2011-10-31 17:03:23 +0000 |
2314 | @@ -21,134 +21,6 @@ |
2315 | THE SOFTWARE. |
2316 | */ |
2317 | |
2318 | - |
2319 | - |
2320 | -#define _MM_FROUND_TO_NEAREST_INT 0x00 |
2321 | -#define _MM_FROUND_TO_NEG_INF 0x01 |
2322 | -#define _MM_FROUND_TO_POS_INF 0x02 |
2323 | -#define _MM_FROUND_TO_ZERO 0x03 |
2324 | -#define _MM_FROUND_CUR_DIRECTION 0x04 |
2325 | - |
2326 | -#define _MM_FROUND_RAISE_EXC 0x00 |
2327 | -#define _MM_FROUND_NO_EXC 0x08 |
2328 | - |
2329 | -#define _MM_FROUND_NINT (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC) |
2330 | -#define _MM_FROUND_FLOOR (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC) |
2331 | -#define _MM_FROUND_CEIL (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC) |
2332 | -#define _MM_FROUND_TRUNC (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC) |
2333 | -#define _MM_FROUND_RINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC) |
2334 | -#define _MM_FROUND_NEARBYINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC) |
2335 | - |
2336 | - |
2337 | - |
2338 | -float __attribute__ ((overloadable)) |
2339 | -cl_ceil(float a) |
2340 | -{ |
2341 | -#ifdef __SSE4_1__ |
2342 | - // LLVM does not optimise this on its own |
2343 | - return ((float4)__builtin_ia32_roundss(*(float4*)&a, *(float4*)&a, |
2344 | - _MM_FROUND_CEIL)).s0; |
2345 | -#else |
2346 | - return __builtin_ceilf(a); |
2347 | -#endif |
2348 | -} |
2349 | - |
2350 | -float2 __attribute__ ((overloadable)) |
2351 | -cl_ceil(float2 a) |
2352 | -{ |
2353 | -#ifdef __SSE4_1__ |
2354 | - return ((float4)cl_ceil(*(float4)&a)).s01; |
2355 | -#else |
2356 | - return (float2)(cl_ceil(a.lo), cl_ceil(a.hi)); |
2357 | -#endif |
2358 | -} |
2359 | - |
2360 | -float3 __attribute__ ((overloadable)) |
2361 | -cl_ceil(float3 a) |
2362 | -{ |
2363 | -#ifdef __SSE4_1__ |
2364 | - return ((float4)cl_ceil(*(float4)&a)).s012; |
2365 | -#else |
2366 | - return (float3)(cl_ceil(a.s01), cl_ceil(a.s2)); |
2367 | -#endif |
2368 | -} |
2369 | - |
2370 | -float4 __attribute__ ((overloadable)) |
2371 | -cl_ceil(float4 a) |
2372 | -{ |
2373 | -#ifdef __SSE4_1__ |
2374 | - return __builtin_ia32_roundps(a, _MM_FROUND_CEIL); |
2375 | -#else |
2376 | - return (float4)(cl_ceil(a.lo), cl_ceil(a.hi)); |
2377 | -#endif |
2378 | -} |
2379 | - |
2380 | -float8 __attribute__ ((overloadable)) |
2381 | -cl_ceil(float8 a) |
2382 | -{ |
2383 | -#ifdef __AVX__ |
2384 | - return __builtin_ia32_roundps256(a, _MM_FROUND_CEIL); |
2385 | -#else |
2386 | - return (float8)(cl_ceil(a.lo), cl_ceil(a.hi)); |
2387 | -#endif |
2388 | -} |
2389 | - |
2390 | -float16 __attribute__ ((overloadable)) |
2391 | -cl_ceil(float16 a) |
2392 | -{ |
2393 | - return (float16)(cl_ceil(a.lo), cl_ceil(a.hi)); |
2394 | -} |
2395 | - |
2396 | -double __attribute__ ((overloadable)) |
2397 | -cl_ceil(double a) |
2398 | -{ |
2399 | -#ifdef __SSE4_1__ |
2400 | - // LLVM does not optimise this on its own |
2401 | - return ((double2)__builtin_ia32_roundss(*(double2*)&a, *(double2*)&a, |
2402 | - _MM_FROUND_CEIL)).s0; |
2403 | -#else |
2404 | - return __builtin_ceil(a); |
2405 | -#endif |
2406 | -} |
2407 | - |
2408 | -double2 __attribute__ ((overloadable)) |
2409 | -cl_ceil(double2 a) |
2410 | -{ |
2411 | -#ifdef __SSE4_1__ |
2412 | - return __builtin_ia32_roundpd(a, _MM_FROUND_CEIL); |
2413 | -#else |
2414 | - return (double2)(cl_ceil(a.lo), cl_ceil(a.hi)); |
2415 | -#endif |
2416 | -} |
2417 | - |
2418 | -double3 __attribute__ ((overloadable)) |
2419 | -cl_ceil(double3 a) |
2420 | -{ |
2421 | -#ifdef __AVX__ |
2422 | - return ((double4)cl_ceil(*(double4)&a)).s012; |
2423 | -#else |
2424 | - return (double3)(cl_ceil(a.s01), cl_ceil(a.s2)); |
2425 | -#endif |
2426 | -} |
2427 | - |
2428 | -double4 __attribute__ ((overloadable)) |
2429 | -cl_ceil(double4 a) |
2430 | -{ |
2431 | -#ifdef __AVX__ |
2432 | - return __builtin_ia32_roundpd256(a, _MM_FROUND_CEIL); |
2433 | -#else |
2434 | - return (double4)(cl_ceil(a.lo), cl_ceil(a.hi)); |
2435 | -#endif |
2436 | -} |
2437 | - |
2438 | -double8 __attribute__ ((overloadable)) |
2439 | -cl_ceil(double8 a) |
2440 | -{ |
2441 | - return (double8)(cl_ceil(a.lo), cl_ceil(a.hi)); |
2442 | -} |
2443 | - |
2444 | -double16 __attribute__ ((overloadable)) |
2445 | -cl_ceil(double16 a) |
2446 | -{ |
2447 | - return (double16)(cl_ceil(a.lo), cl_ceil(a.hi)); |
2448 | -} |
2449 | +#include "templates.h" |
2450 | + |
2451 | +DEFINE_BUILTIN_V_V(ceil) |
2452 | |
2453 | === modified file 'lib/kernel/convert_type.cl' |
2454 | --- lib/kernel/convert_type.cl 2011-10-26 19:49:23 +0000 |
2455 | +++ lib/kernel/convert_type.cl 2011-10-31 17:03:23 +0000 |
2456 | @@ -24,19 +24,19 @@ |
2457 | #include "templates.h" |
2458 | |
2459 | #define DEFINE_CONVERT_TYPE(SRC, DST) \ |
2460 | - DST __attribute__ ((overloadable)) convert_##DST(SRC a) \ |
2461 | + DST __attribute__ ((__overloadable__)) convert_##DST(SRC a) \ |
2462 | { \ |
2463 | return (DST)a; \ |
2464 | } |
2465 | |
2466 | #define DEFINE_CONVERT_TYPE_HALF(SRC, DST, HALFDST) \ |
2467 | - DST __attribute__ ((overloadable)) convert_##DST(SRC a) \ |
2468 | + DST __attribute__ ((__overloadable__)) convert_##DST(SRC a) \ |
2469 | { \ |
2470 | return (DST)(convert_##HALFDST(a.lo), convert_##HALFDST(a.hi)); \ |
2471 | } |
2472 | |
2473 | #define DEFINE_CONVERT_TYPE_012(SRC, DST, DST01, DST2) \ |
2474 | - DST __attribute__ ((overloadable)) convert_##DST(SRC a) \ |
2475 | + DST __attribute__ ((__overloadable__)) convert_##DST(SRC a) \ |
2476 | { \ |
2477 | return (DST)(convert_##DST01(a.s01), convert_##DST2(a.s2)); \ |
2478 | } |
2479 | |
2480 | === modified file 'lib/kernel/copysign.cl' |
2481 | --- lib/kernel/copysign.cl 2011-10-25 16:28:54 +0000 |
2482 | +++ lib/kernel/copysign.cl 2011-10-31 17:03:23 +0000 |
2483 | @@ -21,110 +21,6 @@ |
2484 | THE SOFTWARE. |
2485 | */ |
2486 | |
2487 | -float __attribute__ ((overloadable)) |
2488 | -copysign(float a, float b) |
2489 | -{ |
2490 | - return __builtin_copysignf(a, b); |
2491 | -} |
2492 | - |
2493 | -float2 __attribute__ ((overloadable)) |
2494 | -copysign(float2 a, float2 b) |
2495 | -{ |
2496 | -#ifdef __SSE__ |
2497 | - return copysign(*(float4*)&a, *(float4*)&b).s01; |
2498 | -#else |
2499 | - return (float2)(copysign(a.lo, b.lo), copysign(a.hi, b.hi)); |
2500 | -#endif |
2501 | -} |
2502 | - |
2503 | -float3 __attribute__ ((overloadable)) |
2504 | -copysign(float3 a, float3 b) |
2505 | -{ |
2506 | -#ifdef __SSE__ |
2507 | - return copysign(*(float4*)&a, *(float4*)&b).s012; |
2508 | -#else |
2509 | - return (float3)(copysign(a.s01, b.s01), copysign(a.s2, b.s2)); |
2510 | -#endif |
2511 | -} |
2512 | - |
2513 | -float4 __attribute__ ((overloadable)) |
2514 | -copysign(float4 a, float4 b) |
2515 | -{ |
2516 | -#ifdef __SSE__ |
2517 | - const uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; |
2518 | - return as_float4((~sign_mask & as_uint4(a)) | (sign_mask & as_uint4(b))); |
2519 | -#else |
2520 | - return (float4)(copysign(a.lo, b.lo), copysign(a.hi, b.hi)); |
2521 | -#endif |
2522 | -} |
2523 | - |
2524 | -float8 __attribute__ ((overloadable)) |
2525 | -copysign(float8 a, float8 b) |
2526 | -{ |
2527 | -#ifdef __AVX__ |
2528 | - const uint8 sign_mask = |
2529 | - {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, |
2530 | - 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; |
2531 | - return as_float8((~sign_mask & as_uint8(a)) | (sign_mask & as_uint8(b))); |
2532 | -#else |
2533 | - return (float8)(copysign(a.lo, b.lo), copysign(a.hi, b.hi)); |
2534 | -#endif |
2535 | -} |
2536 | - |
2537 | -float16 __attribute__ ((overloadable)) |
2538 | -copysign(float16 a, float16 b) |
2539 | -{ |
2540 | - return (float16)(copysign(a.lo, b.lo), copysign(a.hi, b.hi)); |
2541 | -} |
2542 | - |
2543 | -double __attribute__ ((overloadable)) |
2544 | -copysign(double a, double b) |
2545 | -{ |
2546 | - return __builtin_copysign(a, b); |
2547 | -} |
2548 | - |
2549 | -double2 __attribute__ ((overloadable)) |
2550 | -copysign(double2 a, double2 b) |
2551 | -{ |
2552 | -#ifdef __SSE2__ |
2553 | - const ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL}; |
2554 | - return as_double2((~sign_mask & as_ulong2(a)) | (sign_mask & as_ulong2(b))); |
2555 | -#else |
2556 | - return (double2)(copysign(a.lo, b.lo), copysign(a.hi, b.hi)); |
2557 | -#endif |
2558 | -} |
2559 | - |
2560 | -double3 __attribute__ ((overloadable)) |
2561 | -copysign(double3 a, double3 b) |
2562 | -{ |
2563 | -#ifdef __AVX__ |
2564 | - return copysign(*(double4*)&a, *(double4*)&b).s012; |
2565 | -#else |
2566 | - return (double3)(copysign(a.s01, b.s01), copysign(a.s2, b.s2)); |
2567 | -#endif |
2568 | -} |
2569 | - |
2570 | -double4 __attribute__ ((overloadable)) |
2571 | -copysign(double4 a, double4 b) |
2572 | -{ |
2573 | -#ifdef __AVX__ |
2574 | - const ulong4 sign_mask = |
2575 | - {0x8000000000000000UL, 0x8000000000000000UL, |
2576 | - 0x8000000000000000UL, 0x8000000000000000UL}; |
2577 | - return as_double4((~sign_mask & as_ulong4(a)) | (sign_mask & as_ulong4(b))); |
2578 | -#else |
2579 | - return (double4)(copysign(a.lo, b.hi), copysign(a.lo, b.hi)); |
2580 | -#endif |
2581 | -} |
2582 | - |
2583 | -double8 __attribute__ ((overloadable)) |
2584 | -copysign(double8 a, double8 b) |
2585 | -{ |
2586 | - return (double8)(copysign(a.lo, b.lo), copysign(a.hi, b.hi)); |
2587 | -} |
2588 | - |
2589 | -double16 __attribute__ ((overloadable)) |
2590 | -copysign(double16 a, double16 b) |
2591 | -{ |
2592 | - return (double16)(copysign(a.lo, b.lo), copysign(a.hi, b.hi)); |
2593 | -} |
2594 | +#include "templates.h" |
2595 | + |
2596 | +DEFINE_BUILTIN_V_VV(copysign) |
2597 | |
2598 | === modified file 'lib/kernel/cross.cl' |
2599 | --- lib/kernel/cross.cl 2011-10-27 01:35:56 +0000 |
2600 | +++ lib/kernel/cross.cl 2011-10-31 17:03:23 +0000 |
2601 | @@ -21,24 +21,24 @@ |
2602 | THE SOFTWARE. |
2603 | */ |
2604 | |
2605 | -float4 __attribute__ ((overloadable)) cross(float4 a, float4 b) |
2606 | +float4 __attribute__ ((__overloadable__)) cross(float4 a, float4 b) |
2607 | { |
2608 | return (float4)(cross(a.xyz, b.xyz), 0.0f); |
2609 | } |
2610 | |
2611 | -float3 __attribute__ ((overloadable)) cross(float3 a, float3 b) |
2612 | +float3 __attribute__ ((__overloadable__)) cross(float3 a, float3 b) |
2613 | { |
2614 | return (float3)(a.y * b.z - a.z * b.y, |
2615 | a.z * b.x - a.x * b.z, |
2616 | a.x * b.y - a.y * b.x); |
2617 | } |
2618 | |
2619 | -double4 __attribute__ ((overloadable)) cross(double4 a, double4 b) |
2620 | +double4 __attribute__ ((__overloadable__)) cross(double4 a, double4 b) |
2621 | { |
2622 | return (double4)(cross(a.xyz, b.xyz), 0.0f); |
2623 | } |
2624 | |
2625 | -double3 __attribute__ ((overloadable)) cross(double3 a, double3 b) |
2626 | +double3 __attribute__ ((__overloadable__)) cross(double3 a, double3 b) |
2627 | { |
2628 | return (double3)(a.y * b.z - a.z * b.y, |
2629 | a.z * b.x - a.x * b.z, |
2630 | |
2631 | === modified file 'lib/kernel/dot.cl' |
2632 | --- lib/kernel/dot.cl 2011-10-27 01:35:56 +0000 |
2633 | +++ lib/kernel/dot.cl 2011-10-31 17:03:23 +0000 |
2634 | @@ -21,62 +21,62 @@ |
2635 | THE SOFTWARE. |
2636 | */ |
2637 | |
2638 | -float __attribute__ ((overloadable)) dot(float a, float b) |
2639 | -{ |
2640 | - return a * b; |
2641 | -} |
2642 | - |
2643 | -float __attribute__ ((overloadable)) dot(float2 a, float2 b) |
2644 | -{ |
2645 | - return a.lo * b.lo + a.hi * b.hi; |
2646 | -} |
2647 | - |
2648 | -float __attribute__ ((overloadable)) dot(float3 a, float3 b) |
2649 | -{ |
2650 | - return dot(a.s01, b.s01) + a.s2 * b.s2; |
2651 | -} |
2652 | - |
2653 | -float __attribute__ ((overloadable)) dot(float4 a, float4 b) |
2654 | -{ |
2655 | - return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2656 | -} |
2657 | - |
2658 | -float __attribute__ ((overloadable)) dot(float8 a, float8 b) |
2659 | -{ |
2660 | - return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2661 | -} |
2662 | - |
2663 | -float __attribute__ ((overloadable)) dot(float16 a, float16 b) |
2664 | -{ |
2665 | - return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2666 | -} |
2667 | - |
2668 | -double __attribute__ ((overloadable)) dot(double a, double b) |
2669 | -{ |
2670 | - return a * b; |
2671 | -} |
2672 | - |
2673 | -double __attribute__ ((overloadable)) dot(double2 a, double2 b) |
2674 | -{ |
2675 | - return a.lo * b.lo + a.hi * b.hi; |
2676 | -} |
2677 | - |
2678 | -double __attribute__ ((overloadable)) dot(double3 a, double3 b) |
2679 | -{ |
2680 | - return dot(a.s01, b.s01) + a.s2 * b.s2; |
2681 | -} |
2682 | - |
2683 | -double __attribute__ ((overloadable)) dot(double4 a, double4 b) |
2684 | -{ |
2685 | - return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2686 | -} |
2687 | - |
2688 | -double __attribute__ ((overloadable)) dot(double8 a, double8 b) |
2689 | -{ |
2690 | - return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2691 | -} |
2692 | - |
2693 | -double __attribute__ ((overloadable)) dot(double16 a, double16 b) |
2694 | +float __attribute__ ((__overloadable__)) dot(float a, float b) |
2695 | +{ |
2696 | + return a * b; |
2697 | +} |
2698 | + |
2699 | +float __attribute__ ((__overloadable__)) dot(float2 a, float2 b) |
2700 | +{ |
2701 | + return a.lo * b.lo + a.hi * b.hi; |
2702 | +} |
2703 | + |
2704 | +float __attribute__ ((__overloadable__)) dot(float3 a, float3 b) |
2705 | +{ |
2706 | + return dot(a.s01, b.s01) + a.s2 * b.s2; |
2707 | +} |
2708 | + |
2709 | +float __attribute__ ((__overloadable__)) dot(float4 a, float4 b) |
2710 | +{ |
2711 | + return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2712 | +} |
2713 | + |
2714 | +float __attribute__ ((__overloadable__)) dot(float8 a, float8 b) |
2715 | +{ |
2716 | + return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2717 | +} |
2718 | + |
2719 | +float __attribute__ ((__overloadable__)) dot(float16 a, float16 b) |
2720 | +{ |
2721 | + return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2722 | +} |
2723 | + |
2724 | +double __attribute__ ((__overloadable__)) dot(double a, double b) |
2725 | +{ |
2726 | + return a * b; |
2727 | +} |
2728 | + |
2729 | +double __attribute__ ((__overloadable__)) dot(double2 a, double2 b) |
2730 | +{ |
2731 | + return a.lo * b.lo + a.hi * b.hi; |
2732 | +} |
2733 | + |
2734 | +double __attribute__ ((__overloadable__)) dot(double3 a, double3 b) |
2735 | +{ |
2736 | + return dot(a.s01, b.s01) + a.s2 * b.s2; |
2737 | +} |
2738 | + |
2739 | +double __attribute__ ((__overloadable__)) dot(double4 a, double4 b) |
2740 | +{ |
2741 | + return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2742 | +} |
2743 | + |
2744 | +double __attribute__ ((__overloadable__)) dot(double8 a, double8 b) |
2745 | +{ |
2746 | + return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2747 | +} |
2748 | + |
2749 | +double __attribute__ ((__overloadable__)) dot(double16 a, double16 b) |
2750 | { |
2751 | return dot(a.lo, b.lo) + dot(a.hi, b.hi); |
2752 | } |
2753 | |
2754 | === modified file 'lib/kernel/fabs.cl' |
2755 | --- lib/kernel/fabs.cl 2011-10-25 16:28:54 +0000 |
2756 | +++ lib/kernel/fabs.cl 2011-10-31 17:03:23 +0000 |
2757 | @@ -21,111 +21,6 @@ |
2758 | THE SOFTWARE. |
2759 | */ |
2760 | |
2761 | -float __attribute__ ((overloadable)) |
2762 | -fabs(float a) |
2763 | -{ |
2764 | - return __builtin_fabsf(a); |
2765 | -} |
2766 | - |
2767 | -float2 __attribute__ ((overloadable)) |
2768 | -fabs(float2 a) |
2769 | -{ |
2770 | -#ifdef __SSE__ |
2771 | - const uint2 sign_mask = {0x80000000U, 0x80000000U}; |
2772 | - return as_float2(~sign_mask & as_uint2(a)); |
2773 | -#else |
2774 | - return (float2)(fabs(a.lo), fabs(a.hi)); |
2775 | -#endif |
2776 | -} |
2777 | - |
2778 | -float3 __attribute__ ((overloadable)) |
2779 | -fabs(float3 a) |
2780 | -{ |
2781 | -#ifdef __SSE__ |
2782 | - return fabs(*(float4*)&a).s012; |
2783 | -#else |
2784 | - return (float3)(fabs(a.s01), fabs(a.s2)); |
2785 | -#endif |
2786 | -} |
2787 | - |
2788 | -float4 __attribute__ ((overloadable)) |
2789 | -fabs(float4 a) |
2790 | -{ |
2791 | -#ifdef __SSE__ |
2792 | - const uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; |
2793 | - return as_float4(~sign_mask & as_uint4(a)); |
2794 | -#else |
2795 | - return (float4)(fabs(a.lo), fabs(a.hi)); |
2796 | -#endif |
2797 | -} |
2798 | - |
2799 | -float8 __attribute__ ((overloadable)) |
2800 | -fabs(float8 a) |
2801 | -{ |
2802 | -#ifdef __AVX__ |
2803 | - const uint8 sign_mask = |
2804 | - {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, |
2805 | - 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; |
2806 | - return as_float8(~sign_mask & as_uint8(a)); |
2807 | -#else |
2808 | - return (float8)(fabs(a.lo), fabs(a.hi)); |
2809 | -#endif |
2810 | -} |
2811 | - |
2812 | -float16 __attribute__ ((overloadable)) |
2813 | -fabs(float16 a) |
2814 | -{ |
2815 | - return (float16)(fabs(a.lo), fabs(a.hi)); |
2816 | -} |
2817 | - |
2818 | -double __attribute__ ((overloadable)) |
2819 | -fabs(double a) |
2820 | -{ |
2821 | - return __builtin_fabs(a); |
2822 | -} |
2823 | - |
2824 | -double2 __attribute__ ((overloadable)) |
2825 | -fabs(double2 a) |
2826 | -{ |
2827 | -#ifdef __SSE2__ |
2828 | - const ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL}; |
2829 | - return as_double2(~sign_mask & as_ulong2(a)); |
2830 | -#else |
2831 | - return (double2)(fabs(a.lo), fabs(a.hi)); |
2832 | -#endif |
2833 | -} |
2834 | - |
2835 | -double3 __attribute__ ((overloadable)) |
2836 | -fabs(double3 a) |
2837 | -{ |
2838 | -#ifdef __AVX__ |
2839 | - return fabs(*(double4*)&a).s012; |
2840 | -#else |
2841 | - return (double3)(fabs(a.s01), fabs(a.s2)); |
2842 | -#endif |
2843 | -} |
2844 | - |
2845 | -double4 __attribute__ ((overloadable)) |
2846 | -fabs(double4 a) |
2847 | -{ |
2848 | -#ifdef __AVX__ |
2849 | - const ulong4 sign_mask = |
2850 | - {0x8000000000000000UL, 0x8000000000000000UL, |
2851 | - 0x8000000000000000UL, 0x8000000000000000UL}; |
2852 | - return as_double4(~sign_mask & as_ulong4(a)); |
2853 | -#else |
2854 | - return (double4)(fabs(a.lo), fabs(a.hi)); |
2855 | -#endif |
2856 | -} |
2857 | - |
2858 | -double8 __attribute__ ((overloadable)) |
2859 | -fabs(double8 a) |
2860 | -{ |
2861 | - return (double8)(fabs(a.lo), fabs(a.hi)); |
2862 | -} |
2863 | - |
2864 | -double16 __attribute__ ((overloadable)) |
2865 | -fabs(double16 a) |
2866 | -{ |
2867 | - return (double16)(fabs(a.lo), fabs(a.hi)); |
2868 | -} |
2869 | +#include "templates.h" |
2870 | + |
2871 | +DEFINE_BUILTIN_V_V(fabs) |
2872 | |
2873 | === modified file 'lib/kernel/floor.cl' |
2874 | --- lib/kernel/floor.cl 2011-10-25 18:52:31 +0000 |
2875 | +++ lib/kernel/floor.cl 2011-10-31 17:03:23 +0000 |
2876 | @@ -21,132 +21,6 @@ |
2877 | THE SOFTWARE. |
2878 | */ |
2879 | |
2880 | -#define _MM_FROUND_TO_NEAREST_INT 0x00 |
2881 | -#define _MM_FROUND_TO_NEG_INF 0x01 |
2882 | -#define _MM_FROUND_TO_POS_INF 0x02 |
2883 | -#define _MM_FROUND_TO_ZERO 0x03 |
2884 | -#define _MM_FROUND_CUR_DIRECTION 0x04 |
2885 | - |
2886 | -#define _MM_FROUND_RAISE_EXC 0x00 |
2887 | -#define _MM_FROUND_NO_EXC 0x08 |
2888 | - |
2889 | -#define _MM_FROUND_NINT (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC) |
2890 | -#define _MM_FROUND_FLOOR (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC) |
2891 | -#define _MM_FROUND_CEIL (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC) |
2892 | -#define _MM_FROUND_TRUNC (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC) |
2893 | -#define _MM_FROUND_RINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC) |
2894 | -#define _MM_FROUND_NEARBYINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC) |
2895 | - |
2896 | - |
2897 | - |
2898 | -float __attribute__ ((overloadable)) |
2899 | -floor(float a) |
2900 | -{ |
2901 | -#ifdef __SSE4_1__ |
2902 | - // LLVM does not optimise this on its own |
2903 | - return ((float4)__builtin_ia32_roundss(*(float4*)&a, *(float4*)&a, |
2904 | - _MM_FROUND_FLOOR)).s0; |
2905 | -#else |
2906 | - return __builtin_floorf(a); |
2907 | -#endif |
2908 | -} |
2909 | - |
2910 | -float2 __attribute__ ((overloadable)) |
2911 | -floor(float2 a) |
2912 | -{ |
2913 | -#ifdef __SSE4_1__ |
2914 | - return ((float4)floor(*(float4)&a)).s01; |
2915 | -#else |
2916 | - return (float2)(floor(a.lo), floor(a.hi)); |
2917 | -#endif |
2918 | -} |
2919 | - |
2920 | -float3 __attribute__ ((overloadable)) |
2921 | -floor(float3 a) |
2922 | -{ |
2923 | -#ifdef __SSE4_1__ |
2924 | - return ((float4)floor(*(float4)&a)).s012; |
2925 | -#else |
2926 | - return (float3)(floor(a.s01), floor(a.s2)); |
2927 | -#endif |
2928 | -} |
2929 | - |
2930 | -float4 __attribute__ ((overloadable)) |
2931 | -floor(float4 a) |
2932 | -{ |
2933 | -#ifdef __SSE4_1__ |
2934 | - return __builtin_ia32_roundps(a, _MM_FROUND_FLOOR); |
2935 | -#else |
2936 | - return (float4)(floor(a.lo), floor(a.hi)); |
2937 | -#endif |
2938 | -} |
2939 | - |
2940 | -float8 __attribute__ ((overloadable)) |
2941 | -floor(float8 a) |
2942 | -{ |
2943 | -#ifdef __AVX__ |
2944 | - return __builtin_ia32_roundps256(a, _MM_FROUND_FLOOR); |
2945 | -#else |
2946 | - return (float8)(floor(a.lo), floor(a.hi)); |
2947 | -#endif |
2948 | -} |
2949 | - |
2950 | -float16 __attribute__ ((overloadable)) |
2951 | -floor(float16 a) |
2952 | -{ |
2953 | - return (float16)(floor(a.lo), floor(a.hi)); |
2954 | -} |
2955 | - |
2956 | -double __attribute__ ((overloadable)) |
2957 | -floor(double a) |
2958 | -{ |
2959 | -#ifdef __SSE4_1__ |
2960 | - // LLVM does not optimise this on its own |
2961 | - return ((double2)__builtin_ia32_roundss(*(double2*)&a, *(double2*)&a, |
2962 | - _MM_FROUND_FLOOR)).s0; |
2963 | -#else |
2964 | - return __builtin_floor(a); |
2965 | -#endif |
2966 | -} |
2967 | - |
2968 | -double2 __attribute__ ((overloadable)) |
2969 | -floor(double2 a) |
2970 | -{ |
2971 | -#ifdef __SSE4_1__ |
2972 | - return __builtin_ia32_roundpd(a, _MM_FROUND_FLOOR); |
2973 | -#else |
2974 | - return (double2)(floor(a.lo), floor(a.hi)); |
2975 | -#endif |
2976 | -} |
2977 | - |
2978 | -double3 __attribute__ ((overloadable)) |
2979 | -floor(double3 a) |
2980 | -{ |
2981 | -#ifdef __AVX__ |
2982 | - return ((double4)floor(*(double4)&a)).s012; |
2983 | -#else |
2984 | - return (double3)(floor(a.s01), floor(a.s2)); |
2985 | -#endif |
2986 | -} |
2987 | - |
2988 | -double4 __attribute__ ((overloadable)) |
2989 | -floor(double4 a) |
2990 | -{ |
2991 | -#ifdef __AVX__ |
2992 | - return __builtin_ia32_roundpd256(a, _MM_FROUND_FLOOR); |
2993 | -#else |
2994 | - return (double4)(floor(a.lo), floor(a.hi)); |
2995 | -#endif |
2996 | -} |
2997 | - |
2998 | -double8 __attribute__ ((overloadable)) |
2999 | -floor(double8 a) |
3000 | -{ |
3001 | - return (double8)(floor(a.lo), floor(a.hi)); |
3002 | -} |
3003 | - |
3004 | -double16 __attribute__ ((overloadable)) |
3005 | -floor(double16 a) |
3006 | -{ |
3007 | - return (double16)(floor(a.lo), floor(a.hi)); |
3008 | -} |
3009 | +#include "templates.h" |
3010 | + |
3011 | +DEFINE_BUILTIN_V_V(floor) |
3012 | |
3013 | === modified file 'lib/kernel/fma.cl' |
3014 | --- lib/kernel/fma.cl 2011-10-26 03:01:29 +0000 |
3015 | +++ lib/kernel/fma.cl 2011-10-31 17:03:23 +0000 |
3016 | @@ -23,5 +23,7 @@ |
3017 | |
3018 | #include "templates.h" |
3019 | |
3020 | +#define __builtin__cl_std_fmaf __builtin_fmaf |
3021 | +#define __builtin__cl_std_fma __builtin_fma |
3022 | + |
3023 | DEFINE_BUILTIN_V_VVV(fma) |
3024 | -// DEFINE_EXPR_V_VVV(fma, a*b+c) |
3025 | |
3026 | === modified file 'lib/kernel/fmax.cl' |
3027 | --- lib/kernel/fmax.cl 2011-10-27 02:59:34 +0000 |
3028 | +++ lib/kernel/fmax.cl 2011-10-31 17:03:23 +0000 |
3029 | @@ -21,134 +21,12 @@ |
3030 | THE SOFTWARE. |
3031 | */ |
3032 | |
3033 | +#undef fmax |
3034 | + |
3035 | #include "templates.h" |
3036 | |
3037 | -DEFINE_EXPR_V_VS(fmax, fmax(a, (vtype)b)) |
3038 | - |
3039 | - |
3040 | - |
3041 | -float4 _cl_fmax_ensure_float4(float4 a) |
3042 | -{ |
3043 | - return a; |
3044 | -} |
3045 | - |
3046 | -double2 _cl_fmax_ensure_double2(double2 a) |
3047 | -{ |
3048 | - return a; |
3049 | -} |
3050 | - |
3051 | - |
3052 | - |
3053 | -float __attribute__ ((overloadable)) |
3054 | -fmax(float a, float b) |
3055 | -{ |
3056 | -#ifdef __SSE__ |
3057 | - // LLVM does not optimise this on its own |
3058 | - // Can't convert to float4 (why?) |
3059 | - // return ((float4)__builtin_ia32_maxss(*(float4*)&a, *(float4*)&b)).s0; |
3060 | - return _cl_fmax_ensure_float4(__builtin_ia32_maxss(*(float4*)&a, *(float4*)&b)).s0; |
3061 | -#else |
3062 | - return __builtin_fmaxf(a, b); |
3063 | -#endif |
3064 | -} |
3065 | - |
3066 | -float2 __attribute__ ((overloadable)) |
3067 | -fmax(float2 a, float2 b) |
3068 | -{ |
3069 | -#ifdef __SSE__ |
3070 | - return ((float4)fmax(*(float4*)&a, *(float4*)&b)).s01; |
3071 | -#else |
3072 | - return (float2)(fmax(a.lo, b.lo), fmax(a.hi, b.hi)); |
3073 | -#endif |
3074 | -} |
3075 | - |
3076 | -float3 __attribute__ ((overloadable)) |
3077 | -fmax(float3 a, float3 b) |
3078 | -{ |
3079 | -#ifdef __SSE__ |
3080 | - return ((float4)fmax(*(float4*)&a, *(float4*)&b)).s012; |
3081 | -#else |
3082 | - return (float3)(fmax(a.s01, b.s01), fmax(a.s2, b.s2)); |
3083 | -#endif |
3084 | -} |
3085 | - |
3086 | -float4 __attribute__ ((overloadable)) |
3087 | -fmax(float4 a, float4 b) |
3088 | -{ |
3089 | -#ifdef __SSE__ |
3090 | - return __builtin_ia32_maxps(a, b); |
3091 | -#else |
3092 | - return (float4)(fmax(a.lo, b.lo), fmax(a.hi, b.hi)); |
3093 | -#endif |
3094 | -} |
3095 | - |
3096 | -float8 __attribute__ ((overloadable)) |
3097 | -fmax(float8 a, float8 b) |
3098 | -{ |
3099 | -#ifdef __AVX__ |
3100 | - return __builtin_ia32_maxps256(a, b); |
3101 | -#else |
3102 | - return (float8)(fmax(a.lo, b.lo), fmax(a.hi, b.hi)); |
3103 | -#endif |
3104 | -} |
3105 | - |
3106 | -float16 __attribute__ ((overloadable)) |
3107 | -fmax(float16 a, float16 b) |
3108 | -{ |
3109 | - return (float16)(fmax(a.lo, b.lo), fmax(a.hi, b.hi)); |
3110 | -} |
3111 | - |
3112 | -double __attribute__ ((overloadable)) |
3113 | -fmax(double a, double b) |
3114 | -{ |
3115 | -#ifdef __SSE2__ |
3116 | - // LLVM does not optimise this on its own |
3117 | - // Can't convert to double2 (why?) |
3118 | - // return ((double2)__builtin_ia32_maxsd(*(double2*)&a, *(double2*)&b)).s0; |
3119 | - return _cl_fmax_ensure_double2(__builtin_ia32_maxsd(*(double2*)&a, *(double2*)&b)).s0; |
3120 | -#else |
3121 | - return __builtin_fmax(a, b); |
3122 | -#endif |
3123 | -} |
3124 | - |
3125 | -double2 __attribute__ ((overloadable)) |
3126 | -fmax(double2 a, double2 b) |
3127 | -{ |
3128 | -#ifdef __SSE2__ |
3129 | - return __builtin_ia32_maxpd(a, b); |
3130 | -#else |
3131 | - return (double2)(fmax(a.lo, b.lo), fmax(a.hi, b.hi)); |
3132 | -#endif |
3133 | -} |
3134 | - |
3135 | -double3 __attribute__ ((overloadable)) |
3136 | -fmax(double3 a, double3 b) |
3137 | -{ |
3138 | -#ifdef __AVX__ |
3139 | - return ((double4)fmax(*(double4*)&a, *(double4*)&b)).s012; |
3140 | -#else |
3141 | - return (double3)(fmax(a.s01, b.s01), fmax(a.s2, b.s2)); |
3142 | -#endif |
3143 | -} |
3144 | - |
3145 | -double4 __attribute__ ((overloadable)) |
3146 | -fmax(double4 a, double4 b) |
3147 | -{ |
3148 | -#ifdef __AVX__ |
3149 | - return __builtin_ia32_maxpd256(a, b); |
3150 | -#else |
3151 | - return (double4)(fmax(a.lo, b.lo), fmax(a.hi, b.hi)); |
3152 | -#endif |
3153 | -} |
3154 | - |
3155 | -double8 __attribute__ ((overloadable)) |
3156 | -fmax(double8 a, double8 b) |
3157 | -{ |
3158 | - return (double8)(fmax(a.lo, b.lo), fmax(a.hi, b.hi)); |
3159 | -} |
3160 | - |
3161 | -double16 __attribute__ ((overloadable)) |
3162 | -fmax(double16 a, double16 b) |
3163 | -{ |
3164 | - return (double16)(fmax(a.lo, b.lo), fmax(a.hi, b.hi)); |
3165 | -} |
3166 | +#define __builtin__cl_std_fmaxf __builtin_fmaxf |
3167 | +#define __builtin__cl_std_fmax __builtin_fmax |
3168 | +DEFINE_BUILTIN_V_VV(_cl_std_fmax) |
3169 | + |
3170 | +DEFINE_EXPR_V_VS(_cl_std_fmax, _cl_std_fmax(a, (vtype)b)) |
3171 | |
3172 | === modified file 'lib/kernel/fmin.cl' |
3173 | --- lib/kernel/fmin.cl 2011-10-27 02:59:34 +0000 |
3174 | +++ lib/kernel/fmin.cl 2011-10-31 17:03:23 +0000 |
3175 | @@ -21,134 +21,12 @@ |
3176 | THE SOFTWARE. |
3177 | */ |
3178 | |
3179 | +#undef fmin |
3180 | + |
3181 | #include "templates.h" |
3182 | |
3183 | -DEFINE_EXPR_V_VS(fmin, fmin(a, (vtype)b)) |
3184 | - |
3185 | - |
3186 | - |
3187 | -float4 _cl_fmin_ensure_float4(float4 a) |
3188 | -{ |
3189 | - return a; |
3190 | -} |
3191 | - |
3192 | -double2 _cl_fmin_ensure_double2(double2 a) |
3193 | -{ |
3194 | - return a; |
3195 | -} |
3196 | - |
3197 | - |
3198 | - |
3199 | -float __attribute__ ((overloadable)) |
3200 | -fmin(float a, float b) |
3201 | -{ |
3202 | -#ifdef __SSE__ |
3203 | - // LLVM does not optimise this on its own |
3204 | - // Can't convert to float4 (why?) |
3205 | - // return ((float4)__builtin_ia32_minss(*(float4*)&a, *(float4*)&b)).s0; |
3206 | - return _cl_fmin_ensure_float4(__builtin_ia32_minss(*(float4*)&a, *(float4*)&b)).s0; |
3207 | -#else |
3208 | - return __builtin_fminf(a, b); |
3209 | -#endif |
3210 | -} |
3211 | - |
3212 | -float2 __attribute__ ((overloadable)) |
3213 | -fmin(float2 a, float2 b) |
3214 | -{ |
3215 | -#ifdef __SSE__ |
3216 | - return ((float4)fmin(*(float4*)&a, *(float4*)&b)).s01; |
3217 | -#else |
3218 | - return (float2)(fmin(a.lo, b.lo), fmin(a.hi, b.hi)); |
3219 | -#endif |
3220 | -} |
3221 | - |
3222 | -float3 __attribute__ ((overloadable)) |
3223 | -fmin(float3 a, float3 b) |
3224 | -{ |
3225 | -#ifdef __SSE__ |
3226 | - return ((float4)fmin(*(float4*)&a, *(float4*)&b)).s012; |
3227 | -#else |
3228 | - return (float3)(fmin(a.s01, b.s01), fmin(a.s2, b.s2)); |
3229 | -#endif |
3230 | -} |
3231 | - |
3232 | -float4 __attribute__ ((overloadable)) |
3233 | -fmin(float4 a, float4 b) |
3234 | -{ |
3235 | -#ifdef __SSE__ |
3236 | - return __builtin_ia32_minps(a, b); |
3237 | -#else |
3238 | - return (float4)(fmin(a.lo, b.lo), fmin(a.hi, b.hi)); |
3239 | -#endif |
3240 | -} |
3241 | - |
3242 | -float8 __attribute__ ((overloadable)) |
3243 | -fmin(float8 a, float8 b) |
3244 | -{ |
3245 | -#ifdef __AVX__ |
3246 | - return __builtin_ia32_minps256(a, b); |
3247 | -#else |
3248 | - return (float8)(fmin(a.lo, b.lo), fmin(a.hi, b.hi)); |
3249 | -#endif |
3250 | -} |
3251 | - |
3252 | -float16 __attribute__ ((overloadable)) |
3253 | -fmin(float16 a, float16 b) |
3254 | -{ |
3255 | - return (float16)(fmin(a.lo, b.lo), fmin(a.hi, b.hi)); |
3256 | -} |
3257 | - |
3258 | -double __attribute__ ((overloadable)) |
3259 | -fmin(double a, double b) |
3260 | -{ |
3261 | -#ifdef __SSE2__ |
3262 | - // LLVM does not optimise this on its own |
3263 | - // Can't convert to double2 (why?) |
3264 | - // return ((double2)__builtin_ia32_minsd(*(double2*)&a, *(double2*)&b)).s0; |
3265 | - return _cl_fmin_ensure_double2(__builtin_ia32_minsd(*(double2*)&a, *(double2*)&b)).s0; |
3266 | -#else |
3267 | - return __builtin_fmin(a, b); |
3268 | -#endif |
3269 | -} |
3270 | - |
3271 | -double2 __attribute__ ((overloadable)) |
3272 | -fmin(double2 a, double2 b) |
3273 | -{ |
3274 | -#ifdef __SSE2__ |
3275 | - return __builtin_ia32_minpd(a, b); |
3276 | -#else |
3277 | - return (double2)(fmin(a.lo, b.lo), fmin(a.hi, b.hi)); |
3278 | -#endif |
3279 | -} |
3280 | - |
3281 | -double3 __attribute__ ((overloadable)) |
3282 | -fmin(double3 a, double3 b) |
3283 | -{ |
3284 | -#ifdef __AVX__ |
3285 | - return ((double4)fmin(*(double4*)&a, *(double4*)&b)).s012; |
3286 | -#else |
3287 | - return (double3)(fmin(a.s01, b.s01), fmin(a.s2, b.s2)); |
3288 | -#endif |
3289 | -} |
3290 | - |
3291 | -double4 __attribute__ ((overloadable)) |
3292 | -fmin(double4 a, double4 b) |
3293 | -{ |
3294 | -#ifdef __AVX__ |
3295 | - return __builtin_ia32_minpd256(a, b); |
3296 | -#else |
3297 | - return (double4)(fmin(a.lo, b.lo), fmin(a.hi, b.hi)); |
3298 | -#endif |
3299 | -} |
3300 | - |
3301 | -double8 __attribute__ ((overloadable)) |
3302 | -fmin(double8 a, double8 b) |
3303 | -{ |
3304 | - return (double8)(fmin(a.lo, b.lo), fmin(a.hi, b.hi)); |
3305 | -} |
3306 | - |
3307 | -double16 __attribute__ ((overloadable)) |
3308 | -fmin(double16 a, double16 b) |
3309 | -{ |
3310 | - return (double16)(fmin(a.lo, b.lo), fmin(a.hi, b.hi)); |
3311 | -} |
3312 | +#define __builtin__cl_std_fminf __builtin_fminf |
3313 | +#define __builtin__cl_std_fmin __builtin_fmin |
3314 | +DEFINE_BUILTIN_V_VV(_cl_std_fmin) |
3315 | + |
3316 | +DEFINE_EXPR_V_VS(_cl_std_fmin, _cl_std_fmin(a, (vtype)b)) |
3317 | |
3318 | === modified file 'lib/kernel/max.cl' |
3319 | --- lib/kernel/max.cl 2011-10-26 21:01:40 +0000 |
3320 | +++ lib/kernel/max.cl 2011-10-31 17:03:23 +0000 |
3321 | @@ -27,5 +27,5 @@ |
3322 | DEFINE_EXPR_G_GS(max, max(a, (gtype)b)) |
3323 | |
3324 | // Note: max() has no special semantics for inf/nan, even if fmax does |
3325 | -DEFINE_EXPR_V_VV(max, fmax(a, b)) |
3326 | +DEFINE_EXPR_V_VV(max, select(b, a, (jtype)(a>=b))) |
3327 | DEFINE_EXPR_V_VS(max, max(a, (vtype)b)) |
3328 | |
3329 | === modified file 'lib/kernel/maxmag.cl' |
3330 | --- lib/kernel/maxmag.cl 2011-10-26 03:01:29 +0000 |
3331 | +++ lib/kernel/maxmag.cl 2011-10-31 17:03:23 +0000 |
3332 | @@ -23,4 +23,18 @@ |
3333 | |
3334 | #include "templates.h" |
3335 | |
3336 | -DEFINE_EXPR_V_VV(maxmag, fmax(fabs(a), fabs(b))) |
3337 | +float __builtin_maxmagf(float x, float y) |
3338 | +{ |
3339 | + if (fabs(x) > fabs(y)) return x; |
3340 | + if (fabs(y) > fabs(x)) return y; |
3341 | + return fmax(x, y); |
3342 | +} |
3343 | + |
3344 | +double __builtin_maxmag(double x, double y) |
3345 | +{ |
3346 | + if (fabs(x) > fabs(y)) return x; |
3347 | + if (fabs(y) > fabs(x)) return y; |
3348 | + return fmax(x, y); |
3349 | +} |
3350 | + |
3351 | +DEFINE_BUILTIN_V_VV(maxmag) |
3352 | |
3353 | === modified file 'lib/kernel/min.cl' |
3354 | --- lib/kernel/min.cl 2011-10-26 21:01:40 +0000 |
3355 | +++ lib/kernel/min.cl 2011-10-31 17:03:23 +0000 |
3356 | @@ -23,9 +23,9 @@ |
3357 | |
3358 | #include "templates.h" |
3359 | |
3360 | -DEFINE_EXPR_G_GG(min, a<b ? a : b) |
3361 | +DEFINE_EXPR_G_GG(min, a<=b ? a : b) |
3362 | DEFINE_EXPR_G_GS(min, min(a, (gtype)b)) |
3363 | |
3364 | // Note: min() has no special semantics for inf/nan, even if fmin does |
3365 | -DEFINE_EXPR_V_VV(min, fmin(a, b)) |
3366 | +DEFINE_EXPR_V_VV(min, select(b, a, (jtype)(a<=b))) |
3367 | DEFINE_EXPR_V_VS(min, min(a, (vtype)b)) |
3368 | |
3369 | === modified file 'lib/kernel/minmag.cl' |
3370 | --- lib/kernel/minmag.cl 2011-10-26 03:01:29 +0000 |
3371 | +++ lib/kernel/minmag.cl 2011-10-31 17:03:23 +0000 |
3372 | @@ -23,4 +23,18 @@ |
3373 | |
3374 | #include "templates.h" |
3375 | |
3376 | -DEFINE_EXPR_V_VV(minmag, fmin(fabs(a), fabs(b))) |
3377 | +float __builtin_minmagf(float x, float y) |
3378 | +{ |
3379 | + if (fabs(x) < fabs(y)) return x; |
3380 | + if (fabs(y) < fabs(x)) return y; |
3381 | + return fmin(x, y); |
3382 | +} |
3383 | + |
3384 | +double __builtin_minmag(double x, double y) |
3385 | +{ |
3386 | + if (fabs(x) < fabs(y)) return x; |
3387 | + if (fabs(y) < fabs(x)) return y; |
3388 | + return fmin(x, y); |
3389 | +} |
3390 | + |
3391 | +DEFINE_BUILTIN_V_VV(minmag) |
3392 | |
3393 | === modified file 'lib/kernel/select.cl' |
3394 | --- lib/kernel/select.cl 2011-10-27 01:35:56 +0000 |
3395 | +++ lib/kernel/select.cl 2011-10-31 17:03:23 +0000 |
3396 | @@ -26,9 +26,9 @@ |
3397 | DEFINE_EXPR_G_GGG(select, c>=(gtype)0 ? a : b) |
3398 | |
3399 | // This segfaults Clang 3.0, so we work around |
3400 | -// DEFINE_EXPR_V_VVJ(select, c>=(jtype)0 ? a : b) |
3401 | +// DEFINE_EXPR_V_VVJ(select, c ? b : a) |
3402 | DEFINE_EXPR_V_VVJ(select, |
3403 | ({ |
3404 | - jtype result = c>=(jtype)0 ? *(jtype*)&a : *(jtype*)&b; |
3405 | + jtype result = c ? *(jtype*)&b : *(jtype*)&a; |
3406 | *(vtype*)&result; |
3407 | })) |
3408 | |
3409 | === modified file 'lib/kernel/sqrt.cl' |
3410 | --- lib/kernel/sqrt.cl 2011-10-25 16:28:54 +0000 |
3411 | +++ lib/kernel/sqrt.cl 2011-10-31 17:03:23 +0000 |
3412 | @@ -21,102 +21,6 @@ |
3413 | THE SOFTWARE. |
3414 | */ |
3415 | |
3416 | -float __attribute__ ((overloadable)) |
3417 | -sqrt(float a) |
3418 | -{ |
3419 | - return __builtin_sqrtf(a); |
3420 | -} |
3421 | - |
3422 | -float2 __attribute__ ((overloadable)) |
3423 | -sqrt(float2 a) |
3424 | -{ |
3425 | -#ifdef __SSE__ |
3426 | - return ((float4)sqrt(*(float4*)&a)).s01; |
3427 | -#else |
3428 | - return (float2)(sqrt(a.lo), sqrt(a.hi)); |
3429 | -#endif |
3430 | -} |
3431 | - |
3432 | -float3 __attribute__ ((overloadable)) |
3433 | -sqrt(float3 a) |
3434 | -{ |
3435 | -#ifdef __SSE__ |
3436 | - return ((float4)sqrt(*(float4*)&a)).s012; |
3437 | -#else |
3438 | - return (float3)(sqrt(a.s01), sqrt(a.s2)); |
3439 | -#endif |
3440 | -} |
3441 | - |
3442 | -float4 __attribute__ ((overloadable)) |
3443 | -sqrt(float4 a) |
3444 | -{ |
3445 | -#ifdef __SSE__ |
3446 | - return __builtin_ia32_sqrtps(a); |
3447 | -#else |
3448 | - return (float4)(sqrt(a.lo), sqrt(a.hi)); |
3449 | -#endif |
3450 | -} |
3451 | - |
3452 | -float8 __attribute__ ((overloadable)) |
3453 | -sqrt(float8 a) |
3454 | -{ |
3455 | -#ifdef __AVX__ |
3456 | - return __builtin_ia32_sqrtps256(a); |
3457 | -#else |
3458 | - return (float8)(sqrt(a.lo), sqrt(a.hi)); |
3459 | -#endif |
3460 | -} |
3461 | - |
3462 | -float16 __attribute__ ((overloadable)) |
3463 | -sqrt(float16 a) |
3464 | -{ |
3465 | - return (float16)(sqrt(a.lo), sqrt(a.hi)); |
3466 | -} |
3467 | - |
3468 | -double __attribute__ ((overloadable)) |
3469 | -sqrt(double a) |
3470 | -{ |
3471 | - return __builtin_sqrt(a); |
3472 | -} |
3473 | - |
3474 | -double2 __attribute__ ((overloadable)) |
3475 | -sqrt(double2 a) |
3476 | -{ |
3477 | -#ifdef __SSE2__ |
3478 | - return __builtin_ia32_sqrtpd(a); |
3479 | -#else |
3480 | - return (double2)(sqrt(a.lo), sqrt(a.hi)); |
3481 | -#endif |
3482 | -} |
3483 | - |
3484 | -double3 __attribute__ ((overloadable)) |
3485 | -sqrt(double3 a) |
3486 | -{ |
3487 | -#ifdef __AVX__ |
3488 | - return ((double4)sqrt(*(double4*)&a)).s012; |
3489 | -#else |
3490 | - return (double3)(sqrt(a.s01), sqrt(a.s2)); |
3491 | -#endif |
3492 | -} |
3493 | - |
3494 | -double4 __attribute__ ((overloadable)) |
3495 | -sqrt(double4 a) |
3496 | -{ |
3497 | -#ifdef __AVX__ |
3498 | - return __builtin_ia32_pd256(a); |
3499 | -#else |
3500 | - return (double4)(sqrt(a.lo), sqrt(a.hi)); |
3501 | -#endif |
3502 | -} |
3503 | - |
3504 | -double8 __attribute__ ((overloadable)) |
3505 | -sqrt(double8 a) |
3506 | -{ |
3507 | - return (double8)(sqrt(a.lo), sqrt(a.hi)); |
3508 | -} |
3509 | - |
3510 | -double16 __attribute__ ((overloadable)) |
3511 | -sqrt(double16 a) |
3512 | -{ |
3513 | - return (double16)(sqrt(a.lo), sqrt(a.hi)); |
3514 | -} |
3515 | +#include "templates.h" |
3516 | + |
3517 | +DEFINE_BUILTIN_V_V(sqrt) |
3518 | |
3519 | === modified file 'lib/kernel/templates.h' |
3520 | --- lib/kernel/templates.h 2011-10-27 01:35:56 +0000 |
3521 | +++ lib/kernel/templates.h 2011-10-31 17:03:23 +0000 |
3522 | @@ -24,18 +24,18 @@ |
3523 | |
3524 | |
3525 | #define IMPLEMENT_BUILTIN_V_V(NAME, VTYPE, LO, HI) \ |
3526 | - VTYPE __attribute__ ((overloadable)) \ |
3527 | + VTYPE _cl_overloadable \ |
3528 | NAME(VTYPE a) \ |
3529 | { \ |
3530 | return (VTYPE)(NAME(a.LO), NAME(a.HI)); \ |
3531 | } |
3532 | #define DEFINE_BUILTIN_V_V(NAME) \ |
3533 | - float __attribute__ ((overloadable)) \ |
3534 | + float _cl_overloadable \ |
3535 | NAME(float a) \ |
3536 | { \ |
3537 | return __builtin_##NAME##f(a); \ |
3538 | } \ |
3539 | - double __attribute__ ((overloadable)) \ |
3540 | + double _cl_overloadable \ |
3541 | NAME(double a) \ |
3542 | { \ |
3543 | return __builtin_##NAME(a); \ |
3544 | @@ -52,18 +52,18 @@ |
3545 | IMPLEMENT_BUILTIN_V_V(NAME, double16, lo, hi) |
3546 | |
3547 | #define IMPLEMENT_BUILTIN_V_VV(NAME, VTYPE, LO, HI) \ |
3548 | - VTYPE __attribute__ ((overloadable)) \ |
3549 | + VTYPE _cl_overloadable \ |
3550 | NAME(VTYPE a, VTYPE b) \ |
3551 | { \ |
3552 | return (VTYPE)(NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \ |
3553 | } |
3554 | #define DEFINE_BUILTIN_V_VV(NAME) \ |
3555 | - float __attribute__ ((overloadable)) \ |
3556 | + float _cl_overloadable \ |
3557 | NAME(float a, float b) \ |
3558 | { \ |
3559 | return __builtin_##NAME##f(a, b); \ |
3560 | } \ |
3561 | - double __attribute__ ((overloadable)) \ |
3562 | + double _cl_overloadable \ |
3563 | NAME(double a, double b) \ |
3564 | { \ |
3565 | return __builtin_##NAME(a, b); \ |
3566 | @@ -80,18 +80,18 @@ |
3567 | IMPLEMENT_BUILTIN_V_VV(NAME, double16, lo, hi) |
3568 | |
3569 | #define IMPLEMENT_BUILTIN_V_VVV(NAME, VTYPE, LO, HI) \ |
3570 | - VTYPE __attribute__ ((overloadable)) \ |
3571 | + VTYPE _cl_overloadable \ |
3572 | NAME(VTYPE a, VTYPE b, VTYPE c) \ |
3573 | { \ |
3574 | return (VTYPE)(NAME(a.LO, b.LO, c.LO), NAME(a.HI, b.HI, c.HI)); \ |
3575 | } |
3576 | #define DEFINE_BUILTIN_V_VVV(NAME) \ |
3577 | - float __attribute__ ((overloadable)) \ |
3578 | + float _cl_overloadable \ |
3579 | NAME(float a, float b, float c) \ |
3580 | { \ |
3581 | return __builtin_##NAME##f(a, b, c); \ |
3582 | } \ |
3583 | - double __attribute__ ((overloadable)) \ |
3584 | + double _cl_overloadable \ |
3585 | NAME(double a, double b, double c) \ |
3586 | { \ |
3587 | return __builtin_##NAME(a, b, c); \ |
3588 | @@ -108,74 +108,86 @@ |
3589 | IMPLEMENT_BUILTIN_V_VVV(NAME, double16, lo, hi) |
3590 | |
3591 | #define IMPLEMENT_BUILTIN_V_U(NAME, VTYPE, UTYPE, LO, HI) \ |
3592 | - VTYPE __attribute__ ((overloadable)) \ |
3593 | + VTYPE _cl_overloadable \ |
3594 | NAME(UTYPE a) \ |
3595 | { \ |
3596 | return (VTYPE)(NAME(a.LO), NAME(a.HI)); \ |
3597 | } |
3598 | -#define DEFINE_BUILTIN_V_U(NAME) \ |
3599 | - float __attribute__ ((overloadable)) \ |
3600 | - NAME(uint a) \ |
3601 | - { \ |
3602 | - return __builtin_##NAME##f(a); \ |
3603 | - } \ |
3604 | - double __attribute__ ((overloadable)) \ |
3605 | - NAME(ulong a) \ |
3606 | - { \ |
3607 | - return __builtin_##NAME(a); \ |
3608 | - } \ |
3609 | - IMPLEMENT_BUILTIN_V_U(NAME, float2 , uint2 , lo, hi) \ |
3610 | - IMPLEMENT_BUILTIN_V_U(NAME, float3 , uint3 , lo, s2) \ |
3611 | - IMPLEMENT_BUILTIN_V_U(NAME, float4 , uint4 , lo, hi) \ |
3612 | - IMPLEMENT_BUILTIN_V_U(NAME, float8 , uint8 , lo, hi) \ |
3613 | - IMPLEMENT_BUILTIN_V_U(NAME, float16 , uint16 , lo, hi) \ |
3614 | - IMPLEMENT_BUILTIN_V_U(NAME, double2 , ulong2 , lo, hi) \ |
3615 | - IMPLEMENT_BUILTIN_V_U(NAME, double3 , ulong3 , lo, s2) \ |
3616 | - IMPLEMENT_BUILTIN_V_U(NAME, double4 , ulong4 , lo, hi) \ |
3617 | - IMPLEMENT_BUILTIN_V_U(NAME, double8 , ulong8 , lo, hi) \ |
3618 | +#define DEFINE_BUILTIN_V_U(NAME) \ |
3619 | + float _cl_overloadable \ |
3620 | + NAME(uint a) \ |
3621 | + { \ |
3622 | + return __builtin_##NAME##f(a); \ |
3623 | + } \ |
3624 | + double _cl_overloadable \ |
3625 | + NAME(ulong a) \ |
3626 | + { \ |
3627 | + return __builtin_##NAME(a); \ |
3628 | + } \ |
3629 | + IMPLEMENT_BUILTIN_V_U(NAME, float2 , uint2 , lo, hi) \ |
3630 | + IMPLEMENT_BUILTIN_V_U(NAME, float3 , uint3 , lo, s2) \ |
3631 | + IMPLEMENT_BUILTIN_V_U(NAME, float4 , uint4 , lo, hi) \ |
3632 | + IMPLEMENT_BUILTIN_V_U(NAME, float8 , uint8 , lo, hi) \ |
3633 | + IMPLEMENT_BUILTIN_V_U(NAME, float16 , uint16 , lo, hi) \ |
3634 | + IMPLEMENT_BUILTIN_V_U(NAME, double2 , ulong2 , lo, hi) \ |
3635 | + IMPLEMENT_BUILTIN_V_U(NAME, double3 , ulong3 , lo, s2) \ |
3636 | + IMPLEMENT_BUILTIN_V_U(NAME, double4 , ulong4 , lo, hi) \ |
3637 | + IMPLEMENT_BUILTIN_V_U(NAME, double8 , ulong8 , lo, hi) \ |
3638 | IMPLEMENT_BUILTIN_V_U(NAME, double16, ulong16, lo, hi) |
3639 | |
3640 | -#define IMPLEMENT_BUILTIN_J_VV(NAME, VTYPE, JTYPE, LO, HI) \ |
3641 | - JTYPE __attribute__ ((overloadable)) \ |
3642 | - NAME(VTYPE a, VTYPE b) \ |
3643 | - { \ |
3644 | - return (JTYPE)(NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \ |
3645 | +#define IMPLEMENT_BUILTIN_J_VV(NAME, VTYPE, STYPE, JTYPE, LO, HI) \ |
3646 | + JTYPE _cl_overloadable \ |
3647 | + NAME(VTYPE a, VTYPE b) \ |
3648 | + { \ |
3649 | + if (sizeof(a.LO) == sizeof(STYPE)) { \ |
3650 | + if (sizeof(a.HI) == sizeof(STYPE)) { \ |
3651 | + return (JTYPE)(-NAME(a.LO, b.LO), -NAME(a.HI, b.HI)); \ |
3652 | + } else { \ |
3653 | + return (JTYPE)(-NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \ |
3654 | + } \ |
3655 | + } else { \ |
3656 | + if (sizeof(a.HI) == sizeof(STYPE)) { \ |
3657 | + return (JTYPE)( NAME(a.LO, b.LO), -NAME(a.HI, b.HI)); \ |
3658 | + } else { \ |
3659 | + return (JTYPE)( NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \ |
3660 | + } \ |
3661 | + } \ |
3662 | } |
3663 | -#define DEFINE_BUILTIN_J_VV(NAME) \ |
3664 | - int __attribute__ ((overloadable)) \ |
3665 | - NAME(float a, float b) \ |
3666 | - { \ |
3667 | - return __builtin_##NAME##f(a, b); \ |
3668 | - } \ |
3669 | - int __attribute__ ((overloadable)) \ |
3670 | - NAME(double a, double b) \ |
3671 | - { \ |
3672 | - return __builtin_##NAME(a, b); \ |
3673 | - } \ |
3674 | - IMPLEMENT_BUILTIN_J_VV(NAME, float2 , int2 , lo, hi) \ |
3675 | - IMPLEMENT_BUILTIN_J_VV(NAME, float3 , int3 , lo, s2) \ |
3676 | - IMPLEMENT_BUILTIN_J_VV(NAME, float4 , int4 , lo, hi) \ |
3677 | - IMPLEMENT_BUILTIN_J_VV(NAME, float8 , int8 , lo, hi) \ |
3678 | - IMPLEMENT_BUILTIN_J_VV(NAME, float16 , int16 , lo, hi) \ |
3679 | - IMPLEMENT_BUILTIN_J_VV(NAME, double2 , long2 , lo, hi) \ |
3680 | - IMPLEMENT_BUILTIN_J_VV(NAME, double3 , long3 , lo, s2) \ |
3681 | - IMPLEMENT_BUILTIN_J_VV(NAME, double4 , long4 , lo, hi) \ |
3682 | - IMPLEMENT_BUILTIN_J_VV(NAME, double8 , long8 , lo, hi) \ |
3683 | - IMPLEMENT_BUILTIN_J_VV(NAME, double16, long16, lo, hi) |
3684 | +#define DEFINE_BUILTIN_J_VV(NAME) \ |
3685 | + int _cl_overloadable \ |
3686 | + NAME(float a, float b) \ |
3687 | + { \ |
3688 | + return __builtin_##NAME##f(a, b); \ |
3689 | + } \ |
3690 | + int _cl_overloadable \ |
3691 | + NAME(double a, double b) \ |
3692 | + { \ |
3693 | + return __builtin_##NAME(a, b); \ |
3694 | + } \ |
3695 | + IMPLEMENT_BUILTIN_J_VV(NAME, float2 , float , int2 , lo, hi) \ |
3696 | + IMPLEMENT_BUILTIN_J_VV(NAME, float3 , float , int3 , lo, s2) \ |
3697 | + IMPLEMENT_BUILTIN_J_VV(NAME, float4 , float , int4 , lo, hi) \ |
3698 | + IMPLEMENT_BUILTIN_J_VV(NAME, float8 , float , int8 , lo, hi) \ |
3699 | + IMPLEMENT_BUILTIN_J_VV(NAME, float16 , float , int16 , lo, hi) \ |
3700 | + IMPLEMENT_BUILTIN_J_VV(NAME, double2 , double, long2 , lo, hi) \ |
3701 | + IMPLEMENT_BUILTIN_J_VV(NAME, double3 , double, long3 , lo, s2) \ |
3702 | + IMPLEMENT_BUILTIN_J_VV(NAME, double4 , double, long4 , lo, hi) \ |
3703 | + IMPLEMENT_BUILTIN_J_VV(NAME, double8 , double, long8 , lo, hi) \ |
3704 | + IMPLEMENT_BUILTIN_J_VV(NAME, double16, double, long16, lo, hi) |
3705 | |
3706 | #define IMPLEMENT_BUILTIN_V_VJ(NAME, VTYPE, JTYPE, LO, HI) \ |
3707 | - VTYPE __attribute__ ((overloadable)) \ |
3708 | + VTYPE _cl_overloadable \ |
3709 | NAME(VTYPE a, JTYPE b) \ |
3710 | { \ |
3711 | return (VTYPE)(NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \ |
3712 | } |
3713 | #define DEFINE_BUILTIN_V_VJ(NAME) \ |
3714 | - float __attribute__ ((overloadable)) \ |
3715 | + float _cl_overloadable \ |
3716 | NAME(float a, int b) \ |
3717 | { \ |
3718 | return __builtin_##NAME##f(a, b); \ |
3719 | } \ |
3720 | - double __attribute__ ((overloadable)) \ |
3721 | + double _cl_overloadable \ |
3722 | NAME(double a, int b) \ |
3723 | { \ |
3724 | return __builtin_##NAME(a, b); \ |
3725 | @@ -192,7 +204,7 @@ |
3726 | IMPLEMENT_BUILTIN_V_VJ(NAME, double16, int16, lo, hi) |
3727 | |
3728 | #define IMPLEMENT_BUILTIN_V_VI(NAME, VTYPE, ITYPE, LO, HI) \ |
3729 | - VTYPE __attribute__ ((overloadable)) \ |
3730 | + VTYPE _cl_overloadable \ |
3731 | NAME(VTYPE a, ITYPE b) \ |
3732 | { \ |
3733 | return (VTYPE)(NAME(a.LO, b), NAME(a.HI, b)); \ |
3734 | @@ -210,18 +222,18 @@ |
3735 | IMPLEMENT_BUILTIN_V_VI(NAME, double16, int, lo, hi) |
3736 | |
3737 | #define IMPLEMENT_BUILTIN_J_V(NAME, JTYPE, VTYPE, LO, HI) \ |
3738 | - JTYPE __attribute__ ((overloadable)) \ |
3739 | + JTYPE _cl_overloadable \ |
3740 | NAME(VTYPE a) \ |
3741 | { \ |
3742 | return (JTYPE)(NAME(a.LO), NAME(a.HI)); \ |
3743 | } |
3744 | #define DEFINE_BUILTIN_J_V(NAME) \ |
3745 | - int __attribute__ ((overloadable)) \ |
3746 | + int _cl_overloadable \ |
3747 | NAME(float a) \ |
3748 | { \ |
3749 | return __builtin_##NAME##f(a); \ |
3750 | } \ |
3751 | - int __attribute__ ((overloadable)) \ |
3752 | + int _cl_overloadable \ |
3753 | NAME(double a) \ |
3754 | { \ |
3755 | return __builtin_##NAME(a); \ |
3756 | @@ -239,30 +251,31 @@ |
3757 | |
3758 | |
3759 | |
3760 | -#define IMPLEMENT_EXPR_V_V(NAME, EXPR, VTYPE, STYPE) \ |
3761 | - VTYPE __attribute__ ((overloadable)) \ |
3762 | - NAME(VTYPE a, VTYPE b) \ |
3763 | - { \ |
3764 | - typedef VTYPE vtype; \ |
3765 | - typedef STYPE stype; \ |
3766 | - return EXPR; \ |
3767 | +#define IMPLEMENT_EXPR_V_V(NAME, EXPR, VTYPE, STYPE, JTYPE) \ |
3768 | + VTYPE _cl_overloadable \ |
3769 | + NAME(VTYPE a) \ |
3770 | + { \ |
3771 | + typedef VTYPE vtype; \ |
3772 | + typedef STYPE stype; \ |
3773 | + typedef JTYPE jtype; \ |
3774 | + return EXPR; \ |
3775 | } |
3776 | -#define DEFINE_EXPR_V_V(NAME, EXPR) \ |
3777 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float , float ) \ |
3778 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float2 , float ) \ |
3779 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float3 , float ) \ |
3780 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float4 , float ) \ |
3781 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float8 , float ) \ |
3782 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float16 , float ) \ |
3783 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double , double) \ |
3784 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double2 , double) \ |
3785 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double3 , double) \ |
3786 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double4 , double) \ |
3787 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double8 , double) \ |
3788 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double16, double) |
3789 | +#define DEFINE_EXPR_V_V(NAME, EXPR) \ |
3790 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float , float , int ) \ |
3791 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float2 , float , int2 ) \ |
3792 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float3 , float , int3 ) \ |
3793 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float4 , float , int4 ) \ |
3794 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float8 , float , int8 ) \ |
3795 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float16 , float , int16 ) \ |
3796 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double , double, long ) \ |
3797 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double2 , double, long2 ) \ |
3798 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double3 , double, long3 ) \ |
3799 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double4 , double, long4 ) \ |
3800 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double8 , double, long8 ) \ |
3801 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double16, double, long16) |
3802 | |
3803 | #define IMPLEMENT_EXPR_V_VV(NAME, EXPR, VTYPE, STYPE, JTYPE) \ |
3804 | - VTYPE __attribute__ ((overloadable)) \ |
3805 | + VTYPE _cl_overloadable \ |
3806 | NAME(VTYPE a, VTYPE b) \ |
3807 | { \ |
3808 | typedef VTYPE vtype; \ |
3809 | @@ -285,7 +298,7 @@ |
3810 | IMPLEMENT_EXPR_V_VV(NAME, EXPR, double16, double, long16) |
3811 | |
3812 | #define IMPLEMENT_EXPR_V_VVV(NAME, EXPR, VTYPE, STYPE, JTYPE) \ |
3813 | - VTYPE __attribute__ ((overloadable)) \ |
3814 | + VTYPE _cl_overloadable \ |
3815 | NAME(VTYPE a, VTYPE b, VTYPE c) \ |
3816 | { \ |
3817 | typedef VTYPE vtype; \ |
3818 | @@ -308,7 +321,7 @@ |
3819 | IMPLEMENT_EXPR_V_VVV(NAME, EXPR, double16, double, long16) |
3820 | |
3821 | #define IMPLEMENT_EXPR_S_VV(NAME, EXPR, VTYPE, STYPE, JTYPE) \ |
3822 | - STYPE __attribute__ ((overloadable)) \ |
3823 | + STYPE _cl_overloadable \ |
3824 | NAME(VTYPE a, VTYPE b) \ |
3825 | { \ |
3826 | typedef VTYPE vtype; \ |
3827 | @@ -331,7 +344,7 @@ |
3828 | IMPLEMENT_EXPR_S_VV(NAME, EXPR, double16, double, long16) |
3829 | |
3830 | #define IMPLEMENT_EXPR_V_VVS(NAME, EXPR, VTYPE, STYPE) \ |
3831 | - VTYPE __attribute__ ((overloadable)) \ |
3832 | + VTYPE _cl_overloadable \ |
3833 | NAME(VTYPE a, VTYPE b, STYPE c) \ |
3834 | { \ |
3835 | typedef VTYPE vtype; \ |
3836 | @@ -351,7 +364,7 @@ |
3837 | IMPLEMENT_EXPR_V_VVS(NAME, EXPR, double16, double) |
3838 | |
3839 | #define IMPLEMENT_EXPR_V_VSS(NAME, EXPR, VTYPE, STYPE) \ |
3840 | - VTYPE __attribute__ ((overloadable)) \ |
3841 | + VTYPE _cl_overloadable \ |
3842 | NAME(VTYPE a, STYPE b, STYPE c) \ |
3843 | { \ |
3844 | typedef VTYPE vtype; \ |
3845 | @@ -371,7 +384,7 @@ |
3846 | IMPLEMENT_EXPR_V_VSS(NAME, EXPR, double16, double) |
3847 | |
3848 | #define IMPLEMENT_EXPR_V_SSV(NAME, EXPR, VTYPE, STYPE) \ |
3849 | - VTYPE __attribute__ ((overloadable)) \ |
3850 | + VTYPE _cl_overloadable \ |
3851 | NAME(STYPE a, STYPE b, VTYPE c) \ |
3852 | { \ |
3853 | typedef VTYPE vtype; \ |
3854 | @@ -391,7 +404,7 @@ |
3855 | IMPLEMENT_EXPR_V_SSV(NAME, EXPR, double16, double) |
3856 | |
3857 | #define IMPLEMENT_EXPR_V_VVJ(NAME, EXPR, VTYPE, STYPE, JTYPE) \ |
3858 | - VTYPE __attribute__ ((overloadable)) \ |
3859 | + VTYPE _cl_overloadable \ |
3860 | NAME(VTYPE a, VTYPE b, JTYPE c) \ |
3861 | { \ |
3862 | typedef VTYPE vtype; \ |
3863 | @@ -414,7 +427,7 @@ |
3864 | IMPLEMENT_EXPR_V_VVJ(NAME, EXPR, double16, double, long16) |
3865 | |
3866 | #define IMPLEMENT_EXPR_V_U(NAME, EXPR, VTYPE, STYPE, UTYPE) \ |
3867 | - VTYPE __attribute__ ((overloadable)) \ |
3868 | + VTYPE _cl_overloadable \ |
3869 | NAME(UTYPE a) \ |
3870 | { \ |
3871 | typedef VTYPE vtype; \ |
3872 | @@ -437,7 +450,7 @@ |
3873 | IMPLEMENT_EXPR_V_U(NAME, EXPR, double16, double, ulong16) |
3874 | |
3875 | #define IMPLEMENT_EXPR_V_VS(NAME, EXPR, VTYPE, STYPE) \ |
3876 | - VTYPE __attribute__ ((overloadable)) \ |
3877 | + VTYPE _cl_overloadable \ |
3878 | NAME(VTYPE a, STYPE b) \ |
3879 | { \ |
3880 | typedef VTYPE vtype; \ |
3881 | @@ -457,7 +470,7 @@ |
3882 | IMPLEMENT_EXPR_V_VS(NAME, EXPR, double16, double) |
3883 | |
3884 | #define IMPLEMENT_EXPR_V_VJ(NAME, EXPR, VTYPE, STYPE, JTYPE) \ |
3885 | - VTYPE __attribute__ ((overloadable)) \ |
3886 | + VTYPE _cl_overloadable \ |
3887 | NAME(VTYPE a, JTYPE b) \ |
3888 | { \ |
3889 | typedef VTYPE vtype; \ |
3890 | @@ -480,7 +493,7 @@ |
3891 | IMPLEMENT_EXPR_V_VJ(NAME, EXPR, double16, double, int16) |
3892 | |
3893 | #define IMPLEMENT_EXPR_V_VI(NAME, EXPR, VTYPE, STYPE, ITYPE) \ |
3894 | - VTYPE __attribute__ ((overloadable)) \ |
3895 | + VTYPE _cl_overloadable \ |
3896 | NAME(VTYPE a, ITYPE b) \ |
3897 | { \ |
3898 | typedef VTYPE vtype; \ |
3899 | @@ -501,14 +514,14 @@ |
3900 | IMPLEMENT_EXPR_V_VI(NAME, EXPR, double16, double, int) |
3901 | |
3902 | #define IMPLEMENT_EXPR_V_VPV(NAME, EXPR, VTYPE, STYPE) \ |
3903 | - VTYPE __attribute__ ((overloadable)) \ |
3904 | + VTYPE _cl_overloadable \ |
3905 | NAME(VTYPE a, __global VTYPE *b) \ |
3906 | { \ |
3907 | typedef VTYPE vtype; \ |
3908 | typedef STYPE stype; \ |
3909 | return EXPR; \ |
3910 | } \ |
3911 | - VTYPE __attribute__ ((overloadable)) \ |
3912 | + VTYPE _cl_overloadable \ |
3913 | NAME(VTYPE a, __local VTYPE *b) \ |
3914 | { \ |
3915 | typedef VTYPE vtype; \ |
3916 | @@ -516,7 +529,7 @@ |
3917 | return EXPR; \ |
3918 | } \ |
3919 | /* __private is not supported yet \ |
3920 | - VTYPE __attribute__ ((overloadable)) \ |
3921 | + VTYPE _cl_overloadable \ |
3922 | NAME(VTYPE a, __private VTYPE *b) \ |
3923 | { \ |
3924 | typedef VTYPE vtype; \ |
3925 | @@ -539,7 +552,7 @@ |
3926 | IMPLEMENT_EXPR_V_VPV(NAME, EXPR, double16, double) |
3927 | |
3928 | #define IMPLEMENT_EXPR_V_SV(NAME, EXPR, VTYPE, STYPE) \ |
3929 | - VTYPE __attribute__ ((overloadable)) \ |
3930 | + VTYPE _cl_overloadable \ |
3931 | NAME(STYPE a, VTYPE b) \ |
3932 | { \ |
3933 | typedef VTYPE vtype; \ |
3934 | @@ -561,48 +574,48 @@ |
3935 | |
3936 | |
3937 | #define IMPLEMENT_BUILTIN_G_G(NAME, GTYPE, UGTYPE, LO, HI) \ |
3938 | - GTYPE __attribute__ ((overloadable)) \ |
3939 | + GTYPE _cl_overloadable \ |
3940 | NAME(GTYPE a) \ |
3941 | { \ |
3942 | return (GTYPE)(NAME(a.LO), NAME(a.HI)); \ |
3943 | } |
3944 | #define DEFINE_BUILTIN_G_G(NAME) \ |
3945 | - char __attribute__ ((overloadable)) \ |
3946 | + char _cl_overloadable \ |
3947 | NAME(char a) \ |
3948 | { \ |
3949 | return __builtin_##NAME##hh(a); \ |
3950 | } \ |
3951 | - short __attribute__ ((overloadable)) \ |
3952 | + short _cl_overloadable \ |
3953 | NAME(short a) \ |
3954 | { \ |
3955 | return __builtin_##NAME##h(a); \ |
3956 | } \ |
3957 | - int __attribute__ ((overloadable)) \ |
3958 | + int _cl_overloadable \ |
3959 | NAME(int a) \ |
3960 | { \ |
3961 | return __builtin_##NAME(a); \ |
3962 | } \ |
3963 | - long __attribute__ ((overloadable)) \ |
3964 | + long _cl_overloadable \ |
3965 | NAME(long a) \ |
3966 | { \ |
3967 | return __builtin_##NAME##l(a); \ |
3968 | } \ |
3969 | - uchar __attribute__ ((overloadable)) \ |
3970 | + uchar _cl_overloadable \ |
3971 | NAME(uchar a) \ |
3972 | { \ |
3973 | return __builtin_##NAME##uhh(a); \ |
3974 | } \ |
3975 | - ushort __attribute__ ((overloadable)) \ |
3976 | + ushort _cl_overloadable \ |
3977 | NAME(ushort a) \ |
3978 | { \ |
3979 | return __builtin_##NAME##uh(a); \ |
3980 | } \ |
3981 | - uint __attribute__ ((overloadable)) \ |
3982 | + uint _cl_overloadable \ |
3983 | NAME(uint a) \ |
3984 | { \ |
3985 | return __builtin_##NAME##u(a); \ |
3986 | } \ |
3987 | - ulong __attribute__ ((overloadable)) \ |
3988 | + ulong _cl_overloadable \ |
3989 | NAME(ulong a) \ |
3990 | { \ |
3991 | return __builtin_##NAME##ul(a); \ |
3992 | @@ -649,48 +662,48 @@ |
3993 | IMPLEMENT_BUILTIN_G_G(NAME, ulong16 , ulong16 , lo, hi) |
3994 | |
3995 | #define IMPLEMENT_BUILTIN_UG_G(NAME, GTYPE, UGTYPE, LO, HI) \ |
3996 | - UGTYPE __attribute__ ((overloadable)) \ |
3997 | + UGTYPE _cl_overloadable \ |
3998 | NAME(GTYPE a) \ |
3999 | { \ |
4000 | return (UGTYPE)(NAME(a.LO), NAME(a.HI)); \ |
4001 | } |
4002 | #define DEFINE_BUILTIN_UG_G(NAME) \ |
4003 | - uchar __attribute__ ((overloadable)) \ |
4004 | + uchar _cl_overloadable \ |
4005 | NAME(char a) \ |
4006 | { \ |
4007 | return __builtin_##NAME##h(a); \ |
4008 | } \ |
4009 | - ushort __attribute__ ((overloadable)) \ |
4010 | + ushort _cl_overloadable \ |
4011 | NAME(short a) \ |
4012 | { \ |
4013 | return __builtin_##NAME##h(a); \ |
4014 | } \ |
4015 | - uint __attribute__ ((overloadable)) \ |
4016 | + uint _cl_overloadable \ |
4017 | NAME(int a) \ |
4018 | { \ |
4019 | return __builtin_##NAME(a); \ |
4020 | } \ |
4021 | - ulong __attribute__ ((overloadable)) \ |
4022 | + ulong _cl_overloadable \ |
4023 | NAME(long a) \ |
4024 | { \ |
4025 | return __builtin_##NAME##l(a); \ |
4026 | } \ |
4027 | - uchar __attribute__ ((overloadable)) \ |
4028 | + uchar _cl_overloadable \ |
4029 | NAME(uchar a) \ |
4030 | { \ |
4031 | return __builtin_##NAME##uhh(a); \ |
4032 | } \ |
4033 | - ushort __attribute__ ((overloadable)) \ |
4034 | + ushort _cl_overloadable \ |
4035 | NAME(ushort a) \ |
4036 | { \ |
4037 | return __builtin_##NAME##uh(a); \ |
4038 | } \ |
4039 | - uint __attribute__ ((overloadable)) \ |
4040 | + uint _cl_overloadable \ |
4041 | NAME(uint a) \ |
4042 | { \ |
4043 | return __builtin_##NAME##u(a); \ |
4044 | } \ |
4045 | - ulong __attribute__ ((overloadable)) \ |
4046 | + ulong _cl_overloadable \ |
4047 | NAME(ulong a) \ |
4048 | { \ |
4049 | return __builtin_##NAME##ul(a); \ |
4050 | @@ -739,7 +752,7 @@ |
4051 | |
4052 | |
4053 | #define IMPLEMENT_EXPR_G_G(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \ |
4054 | - GTYPE __attribute__ ((overloadable)) \ |
4055 | + GTYPE _cl_overloadable \ |
4056 | NAME(GTYPE a) \ |
4057 | { \ |
4058 | typedef GTYPE gtype; \ |
4059 | @@ -799,7 +812,7 @@ |
4060 | IMPLEMENT_EXPR_G_G(NAME, EXPR, ulong16 , ulong , ulong16 , ulong ) |
4061 | |
4062 | #define IMPLEMENT_EXPR_UG_G(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \ |
4063 | - UGTYPE __attribute__ ((overloadable)) \ |
4064 | + UGTYPE _cl_overloadable \ |
4065 | NAME(GTYPE a) \ |
4066 | { \ |
4067 | typedef GTYPE gtype; \ |
4068 | @@ -859,7 +872,7 @@ |
4069 | IMPLEMENT_EXPR_UG_G(NAME, EXPR, ulong16 , ulong , ulong16 , ulong ) |
4070 | |
4071 | #define IMPLEMENT_EXPR_G_GG(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \ |
4072 | - GTYPE __attribute__ ((overloadable)) \ |
4073 | + GTYPE _cl_overloadable \ |
4074 | NAME(GTYPE a, GTYPE b) \ |
4075 | { \ |
4076 | typedef GTYPE gtype; \ |
4077 | @@ -918,7 +931,7 @@ |
4078 | IMPLEMENT_EXPR_G_GG(NAME, EXPR, ulong8 , ulong , ulong8 , ulong ) \ |
4079 | IMPLEMENT_EXPR_G_GG(NAME, EXPR, ulong16 , ulong , ulong16 , ulong ) |
4080 | #define IMPLEMENT_EXPR_G_GGG(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \ |
4081 | - GTYPE __attribute__ ((overloadable)) \ |
4082 | + GTYPE _cl_overloadable \ |
4083 | NAME(GTYPE a, GTYPE b, GTYPE c) \ |
4084 | { \ |
4085 | typedef GTYPE gtype; \ |
4086 | @@ -978,7 +991,7 @@ |
4087 | IMPLEMENT_EXPR_G_GGG(NAME, EXPR, ulong16 , ulong , ulong16 , ulong ) |
4088 | |
4089 | #define IMPLEMENT_EXPR_G_GS(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \ |
4090 | - GTYPE __attribute__ ((overloadable)) \ |
4091 | + GTYPE _cl_overloadable \ |
4092 | NAME(GTYPE a, SGTYPE b) \ |
4093 | { \ |
4094 | typedef GTYPE gtype; \ |
4095 | @@ -1030,7 +1043,7 @@ |
4096 | IMPLEMENT_EXPR_G_GS(NAME, EXPR, ulong16 , ulong , ulong16 , ulong ) |
4097 | |
4098 | #define IMPLEMENT_EXPR_UG_GG(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \ |
4099 | - UGTYPE __attribute__ ((overloadable)) \ |
4100 | + UGTYPE _cl_overloadable \ |
4101 | NAME(GTYPE a, GTYPE b) \ |
4102 | { \ |
4103 | typedef GTYPE gtype; \ |
4104 | @@ -1090,7 +1103,7 @@ |
4105 | IMPLEMENT_EXPR_UG_GG(NAME, EXPR, ulong16 , ulong , ulong16 , ulong ) |
4106 | |
4107 | #define IMPLEMENT_EXPR_LG_GUG(NAME, EXPR, GTYPE, SGTYPE, UGTYPE, LGTYPE) \ |
4108 | - LGTYPE __attribute__ ((overloadable)) \ |
4109 | + LGTYPE _cl_overloadable \ |
4110 | NAME(GTYPE a, UGTYPE b) \ |
4111 | { \ |
4112 | typedef GTYPE gtype; \ |
4113 | @@ -1138,7 +1151,7 @@ |
4114 | IMPLEMENT_EXPR_LG_GUG(NAME, EXPR, uint16 , uint , uint16 , ulong16 ) |
4115 | |
4116 | #define IMPLEMENT_EXPR_J_JJ(NAME, EXPR, JTYPE, SJTYPE, UJTYPE, SUJTYPE) \ |
4117 | - JTYPE __attribute__ ((overloadable)) \ |
4118 | + JTYPE _cl_overloadable \ |
4119 | NAME(JTYPE a, JTYPE b) \ |
4120 | { \ |
4121 | typedef JTYPE gtype; \ |
4122 | @@ -1161,7 +1174,7 @@ |
4123 | IMPLEMENT_EXPR_J_JJ(NAME, EXPR, uint8 , uint , uint8 , uint ) \ |
4124 | IMPLEMENT_EXPR_J_JJ(NAME, EXPR, uint16 , uint , uint16 , uint ) |
4125 | #define IMPLEMENT_EXPR_J_JJJ(NAME, EXPR, JTYPE, SJTYPE, UJTYPE, SUJTYPE) \ |
4126 | - JTYPE __attribute__ ((overloadable)) \ |
4127 | + JTYPE _cl_overloadable \ |
4128 | NAME(JTYPE a, JTYPE b, JTYPE c) \ |
4129 | { \ |
4130 | typedef JTYPE gtype; \ |
4131 | |
4132 | === modified file 'lib/kernel/upsample.cl' |
4133 | --- lib/kernel/upsample.cl 2011-10-26 19:49:23 +0000 |
4134 | +++ lib/kernel/upsample.cl 2011-10-31 17:03:23 +0000 |
4135 | @@ -25,7 +25,7 @@ |
4136 | // convert_* function calls |
4137 | |
4138 | #define IMPLEMENT_UPSAMPLE_LG_GUG(GTYPE, SGTYPE, UGTYPE, LGTYPE) \ |
4139 | - LGTYPE __attribute__ ((overloadable)) \ |
4140 | + LGTYPE __attribute__ ((__overloadable__)) \ |
4141 | upsample(GTYPE a, UGTYPE b) \ |
4142 | { \ |
4143 | int bits = CHAR_BIT * sizeof(SGTYPE); \ |
4144 | |
4145 | === added file 'lib/kernel/vload.cl' |
4146 | --- lib/kernel/vload.cl 1970-01-01 00:00:00 +0000 |
4147 | +++ lib/kernel/vload.cl 2011-10-31 17:03:23 +0000 |
4148 | @@ -0,0 +1,106 @@ |
4149 | +/* OpenCL built-in library: vloa() |
4150 | + |
4151 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
4152 | + |
4153 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
4154 | + of this software and associated documentation files (the "Software"), to deal |
4155 | + in the Software without restriction, including without limitation the rights |
4156 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
4157 | + copies of the Software, and to permit persons to whom the Software is |
4158 | + furnished to do so, subject to the following conditions: |
4159 | + |
4160 | + The above copyright notice and this permission notice shall be included in |
4161 | + all copies or substantial portions of the Software. |
4162 | + |
4163 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
4164 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
4165 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
4166 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
4167 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
4168 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
4169 | + THE SOFTWARE. |
4170 | +*/ |
4171 | + |
4172 | +#include "templates.h" |
4173 | + |
4174 | + |
4175 | + |
4176 | +#define IMPLEMENT_VLOAD(TYPE, MOD) \ |
4177 | + \ |
4178 | + TYPE##2 __attribute__ ((__overloadable__)) \ |
4179 | + vload2(size_t offset, const MOD TYPE *p) \ |
4180 | + { \ |
4181 | + return (TYPE##2)(p[offset*2], p[offset*2+1]); \ |
4182 | + } \ |
4183 | + \ |
4184 | + TYPE##3 __attribute__ ((__overloadable__)) \ |
4185 | + vload3(size_t offset, const MOD TYPE *p) \ |
4186 | + { \ |
4187 | + return (TYPE##3)(vload2(0, &p[offset*3]), p[offset*3+2]); \ |
4188 | + } \ |
4189 | + \ |
4190 | + TYPE##4 __attribute__ ((__overloadable__)) \ |
4191 | + vload4(size_t offset, const MOD TYPE *p) \ |
4192 | + { \ |
4193 | + return (TYPE##4)(vload2(0, &p[offset*4]), vload2(0, &p[offset*4+2])); \ |
4194 | + } \ |
4195 | + \ |
4196 | + TYPE##8 __attribute__ ((__overloadable__)) \ |
4197 | + vload8(size_t offset, const MOD TYPE *p) \ |
4198 | + { \ |
4199 | + return (TYPE##8)(vload4(0, &p[offset*8]), vload4(0, &p[offset*8+4])); \ |
4200 | + } \ |
4201 | + \ |
4202 | + TYPE##16 __attribute__ ((__overloadable__)) \ |
4203 | + vload16(size_t offset, const MOD TYPE *p) \ |
4204 | + { \ |
4205 | + return (TYPE##16)(vload8(0, &p[offset*16]), vload8(0, &p[offset*16+8])); \ |
4206 | + } |
4207 | + |
4208 | + |
4209 | + |
4210 | +IMPLEMENT_VLOAD(char , __global) |
4211 | +IMPLEMENT_VLOAD(short , __global) |
4212 | +IMPLEMENT_VLOAD(int , __global) |
4213 | +IMPLEMENT_VLOAD(long , __global) |
4214 | +IMPLEMENT_VLOAD(uchar , __global) |
4215 | +IMPLEMENT_VLOAD(ushort, __global) |
4216 | +IMPLEMENT_VLOAD(uint , __global) |
4217 | +IMPLEMENT_VLOAD(ulong , __global) |
4218 | +IMPLEMENT_VLOAD(float , __global) |
4219 | +IMPLEMENT_VLOAD(double, __global) |
4220 | + |
4221 | +IMPLEMENT_VLOAD(char , __local) |
4222 | +IMPLEMENT_VLOAD(short , __local) |
4223 | +IMPLEMENT_VLOAD(int , __local) |
4224 | +IMPLEMENT_VLOAD(long , __local) |
4225 | +IMPLEMENT_VLOAD(uchar , __local) |
4226 | +IMPLEMENT_VLOAD(ushort, __local) |
4227 | +IMPLEMENT_VLOAD(uint , __local) |
4228 | +IMPLEMENT_VLOAD(ulong , __local) |
4229 | +IMPLEMENT_VLOAD(float , __local) |
4230 | +IMPLEMENT_VLOAD(double, __local) |
4231 | + |
4232 | +IMPLEMENT_VLOAD(char , __constant) |
4233 | +IMPLEMENT_VLOAD(short , __constant) |
4234 | +IMPLEMENT_VLOAD(int , __constant) |
4235 | +IMPLEMENT_VLOAD(long , __constant) |
4236 | +IMPLEMENT_VLOAD(uchar , __constant) |
4237 | +IMPLEMENT_VLOAD(ushort, __constant) |
4238 | +IMPLEMENT_VLOAD(uint , __constant) |
4239 | +IMPLEMENT_VLOAD(ulong , __constant) |
4240 | +IMPLEMENT_VLOAD(float , __constant) |
4241 | +IMPLEMENT_VLOAD(double, __constant) |
4242 | + |
4243 | +/* __private is not supported yet |
4244 | +IMPLEMENT_VLOAD(char , __private) |
4245 | +IMPLEMENT_VLOAD(short , __private) |
4246 | +IMPLEMENT_VLOAD(int , __private) |
4247 | +IMPLEMENT_VLOAD(long , __private) |
4248 | +IMPLEMENT_VLOAD(uchar , __private) |
4249 | +IMPLEMENT_VLOAD(ushort, __private) |
4250 | +IMPLEMENT_VLOAD(uint , __private) |
4251 | +IMPLEMENT_VLOAD(ulong , __private) |
4252 | +IMPLEMENT_VLOAD(float , __private) |
4253 | +IMPLEMENT_VLOAD(double, __private) |
4254 | +*/ |
4255 | |
4256 | === added file 'lib/kernel/vstore.cl' |
4257 | --- lib/kernel/vstore.cl 1970-01-01 00:00:00 +0000 |
4258 | +++ lib/kernel/vstore.cl 2011-10-31 17:03:23 +0000 |
4259 | @@ -0,0 +1,100 @@ |
4260 | +/* OpenCL built-in library: vstore() |
4261 | + |
4262 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
4263 | + |
4264 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
4265 | + of this software and associated documentation files (the "Software"), to deal |
4266 | + in the Software without restriction, including without limitation the rights |
4267 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
4268 | + copies of the Software, and to permit persons to whom the Software is |
4269 | + furnished to do so, subject to the following conditions: |
4270 | + |
4271 | + The above copyright notice and this permission notice shall be included in |
4272 | + all copies or substantial portions of the Software. |
4273 | + |
4274 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
4275 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
4276 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
4277 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
4278 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
4279 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
4280 | + THE SOFTWARE. |
4281 | +*/ |
4282 | + |
4283 | +#include "templates.h" |
4284 | + |
4285 | + |
4286 | + |
4287 | +#define IMPLEMENT_VSTORE(TYPE, MOD) \ |
4288 | + \ |
4289 | + void __attribute__ ((__overloadable__)) \ |
4290 | + vstore2(TYPE##2 data, size_t offset, MOD TYPE *p) \ |
4291 | + { \ |
4292 | + p[offset*2] = data.lo; \ |
4293 | + p[offset*2+1] = data.hi; \ |
4294 | + } \ |
4295 | + \ |
4296 | + void __attribute__ ((__overloadable__)) \ |
4297 | + vstore3(TYPE##3 data, size_t offset, MOD TYPE *p) \ |
4298 | + { \ |
4299 | + vstore2(data.lo, 0, &p[offset*3]); \ |
4300 | + p[offset*3+2] = data.s2; \ |
4301 | + } \ |
4302 | + \ |
4303 | + void __attribute__ ((__overloadable__)) \ |
4304 | + vstore4(TYPE##4 data, size_t offset, MOD TYPE *p) \ |
4305 | + { \ |
4306 | + vstore2(data.lo, 0, &p[offset*4]); \ |
4307 | + vstore2(data.hi, 0, &p[offset*4+2]); \ |
4308 | + } \ |
4309 | + \ |
4310 | + void __attribute__ ((__overloadable__)) \ |
4311 | + vstore8(TYPE##8 data, size_t offset, MOD TYPE *p) \ |
4312 | + { \ |
4313 | + vstore4(data.lo, 0, &p[offset*8]); \ |
4314 | + vstore4(data.hi, 0, &p[offset*8+4]); \ |
4315 | + } \ |
4316 | + \ |
4317 | + void __attribute__ ((__overloadable__)) \ |
4318 | + vstore16(TYPE##16 data, size_t offset, MOD TYPE *p) \ |
4319 | + { \ |
4320 | + vstore8(data.lo, 0, &p[offset*16]); \ |
4321 | + vstore8(data.hi, 0, &p[offset*16+8]); \ |
4322 | + } |
4323 | + |
4324 | + |
4325 | + |
4326 | +IMPLEMENT_VSTORE(char , __global) |
4327 | +IMPLEMENT_VSTORE(short , __global) |
4328 | +IMPLEMENT_VSTORE(int , __global) |
4329 | +IMPLEMENT_VSTORE(long , __global) |
4330 | +IMPLEMENT_VSTORE(uchar , __global) |
4331 | +IMPLEMENT_VSTORE(ushort, __global) |
4332 | +IMPLEMENT_VSTORE(uint , __global) |
4333 | +IMPLEMENT_VSTORE(ulong , __global) |
4334 | +IMPLEMENT_VSTORE(float , __global) |
4335 | +IMPLEMENT_VSTORE(double, __global) |
4336 | + |
4337 | +IMPLEMENT_VSTORE(char , __local) |
4338 | +IMPLEMENT_VSTORE(short , __local) |
4339 | +IMPLEMENT_VSTORE(int , __local) |
4340 | +IMPLEMENT_VSTORE(long , __local) |
4341 | +IMPLEMENT_VSTORE(uchar , __local) |
4342 | +IMPLEMENT_VSTORE(ushort, __local) |
4343 | +IMPLEMENT_VSTORE(uint , __local) |
4344 | +IMPLEMENT_VSTORE(ulong , __local) |
4345 | +IMPLEMENT_VSTORE(float , __local) |
4346 | +IMPLEMENT_VSTORE(double, __local) |
4347 | + |
4348 | +/* __private is not supported yet |
4349 | +IMPLEMENT_VSTORE(char , __private) |
4350 | +IMPLEMENT_VSTORE(short , __private) |
4351 | +IMPLEMENT_VSTORE(int , __private) |
4352 | +IMPLEMENT_VSTORE(long , __private) |
4353 | +IMPLEMENT_VSTORE(uchar , __private) |
4354 | +IMPLEMENT_VSTORE(ushort, __private) |
4355 | +IMPLEMENT_VSTORE(uint , __private) |
4356 | +IMPLEMENT_VSTORE(ulong , __private) |
4357 | +IMPLEMENT_VSTORE(float , __private) |
4358 | +IMPLEMENT_VSTORE(double, __private) |
4359 | +*/ |
4360 | |
4361 | === added directory 'lib/kernel/x86' |
4362 | === added file 'lib/kernel/x86/Makefile.am' |
4363 | --- lib/kernel/x86/Makefile.am 1970-01-01 00:00:00 +0000 |
4364 | +++ lib/kernel/x86/Makefile.am 2011-10-31 17:03:23 +0000 |
4365 | @@ -0,0 +1,169 @@ |
4366 | +# Process this file with automake to produce Makefile.in (in this, |
4367 | +# and all subdirectories). |
4368 | +# Makefile.am for pocl/lib/kernel/dummy. |
4369 | +# |
4370 | +# Copyright (c) 2011 Universidad Rey Juan Carlos |
4371 | +# |
4372 | +# Permission is hereby granted, free of charge, to any person obtaining a copy |
4373 | +# of this software and associated documentation files (the "Software"), to deal |
4374 | +# in the Software without restriction, including without limitation the rights |
4375 | +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
4376 | +# copies of the Software, and to permit persons to whom the Software is |
4377 | +# furnished to do so, subject to the following conditions: |
4378 | +# |
4379 | +# The above copyright notice and this permission notice shall be included in |
4380 | +# all copies or substantial portions of the Software. |
4381 | +# |
4382 | +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
4383 | +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
4384 | +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
4385 | +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
4386 | +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
4387 | +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
4388 | +# THE SOFTWARE. |
4389 | + |
4390 | +targetpkglibdir = $(pkglibdir)/x86 |
4391 | +targetpkglib_LIBRARIES = libkernel.a |
4392 | + |
4393 | +vpath %.cl @srcdir@/.. |
4394 | +vpath %.c @srcdir@/.. |
4395 | +vpath %.ll @srcdir@/.. |
4396 | + |
4397 | +libkernel_a_SOURCES = get_global_size.c \ |
4398 | + get_global_id.c \ |
4399 | + get_local_id.c \ |
4400 | + get_num_groups.c \ |
4401 | + get_group_id.c \ |
4402 | + as_type.cl \ |
4403 | + convert_type.cl \ |
4404 | + acos.cl \ |
4405 | + acosh.cl \ |
4406 | + acospi.cl \ |
4407 | + asin.cl \ |
4408 | + asinh.cl \ |
4409 | + asinpi.cl \ |
4410 | + atan.cl \ |
4411 | + atan2.cl \ |
4412 | + atan2pi.cl \ |
4413 | + atanh.cl \ |
4414 | + atanpi.cl \ |
4415 | + cbrt.cl \ |
4416 | + ceil.cl \ |
4417 | + copysign.cl \ |
4418 | + cos.cl \ |
4419 | + cosh.cl \ |
4420 | + cospi.cl \ |
4421 | + erfc.cl \ |
4422 | + erf.cl \ |
4423 | + exp.cl \ |
4424 | + exp2.cl \ |
4425 | + exp10.cl \ |
4426 | + expm1.cl \ |
4427 | + fabs.cl \ |
4428 | + fdim.cl \ |
4429 | + floor.cl \ |
4430 | + fma.cl \ |
4431 | + fmax.cl \ |
4432 | + fmin.cl \ |
4433 | + fmod.cl \ |
4434 | + fract.cl \ |
4435 | + hypot.cl \ |
4436 | + ilogb.cl \ |
4437 | + ldexp.cl \ |
4438 | + lgamma.cl \ |
4439 | + log.cl \ |
4440 | + log2.cl \ |
4441 | + log10.cl \ |
4442 | + log1p.cl \ |
4443 | + logb.cl \ |
4444 | + mad.cl \ |
4445 | + maxmag.cl \ |
4446 | + minmag.cl \ |
4447 | + nan.cl \ |
4448 | + nextafter.cl \ |
4449 | + pow.cl \ |
4450 | + pown.cl \ |
4451 | + powr.cl \ |
4452 | + remainder.cl \ |
4453 | + rint.cl \ |
4454 | + rootn.cl \ |
4455 | + round.cl \ |
4456 | + rsqrt.cl \ |
4457 | + sin.cl \ |
4458 | + sinh.cl \ |
4459 | + sinpi.cl \ |
4460 | + sqrt.cl \ |
4461 | + tan.cl \ |
4462 | + tanh.cl \ |
4463 | + tanpi.cl \ |
4464 | + tgamma.cl \ |
4465 | + trunc.cl \ |
4466 | + abs.cl \ |
4467 | + abs_diff.cl \ |
4468 | + add_sat.cl \ |
4469 | + hadd.cl \ |
4470 | + rhadd.cl \ |
4471 | + clamp.cl \ |
4472 | + clz.cl \ |
4473 | + mad_hi.cl \ |
4474 | + mad_sat.cl \ |
4475 | + max.cl \ |
4476 | + min.cl \ |
4477 | + mul_hi.cl \ |
4478 | + rotate.cl \ |
4479 | + sub_sat.cl \ |
4480 | + upsample.cl \ |
4481 | + mad24.cl \ |
4482 | + mul24.cl \ |
4483 | + degrees.cl \ |
4484 | + mix.cl \ |
4485 | + radians.cl \ |
4486 | + step.cl \ |
4487 | + smoothstep.cl \ |
4488 | + sign.cl \ |
4489 | + cross.cl \ |
4490 | + dot.cl \ |
4491 | + distance.cl \ |
4492 | + length.cl \ |
4493 | + normalize.cl \ |
4494 | + fast_distance.cl \ |
4495 | + fast_length.cl \ |
4496 | + fast_normalize.cl \ |
4497 | + isequal.cl \ |
4498 | + isnotequal.cl \ |
4499 | + isgreater.cl \ |
4500 | + isgreaterequal.cl \ |
4501 | + isless.cl \ |
4502 | + islessequal.cl \ |
4503 | + islessgreater.cl \ |
4504 | + isfinite.cl \ |
4505 | + isinf.cl \ |
4506 | + isnan.cl \ |
4507 | + isnormal.cl \ |
4508 | + isordered.cl \ |
4509 | + isunordered.cl \ |
4510 | + signbit.cl \ |
4511 | + any.cl \ |
4512 | + all.cl \ |
4513 | + bitselect.cl \ |
4514 | + select.cl \ |
4515 | + vload.cl \ |
4516 | + vstore.cl |
4517 | + |
4518 | +libkernel_a_LIBADD = barrier.o |
4519 | +EXTRA_DIST = barrier.ll |
4520 | + |
4521 | +RANLIB = `@LLVM_CONFIG@ --bindir`/llvm-ranlib |
4522 | +AR = `@LLVM_CONFIG@ --bindir`/llvm-ar |
4523 | + |
4524 | +.cl.o: |
4525 | + $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/_kernel.h -o $@ $< |
4526 | + |
4527 | +.c.o: |
4528 | + $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/_kernel.h -o $@ $< |
4529 | + |
4530 | +.ll.o: |
4531 | + $(LLVM_AS) -o $@ $< |
4532 | + |
4533 | +$(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h |
4534 | +$(libkernel_a_SOURCES:.cl=.o): $(top_srcdir)/include/_kernel.h |
4535 | |
4536 | === added file 'lib/kernel/x86/ceil.cl' |
4537 | --- lib/kernel/x86/ceil.cl 1970-01-01 00:00:00 +0000 |
4538 | +++ lib/kernel/x86/ceil.cl 2011-10-31 17:03:23 +0000 |
4539 | @@ -0,0 +1,149 @@ |
4540 | +/* OpenCL built-in library: ceil() |
4541 | + |
4542 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
4543 | + |
4544 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
4545 | + of this software and associated documentation files (the "Software"), to deal |
4546 | + in the Software without restriction, including without limitation the rights |
4547 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
4548 | + copies of the Software, and to permit persons to whom the Software is |
4549 | + furnished to do so, subject to the following conditions: |
4550 | + |
4551 | + The above copyright notice and this permission notice shall be included in |
4552 | + all copies or substantial portions of the Software. |
4553 | + |
4554 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
4555 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
4556 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
4557 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
4558 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
4559 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
4560 | + THE SOFTWARE. |
4561 | +*/ |
4562 | + |
4563 | +#define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \ |
4564 | + TYPE _cl_overloadable NAME(TYPE a) \ |
4565 | + { \ |
4566 | + typedef TYPE type; \ |
4567 | + return EXPR; \ |
4568 | + } |
4569 | + |
4570 | +#define IMPLEMENT_UPCAST(NAME, TYPE, UPTYPE, LO) \ |
4571 | + TYPE _cl_overloadable NAME(TYPE a) \ |
4572 | + { \ |
4573 | + return NAME(*(UPTYPE*)&a).LO; \ |
4574 | + } |
4575 | + |
4576 | +#define IMPLEMENT_SPLIT(NAME, TYPE, LO, HI) \ |
4577 | + TYPE _cl_overloadable NAME(TYPE a) \ |
4578 | + { \ |
4579 | + return (TYPE)(NAME(a.LO), NAME(a.HI)); \ |
4580 | + } |
4581 | + |
4582 | + |
4583 | + |
4584 | +#define _MM_FROUND_TO_NEAREST_INT 0x00 |
4585 | +#define _MM_FROUND_TO_NEG_INF 0x01 |
4586 | +#define _MM_FROUND_TO_POS_INF 0x02 |
4587 | +#define _MM_FROUND_TO_ZERO 0x03 |
4588 | +#define _MM_FROUND_CUR_DIRECTION 0x04 |
4589 | + |
4590 | +#define _MM_FROUND_RAISE_EXC 0x00 |
4591 | +#define _MM_FROUND_NO_EXC 0x08 |
4592 | + |
4593 | +#define _MM_FROUND_NINT (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC) |
4594 | +#define _MM_FROUND_FLOOR (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC) |
4595 | +#define _MM_FROUND_CEIL (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC) |
4596 | +#define _MM_FROUND_TRUNC (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC) |
4597 | +#define _MM_FROUND_RINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC) |
4598 | +#define _MM_FROUND_NEARBYINT (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC ) |
4599 | + |
4600 | + |
4601 | + |
4602 | +#define IMPLEMENT_CEIL_DIRECT_FLOAT __builtin_ceilf(a) |
4603 | +#define IMPLEMENT_CEIL_DIRECT_DOUBLE __builtin_ceil(a) |
4604 | +// Using only a single asm operand leads to better code, since LLVM |
4605 | +// doesn't seem to allocate input and output operands to the same |
4606 | +// register |
4607 | +#define IMPLEMENT_CEIL_SSE41_FLOAT \ |
4608 | + ({ \ |
4609 | + __asm__ ("roundss %[dst], %[dst], %[mode]" : \ |
4610 | + [dst] "+x" (a) : \ |
4611 | + [mode] "n" (_MM_FROUND_CEIL)); \ |
4612 | + a; \ |
4613 | + }) |
4614 | +#define IMPLEMENT_CEIL_SSE41_FLOAT4 \ |
4615 | + ({ \ |
4616 | + __asm__ ("roundps %[dst], %[dst], %[mode]" : \ |
4617 | + [dst] "+x" (a) : \ |
4618 | + [mode] "n" (_MM_FROUND_CEIL)); \ |
4619 | + a; \ |
4620 | + }) |
4621 | +#define IMPLEMENT_CEIL_AVX_FLOAT8 \ |
4622 | + ({ \ |
4623 | + __asm__ ("roundps256 %[dst], %[dst], %[mode]" : \ |
4624 | + [dst] "+x" (a) : \ |
4625 | + [mode] "n" (_MM_FROUND_CEIL)); \ |
4626 | + a; \ |
4627 | + }) |
4628 | +#define IMPLEMENT_CEIL_SSE41_DOUBLE \ |
4629 | + ({ \ |
4630 | + __asm__ ("roundsd %[dst], %[dst], %[mode]" : \ |
4631 | + [dst] "+x" (a) : \ |
4632 | + [mode] "n" (_MM_FROUND_CEIL)); \ |
4633 | + a; \ |
4634 | + }) |
4635 | +#define IMPLEMENT_CEIL_SSE41_DOUBLE2 \ |
4636 | + ({ \ |
4637 | + __asm__ ("roundpd %[dst], %[dst], %[mode]" : \ |
4638 | + [dst] "+x" (a) : \ |
4639 | + [mode] "n" (_MM_FROUND_CEIL)); \ |
4640 | + a; \ |
4641 | + }) |
4642 | +#define IMPLEMENT_CEIL_AVX_DOUBLE4 \ |
4643 | + ({ \ |
4644 | + __asm__ ("roundpd256 %[dst], %[dst], %[mode]" : \ |
4645 | + [dst] "+x" (a) : \ |
4646 | + [mode] "n" (_MM_FROUND_CEIL)); \ |
4647 | + a; \ |
4648 | + }) |
4649 | + |
4650 | + |
4651 | + |
4652 | +#ifdef __SSE4_1__ |
4653 | +IMPLEMENT_DIRECT(ceil, float , IMPLEMENT_CEIL_SSE41_FLOAT) |
4654 | +IMPLEMENT_UPCAST(ceil, float2 , float4, lo) |
4655 | +IMPLEMENT_UPCAST(ceil, float3 , float4, s012) |
4656 | +IMPLEMENT_DIRECT(ceil, float4 , IMPLEMENT_CEIL_SSE41_FLOAT4) |
4657 | +# ifdef __AVX__ |
4658 | +IMPLEMENT_DIRECT(ceil, float8 , IMPLEMENT_CEIL_AVX_FLOAT8) |
4659 | +# else |
4660 | +IMPLEMENT_SPLIT (ceil, float8 , lo, hi) |
4661 | +# endif |
4662 | +#else |
4663 | +IMPLEMENT_DIRECT(ceil, float , IMPLEMENT_CEIL_DIRECT_FLOAT) |
4664 | +IMPLEMENT_SPLIT (ceil, float2 , lo, hi) |
4665 | +IMPLEMENT_SPLIT (ceil, float3 , lo, s2) |
4666 | +IMPLEMENT_SPLIT (ceil, float4 , lo, hi) |
4667 | +IMPLEMENT_SPLIT (ceil, float8 , lo, hi) |
4668 | +#endif |
4669 | +IMPLEMENT_SPLIT (ceil, float16, lo, hi) |
4670 | + |
4671 | +#ifdef __SSE4_1__ |
4672 | +IMPLEMENT_DIRECT(ceil, double , IMPLEMENT_CEIL_SSE41_DOUBLE) |
4673 | +IMPLEMENT_DIRECT(ceil, double2 , IMPLEMENT_CEIL_SSE41_DOUBLE2) |
4674 | +# ifdef __AVX__ |
4675 | +IMPLEMENT_UPCAST(ceil, double3 , double4, s012) |
4676 | +IMPLEMENT_DIRECT(ceil, double4 , IMPLEMENT_CEIL_AVX_DOUBLE4) |
4677 | +# else |
4678 | +IMPLEMENT_SPLIT (ceil, double3 , lo, s2) |
4679 | +IMPLEMENT_SPLIT (ceil, double4 , lo, hi) |
4680 | +# endif |
4681 | +#else |
4682 | +IMPLEMENT_DIRECT(ceil, double , IMPLEMENT_CEIL_DIRECT_DOUBLE) |
4683 | +IMPLEMENT_SPLIT (ceil, double2 , lo, hi) |
4684 | +IMPLEMENT_SPLIT (ceil, double3 , lo, s2) |
4685 | +IMPLEMENT_SPLIT (ceil, double4 , lo, hi) |
4686 | +#endif |
4687 | +IMPLEMENT_SPLIT (ceil, double8 , lo, hi) |
4688 | +IMPLEMENT_SPLIT (ceil, double16, lo, hi) |
4689 | |
4690 | === added file 'lib/kernel/x86/copysign.cl' |
4691 | --- lib/kernel/x86/copysign.cl 1970-01-01 00:00:00 +0000 |
4692 | +++ lib/kernel/x86/copysign.cl 2011-10-31 17:03:23 +0000 |
4693 | @@ -0,0 +1,169 @@ |
4694 | +/* OpenCL built-in library: copysign() |
4695 | + |
4696 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
4697 | + |
4698 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
4699 | + of this software and associated documentation files (the "Software"), to deal |
4700 | + in the Software without restriction, including without limitation the rights |
4701 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
4702 | + copies of the Software, and to permit persons to whom the Software is |
4703 | + furnished to do so, subject to the following conditions: |
4704 | + |
4705 | + The above copyright notice and this permission notice shall be included in |
4706 | + all copies or substantial portions of the Software. |
4707 | + |
4708 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
4709 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
4710 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
4711 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
4712 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
4713 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
4714 | + THE SOFTWARE. |
4715 | +*/ |
4716 | + |
4717 | +#if 0 |
4718 | + |
4719 | +#include "../templates.h" |
4720 | + |
4721 | +// LLVM generates non-optimal code for this implementation |
4722 | +DEFINE_EXPR_V_VV(copysign, |
4723 | + ({ |
4724 | + int bits = CHAR_BIT * sizeof(stype); |
4725 | + jtype sign_mask = (jtype)1 << (jtype)(bits - 1); |
4726 | + jtype result = ((~sign_mask & *(jtype*)&a) | |
4727 | + ( sign_mask & *(jtype*)&b)); |
4728 | + *(vtype*)&result; |
4729 | + })) |
4730 | + |
4731 | +#endif |
4732 | + |
4733 | + |
4734 | + |
4735 | +#define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \ |
4736 | + TYPE _cl_overloadable NAME(TYPE a, TYPE b) \ |
4737 | + { \ |
4738 | + return EXPR; \ |
4739 | + } |
4740 | + |
4741 | +#define IMPLEMENT_UPCAST(NAME, TYPE, UPTYPE, LO) \ |
4742 | + TYPE _cl_overloadable NAME(TYPE a, TYPE b) \ |
4743 | + { \ |
4744 | + return NAME(*(UPTYPE*)&a, *(UPTYPE*)&b).LO; \ |
4745 | + } |
4746 | + |
4747 | +#define IMPLEMENT_SPLIT(NAME, TYPE, LO, HI) \ |
4748 | + TYPE _cl_overloadable NAME(TYPE a, TYPE b) \ |
4749 | + { \ |
4750 | + return (TYPE)(NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \ |
4751 | + } |
4752 | + |
4753 | + |
4754 | + |
4755 | +#define IMPLEMENT_COPYSIGN_DIRECT \ |
4756 | + ({ \ |
4757 | + int bits = CHAR_BIT * sizeof(stype); \ |
4758 | + jtype sign_mask = (jtype)1 << (jtype)(bits - 1); \ |
4759 | + jtype result = (~sign_mask & *(jtype*)&a) | (sign_mask & *(jtype*)&b); \ |
4760 | + *(vtype*)&result; \ |
4761 | + }) |
4762 | +#define IMPLEMENT_COPYSIGN_SSE_FLOAT4 \ |
4763 | + ({ \ |
4764 | + uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \ |
4765 | + __asm__ ("andps %[src], %[dst]" : \ |
4766 | + [dst] "+x" (a) : \ |
4767 | + [src] "x" (~sign_mask)); \ |
4768 | + __asm__ ("andps %[src], %[dst]" : \ |
4769 | + [dst] "+x" (b) : \ |
4770 | + [src] "x" (sign_mask)); \ |
4771 | + __asm__ ("orps %[src], %[dst]" : \ |
4772 | + [dst] "+x" (a) : \ |
4773 | + [src] "x" (b)); \ |
4774 | + a; \ |
4775 | + }) |
4776 | +#define IMPLEMENT_COPYSIGN_AVX_FLOAT8 \ |
4777 | + ({ \ |
4778 | + uint8 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, \ |
4779 | + 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \ |
4780 | + __asm__ ("andps256 %[src], %[dst]" : \ |
4781 | + [dst] "+x" (a) : \ |
4782 | + [src] "x" (~sign_mask)); \ |
4783 | + __asm__ ("andps256 %[src], %[dst]" : \ |
4784 | + [dst] "+x" (b) : \ |
4785 | + [src] "x" (sign_mask)); \ |
4786 | + __asm__ ("orps256 %[src], %[dst]" : \ |
4787 | + [dst] "+x" (a) : \ |
4788 | + [b] "x" (b)); \ |
4789 | + a; \ |
4790 | + }) |
4791 | +#define IMPLEMENT_COPYSIGN_SSE2_DOUBLE2 \ |
4792 | + ({ \ |
4793 | + ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL}; \ |
4794 | + __asm__ ("andpd %[src], %[dst]" : \ |
4795 | + [dst] "+x" (a) : \ |
4796 | + [src] "x" (~sign_mask)); \ |
4797 | + __asm__ ("andpd %[src], %[dst]" : \ |
4798 | + [dst] "+x" (b) : \ |
4799 | + [src] "x" (sign_mask)); \ |
4800 | + __asm__ ("orpd %[src], %[dst]" : \ |
4801 | + [dst] "+x" (a) : \ |
4802 | + [src] "x" (b)); \ |
4803 | + a; \ |
4804 | + }) |
4805 | +#define IMPLEMENT_COPYSIGN_AVX_DOUBLE4 \ |
4806 | + ({ \ |
4807 | + ulong4 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL, \ |
4808 | + 0x8000000000000000UL, 0x8000000000000000UL}; \ |
4809 | + __asm__ ("andpd256 %[src], %[dst]" : \ |
4810 | + [dst] "+x" (a) : \ |
4811 | + [src] "x" (~sign_mask)); \ |
4812 | + __asm__ ("andpd256 %[src], %[dst]" : \ |
4813 | + [dst] "+x" (b) : \ |
4814 | + [src] "x" (sign_mask)); \ |
4815 | + __asm__ ("orpd256 %[src], %[dst]" : \ |
4816 | + [dst] "+x" (a) : \ |
4817 | + [src] "x" (b)); \ |
4818 | + a; \ |
4819 | + }) |
4820 | + |
4821 | + |
4822 | + |
4823 | +#ifdef __SSE__ |
4824 | +IMPLEMENT_DIRECT(copysign, float , IMPLEMENT_COPYSIGN_SSE_FLOAT4) |
4825 | +IMPLEMENT_UPCAST(copysign, float2 , float4, lo) |
4826 | +IMPLEMENT_UPCAST(copysign, float3 , float4, s012) |
4827 | +IMPLEMENT_DIRECT(copysign, float4 , IMPLEMENT_COPYSIGN_SSE_FLOAT4) |
4828 | +# ifdef __AVX__ |
4829 | +IMPLEMENT_DIRECT(copysign, float8 , IMPLEMENT_COPYSIGN_AVX_FLOAT8) |
4830 | +# else |
4831 | +IMPLEMENT_SPLIT (copysign, float8 , lo, hi) |
4832 | +# endif |
4833 | +IMPLEMENT_SPLIT (copysign, float16, lo, hi) |
4834 | +#else |
4835 | +IMPLEMENT_DIRECT(copysign, float , IMPLEMENT_COPYSIGN_DIRECT) |
4836 | +IMPLEMENT_DIRECT(copysign, float2 , IMPLEMENT_COPYSIGN_DIRECT) |
4837 | +IMPLEMENT_DIRECT(copysign, float3 , IMPLEMENT_COPYSIGN_DIRECT) |
4838 | +IMPLEMENT_DIRECT(copysign, float4 , IMPLEMENT_COPYSIGN_DIRECT) |
4839 | +IMPLEMENT_DIRECT(copysign, float8 , IMPLEMENT_COPYSIGN_DIRECT) |
4840 | +IMPLEMENT_DIRECT(copysign, float16, IMPLEMENT_COPYSIGN_DIRECT) |
4841 | +#endif |
4842 | + |
4843 | +#ifdef __SSE2__ |
4844 | +IMPLEMENT_DIRECT(copysign, double , IMPLEMENT_COPYSIGN_SSE2_DOUBLE2) |
4845 | +IMPLEMENT_DIRECT(copysign, double2 , IMPLEMENT_COPYSIGN_SSE2_DOUBLE2) |
4846 | +# ifdef __AVX__ |
4847 | +IMPLEMENT_UPCAST(copysign, double3 , double4, s012) |
4848 | +IMPLEMENT_DIRECT(copysign, double4 , IMPLEMENT_COPYSIGN_AVX_DOUBLE4) |
4849 | +# else |
4850 | +IMPLEMENT_SPLIT (copysign, double3 , lo, s2) |
4851 | +IMPLEMENT_SPLIT (copysign, double4 , lo, hi) |
4852 | +# endif |
4853 | +IMPLEMENT_SPLIT (copysign, double8 , lo, hi) |
4854 | +IMPLEMENT_SPLIT (copysign, double16, lo, hi) |
4855 | +#else |
4856 | +IMPLEMENT_DIRECT(copysign, double , IMPLEMENT_COPYSIGN_DIRECT) |
4857 | +IMPLEMENT_DIRECT(copysign, double2 , IMPLEMENT_COPYSIGN_DIRECT) |
4858 | +IMPLEMENT_DIRECT(copysign, double3 , IMPLEMENT_COPYSIGN_DIRECT) |
4859 | +IMPLEMENT_DIRECT(copysign, double4 , IMPLEMENT_COPYSIGN_DIRECT) |
4860 | +IMPLEMENT_DIRECT(copysign, double8 , IMPLEMENT_COPYSIGN_DIRECT) |
4861 | +IMPLEMENT_DIRECT(copysign, double16, IMPLEMENT_COPYSIGN_DIRECT) |
4862 | +#endif |
4863 | |
4864 | === added file 'lib/kernel/x86/fabs.cl' |
4865 | --- lib/kernel/x86/fabs.cl 1970-01-01 00:00:00 +0000 |
4866 | +++ lib/kernel/x86/fabs.cl 2011-10-31 17:03:23 +0000 |
4867 | @@ -0,0 +1,144 @@ |
4868 | +/* OpenCL built-in library: fabs() |
4869 | + |
4870 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
4871 | + |
4872 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
4873 | + of this software and associated documentation files (the "Software"), to deal |
4874 | + in the Software without restriction, including without limitation the rights |
4875 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
4876 | + copies of the Software, and to permit persons to whom the Software is |
4877 | + furnished to do so, subject to the following conditions: |
4878 | + |
4879 | + The above copyright notice and this permission notice shall be included in |
4880 | + all copies or substantial portions of the Software. |
4881 | + |
4882 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
4883 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
4884 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
4885 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
4886 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
4887 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
4888 | + THE SOFTWARE. |
4889 | +*/ |
4890 | + |
4891 | +#if 0 |
4892 | + |
4893 | +#include "../templates.h" |
4894 | + |
4895 | +// LLVM generates non-optimal code for this implementation |
4896 | +DEFINE_EXPR_V_V(fabs, |
4897 | + ({ |
4898 | + int bits = CHAR_BIT * sizeof(stype); |
4899 | + jtype sign_mask = (jtype)1 << (jtype)(bits - 1); |
4900 | + jtype result = ~sign_mask & *(jtype*)&a; |
4901 | + *(vtype*)&result; |
4902 | + })) |
4903 | + |
4904 | +#endif |
4905 | + |
4906 | + |
4907 | + |
4908 | +#define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \ |
4909 | + TYPE _cl_overloadable NAME(TYPE a) \ |
4910 | + { \ |
4911 | + return EXPR; \ |
4912 | + } |
4913 | + |
4914 | +#define IMPLEMENT_UPCAST(NAME, TYPE, UPTYPE, LO) \ |
4915 | + TYPE _cl_overloadable NAME(TYPE a) \ |
4916 | + { \ |
4917 | + return NAME(*(UPTYPE*)&a).LO; \ |
4918 | + } |
4919 | + |
4920 | +#define IMPLEMENT_SPLIT(NAME, TYPE, LO, HI) \ |
4921 | + TYPE _cl_overloadable NAME(TYPE a) \ |
4922 | + { \ |
4923 | + return (TYPE)(NAME(a.LO), NAME(a.HI)); \ |
4924 | + } |
4925 | + |
4926 | + |
4927 | + |
4928 | +#define IMPLEMENT_FABS_DIRECT \ |
4929 | + ({ \ |
4930 | + int bits = CHAR_BIT * sizeof(stype); \ |
4931 | + jtype sign_mask = (jtype)1 << (jtype)(bits - 1); \ |
4932 | + jtype result = ~sign_mask & *(jtype*)&a; \ |
4933 | + *(vtype*)&result; \ |
4934 | + }) |
4935 | +#define IMPLEMENT_FABS_SSE_FLOAT4 \ |
4936 | + ({ \ |
4937 | + uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \ |
4938 | + __asm__ ("andps %[src], %[dst]" : \ |
4939 | + [dst] "+x" (a) : \ |
4940 | + [src] "x" (~sign_mask)); \ |
4941 | + a; \ |
4942 | + }) |
4943 | +#define IMPLEMENT_FABS_AVX_FLOAT8 \ |
4944 | + ({ \ |
4945 | + uint8 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, \ |
4946 | + 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \ |
4947 | + __asm__ ("andps256 %[src], %[dst]" : \ |
4948 | + [dst] "=x" (a) : \ |
4949 | + "[dst]" (a), [src] "x" (~sign_mask)); \ |
4950 | + a; \ |
4951 | + }) |
4952 | +#define IMPLEMENT_FABS_SSE2_DOUBLE2 \ |
4953 | + ({ \ |
4954 | + ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL}; \ |
4955 | + __asm__ ("andpd %[src], %[dst]" : \ |
4956 | + [dst] "=x" (a) : \ |
4957 | + "[dst]" (a), [src] "x" (~sign_mask)); \ |
4958 | + a; \ |
4959 | + }) |
4960 | +#define IMPLEMENT_FABS_AVX_DOUBLE4 \ |
4961 | + ({ \ |
4962 | + ulong4 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL, \ |
4963 | + 0x8000000000000000UL, 0x8000000000000000UL}; \ |
4964 | + __asm__ ("andpd256 %[src], %[dst]" : \ |
4965 | + [dst] "=x" (a) : \ |
4966 | + "[dst]" (a), [src] "x" (~sign_mask)); \ |
4967 | + a; \ |
4968 | + }) |
4969 | + |
4970 | + |
4971 | + |
4972 | +#ifdef __SSE__ |
4973 | +IMPLEMENT_UPCAST(fabs, float , float2, lo) |
4974 | +IMPLEMENT_UPCAST(fabs, float2 , float4, lo) |
4975 | +IMPLEMENT_UPCAST(fabs, float3 , float4, s012) |
4976 | +IMPLEMENT_DIRECT(fabs, float4 , IMPLEMENT_FABS_SSE_FLOAT4) |
4977 | +# ifdef __AVX__ |
4978 | +IMPLEMENT_DIRECT(fabs, float8 , IMPLEMENT_FABS_AVX_FLOAT8) |
4979 | +# else |
4980 | +IMPLEMENT_SPLIT (fabs, float8 , lo, hi) |
4981 | +# endif |
4982 | +IMPLEMENT_SPLIT (fabs, float16, lo, hi) |
4983 | +#else |
4984 | +IMPLEMENT_DIRECT(fabs, float , IMPLEMENT_FABS_DIRECT) |
4985 | +IMPLEMENT_DIRECT(fabs, float2 , IMPLEMENT_FABS_DIRECT) |
4986 | +IMPLEMENT_DIRECT(fabs, float3 , IMPLEMENT_FABS_DIRECT) |
4987 | +IMPLEMENT_DIRECT(fabs, float4 , IMPLEMENT_FABS_DIRECT) |
4988 | +IMPLEMENT_DIRECT(fabs, float8 , IMPLEMENT_FABS_DIRECT) |
4989 | +IMPLEMENT_DIRECT(fabs, float16, IMPLEMENT_FABS_DIRECT) |
4990 | +#endif |
4991 | + |
4992 | +#ifdef __SSE2__ |
4993 | +IMPLEMENT_UPCAST(fabs, double , double2, lo) |
4994 | +IMPLEMENT_DIRECT(fabs, double2 , IMPLEMENT_FABS_SSE2_DOUBLE2) |
4995 | +# ifdef __AVX__ |
4996 | +IMPLEMENT_UPCAST(fabs, double3 , double4, s012) |
4997 | +IMPLEMENT_DIRECT(fabs, double4 , IMPLEMENT_FABS_AVX_DOUBLE4) |
4998 | +# else |
4999 | +IMPLEMENT_SPLIT (fabs, double3 , lo, s2) |
5000 | +IMPLEMENT_SPLIT (fabs, double4 , lo, hi) |
I am getting some erros buiding your branch:
----
../../. ./../.. /src/pocl. schnetter/ lib/kernel/ x86/max. cl:156: 32: error: invalid conversion between ext-vector type 'long2' and 'ulong2' DIRECT( max, ulong2 , (long2 )(a>=b) ? a : b) ~~~~~~~ ~~~~~~~ ~~~~~~~ ~~~^~~~ ~~~~~~~ ~~~~~~~ ~~~~~ ./../.. /src/pocl. schnetter/ lib/kernel/ x86/max. cl:29:12: note: expanded from: ./../.. /src/pocl. schnetter/ lib/kernel/ x86/max. cl:157: 32: error: invalid conversion between ext-vector type 'long3' and 'ulong3' DIRECT( max, ulong3 , (long3 )(a>=b) ? a : b) ~~~~~~~ ~~~~~~~ ~~~~~~~ ~~~^~~~ ~~~~~~~ ~~~~~~~ ~~~~~ ./../.. /src/pocl. schnetter/ lib/kernel/ x86/max. cl:29:12: note: expanded from: ./../.. /src/pocl. schnetter/ lib/kernel/ x86/max. cl:158: 32: error: invalid conversion between ext-vector type 'long4' and 'ulong4' DIRECT( max, ulong4 , (long4 )(a>=b) ? a : b) ~~~~~~~ ~~~~~~~ ~~~~~~~ ~~~^~~~ ~~~~~~~ ~~~~~~~ ~~~~~ ./../.. /src/pocl. schnetter/ lib/kernel/ x86/max. cl:29:12: note: expanded from: ./../.. /src/pocl. schnetter/ lib/kernel/ x86/max. cl:159: 32: error: invalid conversion between ext-vector type 'long8' and 'ulong8' DIRECT( max, ulong8 , (long8 )(a>=b) ? a : b) ~~~~~~~ ~~~~~~~ ~~~~~~~ ~~~^~~~ ~~~~~~~ ~~~~~~~ ~~~~~ ./../.. /src/pocl. schnetter/ lib/kernel/ x86/max. cl:29:12: note: expanded from: ./../.. /src/pocl. schnetter/ lib/kernel/ x86/max. cl:160: 32: error: invalid conversion between ext-vector type 'long16' and 'ulong16' DIRECT( max, ulong16, (long16)(a>=b) ? a : b) ~~~~~~~ ~~~~~~~ ~~~~~~~ ~~~^~~~ ~~~~~~~ ~~~~~~~ ~~~~~ ./../.. /src/pocl. schnetter/ lib/kernel/ x86/max. cl:29:12: note: expanded from:
IMPLEMENT_
~~~~~~~
../../.
return EXPR; \
^
../../.
IMPLEMENT_
~~~~~~~
../../.
return EXPR; \
^
../../.
IMPLEMENT_
~~~~~~~
../../.
return EXPR; \
^
../../.
IMPLEMENT_
~~~~~~~
../../.
return EXPR; \
^
../../.
IMPLEMENT_
~~~~~~~
../../.
return EXPR; \
^
5 errors generated.
----
Can you have a look at those?
BR
Carlos