Merge lp:~schnetter/pocl/main into lp:~pocl/pocl/trunk

Proposed by Erik Schnetter
Status: Merged
Merge reported by: Pekka Jääskeläinen
Merged at revision: not available
Proposed branch: lp:~schnetter/pocl/main
Merge into: lp:~pocl/pocl/trunk
Diff against target: 921 lines (+540/-42)
28 files modified
.bzrignore (+2/-0)
clconfig.h.in (+8/-3)
configure.ac (+1/-0)
include/_kernel.h (+115/-0)
include/arm/types.h (+1/-0)
include/tce/types.h (+1/-0)
include/types.h (+13/-7)
include/x86_64/types.h (+1/-0)
lib/kernel/cos.cl (+3/-0)
lib/kernel/divide.cl (+27/-0)
lib/kernel/exp.cl (+3/-0)
lib/kernel/exp10.cl (+3/-0)
lib/kernel/exp2.cl (+3/-0)
lib/kernel/log.cl (+3/-0)
lib/kernel/log10.cl (+3/-0)
lib/kernel/log2.cl (+3/-0)
lib/kernel/powr.cl (+3/-0)
lib/kernel/recip.cl (+27/-0)
lib/kernel/rsqrt.cl (+3/-0)
lib/kernel/sin.cl (+3/-0)
lib/kernel/sources.mk (+5/-1)
lib/kernel/sqrt.cl (+3/-0)
lib/kernel/tan.cl (+3/-0)
lib/kernel/templates.h (+56/-21)
lib/kernel/vload.cl (+1/-1)
lib/kernel/vload_half.cl (+113/-0)
lib/kernel/vstore_half.cl (+124/-0)
lib/kernel/x86_64/fabs.cl (+9/-9)
To merge this branch: bzr merge lp:~schnetter/pocl/main
Reviewer Review Type Date Requested Status
pocl maintaners Pending
Review via email: mp+85761@code.launchpad.net

Description of the change

I added support for the half datatype, protected by #ifdef cl_khr_fp16, analogous to cl_khr_fp64. I don't know which targets support this datatype (presumably all, since llvm supports them?), so I enabled this for all targets -- this will break things if this is wrong.

To post a comment you must log in.
lp:~schnetter/pocl/main updated
134. By Erik Schnetter

Correct alternative (unused) fabs implementation

135. By Erik Schnetter

Auto-detect whether the half type is supported

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

On 12/15/2011 01:09 AM, Erik Schnetter wrote:
> Erik Schnetter has proposed merging lp:~schnetter/pocl/main into lp:pocl.
>
> Requested reviews: pocl maintaners (pocl)
>
> For more details, see:
> https://code.launchpad.net/~schnetter/pocl/main/+merge/85761
>
> I added support for the half datatype, protected by #ifdef cl_khr_fp16,
> analogous to cl_khr_fp64. I don't know which targets support this datatype
> (presumably all, since llvm supports them?), so I enabled this for all
> targets -- this will break things if this is wrong.

Just curious...

How does LLVM/Clang support the half by default nowadays? I've heard that for
NVIDIA GPUs, for example, the half is supported only as a storage format. That
is, you have the float in 16bit format in memory but whenever you compute
something with halfs, they are converted to single precision floats to avoid
the need for separate floating point units for halfs.

Just curious to hear what happens when you use half floats in LLVM/Clang
now -- do they convert them to single precision fp whenever computation occurs?
The last time I checked, 'half' was not a datatype in the LLVM IR
thus they could not be selected (to be implemented with the target ISA) nicely.

It seems there are only two intrinsics for halfs available:
http://llvm.org/docs/LangRef.html#int_fp16

Does Clang generate those automatically for halfs in OpenCL C now? For example
if you perform a basic operation halfA + halfB, what happens?

I'm interested in a proper half support as for embedded/mobile it is more
beneficial than just for saving the memory bandwidth as you can save in the area
of the FPU, improve the speed, lower the energy consumption, etc. if you
can do with half floats for your computations. But I think they do not accept
it as a proper datatype in LLVM before there is a real (read: off-the-shelf)
target in LLVM that supports it natively.

--
--Pekka

Revision history for this message
Erik Schnetter (schnetter) wrote :
Download full text (3.6 KiB)

OpenCL supports only two operations for halfs: vload_half, converting it to
a float, and vstore_half, converting from a float. Nothing else exists
explicitly, not even vectors of halfs. Essentially the only thing one can
do with the half type is to pass a half* to these load/store routines.

There are routines such as float sin_half(float) that are only required to
have the precision offered by datatype half (allowing optimisations), but
the API is via float. There is text in the standard presumable allowing
this to be optimised to use operations that act directly on half values,
but this is not required.

I added code to detect whether clang supports half (called __fp16 in C),
and if so, these vload_half/vload_store routines are available. sin_half
and friends are always available, forwarding to their float counterparts by
default -- I assume that target-specific optimisations can do better.

-erik

2011/12/15 Pekka Jääskeläinen <email address hidden>

> On 12/15/2011 01:09 AM, Erik Schnetter wrote:
> > Erik Schnetter has proposed merging lp:~schnetter/pocl/main into lp:pocl.
> >
> > Requested reviews: pocl maintaners (pocl)
> >
> > For more details, see:
> > https://code.launchpad.net/~schnetter/pocl/main/+merge/85761
> >
> > I added support for the half datatype, protected by #ifdef cl_khr_fp16,
> > analogous to cl_khr_fp64. I don't know which targets support this
> datatype
> > (presumably all, since llvm supports them?), so I enabled this for all
> > targets -- this will break things if this is wrong.
>
> Just curious...
>
> How does LLVM/Clang support the half by default nowadays? I've heard that
> for
> NVIDIA GPUs, for example, the half is supported only as a storage format.
> That
> is, you have the float in 16bit format in memory but whenever you compute
> something with halfs, they are converted to single precision floats to
> avoid
> the need for separate floating point units for halfs.
>
> Just curious to hear what happens when you use half floats in LLVM/Clang
> now -- do they convert them to single precision fp whenever computation
> occurs?
> The last time I checked, 'half' was not a datatype in the LLVM IR
> thus they could not be selected (to be implemented with the target ISA)
> nicely.
>
> It seems there are only two intrinsics for halfs available:
> http://llvm.org/docs/LangRef.html#int_fp16
>
> Does Clang generate those automatically for halfs in OpenCL C now? For
> example
> if you perform a basic operation halfA + halfB, what happens?
>
> I'm interested in a proper half support as for embedded/mobile it is more
> beneficial than just for saving the memory bandwidth as you can save in
> the area
> of the FPU, improve the speed, lower the energy consumption, etc. if you
> can do with half floats for your computations. But I think they do not
> accept
> it as a proper datatype in LLVM before there is a real (read:
> off-the-shelf)
> target in LLVM that supports it natively.
>
> --
> --Pekka
>
>
>
> ------------------------------------------------------------------------------
> 10 Tips for Better Server Consolidation
> Server virtualization is being driven by many needs.
> But none more important than the need to red...

Read more...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '.bzrignore'
2--- .bzrignore 2011-12-14 01:58:32 +0000
3+++ .bzrignore 2011-12-15 04:21:23 +0000
4@@ -14,6 +14,7 @@
5 # builddir
6
7 Makefile
8+pocl.pc
9 .deps
10 .libs
11 *.lo
12@@ -21,6 +22,7 @@
13 *.la
14
15 ./libtool
16+./clconfig.h
17 ./config.h
18 ./config.log
19 ./config.status
20
21=== modified file 'clconfig.h.in'
22--- clconfig.h.in 2011-11-29 16:59:14 +0000
23+++ clconfig.h.in 2011-12-15 04:21:23 +0000
24@@ -1,8 +1,13 @@
25+/* The size of `long', as computed by sizeof. */
26+#undef SIZEOF_LONG
27+
28+/* The size of `half', as computed by sizeof. */
29+#undef SIZEOF___FP16
30+/* The OpenCL type `half' is called `__fp16' in C */
31+#define SIZEOF_HALF SIZEOF___FP16
32+
33 /* The size of `double', as computed by sizeof. */
34 #undef SIZEOF_DOUBLE
35
36-/* The size of `long', as computed by sizeof. */
37-#undef SIZEOF_LONG
38-
39 /* The size of `void *', as computed by sizeof. */
40 #undef SIZEOF_VOID_P
41
42=== modified file 'configure.ac'
43--- configure.ac 2011-12-14 17:59:11 +0000
44+++ configure.ac 2011-12-15 04:21:23 +0000
45@@ -222,6 +222,7 @@
46
47 # Checks for typedefs, structures, and compiler characteristics.
48 AC_CHECK_SIZEOF([long])
49+AC_CHECK_SIZEOF([__fp16])
50 AC_CHECK_SIZEOF([double])
51 AC_CHECK_SIZEOF([void *])
52 AC_CHECK_ALIGNOF([float16], [typedef float float16 __attribute__((__ext_vector_type__(16)));])
53
54=== modified file 'include/_kernel.h'
55--- include/_kernel.h 2011-12-14 01:11:29 +0000
56+++ include/_kernel.h 2011-12-15 04:21:23 +0000
57@@ -42,12 +42,22 @@
58 #else
59 # define __IF_INT64(x)
60 #endif
61+#ifdef cl_khr_fp16
62+# define __IF_FP16(x) x
63+#else
64+# define __IF_FP16(x)
65+#endif
66 #ifdef cl_khr_fp64
67 # define __IF_FP64(x) x
68 #else
69 # define __IF_FP64(x)
70 #endif
71
72+#if defined(cl_khr_fp64) && !defined(cles_khr_int64)
73+# error "cl_khr_fp64 requires cles_khr_int64"
74+#endif
75+
76+
77
78 /* A static assert statement to catch inconsistencies at build time */
79 #define _cl_static_assert(_t, _x) typedef int ai##_t[(_x) ? 1 : -1]
80@@ -79,6 +89,10 @@
81 typedef struct error_undefined_type_ulong error_undefined_type_ulong;
82 # define ulong error_undefined_type_ulong
83 #endif
84+#ifndef cl_khr_fp16
85+typedef struct error_undefined_type_half error_undefined_type_half;
86+# define half error_undefined_type_half
87+#endif
88 #ifndef cl_khr_fp64
89 typedef struct error_undefined_type_double error_undefined_type_double;
90 # define double error_undefined_type_double
91@@ -210,6 +224,11 @@
92 _cl_static_assert(ulong16, sizeof(ulong16) == 16*sizeof(ulong));
93 #endif
94
95+#ifdef cl_khr_fp16
96+_cl_static_assert(half, sizeof(half) == 2);
97+/* There are no vectors of type half */
98+#endif
99+
100 _cl_static_assert(float , sizeof(float ) == 4);
101 _cl_static_assert(float2 , sizeof(float2 ) == 2 *sizeof(float));
102 _cl_static_assert(float3 , sizeof(float3 ) == 4 *sizeof(float));
103@@ -506,6 +525,7 @@
104 * J: vector of int
105 * U: vector of uint or ulong
106 * S: scalar (float or double)
107+ * F: vector of float
108 * V: vector of float or double
109 */
110
111@@ -777,6 +797,20 @@
112 double _cl_overloadable NAME(double4 , double4 ); \
113 double _cl_overloadable NAME(double8 , double8 ); \
114 double _cl_overloadable NAME(double16, double16);)
115+#define _CL_DECLARE_FUNC_F_F(NAME) \
116+ float _cl_overloadable NAME(float ); \
117+ float2 _cl_overloadable NAME(float2 ); \
118+ float3 _cl_overloadable NAME(float3 ); \
119+ float4 _cl_overloadable NAME(float4 ); \
120+ float8 _cl_overloadable NAME(float8 ); \
121+ float16 _cl_overloadable NAME(float16 );
122+#define _CL_DECLARE_FUNC_F_FF(NAME) \
123+ float _cl_overloadable NAME(float , float ); \
124+ float2 _cl_overloadable NAME(float2 , float2 ); \
125+ float3 _cl_overloadable NAME(float3 , float3 ); \
126+ float4 _cl_overloadable NAME(float4 , float4 ); \
127+ float8 _cl_overloadable NAME(float8 , float8 ); \
128+ float16 _cl_overloadable NAME(float16 , float16 );
129
130 /* Move built-in declarations out of the way. (There should be a
131 better way of doing so.) These five functions are built-in math
132@@ -877,6 +911,35 @@
133 _CL_DECLARE_FUNC_V_V(tgamma)
134 _CL_DECLARE_FUNC_V_V(trunc)
135
136+_CL_DECLARE_FUNC_F_F(half_cos)
137+_CL_DECLARE_FUNC_F_FF(half_divide)
138+_CL_DECLARE_FUNC_F_F(half_exp)
139+_CL_DECLARE_FUNC_F_F(half_exp2)
140+_CL_DECLARE_FUNC_F_F(half_exp10)
141+_CL_DECLARE_FUNC_F_F(half_log)
142+_CL_DECLARE_FUNC_F_F(half_log2)
143+_CL_DECLARE_FUNC_F_F(half_log10)
144+_CL_DECLARE_FUNC_F_FF(half_powr)
145+_CL_DECLARE_FUNC_F_F(half_recip)
146+_CL_DECLARE_FUNC_F_F(half_rsqrt)
147+_CL_DECLARE_FUNC_F_F(half_sin)
148+_CL_DECLARE_FUNC_F_F(half_sqrt)
149+_CL_DECLARE_FUNC_F_F(half_tan)
150+_CL_DECLARE_FUNC_F_F(native_cos)
151+_CL_DECLARE_FUNC_F_FF(native_divide)
152+_CL_DECLARE_FUNC_F_F(native_exp)
153+_CL_DECLARE_FUNC_F_F(native_exp2)
154+_CL_DECLARE_FUNC_F_F(native_exp10)
155+_CL_DECLARE_FUNC_F_F(native_log)
156+_CL_DECLARE_FUNC_F_F(native_log2)
157+_CL_DECLARE_FUNC_F_F(native_log10)
158+_CL_DECLARE_FUNC_F_FF(native_powr)
159+_CL_DECLARE_FUNC_F_F(native_recip)
160+_CL_DECLARE_FUNC_F_F(native_rsqrt)
161+_CL_DECLARE_FUNC_F_F(native_sin)
162+_CL_DECLARE_FUNC_F_F(native_sqrt)
163+_CL_DECLARE_FUNC_F_F(native_tan)
164+
165
166
167 /* Integer Constants */
168
169@@ -1495,6 +1558,58 @@
170 #endif
171 */
172
173+#ifdef cl_khr_fp16
174+
175+#define _CL_DECLARE_VLOAD_HALF(MOD) \
176+ float _cl_overloadable vload_half (size_t offset, const MOD half *p); \
177+ float2 _cl_overloadable vload_half2 (size_t offset, const MOD half *p); \
178+ float3 _cl_overloadable vload_half3 (size_t offset, const MOD half *p); \
179+ float4 _cl_overloadable vload_half4 (size_t offset, const MOD half *p); \
180+ float8 _cl_overloadable vload_half8 (size_t offset, const MOD half *p); \
181+ float16 _cl_overloadable vload_half16 (size_t offset, const MOD half *p); \
182+ float2 _cl_overloadable vloada_half2 (size_t offset, const MOD half *p); \
183+ float3 _cl_overloadable vloada_half3 (size_t offset, const MOD half *p); \
184+ float4 _cl_overloadable vloada_half4 (size_t offset, const MOD half *p); \
185+ float8 _cl_overloadable vloada_half8 (size_t offset, const MOD half *p); \
186+ float16 _cl_overloadable vloada_half16(size_t offset, const MOD half *p);
187+
188+_CL_DECLARE_VLOAD_HALF(__global)
189+_CL_DECLARE_VLOAD_HALF(__local)
190+_CL_DECLARE_VLOAD_HALF(__constant)
191+/* _CL_DECLARE_VLOAD_HALF(__private) */
192+
193+/* stores to half may have a suffix: _rte _rtz _rtp _rtn */
194+#define _CL_DECLARE_VSTORE_HALF(MOD, SUFFIX) \
195+ void _cl_overloadable vstore_half##SUFFIX (float data, size_t offset, MOD half *p); \
196+ void _cl_overloadable vstore_half2##SUFFIX (float2 data, size_t offset, MOD half *p); \
197+ void _cl_overloadable vstore_half3##SUFFIX (float3 data, size_t offset, MOD half *p); \
198+ void _cl_overloadable vstore_half4##SUFFIX (float4 data, size_t offset, MOD half *p); \
199+ void _cl_overloadable vstore_half8##SUFFIX (float8 data, size_t offset, MOD half *p); \
200+ void _cl_overloadable vstore_half16##SUFFIX (float16 data, size_t offset, MOD half *p); \
201+ void _cl_overloadable vstorea_half2##SUFFIX (float2 data, size_t offset, MOD half *p); \
202+ void _cl_overloadable vstorea_half3##SUFFIX (float3 data, size_t offset, MOD half *p); \
203+ void _cl_overloadable vstorea_half4##SUFFIX (float4 data, size_t offset, MOD half *p); \
204+ void _cl_overloadable vstorea_half8##SUFFIX (float8 data, size_t offset, MOD half *p); \
205+ void _cl_overloadable vstorea_half16##SUFFIX(float16 data, size_t offset, MOD half *p);
206+
207+_CL_DECLARE_VSTORE_HALF(__global , )
208+_CL_DECLARE_VSTORE_HALF(__global , _rte)
209+_CL_DECLARE_VSTORE_HALF(__global , _rtz)
210+_CL_DECLARE_VSTORE_HALF(__global , _rtp)
211+_CL_DECLARE_VSTORE_HALF(__global , _rtn)
212+_CL_DECLARE_VSTORE_HALF(__local , )
213+_CL_DECLARE_VSTORE_HALF(__local , _rte)
214+_CL_DECLARE_VSTORE_HALF(__local , _rtz)
215+_CL_DECLARE_VSTORE_HALF(__local , _rtp)
216+_CL_DECLARE_VSTORE_HALF(__local , _rtn)
217+/* _CL_DECLARE_VSTORE_HALF(__private , ) */
218+/* _CL_DECLARE_VSTORE_HALF(__private , _rte) */
219+/* _CL_DECLARE_VSTORE_HALF(__private , _rtz) */
220+/* _CL_DECLARE_VSTORE_HALF(__private , _rtp) */
221+/* _CL_DECLARE_VSTORE_HALF(__private , _rtn) */
222+
223+#endif
224+
225
226
227 /* Miscellaneous Vector Functions */
228
229
230=== modified file 'include/arm/types.h'
231--- include/arm/types.h 2011-12-01 17:21:52 +0000
232+++ include/arm/types.h 2011-12-15 04:21:23 +0000
233@@ -4,6 +4,7 @@
234
235 #define __EMBEDDED_PROFILE__ 1
236 #undef cles_khr_int64
237+#define cl_khr_fp16 /* ES: is this correct? */
238 #undef cl_khr_fp64
239
240 typedef uint size_t;
241
242=== modified file 'include/tce/types.h'
243--- include/tce/types.h 2011-12-01 17:21:52 +0000
244+++ include/tce/types.h 2011-12-15 04:21:23 +0000
245@@ -4,6 +4,7 @@
246
247 #define __EMBEDDED_PROFILE__ 1
248 #undef cles_khr_int64
249+#define cl_khr_fp16 /* ES: is this correct? */
250 #undef cl_khr_fp64
251
252 typedef uint size_t;
253
254=== modified file 'include/types.h'
255--- include/types.h 2011-12-01 17:21:52 +0000
256+++ include/types.h 2011-12-15 04:21:23 +0000
257@@ -13,16 +13,22 @@
258
259 #if SIZEOF_LONG == 8
260 # define cles_khr_int64
261-# if SIZEOF_DOUBLE == 8
262-# define cl_khr_fp64
263-# else
264-# undef cl_khr_fp64
265-# endif
266-#else /* SIZEOF_LONG != 8 */
267+#else
268 # define __EMBEDDED_PROFILE__ 1
269 # undef cles_khr_int64
270+#endif
271+
272+#if SIZEOF_HALF == 2
273+# define cl_khr_fp16
274+#else
275+# undef cl_khr_fp16
276+#endif
277+
278+#if SIZEOF_DOUBLE == 8
279+# define cl_khr_fp64
280+#else
281 # undef cl_khr_fp64
282-#endif /* SIZEOF_LONG != 8 */
283+#endif
284
285 #if SIZEOF_VOID_P == 8
286 typedef ulong size_t;
287
288=== modified file 'include/x86_64/types.h'
289--- include/x86_64/types.h 2011-12-09 16:01:21 +0000
290+++ include/x86_64/types.h 2011-12-15 04:21:23 +0000
291@@ -4,6 +4,7 @@
292 typedef unsigned long ulong;
293
294 #define cles_khr_int64
295+#define cl_khr_fp16
296 #define cl_khr_fp64
297
298 typedef ulong size_t;
299
300=== modified file 'lib/kernel/cos.cl'
301--- lib/kernel/cos.cl 2011-10-26 03:01:29 +0000
302+++ lib/kernel/cos.cl 2011-12-15 04:21:23 +0000
303@@ -24,3 +24,6 @@
304 #include "templates.h"
305
306 DEFINE_BUILTIN_V_V(cos)
307+
308+DEFINE_EXPR_F_F(half_cos, cos(a))
309+DEFINE_EXPR_F_F(native_cos, cos(a))
310
311=== added file 'lib/kernel/divide.cl'
312--- lib/kernel/divide.cl 1970-01-01 00:00:00 +0000
313+++ lib/kernel/divide.cl 2011-12-15 04:21:23 +0000
314@@ -0,0 +1,27 @@
315+/* OpenCL built-in library: divide()
316+
317+ Copyright (c) 2011 Universidad Rey Juan Carlos
318+
319+ Permission is hereby granted, free of charge, to any person obtaining a copy
320+ of this software and associated documentation files (the "Software"), to deal
321+ in the Software without restriction, including without limitation the rights
322+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
323+ copies of the Software, and to permit persons to whom the Software is
324+ furnished to do so, subject to the following conditions:
325+
326+ The above copyright notice and this permission notice shall be included in
327+ all copies or substantial portions of the Software.
328+
329+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
330+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
331+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
332+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
333+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
334+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
335+ THE SOFTWARE.
336+*/
337+
338+#include "templates.h"
339+
340+DEFINE_EXPR_F_FF(half_divide, a/b)
341+DEFINE_EXPR_F_FF(native_divide, a/b)
342
343=== modified file 'lib/kernel/exp.cl'
344--- lib/kernel/exp.cl 2011-10-26 03:01:29 +0000
345+++ lib/kernel/exp.cl 2011-12-15 04:21:23 +0000
346@@ -24,3 +24,6 @@
347 #include "templates.h"
348
349 DEFINE_BUILTIN_V_V(exp)
350+
351+DEFINE_EXPR_F_F(half_exp, exp(a))
352+DEFINE_EXPR_F_F(native_exp, exp(a))
353
354=== modified file 'lib/kernel/exp10.cl'
355--- lib/kernel/exp10.cl 2011-11-05 00:10:25 +0000
356+++ lib/kernel/exp10.cl 2011-12-15 04:21:23 +0000
357@@ -28,3 +28,6 @@
358 #else
359 DEFINE_EXPR_V_V(exp10, exp(M_LN10_F*a))
360 #endif
361+
362+DEFINE_EXPR_F_F(half_exp10, exp10(a))
363+DEFINE_EXPR_F_F(native_exp10, exp10(a))
364
365=== modified file 'lib/kernel/exp2.cl'
366--- lib/kernel/exp2.cl 2011-10-26 03:01:29 +0000
367+++ lib/kernel/exp2.cl 2011-12-15 04:21:23 +0000
368@@ -24,3 +24,6 @@
369 #include "templates.h"
370
371 DEFINE_BUILTIN_V_V(exp2)
372+
373+DEFINE_EXPR_F_F(half_exp2, exp2(a))
374+DEFINE_EXPR_F_F(native_exp2, exp2(a))
375
376=== modified file 'lib/kernel/log.cl'
377--- lib/kernel/log.cl 2011-10-26 03:01:29 +0000
378+++ lib/kernel/log.cl 2011-12-15 04:21:23 +0000
379@@ -24,3 +24,6 @@
380 #include "templates.h"
381
382 DEFINE_BUILTIN_V_V(log)
383+
384+DEFINE_EXPR_F_F(half_log, log(a))
385+DEFINE_EXPR_F_F(native_log, log(a))
386
387=== modified file 'lib/kernel/log10.cl'
388--- lib/kernel/log10.cl 2011-10-26 03:01:29 +0000
389+++ lib/kernel/log10.cl 2011-12-15 04:21:23 +0000
390@@ -24,3 +24,6 @@
391 #include "templates.h"
392
393 DEFINE_BUILTIN_V_V(log10)
394+
395+DEFINE_EXPR_F_F(half_log10, log10(a))
396+DEFINE_EXPR_F_F(native_log10, log10(a))
397
398=== modified file 'lib/kernel/log2.cl'
399--- lib/kernel/log2.cl 2011-10-26 03:01:29 +0000
400+++ lib/kernel/log2.cl 2011-12-15 04:21:23 +0000
401@@ -24,3 +24,6 @@
402 #include "templates.h"
403
404 DEFINE_BUILTIN_V_V(log2)
405+
406+DEFINE_EXPR_F_F(half_log2, log2(a))
407+DEFINE_EXPR_F_F(native_log2, log2(a))
408
409=== modified file 'lib/kernel/powr.cl'
410--- lib/kernel/powr.cl 2011-10-26 03:01:29 +0000
411+++ lib/kernel/powr.cl 2011-12-15 04:21:23 +0000
412@@ -24,3 +24,6 @@
413 #include "templates.h"
414
415 DEFINE_EXPR_V_VV(powr, pow(a, b))
416+
417+DEFINE_EXPR_F_FF(half_powr, powr(a, b))
418+DEFINE_EXPR_F_FF(native_powr, powr(a, b))
419
420=== added file 'lib/kernel/recip.cl'
421--- lib/kernel/recip.cl 1970-01-01 00:00:00 +0000
422+++ lib/kernel/recip.cl 2011-12-15 04:21:23 +0000
423@@ -0,0 +1,27 @@
424+/* OpenCL built-in library: recip()
425+
426+ Copyright (c) 2011 Universidad Rey Juan Carlos
427+
428+ Permission is hereby granted, free of charge, to any person obtaining a copy
429+ of this software and associated documentation files (the "Software"), to deal
430+ in the Software without restriction, including without limitation the rights
431+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
432+ copies of the Software, and to permit persons to whom the Software is
433+ furnished to do so, subject to the following conditions:
434+
435+ The above copyright notice and this permission notice shall be included in
436+ all copies or substantial portions of the Software.
437+
438+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
439+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
440+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
441+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
442+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
443+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
444+ THE SOFTWARE.
445+*/
446+
447+#include "templates.h"
448+
449+DEFINE_EXPR_F_F(half_recip, (stype)1/a)
450+DEFINE_EXPR_F_F(native_recip, (stype)1/a)
451
452=== modified file 'lib/kernel/rsqrt.cl'
453--- lib/kernel/rsqrt.cl 2011-11-05 00:10:25 +0000
454+++ lib/kernel/rsqrt.cl 2011-12-15 04:21:23 +0000
455@@ -24,3 +24,6 @@
456 #include "templates.h"
457
458 DEFINE_EXPR_V_V(rsqrt, (stype)1/sqrt(a))
459+
460+DEFINE_EXPR_F_F(half_rsqrt, rsqrt(a))
461+DEFINE_EXPR_F_F(native_rsqrt, rsqrt(a))
462
463=== modified file 'lib/kernel/sin.cl'
464--- lib/kernel/sin.cl 2011-10-26 03:01:29 +0000
465+++ lib/kernel/sin.cl 2011-12-15 04:21:23 +0000
466@@ -24,3 +24,6 @@
467 #include "templates.h"
468
469 DEFINE_BUILTIN_V_V(sin)
470+
471+DEFINE_EXPR_F_F(half_sin, sin(a))
472+DEFINE_EXPR_F_F(native_sin, sin(a))
473
474=== modified file 'lib/kernel/sources.mk'
475--- lib/kernel/sources.mk 2011-12-05 22:08:57 +0000
476+++ lib/kernel/sources.mk 2011-12-15 04:21:23 +0000
477@@ -72,6 +72,8 @@
478 tanpi.cl \
479 tgamma.cl \
480 trunc.cl \
481+ divide.cl \
482+ recip.cl \
483 abs.cl \
484 abs_diff.cl \
485 add_sat.cl \
486@@ -123,4 +125,6 @@
487 bitselect.cl \
488 select.cl \
489 vload.cl \
490- vstore.cl
491+ vstore.cl \
492+ vload_half.cl \
493+ vstore_half.cl
494
495=== modified file 'lib/kernel/sqrt.cl'
496--- lib/kernel/sqrt.cl 2011-10-29 14:08:33 +0000
497+++ lib/kernel/sqrt.cl 2011-12-15 04:21:23 +0000
498@@ -24,3 +24,6 @@
499 #include "templates.h"
500
501 DEFINE_BUILTIN_V_V(sqrt)
502+
503+DEFINE_EXPR_F_F(half_sqrt, sqrt(a))
504+DEFINE_EXPR_F_F(native_sqrt, sqrt(a))
505
506=== modified file 'lib/kernel/tan.cl'
507--- lib/kernel/tan.cl 2011-10-26 03:01:29 +0000
508+++ lib/kernel/tan.cl 2011-12-15 04:21:23 +0000
509@@ -24,3 +24,6 @@
510 #include "templates.h"
511
512 DEFINE_BUILTIN_V_V(tan)
513+
514+DEFINE_EXPR_F_F(half_tan, tan(a))
515+DEFINE_EXPR_F_F(native_tan, tan(a))
516
517=== modified file 'lib/kernel/templates.h'
518--- lib/kernel/templates.h 2011-12-14 01:10:26 +0000
519+++ lib/kernel/templates.h 2011-12-15 04:21:23 +0000
520@@ -275,28 +275,30 @@
521
522 /******************************************************************************/
523
524-#define IMPLEMENT_EXPR_V_V(NAME, EXPR, VTYPE, STYPE) \
525- VTYPE __attribute__ ((overloadable)) \
526- NAME(VTYPE a, VTYPE b) \
527- { \
528- typedef VTYPE vtype; \
529- typedef STYPE stype; \
530- return EXPR; \
531+#define IMPLEMENT_EXPR_V_V(NAME, EXPR, VTYPE, STYPE, JTYPE, SJTYPE) \
532+ VTYPE __attribute__ ((overloadable)) \
533+ NAME(VTYPE a, VTYPE b) \
534+ { \
535+ typedef VTYPE vtype; \
536+ typedef STYPE stype; \
537+ typedef JTYPE jtype; \
538+ typedef SJTYPE sjtype; \
539+ return EXPR; \
540 }
541-#define DEFINE_EXPR_V_V(NAME, EXPR) \
542- IMPLEMENT_EXPR_V_V(NAME, EXPR, float , float ) \
543- IMPLEMENT_EXPR_V_V(NAME, EXPR, float2 , float ) \
544- IMPLEMENT_EXPR_V_V(NAME, EXPR, float3 , float ) \
545- IMPLEMENT_EXPR_V_V(NAME, EXPR, float4 , float ) \
546- IMPLEMENT_EXPR_V_V(NAME, EXPR, float8 , float ) \
547- IMPLEMENT_EXPR_V_V(NAME, EXPR, float16 , float ) \
548- __IF_FP64( \
549- IMPLEMENT_EXPR_V_V(NAME, EXPR, double , double) \
550- IMPLEMENT_EXPR_V_V(NAME, EXPR, double2 , double) \
551- IMPLEMENT_EXPR_V_V(NAME, EXPR, double3 , double) \
552- IMPLEMENT_EXPR_V_V(NAME, EXPR, double4 , double) \
553- IMPLEMENT_EXPR_V_V(NAME, EXPR, double8 , double) \
554- IMPLEMENT_EXPR_V_V(NAME, EXPR, double16, double))
555+#define DEFINE_EXPR_V_V(NAME, EXPR) \
556+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float , float , int , int ) \
557+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float2 , float , int2 , int ) \
558+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float3 , float , int3 , int ) \
559+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float4 , float , int4 , int ) \
560+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float8 , float , int8 , int ) \
561+ IMPLEMENT_EXPR_V_V(NAME, EXPR, float16 , float , int16 , int ) \
562+ __IF_FP64( \
563+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double , double, long , long) \
564+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double2 , double, long2 , long) \
565+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double3 , double, long3 , long) \
566+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double4 , double, long4 , long) \
567+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double8 , double, long8 , long) \
568+ IMPLEMENT_EXPR_V_V(NAME, EXPR, double16, double, long16, long))
569
570 #define IMPLEMENT_EXPR_V_VV(NAME, EXPR, VTYPE, STYPE, JTYPE) \
571 VTYPE __attribute__ ((overloadable)) \
572@@ -608,6 +610,39 @@
573 IMPLEMENT_EXPR_V_SV(NAME, EXPR, double8 , double) \
574 IMPLEMENT_EXPR_V_SV(NAME, EXPR, double16, double))
575
576+#define IMPLEMENT_EXPR_F_F(NAME, EXPR, VTYPE, STYPE) \
577+ VTYPE __attribute__ ((overloadable)) \
578+ NAME(VTYPE a, VTYPE b) \
579+ { \
580+ typedef VTYPE vtype; \
581+ typedef STYPE stype; \
582+ return EXPR; \
583+ }
584+#define DEFINE_EXPR_F_F(NAME, EXPR) \
585+ IMPLEMENT_EXPR_F_F(NAME, EXPR, float , float ) \
586+ IMPLEMENT_EXPR_F_F(NAME, EXPR, float2 , float ) \
587+ IMPLEMENT_EXPR_F_F(NAME, EXPR, float3 , float ) \
588+ IMPLEMENT_EXPR_F_F(NAME, EXPR, float4 , float ) \
589+ IMPLEMENT_EXPR_F_F(NAME, EXPR, float8 , float ) \
590+ IMPLEMENT_EXPR_F_F(NAME, EXPR, float16 , float )
591+
592+#define IMPLEMENT_EXPR_F_FF(NAME, EXPR, VTYPE, STYPE, JTYPE) \
593+ VTYPE __attribute__ ((overloadable)) \
594+ NAME(VTYPE a, VTYPE b) \
595+ { \
596+ typedef VTYPE vtype; \
597+ typedef STYPE stype; \
598+ typedef JTYPE jtype; \
599+ return EXPR; \
600+ }
601+#define DEFINE_EXPR_F_FF(NAME, EXPR) \
602+ IMPLEMENT_EXPR_F_FF(NAME, EXPR, float , float , int ) \
603+ IMPLEMENT_EXPR_F_FF(NAME, EXPR, float2 , float , int2 ) \
604+ IMPLEMENT_EXPR_F_FF(NAME, EXPR, float3 , float , int3 ) \
605+ IMPLEMENT_EXPR_F_FF(NAME, EXPR, float4 , float , int4 ) \
606+ IMPLEMENT_EXPR_F_FF(NAME, EXPR, float8 , float , int8 ) \
607+ IMPLEMENT_EXPR_F_FF(NAME, EXPR, float16 , float , int16 )
608+
609
610
611 #define IMPLEMENT_BUILTIN_G_G(NAME, GTYPE, UGTYPE, LO, HI) \
612
613=== modified file 'lib/kernel/vload.cl'
614--- lib/kernel/vload.cl 2011-11-25 17:02:42 +0000
615+++ lib/kernel/vload.cl 2011-12-15 04:21:23 +0000
616@@ -1,4 +1,4 @@
617-/* OpenCL built-in library: vloa()
618+/* OpenCL built-in library: vload()
619
620 Copyright (c) 2011 Universidad Rey Juan Carlos
621
622
623=== added file 'lib/kernel/vload_half.cl'
624--- lib/kernel/vload_half.cl 1970-01-01 00:00:00 +0000
625+++ lib/kernel/vload_half.cl 2011-12-15 04:21:23 +0000
626@@ -0,0 +1,113 @@
627+/* OpenCL built-in library: vload_half()
628+
629+ Copyright (c) 2011 Universidad Rey Juan Carlos
630+
631+ Permission is hereby granted, free of charge, to any person obtaining a copy
632+ of this software and associated documentation files (the "Software"), to deal
633+ in the Software without restriction, including without limitation the rights
634+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
635+ copies of the Software, and to permit persons to whom the Software is
636+ furnished to do so, subject to the following conditions:
637+
638+ The above copyright notice and this permission notice shall be included in
639+ all copies or substantial portions of the Software.
640+
641+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
642+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
643+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
644+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
645+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
646+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
647+ THE SOFTWARE.
648+*/
649+
650+
651+
652+#ifdef cl_khr_fp16
653+
654+#define IMPLEMENT_VLOAD_HALF(MOD) \
655+ \
656+ float __attribute__ ((__overloadable__)) \
657+ vload_half(size_t offset, const MOD half *p) \
658+ { \
659+ return (float)p[offset]; \
660+ } \
661+ \
662+ float2 __attribute__ ((__overloadable__)) \
663+ vload_half2(size_t offset, const MOD half *p) \
664+ { \
665+ return (float2)(vload_half(0, &p[offset*2]), \
666+ vload_half(0, &p[offset*2+1])); \
667+ } \
668+ \
669+ float3 __attribute__ ((__overloadable__)) \
670+ vload_half3(size_t offset, const MOD half *p) \
671+ { \
672+ return (float3)(vload_half2(0, &p[offset*3]), \
673+ vload_half(0, &p[offset*3+2])); \
674+ } \
675+ \
676+ float4 __attribute__ ((__overloadable__)) \
677+ vload_half4(size_t offset, const MOD half *p) \
678+ { \
679+ return (float4)(vload_half2(0, &p[offset*4]), \
680+ vload_half2(0, &p[offset*4+2])); \
681+ } \
682+ \
683+ float8 __attribute__ ((__overloadable__)) \
684+ vload_half8(size_t offset, const MOD half *p) \
685+ { \
686+ return (float8)(vload_half4(0, &p[offset*8]), \
687+ vload_half4(0, &p[offset*8+4])); \
688+ } \
689+ \
690+ float16 __attribute__ ((__overloadable__)) \
691+ vload_half16(size_t offset, const MOD half *p) \
692+ { \
693+ return (float16)(vload_half8(0, &p[offset*16]), \
694+ vload_half8(0, &p[offset*16+8])); \
695+ } \
696+ \
697+ float2 __attribute__ ((__overloadable__)) \
698+ vloada_half2(size_t offset, const MOD half *p) \
699+ { \
700+ return (float2)(vload_half(0, &p[offset*2]), \
701+ vload_half(0, &p[offset*2+1])); \
702+ } \
703+ \
704+ float3 __attribute__ ((__overloadable__)) \
705+ vloada_half3(size_t offset, const MOD half *p) \
706+ { \
707+ return (float3)(vloada_half2(0, &p[offset*4]), \
708+ vload_half(0, &p[offset*4+2])); \
709+ } \
710+ \
711+ float4 __attribute__ ((__overloadable__)) \
712+ vloada_half4(size_t offset, const MOD half *p) \
713+ { \
714+ return (float4)(vloada_half2(0, &p[offset*4]), \
715+ vloada_half2(0, &p[offset*4+2])); \
716+ } \
717+ \
718+ float8 __attribute__ ((__overloadable__)) \
719+ vloada_half8(size_t offset, const MOD half *p) \
720+ { \
721+ return (float8)(vloada_half4(0, &p[offset*8]), \
722+ vloada_half4(0, &p[offset*8+4])); \
723+ } \
724+ \
725+ float16 __attribute__ ((__overloadable__)) \
726+ vloada_half16(size_t offset, const MOD half *p) \
727+ { \
728+ return (float16)(vloada_half8(0, &p[offset*16]), \
729+ vloada_half8(0, &p[offset*16+8])); \
730+ }
731+
732+
733+
734+IMPLEMENT_VLOAD_HALF(__global)
735+IMPLEMENT_VLOAD_HALF(__local)
736+IMPLEMENT_VLOAD_HALF(__constant)
737+/* IMPLEMENT_VLOAD_HALF(__private) */
738+
739+#endif
740
741=== added file 'lib/kernel/vstore_half.cl'
742--- lib/kernel/vstore_half.cl 1970-01-01 00:00:00 +0000
743+++ lib/kernel/vstore_half.cl 2011-12-15 04:21:23 +0000
744@@ -0,0 +1,124 @@
745+/* OpenCL built-in library: vstore_half()
746+
747+ Copyright (c) 2011 Universidad Rey Juan Carlos
748+
749+ Permission is hereby granted, free of charge, to any person obtaining a copy
750+ of this software and associated documentation files (the "Software"), to deal
751+ in the Software without restriction, including without limitation the rights
752+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
753+ copies of the Software, and to permit persons to whom the Software is
754+ furnished to do so, subject to the following conditions:
755+
756+ The above copyright notice and this permission notice shall be included in
757+ all copies or substantial portions of the Software.
758+
759+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
760+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
761+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
762+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
763+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
764+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
765+ THE SOFTWARE.
766+*/
767+
768+
769+
770+#ifdef cl_khr_fp16
771+
772+#define IMPLEMENT_VSTORE_HALF(MOD, SUFFIX) \
773+ \
774+ void __attribute__ ((__overloadable__)) \
775+ vstore_half##SUFFIX(float data, size_t offset, MOD half *p) \
776+ { \
777+ p[offset] = data; \
778+ } \
779+ \
780+ void __attribute__ ((__overloadable__)) \
781+ vstore_half2##SUFFIX(float2 data, size_t offset, MOD half *p) \
782+ { \
783+ vstore_half##SUFFIX(data.lo, 0, &p[offset*2]); \
784+ vstore_half##SUFFIX(data.hi, 0, &p[offset*2+1]); \
785+ } \
786+ \
787+ void __attribute__ ((__overloadable__)) \
788+ vstore_half3##SUFFIX(float3 data, size_t offset, MOD half *p) \
789+ { \
790+ vstore_half2##SUFFIX(data.lo, 0, &p[offset*3]); \
791+ vstore_half##SUFFIX(data.s2, 0, &p[offset*3+2]); \
792+ } \
793+ \
794+ void __attribute__ ((__overloadable__)) \
795+ vstore_half4##SUFFIX(float4 data, size_t offset, MOD half *p) \
796+ { \
797+ vstore_half2##SUFFIX(data.lo, 0, &p[offset*4]); \
798+ vstore_half2##SUFFIX(data.hi, 0, &p[offset*4+2]); \
799+ } \
800+ \
801+ void __attribute__ ((__overloadable__)) \
802+ vstore_half8##SUFFIX(float8 data, size_t offset, MOD half *p) \
803+ { \
804+ vstore_half4##SUFFIX(data.lo, 0, &p[offset*8]); \
805+ vstore_half4##SUFFIX(data.hi, 0, &p[offset*8+4]); \
806+ } \
807+ \
808+ void __attribute__ ((__overloadable__)) \
809+ vstore_half16##SUFFIX(float16 data, size_t offset, MOD half *p) \
810+ { \
811+ vstore_half8##SUFFIX(data.lo, 0, &p[offset*16]); \
812+ vstore_half8##SUFFIX(data.hi, 0, &p[offset*16+8]); \
813+ } \
814+ \
815+ void __attribute__ ((__overloadable__)) \
816+ vstorea_half2##SUFFIX(float2 data, size_t offset, MOD half *p) \
817+ { \
818+ vstore_half##SUFFIX(data.lo, 0, &p[offset*2]); \
819+ vstore_half##SUFFIX(data.hi, 0, &p[offset*2+1]); \
820+ } \
821+ \
822+ void __attribute__ ((__overloadable__)) \
823+ vstorea_half3##SUFFIX(float3 data, size_t offset, MOD half *p) \
824+ { \
825+ vstorea_half2##SUFFIX(data.lo, 0, &p[offset*3]); \
826+ vstore_half##SUFFIX(data.s2, 0, &p[offset*3+2]); \
827+ } \
828+ \
829+ void __attribute__ ((__overloadable__)) \
830+ vstorea_half4##SUFFIX(float4 data, size_t offset, MOD half *p) \
831+ { \
832+ vstorea_half2##SUFFIX(data.lo, 0, &p[offset*4]); \
833+ vstorea_half2##SUFFIX(data.hi, 0, &p[offset*4+2]); \
834+ } \
835+ \
836+ void __attribute__ ((__overloadable__)) \
837+ vstorea_half8##SUFFIX(float8 data, size_t offset, MOD half *p) \
838+ { \
839+ vstorea_half4##SUFFIX(data.lo, 0, &p[offset*8]); \
840+ vstorea_half4##SUFFIX(data.hi, 0, &p[offset*8+4]); \
841+ } \
842+ \
843+ void __attribute__ ((__overloadable__)) \
844+ vstorea_half16##SUFFIX(float16 data, size_t offset, MOD half *p) \
845+ { \
846+ vstorea_half8##SUFFIX(data.lo, 0, &p[offset*16]); \
847+ vstorea_half8##SUFFIX(data.hi, 0, &p[offset*16+8]); \
848+ }
849+
850+
851+
852+IMPLEMENT_VSTORE_HALF(__global , )
853+IMPLEMENT_VSTORE_HALF(__global , _rte)
854+IMPLEMENT_VSTORE_HALF(__global , _rtz)
855+IMPLEMENT_VSTORE_HALF(__global , _rtp)
856+IMPLEMENT_VSTORE_HALF(__global , _rtn)
857+IMPLEMENT_VSTORE_HALF(__local , )
858+IMPLEMENT_VSTORE_HALF(__local , _rte)
859+IMPLEMENT_VSTORE_HALF(__local , _rtz)
860+IMPLEMENT_VSTORE_HALF(__local , _rtp)
861+IMPLEMENT_VSTORE_HALF(__local , _rtn)
862+/* IMPLEMENT_VSTORE_HALF(__private , ) */
863+/* IMPLEMENT_VSTORE_HALF(__private , _rte) */
864+/* IMPLEMENT_VSTORE_HALF(__private , _rtz) */
865+/* IMPLEMENT_VSTORE_HALF(__private , _rtp) */
866+/* IMPLEMENT_VSTORE_HALF(__private , _rtn) */
867+
868+#endif
869
870=== modified file 'lib/kernel/x86_64/fabs.cl'
871--- lib/kernel/x86_64/fabs.cl 2011-10-31 17:00:12 +0000
872+++ lib/kernel/x86_64/fabs.cl 2011-12-15 04:21:23 +0000
873@@ -29,8 +29,8 @@
874 DEFINE_EXPR_V_V(fabs,
875 ({
876 int bits = CHAR_BIT * sizeof(stype);
877- jtype sign_mask = (jtype)1 << (jtype)(bits - 1);
878- jtype result = ~sign_mask & *(jtype*)&a;
879+ sjtype sign_mask = (sjtype)1 << (sjtype)(bits - 1);
880+ sjtype result = ~sign_mask & *(jtype*)&a;
881 *(vtype*)&result;
882 }))
883
884@@ -70,7 +70,7 @@
885 uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \
886 __asm__ ("andps %[src], %[dst]" : \
887 [dst] "+x" (a) : \
888- [src] "x" (~sign_mask)); \
889+ [src] "xm" (~sign_mask)); \
890 a; \
891 })
892 #define IMPLEMENT_FABS_AVX_FLOAT8 \
893@@ -78,16 +78,16 @@
894 uint8 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, \
895 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \
896 __asm__ ("andps256 %[src], %[dst]" : \
897- [dst] "=x" (a) : \
898- "[dst]" (a), [src] "x" (~sign_mask)); \
899+ [dst] "+x" (a) : \
900+ [src] "xm" (~sign_mask)); \
901 a; \
902 })
903 #define IMPLEMENT_FABS_SSE2_DOUBLE2 \
904 ({ \
905 ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL}; \
906 __asm__ ("andpd %[src], %[dst]" : \
907- [dst] "=x" (a) : \
908- "[dst]" (a), [src] "x" (~sign_mask)); \
909+ [dst] "+x" (a) : \
910+ [src] "xm" (~sign_mask)); \
911 a; \
912 })
913 #define IMPLEMENT_FABS_AVX_DOUBLE4 \
914@@ -95,8 +95,8 @@
915 ulong4 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL, \
916 0x8000000000000000UL, 0x8000000000000000UL}; \
917 __asm__ ("andpd256 %[src], %[dst]" : \
918- [dst] "=x" (a) : \
919- "[dst]" (a), [src] "x" (~sign_mask)); \
920+ [dst] "+x" (a) : \
921+ [src] "xm" (~sign_mask)); \
922 a; \
923 })
924