Merge lp:~schnetter/pocl/main into lp:~pocl/pocl/trunk
- main
- Merge into trunk
Status: | Merged |
---|---|
Merge reported by: | Pekka Jääskeläinen |
Merged at revision: | not available |
Proposed branch: | lp:~schnetter/pocl/main |
Merge into: | lp:~pocl/pocl/trunk |
Diff against target: |
921 lines (+540/-42) 28 files modified
.bzrignore (+2/-0) clconfig.h.in (+8/-3) configure.ac (+1/-0) include/_kernel.h (+115/-0) include/arm/types.h (+1/-0) include/tce/types.h (+1/-0) include/types.h (+13/-7) include/x86_64/types.h (+1/-0) lib/kernel/cos.cl (+3/-0) lib/kernel/divide.cl (+27/-0) lib/kernel/exp.cl (+3/-0) lib/kernel/exp10.cl (+3/-0) lib/kernel/exp2.cl (+3/-0) lib/kernel/log.cl (+3/-0) lib/kernel/log10.cl (+3/-0) lib/kernel/log2.cl (+3/-0) lib/kernel/powr.cl (+3/-0) lib/kernel/recip.cl (+27/-0) lib/kernel/rsqrt.cl (+3/-0) lib/kernel/sin.cl (+3/-0) lib/kernel/sources.mk (+5/-1) lib/kernel/sqrt.cl (+3/-0) lib/kernel/tan.cl (+3/-0) lib/kernel/templates.h (+56/-21) lib/kernel/vload.cl (+1/-1) lib/kernel/vload_half.cl (+113/-0) lib/kernel/vstore_half.cl (+124/-0) lib/kernel/x86_64/fabs.cl (+9/-9) |
To merge this branch: | bzr merge lp:~schnetter/pocl/main |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
pocl maintaners | Pending | ||
Review via email:
|
Commit message
Description of the change
I added support for the half datatype, protected by #ifdef cl_khr_fp16, analogous to cl_khr_fp64. I don't know which targets support this datatype (presumably all, since llvm supports them?), so I enabled this for all targets -- this will break things if this is wrong.
- 134. By Erik Schnetter
-
Correct alternative (unused) fabs implementation
- 135. By Erik Schnetter
-
Auto-detect whether the half type is supported
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
OpenCL supports only two operations for halfs: vload_half, converting it to
a float, and vstore_half, converting from a float. Nothing else exists
explicitly, not even vectors of halfs. Essentially the only thing one can
do with the half type is to pass a half* to these load/store routines.
There are routines such as float sin_half(float) that are only required to
have the precision offered by datatype half (allowing optimisations), but
the API is via float. There is text in the standard presumable allowing
this to be optimised to use operations that act directly on half values,
but this is not required.
I added code to detect whether clang supports half (called __fp16 in C),
and if so, these vload_half/
and friends are always available, forwarding to their float counterparts by
default -- I assume that target-specific optimisations can do better.
-erik
2011/12/15 Pekka Jääskeläinen <email address hidden>
> On 12/15/2011 01:09 AM, Erik Schnetter wrote:
> > Erik Schnetter has proposed merging lp:~schnetter/pocl/main into lp:pocl.
> >
> > Requested reviews: pocl maintaners (pocl)
> >
> > For more details, see:
> > https:/
> >
> > I added support for the half datatype, protected by #ifdef cl_khr_fp16,
> > analogous to cl_khr_fp64. I don't know which targets support this
> datatype
> > (presumably all, since llvm supports them?), so I enabled this for all
> > targets -- this will break things if this is wrong.
>
> Just curious...
>
> How does LLVM/Clang support the half by default nowadays? I've heard that
> for
> NVIDIA GPUs, for example, the half is supported only as a storage format.
> That
> is, you have the float in 16bit format in memory but whenever you compute
> something with halfs, they are converted to single precision floats to
> avoid
> the need for separate floating point units for halfs.
>
> Just curious to hear what happens when you use half floats in LLVM/Clang
> now -- do they convert them to single precision fp whenever computation
> occurs?
> The last time I checked, 'half' was not a datatype in the LLVM IR
> thus they could not be selected (to be implemented with the target ISA)
> nicely.
>
> It seems there are only two intrinsics for halfs available:
> http://
>
> Does Clang generate those automatically for halfs in OpenCL C now? For
> example
> if you perform a basic operation halfA + halfB, what happens?
>
> I'm interested in a proper half support as for embedded/mobile it is more
> beneficial than just for saving the memory bandwidth as you can save in
> the area
> of the FPU, improve the speed, lower the energy consumption, etc. if you
> can do with half floats for your computations. But I think they do not
> accept
> it as a proper datatype in LLVM before there is a real (read:
> off-the-shelf)
> target in LLVM that supports it natively.
>
> --
> --Pekka
>
>
>
> -------
> 10 Tips for Better Server Consolidation
> Server virtualization is being driven by many needs.
> But none more important than the need to red...
Preview Diff
1 | === modified file '.bzrignore' |
2 | --- .bzrignore 2011-12-14 01:58:32 +0000 |
3 | +++ .bzrignore 2011-12-15 04:21:23 +0000 |
4 | @@ -14,6 +14,7 @@ |
5 | # builddir |
6 | |
7 | Makefile |
8 | +pocl.pc |
9 | .deps |
10 | .libs |
11 | *.lo |
12 | @@ -21,6 +22,7 @@ |
13 | *.la |
14 | |
15 | ./libtool |
16 | +./clconfig.h |
17 | ./config.h |
18 | ./config.log |
19 | ./config.status |
20 | |
21 | === modified file 'clconfig.h.in' |
22 | --- clconfig.h.in 2011-11-29 16:59:14 +0000 |
23 | +++ clconfig.h.in 2011-12-15 04:21:23 +0000 |
24 | @@ -1,8 +1,13 @@ |
25 | +/* The size of `long', as computed by sizeof. */ |
26 | +#undef SIZEOF_LONG |
27 | + |
28 | +/* The size of `half', as computed by sizeof. */ |
29 | +#undef SIZEOF___FP16 |
30 | +/* The OpenCL type `half' is called `__fp16' in C */ |
31 | +#define SIZEOF_HALF SIZEOF___FP16 |
32 | + |
33 | /* The size of `double', as computed by sizeof. */ |
34 | #undef SIZEOF_DOUBLE |
35 | |
36 | -/* The size of `long', as computed by sizeof. */ |
37 | -#undef SIZEOF_LONG |
38 | - |
39 | /* The size of `void *', as computed by sizeof. */ |
40 | #undef SIZEOF_VOID_P |
41 | |
42 | === modified file 'configure.ac' |
43 | --- configure.ac 2011-12-14 17:59:11 +0000 |
44 | +++ configure.ac 2011-12-15 04:21:23 +0000 |
45 | @@ -222,6 +222,7 @@ |
46 | |
47 | # Checks for typedefs, structures, and compiler characteristics. |
48 | AC_CHECK_SIZEOF([long]) |
49 | +AC_CHECK_SIZEOF([__fp16]) |
50 | AC_CHECK_SIZEOF([double]) |
51 | AC_CHECK_SIZEOF([void *]) |
52 | AC_CHECK_ALIGNOF([float16], [typedef float float16 __attribute__((__ext_vector_type__(16)));]) |
53 | |
54 | === modified file 'include/_kernel.h' |
55 | --- include/_kernel.h 2011-12-14 01:11:29 +0000 |
56 | +++ include/_kernel.h 2011-12-15 04:21:23 +0000 |
57 | @@ -42,12 +42,22 @@ |
58 | #else |
59 | # define __IF_INT64(x) |
60 | #endif |
61 | +#ifdef cl_khr_fp16 |
62 | +# define __IF_FP16(x) x |
63 | +#else |
64 | +# define __IF_FP16(x) |
65 | +#endif |
66 | #ifdef cl_khr_fp64 |
67 | # define __IF_FP64(x) x |
68 | #else |
69 | # define __IF_FP64(x) |
70 | #endif |
71 | |
72 | +#if defined(cl_khr_fp64) && !defined(cles_khr_int64) |
73 | +# error "cl_khr_fp64 requires cles_khr_int64" |
74 | +#endif |
75 | + |
76 | + |
77 | |
78 | /* A static assert statement to catch inconsistencies at build time */ |
79 | #define _cl_static_assert(_t, _x) typedef int ai##_t[(_x) ? 1 : -1] |
80 | @@ -79,6 +89,10 @@ |
81 | typedef struct error_undefined_type_ulong error_undefined_type_ulong; |
82 | # define ulong error_undefined_type_ulong |
83 | #endif |
84 | +#ifndef cl_khr_fp16 |
85 | +typedef struct error_undefined_type_half error_undefined_type_half; |
86 | +# define half error_undefined_type_half |
87 | +#endif |
88 | #ifndef cl_khr_fp64 |
89 | typedef struct error_undefined_type_double error_undefined_type_double; |
90 | # define double error_undefined_type_double |
91 | @@ -210,6 +224,11 @@ |
92 | _cl_static_assert(ulong16, sizeof(ulong16) == 16*sizeof(ulong)); |
93 | #endif |
94 | |
95 | +#ifdef cl_khr_fp16 |
96 | +_cl_static_assert(half, sizeof(half) == 2); |
97 | +/* There are no vectors of type half */ |
98 | +#endif |
99 | + |
100 | _cl_static_assert(float , sizeof(float ) == 4); |
101 | _cl_static_assert(float2 , sizeof(float2 ) == 2 *sizeof(float)); |
102 | _cl_static_assert(float3 , sizeof(float3 ) == 4 *sizeof(float)); |
103 | @@ -506,6 +525,7 @@ |
104 | * J: vector of int |
105 | * U: vector of uint or ulong |
106 | * S: scalar (float or double) |
107 | + * F: vector of float |
108 | * V: vector of float or double |
109 | */ |
110 | |
111 | @@ -777,6 +797,20 @@ |
112 | double _cl_overloadable NAME(double4 , double4 ); \ |
113 | double _cl_overloadable NAME(double8 , double8 ); \ |
114 | double _cl_overloadable NAME(double16, double16);) |
115 | +#define _CL_DECLARE_FUNC_F_F(NAME) \ |
116 | + float _cl_overloadable NAME(float ); \ |
117 | + float2 _cl_overloadable NAME(float2 ); \ |
118 | + float3 _cl_overloadable NAME(float3 ); \ |
119 | + float4 _cl_overloadable NAME(float4 ); \ |
120 | + float8 _cl_overloadable NAME(float8 ); \ |
121 | + float16 _cl_overloadable NAME(float16 ); |
122 | +#define _CL_DECLARE_FUNC_F_FF(NAME) \ |
123 | + float _cl_overloadable NAME(float , float ); \ |
124 | + float2 _cl_overloadable NAME(float2 , float2 ); \ |
125 | + float3 _cl_overloadable NAME(float3 , float3 ); \ |
126 | + float4 _cl_overloadable NAME(float4 , float4 ); \ |
127 | + float8 _cl_overloadable NAME(float8 , float8 ); \ |
128 | + float16 _cl_overloadable NAME(float16 , float16 ); |
129 | |
130 | /* Move built-in declarations out of the way. (There should be a |
131 | better way of doing so.) These five functions are built-in math |
132 | @@ -877,6 +911,35 @@ |
133 | _CL_DECLARE_FUNC_V_V(tgamma) |
134 | _CL_DECLARE_FUNC_V_V(trunc) |
135 | |
136 | +_CL_DECLARE_FUNC_F_F(half_cos) |
137 | +_CL_DECLARE_FUNC_F_FF(half_divide) |
138 | +_CL_DECLARE_FUNC_F_F(half_exp) |
139 | +_CL_DECLARE_FUNC_F_F(half_exp2) |
140 | +_CL_DECLARE_FUNC_F_F(half_exp10) |
141 | +_CL_DECLARE_FUNC_F_F(half_log) |
142 | +_CL_DECLARE_FUNC_F_F(half_log2) |
143 | +_CL_DECLARE_FUNC_F_F(half_log10) |
144 | +_CL_DECLARE_FUNC_F_FF(half_powr) |
145 | +_CL_DECLARE_FUNC_F_F(half_recip) |
146 | +_CL_DECLARE_FUNC_F_F(half_rsqrt) |
147 | +_CL_DECLARE_FUNC_F_F(half_sin) |
148 | +_CL_DECLARE_FUNC_F_F(half_sqrt) |
149 | +_CL_DECLARE_FUNC_F_F(half_tan) |
150 | +_CL_DECLARE_FUNC_F_F(native_cos) |
151 | +_CL_DECLARE_FUNC_F_FF(native_divide) |
152 | +_CL_DECLARE_FUNC_F_F(native_exp) |
153 | +_CL_DECLARE_FUNC_F_F(native_exp2) |
154 | +_CL_DECLARE_FUNC_F_F(native_exp10) |
155 | +_CL_DECLARE_FUNC_F_F(native_log) |
156 | +_CL_DECLARE_FUNC_F_F(native_log2) |
157 | +_CL_DECLARE_FUNC_F_F(native_log10) |
158 | +_CL_DECLARE_FUNC_F_FF(native_powr) |
159 | +_CL_DECLARE_FUNC_F_F(native_recip) |
160 | +_CL_DECLARE_FUNC_F_F(native_rsqrt) |
161 | +_CL_DECLARE_FUNC_F_F(native_sin) |
162 | +_CL_DECLARE_FUNC_F_F(native_sqrt) |
163 | +_CL_DECLARE_FUNC_F_F(native_tan) |
164 | + |
165 | |
166 | |
167 | /* Integer Constants */ |
168 | |
169 | @@ -1495,6 +1558,58 @@ |
170 | #endif |
171 | */ |
172 | |
173 | +#ifdef cl_khr_fp16 |
174 | + |
175 | +#define _CL_DECLARE_VLOAD_HALF(MOD) \ |
176 | + float _cl_overloadable vload_half (size_t offset, const MOD half *p); \ |
177 | + float2 _cl_overloadable vload_half2 (size_t offset, const MOD half *p); \ |
178 | + float3 _cl_overloadable vload_half3 (size_t offset, const MOD half *p); \ |
179 | + float4 _cl_overloadable vload_half4 (size_t offset, const MOD half *p); \ |
180 | + float8 _cl_overloadable vload_half8 (size_t offset, const MOD half *p); \ |
181 | + float16 _cl_overloadable vload_half16 (size_t offset, const MOD half *p); \ |
182 | + float2 _cl_overloadable vloada_half2 (size_t offset, const MOD half *p); \ |
183 | + float3 _cl_overloadable vloada_half3 (size_t offset, const MOD half *p); \ |
184 | + float4 _cl_overloadable vloada_half4 (size_t offset, const MOD half *p); \ |
185 | + float8 _cl_overloadable vloada_half8 (size_t offset, const MOD half *p); \ |
186 | + float16 _cl_overloadable vloada_half16(size_t offset, const MOD half *p); |
187 | + |
188 | +_CL_DECLARE_VLOAD_HALF(__global) |
189 | +_CL_DECLARE_VLOAD_HALF(__local) |
190 | +_CL_DECLARE_VLOAD_HALF(__constant) |
191 | +/* _CL_DECLARE_VLOAD_HALF(__private) */ |
192 | + |
193 | +/* stores to half may have a suffix: _rte _rtz _rtp _rtn */ |
194 | +#define _CL_DECLARE_VSTORE_HALF(MOD, SUFFIX) \ |
195 | + void _cl_overloadable vstore_half##SUFFIX (float data, size_t offset, MOD half *p); \ |
196 | + void _cl_overloadable vstore_half2##SUFFIX (float2 data, size_t offset, MOD half *p); \ |
197 | + void _cl_overloadable vstore_half3##SUFFIX (float3 data, size_t offset, MOD half *p); \ |
198 | + void _cl_overloadable vstore_half4##SUFFIX (float4 data, size_t offset, MOD half *p); \ |
199 | + void _cl_overloadable vstore_half8##SUFFIX (float8 data, size_t offset, MOD half *p); \ |
200 | + void _cl_overloadable vstore_half16##SUFFIX (float16 data, size_t offset, MOD half *p); \ |
201 | + void _cl_overloadable vstorea_half2##SUFFIX (float2 data, size_t offset, MOD half *p); \ |
202 | + void _cl_overloadable vstorea_half3##SUFFIX (float3 data, size_t offset, MOD half *p); \ |
203 | + void _cl_overloadable vstorea_half4##SUFFIX (float4 data, size_t offset, MOD half *p); \ |
204 | + void _cl_overloadable vstorea_half8##SUFFIX (float8 data, size_t offset, MOD half *p); \ |
205 | + void _cl_overloadable vstorea_half16##SUFFIX(float16 data, size_t offset, MOD half *p); |
206 | + |
207 | +_CL_DECLARE_VSTORE_HALF(__global , ) |
208 | +_CL_DECLARE_VSTORE_HALF(__global , _rte) |
209 | +_CL_DECLARE_VSTORE_HALF(__global , _rtz) |
210 | +_CL_DECLARE_VSTORE_HALF(__global , _rtp) |
211 | +_CL_DECLARE_VSTORE_HALF(__global , _rtn) |
212 | +_CL_DECLARE_VSTORE_HALF(__local , ) |
213 | +_CL_DECLARE_VSTORE_HALF(__local , _rte) |
214 | +_CL_DECLARE_VSTORE_HALF(__local , _rtz) |
215 | +_CL_DECLARE_VSTORE_HALF(__local , _rtp) |
216 | +_CL_DECLARE_VSTORE_HALF(__local , _rtn) |
217 | +/* _CL_DECLARE_VSTORE_HALF(__private , ) */ |
218 | +/* _CL_DECLARE_VSTORE_HALF(__private , _rte) */ |
219 | +/* _CL_DECLARE_VSTORE_HALF(__private , _rtz) */ |
220 | +/* _CL_DECLARE_VSTORE_HALF(__private , _rtp) */ |
221 | +/* _CL_DECLARE_VSTORE_HALF(__private , _rtn) */ |
222 | + |
223 | +#endif |
224 | + |
225 | |
226 | |
227 | /* Miscellaneous Vector Functions */ |
228 | |
229 | |
230 | === modified file 'include/arm/types.h' |
231 | --- include/arm/types.h 2011-12-01 17:21:52 +0000 |
232 | +++ include/arm/types.h 2011-12-15 04:21:23 +0000 |
233 | @@ -4,6 +4,7 @@ |
234 | |
235 | #define __EMBEDDED_PROFILE__ 1 |
236 | #undef cles_khr_int64 |
237 | +#define cl_khr_fp16 /* ES: is this correct? */ |
238 | #undef cl_khr_fp64 |
239 | |
240 | typedef uint size_t; |
241 | |
242 | === modified file 'include/tce/types.h' |
243 | --- include/tce/types.h 2011-12-01 17:21:52 +0000 |
244 | +++ include/tce/types.h 2011-12-15 04:21:23 +0000 |
245 | @@ -4,6 +4,7 @@ |
246 | |
247 | #define __EMBEDDED_PROFILE__ 1 |
248 | #undef cles_khr_int64 |
249 | +#define cl_khr_fp16 /* ES: is this correct? */ |
250 | #undef cl_khr_fp64 |
251 | |
252 | typedef uint size_t; |
253 | |
254 | === modified file 'include/types.h' |
255 | --- include/types.h 2011-12-01 17:21:52 +0000 |
256 | +++ include/types.h 2011-12-15 04:21:23 +0000 |
257 | @@ -13,16 +13,22 @@ |
258 | |
259 | #if SIZEOF_LONG == 8 |
260 | # define cles_khr_int64 |
261 | -# if SIZEOF_DOUBLE == 8 |
262 | -# define cl_khr_fp64 |
263 | -# else |
264 | -# undef cl_khr_fp64 |
265 | -# endif |
266 | -#else /* SIZEOF_LONG != 8 */ |
267 | +#else |
268 | # define __EMBEDDED_PROFILE__ 1 |
269 | # undef cles_khr_int64 |
270 | +#endif |
271 | + |
272 | +#if SIZEOF_HALF == 2 |
273 | +# define cl_khr_fp16 |
274 | +#else |
275 | +# undef cl_khr_fp16 |
276 | +#endif |
277 | + |
278 | +#if SIZEOF_DOUBLE == 8 |
279 | +# define cl_khr_fp64 |
280 | +#else |
281 | # undef cl_khr_fp64 |
282 | -#endif /* SIZEOF_LONG != 8 */ |
283 | +#endif |
284 | |
285 | #if SIZEOF_VOID_P == 8 |
286 | typedef ulong size_t; |
287 | |
288 | === modified file 'include/x86_64/types.h' |
289 | --- include/x86_64/types.h 2011-12-09 16:01:21 +0000 |
290 | +++ include/x86_64/types.h 2011-12-15 04:21:23 +0000 |
291 | @@ -4,6 +4,7 @@ |
292 | typedef unsigned long ulong; |
293 | |
294 | #define cles_khr_int64 |
295 | +#define cl_khr_fp16 |
296 | #define cl_khr_fp64 |
297 | |
298 | typedef ulong size_t; |
299 | |
300 | === modified file 'lib/kernel/cos.cl' |
301 | --- lib/kernel/cos.cl 2011-10-26 03:01:29 +0000 |
302 | +++ lib/kernel/cos.cl 2011-12-15 04:21:23 +0000 |
303 | @@ -24,3 +24,6 @@ |
304 | #include "templates.h" |
305 | |
306 | DEFINE_BUILTIN_V_V(cos) |
307 | + |
308 | +DEFINE_EXPR_F_F(half_cos, cos(a)) |
309 | +DEFINE_EXPR_F_F(native_cos, cos(a)) |
310 | |
311 | === added file 'lib/kernel/divide.cl' |
312 | --- lib/kernel/divide.cl 1970-01-01 00:00:00 +0000 |
313 | +++ lib/kernel/divide.cl 2011-12-15 04:21:23 +0000 |
314 | @@ -0,0 +1,27 @@ |
315 | +/* OpenCL built-in library: divide() |
316 | + |
317 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
318 | + |
319 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
320 | + of this software and associated documentation files (the "Software"), to deal |
321 | + in the Software without restriction, including without limitation the rights |
322 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
323 | + copies of the Software, and to permit persons to whom the Software is |
324 | + furnished to do so, subject to the following conditions: |
325 | + |
326 | + The above copyright notice and this permission notice shall be included in |
327 | + all copies or substantial portions of the Software. |
328 | + |
329 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
330 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
331 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
332 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
333 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
334 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
335 | + THE SOFTWARE. |
336 | +*/ |
337 | + |
338 | +#include "templates.h" |
339 | + |
340 | +DEFINE_EXPR_F_FF(half_divide, a/b) |
341 | +DEFINE_EXPR_F_FF(native_divide, a/b) |
342 | |
343 | === modified file 'lib/kernel/exp.cl' |
344 | --- lib/kernel/exp.cl 2011-10-26 03:01:29 +0000 |
345 | +++ lib/kernel/exp.cl 2011-12-15 04:21:23 +0000 |
346 | @@ -24,3 +24,6 @@ |
347 | #include "templates.h" |
348 | |
349 | DEFINE_BUILTIN_V_V(exp) |
350 | + |
351 | +DEFINE_EXPR_F_F(half_exp, exp(a)) |
352 | +DEFINE_EXPR_F_F(native_exp, exp(a)) |
353 | |
354 | === modified file 'lib/kernel/exp10.cl' |
355 | --- lib/kernel/exp10.cl 2011-11-05 00:10:25 +0000 |
356 | +++ lib/kernel/exp10.cl 2011-12-15 04:21:23 +0000 |
357 | @@ -28,3 +28,6 @@ |
358 | #else |
359 | DEFINE_EXPR_V_V(exp10, exp(M_LN10_F*a)) |
360 | #endif |
361 | + |
362 | +DEFINE_EXPR_F_F(half_exp10, exp10(a)) |
363 | +DEFINE_EXPR_F_F(native_exp10, exp10(a)) |
364 | |
365 | === modified file 'lib/kernel/exp2.cl' |
366 | --- lib/kernel/exp2.cl 2011-10-26 03:01:29 +0000 |
367 | +++ lib/kernel/exp2.cl 2011-12-15 04:21:23 +0000 |
368 | @@ -24,3 +24,6 @@ |
369 | #include "templates.h" |
370 | |
371 | DEFINE_BUILTIN_V_V(exp2) |
372 | + |
373 | +DEFINE_EXPR_F_F(half_exp2, exp2(a)) |
374 | +DEFINE_EXPR_F_F(native_exp2, exp2(a)) |
375 | |
376 | === modified file 'lib/kernel/log.cl' |
377 | --- lib/kernel/log.cl 2011-10-26 03:01:29 +0000 |
378 | +++ lib/kernel/log.cl 2011-12-15 04:21:23 +0000 |
379 | @@ -24,3 +24,6 @@ |
380 | #include "templates.h" |
381 | |
382 | DEFINE_BUILTIN_V_V(log) |
383 | + |
384 | +DEFINE_EXPR_F_F(half_log, log(a)) |
385 | +DEFINE_EXPR_F_F(native_log, log(a)) |
386 | |
387 | === modified file 'lib/kernel/log10.cl' |
388 | --- lib/kernel/log10.cl 2011-10-26 03:01:29 +0000 |
389 | +++ lib/kernel/log10.cl 2011-12-15 04:21:23 +0000 |
390 | @@ -24,3 +24,6 @@ |
391 | #include "templates.h" |
392 | |
393 | DEFINE_BUILTIN_V_V(log10) |
394 | + |
395 | +DEFINE_EXPR_F_F(half_log10, log10(a)) |
396 | +DEFINE_EXPR_F_F(native_log10, log10(a)) |
397 | |
398 | === modified file 'lib/kernel/log2.cl' |
399 | --- lib/kernel/log2.cl 2011-10-26 03:01:29 +0000 |
400 | +++ lib/kernel/log2.cl 2011-12-15 04:21:23 +0000 |
401 | @@ -24,3 +24,6 @@ |
402 | #include "templates.h" |
403 | |
404 | DEFINE_BUILTIN_V_V(log2) |
405 | + |
406 | +DEFINE_EXPR_F_F(half_log2, log2(a)) |
407 | +DEFINE_EXPR_F_F(native_log2, log2(a)) |
408 | |
409 | === modified file 'lib/kernel/powr.cl' |
410 | --- lib/kernel/powr.cl 2011-10-26 03:01:29 +0000 |
411 | +++ lib/kernel/powr.cl 2011-12-15 04:21:23 +0000 |
412 | @@ -24,3 +24,6 @@ |
413 | #include "templates.h" |
414 | |
415 | DEFINE_EXPR_V_VV(powr, pow(a, b)) |
416 | + |
417 | +DEFINE_EXPR_F_FF(half_powr, powr(a, b)) |
418 | +DEFINE_EXPR_F_FF(native_powr, powr(a, b)) |
419 | |
420 | === added file 'lib/kernel/recip.cl' |
421 | --- lib/kernel/recip.cl 1970-01-01 00:00:00 +0000 |
422 | +++ lib/kernel/recip.cl 2011-12-15 04:21:23 +0000 |
423 | @@ -0,0 +1,27 @@ |
424 | +/* OpenCL built-in library: recip() |
425 | + |
426 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
427 | + |
428 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
429 | + of this software and associated documentation files (the "Software"), to deal |
430 | + in the Software without restriction, including without limitation the rights |
431 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
432 | + copies of the Software, and to permit persons to whom the Software is |
433 | + furnished to do so, subject to the following conditions: |
434 | + |
435 | + The above copyright notice and this permission notice shall be included in |
436 | + all copies or substantial portions of the Software. |
437 | + |
438 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
439 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
440 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
441 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
442 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
443 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
444 | + THE SOFTWARE. |
445 | +*/ |
446 | + |
447 | +#include "templates.h" |
448 | + |
449 | +DEFINE_EXPR_F_F(half_recip, (stype)1/a) |
450 | +DEFINE_EXPR_F_F(native_recip, (stype)1/a) |
451 | |
452 | === modified file 'lib/kernel/rsqrt.cl' |
453 | --- lib/kernel/rsqrt.cl 2011-11-05 00:10:25 +0000 |
454 | +++ lib/kernel/rsqrt.cl 2011-12-15 04:21:23 +0000 |
455 | @@ -24,3 +24,6 @@ |
456 | #include "templates.h" |
457 | |
458 | DEFINE_EXPR_V_V(rsqrt, (stype)1/sqrt(a)) |
459 | + |
460 | +DEFINE_EXPR_F_F(half_rsqrt, rsqrt(a)) |
461 | +DEFINE_EXPR_F_F(native_rsqrt, rsqrt(a)) |
462 | |
463 | === modified file 'lib/kernel/sin.cl' |
464 | --- lib/kernel/sin.cl 2011-10-26 03:01:29 +0000 |
465 | +++ lib/kernel/sin.cl 2011-12-15 04:21:23 +0000 |
466 | @@ -24,3 +24,6 @@ |
467 | #include "templates.h" |
468 | |
469 | DEFINE_BUILTIN_V_V(sin) |
470 | + |
471 | +DEFINE_EXPR_F_F(half_sin, sin(a)) |
472 | +DEFINE_EXPR_F_F(native_sin, sin(a)) |
473 | |
474 | === modified file 'lib/kernel/sources.mk' |
475 | --- lib/kernel/sources.mk 2011-12-05 22:08:57 +0000 |
476 | +++ lib/kernel/sources.mk 2011-12-15 04:21:23 +0000 |
477 | @@ -72,6 +72,8 @@ |
478 | tanpi.cl \ |
479 | tgamma.cl \ |
480 | trunc.cl \ |
481 | + divide.cl \ |
482 | + recip.cl \ |
483 | abs.cl \ |
484 | abs_diff.cl \ |
485 | add_sat.cl \ |
486 | @@ -123,4 +125,6 @@ |
487 | bitselect.cl \ |
488 | select.cl \ |
489 | vload.cl \ |
490 | - vstore.cl |
491 | + vstore.cl \ |
492 | + vload_half.cl \ |
493 | + vstore_half.cl |
494 | |
495 | === modified file 'lib/kernel/sqrt.cl' |
496 | --- lib/kernel/sqrt.cl 2011-10-29 14:08:33 +0000 |
497 | +++ lib/kernel/sqrt.cl 2011-12-15 04:21:23 +0000 |
498 | @@ -24,3 +24,6 @@ |
499 | #include "templates.h" |
500 | |
501 | DEFINE_BUILTIN_V_V(sqrt) |
502 | + |
503 | +DEFINE_EXPR_F_F(half_sqrt, sqrt(a)) |
504 | +DEFINE_EXPR_F_F(native_sqrt, sqrt(a)) |
505 | |
506 | === modified file 'lib/kernel/tan.cl' |
507 | --- lib/kernel/tan.cl 2011-10-26 03:01:29 +0000 |
508 | +++ lib/kernel/tan.cl 2011-12-15 04:21:23 +0000 |
509 | @@ -24,3 +24,6 @@ |
510 | #include "templates.h" |
511 | |
512 | DEFINE_BUILTIN_V_V(tan) |
513 | + |
514 | +DEFINE_EXPR_F_F(half_tan, tan(a)) |
515 | +DEFINE_EXPR_F_F(native_tan, tan(a)) |
516 | |
517 | === modified file 'lib/kernel/templates.h' |
518 | --- lib/kernel/templates.h 2011-12-14 01:10:26 +0000 |
519 | +++ lib/kernel/templates.h 2011-12-15 04:21:23 +0000 |
520 | @@ -275,28 +275,30 @@ |
521 | |
522 | /******************************************************************************/ |
523 | |
524 | -#define IMPLEMENT_EXPR_V_V(NAME, EXPR, VTYPE, STYPE) \ |
525 | - VTYPE __attribute__ ((overloadable)) \ |
526 | - NAME(VTYPE a, VTYPE b) \ |
527 | - { \ |
528 | - typedef VTYPE vtype; \ |
529 | - typedef STYPE stype; \ |
530 | - return EXPR; \ |
531 | +#define IMPLEMENT_EXPR_V_V(NAME, EXPR, VTYPE, STYPE, JTYPE, SJTYPE) \ |
532 | + VTYPE __attribute__ ((overloadable)) \ |
533 | + NAME(VTYPE a, VTYPE b) \ |
534 | + { \ |
535 | + typedef VTYPE vtype; \ |
536 | + typedef STYPE stype; \ |
537 | + typedef JTYPE jtype; \ |
538 | + typedef SJTYPE sjtype; \ |
539 | + return EXPR; \ |
540 | } |
541 | -#define DEFINE_EXPR_V_V(NAME, EXPR) \ |
542 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float , float ) \ |
543 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float2 , float ) \ |
544 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float3 , float ) \ |
545 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float4 , float ) \ |
546 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float8 , float ) \ |
547 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, float16 , float ) \ |
548 | - __IF_FP64( \ |
549 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double , double) \ |
550 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double2 , double) \ |
551 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double3 , double) \ |
552 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double4 , double) \ |
553 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double8 , double) \ |
554 | - IMPLEMENT_EXPR_V_V(NAME, EXPR, double16, double)) |
555 | +#define DEFINE_EXPR_V_V(NAME, EXPR) \ |
556 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float , float , int , int ) \ |
557 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float2 , float , int2 , int ) \ |
558 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float3 , float , int3 , int ) \ |
559 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float4 , float , int4 , int ) \ |
560 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float8 , float , int8 , int ) \ |
561 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, float16 , float , int16 , int ) \ |
562 | + __IF_FP64( \ |
563 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double , double, long , long) \ |
564 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double2 , double, long2 , long) \ |
565 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double3 , double, long3 , long) \ |
566 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double4 , double, long4 , long) \ |
567 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double8 , double, long8 , long) \ |
568 | + IMPLEMENT_EXPR_V_V(NAME, EXPR, double16, double, long16, long)) |
569 | |
570 | #define IMPLEMENT_EXPR_V_VV(NAME, EXPR, VTYPE, STYPE, JTYPE) \ |
571 | VTYPE __attribute__ ((overloadable)) \ |
572 | @@ -608,6 +610,39 @@ |
573 | IMPLEMENT_EXPR_V_SV(NAME, EXPR, double8 , double) \ |
574 | IMPLEMENT_EXPR_V_SV(NAME, EXPR, double16, double)) |
575 | |
576 | +#define IMPLEMENT_EXPR_F_F(NAME, EXPR, VTYPE, STYPE) \ |
577 | + VTYPE __attribute__ ((overloadable)) \ |
578 | + NAME(VTYPE a, VTYPE b) \ |
579 | + { \ |
580 | + typedef VTYPE vtype; \ |
581 | + typedef STYPE stype; \ |
582 | + return EXPR; \ |
583 | + } |
584 | +#define DEFINE_EXPR_F_F(NAME, EXPR) \ |
585 | + IMPLEMENT_EXPR_F_F(NAME, EXPR, float , float ) \ |
586 | + IMPLEMENT_EXPR_F_F(NAME, EXPR, float2 , float ) \ |
587 | + IMPLEMENT_EXPR_F_F(NAME, EXPR, float3 , float ) \ |
588 | + IMPLEMENT_EXPR_F_F(NAME, EXPR, float4 , float ) \ |
589 | + IMPLEMENT_EXPR_F_F(NAME, EXPR, float8 , float ) \ |
590 | + IMPLEMENT_EXPR_F_F(NAME, EXPR, float16 , float ) |
591 | + |
592 | +#define IMPLEMENT_EXPR_F_FF(NAME, EXPR, VTYPE, STYPE, JTYPE) \ |
593 | + VTYPE __attribute__ ((overloadable)) \ |
594 | + NAME(VTYPE a, VTYPE b) \ |
595 | + { \ |
596 | + typedef VTYPE vtype; \ |
597 | + typedef STYPE stype; \ |
598 | + typedef JTYPE jtype; \ |
599 | + return EXPR; \ |
600 | + } |
601 | +#define DEFINE_EXPR_F_FF(NAME, EXPR) \ |
602 | + IMPLEMENT_EXPR_F_FF(NAME, EXPR, float , float , int ) \ |
603 | + IMPLEMENT_EXPR_F_FF(NAME, EXPR, float2 , float , int2 ) \ |
604 | + IMPLEMENT_EXPR_F_FF(NAME, EXPR, float3 , float , int3 ) \ |
605 | + IMPLEMENT_EXPR_F_FF(NAME, EXPR, float4 , float , int4 ) \ |
606 | + IMPLEMENT_EXPR_F_FF(NAME, EXPR, float8 , float , int8 ) \ |
607 | + IMPLEMENT_EXPR_F_FF(NAME, EXPR, float16 , float , int16 ) |
608 | + |
609 | |
610 | |
611 | #define IMPLEMENT_BUILTIN_G_G(NAME, GTYPE, UGTYPE, LO, HI) \ |
612 | |
613 | === modified file 'lib/kernel/vload.cl' |
614 | --- lib/kernel/vload.cl 2011-11-25 17:02:42 +0000 |
615 | +++ lib/kernel/vload.cl 2011-12-15 04:21:23 +0000 |
616 | @@ -1,4 +1,4 @@ |
617 | -/* OpenCL built-in library: vloa() |
618 | +/* OpenCL built-in library: vload() |
619 | |
620 | Copyright (c) 2011 Universidad Rey Juan Carlos |
621 | |
622 | |
623 | === added file 'lib/kernel/vload_half.cl' |
624 | --- lib/kernel/vload_half.cl 1970-01-01 00:00:00 +0000 |
625 | +++ lib/kernel/vload_half.cl 2011-12-15 04:21:23 +0000 |
626 | @@ -0,0 +1,113 @@ |
627 | +/* OpenCL built-in library: vload_half() |
628 | + |
629 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
630 | + |
631 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
632 | + of this software and associated documentation files (the "Software"), to deal |
633 | + in the Software without restriction, including without limitation the rights |
634 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
635 | + copies of the Software, and to permit persons to whom the Software is |
636 | + furnished to do so, subject to the following conditions: |
637 | + |
638 | + The above copyright notice and this permission notice shall be included in |
639 | + all copies or substantial portions of the Software. |
640 | + |
641 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
642 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
643 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
644 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
645 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
646 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
647 | + THE SOFTWARE. |
648 | +*/ |
649 | + |
650 | + |
651 | + |
652 | +#ifdef cl_khr_fp16 |
653 | + |
654 | +#define IMPLEMENT_VLOAD_HALF(MOD) \ |
655 | + \ |
656 | + float __attribute__ ((__overloadable__)) \ |
657 | + vload_half(size_t offset, const MOD half *p) \ |
658 | + { \ |
659 | + return (float)p[offset]; \ |
660 | + } \ |
661 | + \ |
662 | + float2 __attribute__ ((__overloadable__)) \ |
663 | + vload_half2(size_t offset, const MOD half *p) \ |
664 | + { \ |
665 | + return (float2)(vload_half(0, &p[offset*2]), \ |
666 | + vload_half(0, &p[offset*2+1])); \ |
667 | + } \ |
668 | + \ |
669 | + float3 __attribute__ ((__overloadable__)) \ |
670 | + vload_half3(size_t offset, const MOD half *p) \ |
671 | + { \ |
672 | + return (float3)(vload_half2(0, &p[offset*3]), \ |
673 | + vload_half(0, &p[offset*3+2])); \ |
674 | + } \ |
675 | + \ |
676 | + float4 __attribute__ ((__overloadable__)) \ |
677 | + vload_half4(size_t offset, const MOD half *p) \ |
678 | + { \ |
679 | + return (float4)(vload_half2(0, &p[offset*4]), \ |
680 | + vload_half2(0, &p[offset*4+2])); \ |
681 | + } \ |
682 | + \ |
683 | + float8 __attribute__ ((__overloadable__)) \ |
684 | + vload_half8(size_t offset, const MOD half *p) \ |
685 | + { \ |
686 | + return (float8)(vload_half4(0, &p[offset*8]), \ |
687 | + vload_half4(0, &p[offset*8+4])); \ |
688 | + } \ |
689 | + \ |
690 | + float16 __attribute__ ((__overloadable__)) \ |
691 | + vload_half16(size_t offset, const MOD half *p) \ |
692 | + { \ |
693 | + return (float16)(vload_half8(0, &p[offset*16]), \ |
694 | + vload_half8(0, &p[offset*16+8])); \ |
695 | + } \ |
696 | + \ |
697 | + float2 __attribute__ ((__overloadable__)) \ |
698 | + vloada_half2(size_t offset, const MOD half *p) \ |
699 | + { \ |
700 | + return (float2)(vload_half(0, &p[offset*2]), \ |
701 | + vload_half(0, &p[offset*2+1])); \ |
702 | + } \ |
703 | + \ |
704 | + float3 __attribute__ ((__overloadable__)) \ |
705 | + vloada_half3(size_t offset, const MOD half *p) \ |
706 | + { \ |
707 | + return (float3)(vloada_half2(0, &p[offset*4]), \ |
708 | + vload_half(0, &p[offset*4+2])); \ |
709 | + } \ |
710 | + \ |
711 | + float4 __attribute__ ((__overloadable__)) \ |
712 | + vloada_half4(size_t offset, const MOD half *p) \ |
713 | + { \ |
714 | + return (float4)(vloada_half2(0, &p[offset*4]), \ |
715 | + vloada_half2(0, &p[offset*4+2])); \ |
716 | + } \ |
717 | + \ |
718 | + float8 __attribute__ ((__overloadable__)) \ |
719 | + vloada_half8(size_t offset, const MOD half *p) \ |
720 | + { \ |
721 | + return (float8)(vloada_half4(0, &p[offset*8]), \ |
722 | + vloada_half4(0, &p[offset*8+4])); \ |
723 | + } \ |
724 | + \ |
725 | + float16 __attribute__ ((__overloadable__)) \ |
726 | + vloada_half16(size_t offset, const MOD half *p) \ |
727 | + { \ |
728 | + return (float16)(vloada_half8(0, &p[offset*16]), \ |
729 | + vloada_half8(0, &p[offset*16+8])); \ |
730 | + } |
731 | + |
732 | + |
733 | + |
734 | +IMPLEMENT_VLOAD_HALF(__global) |
735 | +IMPLEMENT_VLOAD_HALF(__local) |
736 | +IMPLEMENT_VLOAD_HALF(__constant) |
737 | +/* IMPLEMENT_VLOAD_HALF(__private) */ |
738 | + |
739 | +#endif |
740 | |
741 | === added file 'lib/kernel/vstore_half.cl' |
742 | --- lib/kernel/vstore_half.cl 1970-01-01 00:00:00 +0000 |
743 | +++ lib/kernel/vstore_half.cl 2011-12-15 04:21:23 +0000 |
744 | @@ -0,0 +1,124 @@ |
745 | +/* OpenCL built-in library: vstore_half() |
746 | + |
747 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
748 | + |
749 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
750 | + of this software and associated documentation files (the "Software"), to deal |
751 | + in the Software without restriction, including without limitation the rights |
752 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
753 | + copies of the Software, and to permit persons to whom the Software is |
754 | + furnished to do so, subject to the following conditions: |
755 | + |
756 | + The above copyright notice and this permission notice shall be included in |
757 | + all copies or substantial portions of the Software. |
758 | + |
759 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
760 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
761 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
762 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
763 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
764 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
765 | + THE SOFTWARE. |
766 | +*/ |
767 | + |
768 | + |
769 | + |
770 | +#ifdef cl_khr_fp16 |
771 | + |
772 | +#define IMPLEMENT_VSTORE_HALF(MOD, SUFFIX) \ |
773 | + \ |
774 | + void __attribute__ ((__overloadable__)) \ |
775 | + vstore_half##SUFFIX(float data, size_t offset, MOD half *p) \ |
776 | + { \ |
777 | + p[offset] = data; \ |
778 | + } \ |
779 | + \ |
780 | + void __attribute__ ((__overloadable__)) \ |
781 | + vstore_half2##SUFFIX(float2 data, size_t offset, MOD half *p) \ |
782 | + { \ |
783 | + vstore_half##SUFFIX(data.lo, 0, &p[offset*2]); \ |
784 | + vstore_half##SUFFIX(data.hi, 0, &p[offset*2+1]); \ |
785 | + } \ |
786 | + \ |
787 | + void __attribute__ ((__overloadable__)) \ |
788 | + vstore_half3##SUFFIX(float3 data, size_t offset, MOD half *p) \ |
789 | + { \ |
790 | + vstore_half2##SUFFIX(data.lo, 0, &p[offset*3]); \ |
791 | + vstore_half##SUFFIX(data.s2, 0, &p[offset*3+2]); \ |
792 | + } \ |
793 | + \ |
794 | + void __attribute__ ((__overloadable__)) \ |
795 | + vstore_half4##SUFFIX(float4 data, size_t offset, MOD half *p) \ |
796 | + { \ |
797 | + vstore_half2##SUFFIX(data.lo, 0, &p[offset*4]); \ |
798 | + vstore_half2##SUFFIX(data.hi, 0, &p[offset*4+2]); \ |
799 | + } \ |
800 | + \ |
801 | + void __attribute__ ((__overloadable__)) \ |
802 | + vstore_half8##SUFFIX(float8 data, size_t offset, MOD half *p) \ |
803 | + { \ |
804 | + vstore_half4##SUFFIX(data.lo, 0, &p[offset*8]); \ |
805 | + vstore_half4##SUFFIX(data.hi, 0, &p[offset*8+4]); \ |
806 | + } \ |
807 | + \ |
808 | + void __attribute__ ((__overloadable__)) \ |
809 | + vstore_half16##SUFFIX(float16 data, size_t offset, MOD half *p) \ |
810 | + { \ |
811 | + vstore_half8##SUFFIX(data.lo, 0, &p[offset*16]); \ |
812 | + vstore_half8##SUFFIX(data.hi, 0, &p[offset*16+8]); \ |
813 | + } \ |
814 | + \ |
815 | + void __attribute__ ((__overloadable__)) \ |
816 | + vstorea_half2##SUFFIX(float2 data, size_t offset, MOD half *p) \ |
817 | + { \ |
818 | + vstore_half##SUFFIX(data.lo, 0, &p[offset*2]); \ |
819 | + vstore_half##SUFFIX(data.hi, 0, &p[offset*2+1]); \ |
820 | + } \ |
821 | + \ |
822 | + void __attribute__ ((__overloadable__)) \ |
823 | + vstorea_half3##SUFFIX(float3 data, size_t offset, MOD half *p) \ |
824 | + { \ |
825 | + vstorea_half2##SUFFIX(data.lo, 0, &p[offset*3]); \ |
826 | + vstore_half##SUFFIX(data.s2, 0, &p[offset*3+2]); \ |
827 | + } \ |
828 | + \ |
829 | + void __attribute__ ((__overloadable__)) \ |
830 | + vstorea_half4##SUFFIX(float4 data, size_t offset, MOD half *p) \ |
831 | + { \ |
832 | + vstorea_half2##SUFFIX(data.lo, 0, &p[offset*4]); \ |
833 | + vstorea_half2##SUFFIX(data.hi, 0, &p[offset*4+2]); \ |
834 | + } \ |
835 | + \ |
836 | + void __attribute__ ((__overloadable__)) \ |
837 | + vstorea_half8##SUFFIX(float8 data, size_t offset, MOD half *p) \ |
838 | + { \ |
839 | + vstorea_half4##SUFFIX(data.lo, 0, &p[offset*8]); \ |
840 | + vstorea_half4##SUFFIX(data.hi, 0, &p[offset*8+4]); \ |
841 | + } \ |
842 | + \ |
843 | + void __attribute__ ((__overloadable__)) \ |
844 | + vstorea_half16##SUFFIX(float16 data, size_t offset, MOD half *p) \ |
845 | + { \ |
846 | + vstorea_half8##SUFFIX(data.lo, 0, &p[offset*16]); \ |
847 | + vstorea_half8##SUFFIX(data.hi, 0, &p[offset*16+8]); \ |
848 | + } |
849 | + |
850 | + |
851 | + |
852 | +IMPLEMENT_VSTORE_HALF(__global , ) |
853 | +IMPLEMENT_VSTORE_HALF(__global , _rte) |
854 | +IMPLEMENT_VSTORE_HALF(__global , _rtz) |
855 | +IMPLEMENT_VSTORE_HALF(__global , _rtp) |
856 | +IMPLEMENT_VSTORE_HALF(__global , _rtn) |
857 | +IMPLEMENT_VSTORE_HALF(__local , ) |
858 | +IMPLEMENT_VSTORE_HALF(__local , _rte) |
859 | +IMPLEMENT_VSTORE_HALF(__local , _rtz) |
860 | +IMPLEMENT_VSTORE_HALF(__local , _rtp) |
861 | +IMPLEMENT_VSTORE_HALF(__local , _rtn) |
862 | +/* IMPLEMENT_VSTORE_HALF(__private , ) */ |
863 | +/* IMPLEMENT_VSTORE_HALF(__private , _rte) */ |
864 | +/* IMPLEMENT_VSTORE_HALF(__private , _rtz) */ |
865 | +/* IMPLEMENT_VSTORE_HALF(__private , _rtp) */ |
866 | +/* IMPLEMENT_VSTORE_HALF(__private , _rtn) */ |
867 | + |
868 | +#endif |
869 | |
870 | === modified file 'lib/kernel/x86_64/fabs.cl' |
871 | --- lib/kernel/x86_64/fabs.cl 2011-10-31 17:00:12 +0000 |
872 | +++ lib/kernel/x86_64/fabs.cl 2011-12-15 04:21:23 +0000 |
873 | @@ -29,8 +29,8 @@ |
874 | DEFINE_EXPR_V_V(fabs, |
875 | ({ |
876 | int bits = CHAR_BIT * sizeof(stype); |
877 | - jtype sign_mask = (jtype)1 << (jtype)(bits - 1); |
878 | - jtype result = ~sign_mask & *(jtype*)&a; |
879 | + sjtype sign_mask = (sjtype)1 << (sjtype)(bits - 1); |
880 | + sjtype result = ~sign_mask & *(jtype*)&a; |
881 | *(vtype*)&result; |
882 | })) |
883 | |
884 | @@ -70,7 +70,7 @@ |
885 | uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \ |
886 | __asm__ ("andps %[src], %[dst]" : \ |
887 | [dst] "+x" (a) : \ |
888 | - [src] "x" (~sign_mask)); \ |
889 | + [src] "xm" (~sign_mask)); \ |
890 | a; \ |
891 | }) |
892 | #define IMPLEMENT_FABS_AVX_FLOAT8 \ |
893 | @@ -78,16 +78,16 @@ |
894 | uint8 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, \ |
895 | 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \ |
896 | __asm__ ("andps256 %[src], %[dst]" : \ |
897 | - [dst] "=x" (a) : \ |
898 | - "[dst]" (a), [src] "x" (~sign_mask)); \ |
899 | + [dst] "+x" (a) : \ |
900 | + [src] "xm" (~sign_mask)); \ |
901 | a; \ |
902 | }) |
903 | #define IMPLEMENT_FABS_SSE2_DOUBLE2 \ |
904 | ({ \ |
905 | ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL}; \ |
906 | __asm__ ("andpd %[src], %[dst]" : \ |
907 | - [dst] "=x" (a) : \ |
908 | - "[dst]" (a), [src] "x" (~sign_mask)); \ |
909 | + [dst] "+x" (a) : \ |
910 | + [src] "xm" (~sign_mask)); \ |
911 | a; \ |
912 | }) |
913 | #define IMPLEMENT_FABS_AVX_DOUBLE4 \ |
914 | @@ -95,8 +95,8 @@ |
915 | ulong4 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL, \ |
916 | 0x8000000000000000UL, 0x8000000000000000UL}; \ |
917 | __asm__ ("andpd256 %[src], %[dst]" : \ |
918 | - [dst] "=x" (a) : \ |
919 | - "[dst]" (a), [src] "x" (~sign_mask)); \ |
920 | + [dst] "+x" (a) : \ |
921 | + [src] "xm" (~sign_mask)); \ |
922 | a; \ |
923 | }) |
924 |
On 12/15/2011 01:09 AM, Erik Schnetter wrote: /code.launchpad .net/~schnetter /pocl/main/ +merge/ 85761
> Erik Schnetter has proposed merging lp:~schnetter/pocl/main into lp:pocl.
>
> Requested reviews: pocl maintaners (pocl)
>
> For more details, see:
> https:/
>
> I added support for the half datatype, protected by #ifdef cl_khr_fp16,
> analogous to cl_khr_fp64. I don't know which targets support this datatype
> (presumably all, since llvm supports them?), so I enabled this for all
> targets -- this will break things if this is wrong.
Just curious...
How does LLVM/Clang support the half by default nowadays? I've heard that for
NVIDIA GPUs, for example, the half is supported only as a storage format. That
is, you have the float in 16bit format in memory but whenever you compute
something with halfs, they are converted to single precision floats to avoid
the need for separate floating point units for halfs.
Just curious to hear what happens when you use half floats in LLVM/Clang
now -- do they convert them to single precision fp whenever computation occurs?
The last time I checked, 'half' was not a datatype in the LLVM IR
thus they could not be selected (to be implemented with the target ISA) nicely.
It seems there are only two intrinsics for halfs available: llvm.org/ docs/LangRef. html#int_ fp16
http://
Does Clang generate those automatically for halfs in OpenCL C now? For example
if you perform a basic operation halfA + halfB, what happens?
I'm interested in a proper half support as for embedded/mobile it is more
beneficial than just for saving the memory bandwidth as you can save in the area
of the FPU, improve the speed, lower the energy consumption, etc. if you
can do with half floats for your computations. But I think they do not accept
it as a proper datatype in LLVM before there is a real (read: off-the-shelf)
target in LLVM that supports it natively.
--
--Pekka