pocl

Merge lp:~schnetter/pocl/main into lp:~pocl/pocl/trunk

main
Merge into trunk

Proposed by Erik Schnetter on 2011-12-14

Status:	Merged
Merge reported by:	Pekka Jääskeläinen
Merged at revision:	not available
Proposed branch:	lp:~schnetter/pocl/main
Merge into:	lp:~pocl/pocl/trunk
Diff against target:	921 lines (+540/-42) 28 files modified .bzrignore (+2/-0) clconfig.h.in (+8/-3) configure.ac (+1/-0) include/_kernel.h (+115/-0) include/arm/types.h (+1/-0) include/tce/types.h (+1/-0) include/types.h (+13/-7) include/x86_64/types.h (+1/-0) lib/kernel/cos.cl (+3/-0) lib/kernel/divide.cl (+27/-0) lib/kernel/exp.cl (+3/-0) lib/kernel/exp10.cl (+3/-0) lib/kernel/exp2.cl (+3/-0) lib/kernel/log.cl (+3/-0) lib/kernel/log10.cl (+3/-0) lib/kernel/log2.cl (+3/-0) lib/kernel/powr.cl (+3/-0) lib/kernel/recip.cl (+27/-0) lib/kernel/rsqrt.cl (+3/-0) lib/kernel/sin.cl (+3/-0) lib/kernel/sources.mk (+5/-1) lib/kernel/sqrt.cl (+3/-0) lib/kernel/tan.cl (+3/-0) lib/kernel/templates.h (+56/-21) lib/kernel/vload.cl (+1/-1) lib/kernel/vload_half.cl (+113/-0) lib/kernel/vstore_half.cl (+124/-0) lib/kernel/x86_64/fabs.cl (+9/-9)
To merge this branch:	bzr merge lp:~schnetter/pocl/main
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
pocl maintaners		2011-12-14	Pending
Review via email: mp+85761@code.launchpad.net

Description of the change

I added support for the half datatype, protected by #ifdef cl_khr_fp16, analogous to cl_khr_fp64. I don't know which targets support this datatype (presumably all, since llvm supports them?), so I enabled this for all targets -- this will break things if this is wrong.

lp:~schnetter/pocl/main updated on 2011-12-15

134. By Erik Schnetter on 2011-12-15: Correct alternative (unused) fabs implementation
135. By Erik Schnetter on 2011-12-15: Auto-detect whether the half type is supported

Revision history for this message

Pekka Jääskeläinen (pekka-jaaskelainen) wrote on 2011-12-15:

On 12/15/2011 01:09 AM, Erik Schnetter wrote:
> Erik Schnetter has proposed merging lp:~schnetter/pocl/main into lp:pocl.
>
> Requested reviews: pocl maintaners (pocl)
>
> For more details, see:
> https://code.launchpad.net/~schnetter/pocl/main/+merge/85761
>
> I added support for the half datatype, protected by #ifdef cl_khr_fp16,
> analogous to cl_khr_fp64. I don't know which targets support this datatype
> (presumably all, since llvm supports them?), so I enabled this for all
> targets -- this will break things if this is wrong.

Just curious...

How does LLVM/Clang support the half by default nowadays? I've heard that for
NVIDIA GPUs, for example, the half is supported only as a storage format. That
is, you have the float in 16bit format in memory but whenever you compute
something with halfs, they are converted to single precision floats to avoid
the need for separate floating point units for halfs.

Just curious to hear what happens when you use half floats in LLVM/Clang
now -- do they convert them to single precision fp whenever computation occurs?
The last time I checked, 'half' was not a datatype in the LLVM IR
thus they could not be selected (to be implemented with the target ISA) nicely.

It seems there are only two intrinsics for halfs available:
http://llvm.org/docs/LangRef.html#int_fp16

Does Clang generate those automatically for halfs in OpenCL C now? For example
if you perform a basic operation halfA + halfB, what happens?

I'm interested in a proper half support as for embedded/mobile it is more
beneficial than just for saving the memory bandwidth as you can save in the area
of the FPU, improve the speed, lower the energy consumption, etc. if you
can do with half floats for your computations. But I think they do not accept
it as a proper datatype in LLVM before there is a real (read: off-the-shelf)
target in LLVM that supports it natively.

--
--Pekka

Revision history for this message

Erik Schnetter (schnetter) wrote on 2011-12-15:

Download full text (3.6 KiB)

OpenCL supports only two operations for halfs: vload_half, converting it to
a float, and vstore_half, converting from a float. Nothing else exists
explicitly, not even vectors of halfs. Essentially the only thing one can
do with the half type is to pass a half* to these load/store routines.

There are routines such as float sin_half(float) that are only required to
have the precision offered by datatype half (allowing optimisations), but
the API is via float. There is text in the standard presumable allowing
this to be optimised to use operations that act directly on half values,
but this is not required.

I added code to detect whether clang supports half (called __fp16 in C),
and if so, these vload_half/vload_store routines are available. sin_half
and friends are always available, forwarding to their float counterparts by
default -- I assume that target-specific optimisations can do better.

-erik

2011/12/15 Pekka Jääskeläinen <email address hidden>

> On 12/15/2011 01:09 AM, Erik Schnetter wrote:
> > Erik Schnetter has proposed merging lp:~schnetter/pocl/main into lp:pocl.
> >
> > Requested reviews: pocl maintaners (pocl)
> >
> > For more details, see:
> > https://code.launchpad.net/~schnetter/pocl/main/+merge/85761
> >
> > I added support for the half datatype, protected by #ifdef cl_khr_fp16,
> > analogous to cl_khr_fp64. I don't know which targets support this
> datatype
> > (presumably all, since llvm supports them?), so I enabled this for all
> > targets -- this will break things if this is wrong.
>
> Just curious...
>
> How does LLVM/Clang support the half by default nowadays? I've heard that
> for
> NVIDIA GPUs, for example, the half is supported only as a storage format.
> That
> is, you have the float in 16bit format in memory but whenever you compute
> something with halfs, they are converted to single precision floats to
> avoid
> the need for separate floating point units for halfs.
>
> Just curious to hear what happens when you use half floats in LLVM/Clang
> now -- do they convert them to single precision fp whenever computation
> occurs?
> The last time I checked, 'half' was not a datatype in the LLVM IR
> thus they could not be selected (to be implemented with the target ISA)
> nicely.
>
> It seems there are only two intrinsics for halfs available:
> http://llvm.org/docs/LangRef.html#int_fp16
>
> Does Clang generate those automatically for halfs in OpenCL C now? For
> example
> if you perform a basic operation halfA + halfB, what happens?
>
> I'm interested in a proper half support as for embedded/mobile it is more
> beneficial than just for saving the memory bandwidth as you can save in
> the area
> of the FPU, improve the speed, lower the energy consumption, etc. if you
> can do with half floats for your computations. But I think they do not
> accept
> it as a proper datatype in LLVM before there is a real (read:
> off-the-shelf)
> target in LLVM that supports it natively.
>
> --
> --Pekka
>
>
>
> ------------------------------------------------------------------------------
> 10 Tips for Better Server Consolidation
> Server virtualization is being driven by many needs.
> But none more important than the need to red...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Brandon Surmanski

Carlos Sánchez de La Lama

Erik Schnetter

Jesse Towner

Kalle Raiskila

Pekka Jääskeläinen

Sumit Semwal

Vincent Danjean

 === modified file '.bzrignore'
 --- .bzrignore	2011-12-14 01:58:32 +0000
 +++ .bzrignore	2011-12-15 04:21:23 +0000
@@ -14,6 +14,7 @@
  # builddir
  Makefile
++pocl.pc
  .deps
  .libs
  *.lo
@@ -21,6 +22,7 @@
  *.la
  ./libtool
++./clconfig.h
  ./config.h
  ./config.log
  ./config.status
 === modified file 'clconfig.h.in'
 --- clconfig.h.in	2011-11-29 16:59:14 +0000
 +++ clconfig.h.in	2011-12-15 04:21:23 +0000
@@ -1,8 +1,13 @@
++/* The size of `long', as computed by sizeof. */
++#undef SIZEOF_LONG
++
++/* The size of `half', as computed by sizeof. */
++#undef SIZEOF___FP16
++/* The OpenCL type `half' is called `__fp16' in C */
++#define SIZEOF_HALF SIZEOF___FP16
++
  /* The size of `double', as computed by sizeof. */
  #undef SIZEOF_DOUBLE
--/* The size of `long', as computed by sizeof. */
--#undef SIZEOF_LONG
--
  /* The size of `void *', as computed by sizeof. */
  #undef SIZEOF_VOID_P
 === modified file 'configure.ac'
 --- configure.ac	2011-12-14 17:59:11 +0000
 +++ configure.ac	2011-12-15 04:21:23 +0000
@@ -222,6 +222,7 @@
  # Checks for typedefs, structures, and compiler characteristics.
  AC_CHECK_SIZEOF([long])
++AC_CHECK_SIZEOF([__fp16])
  AC_CHECK_SIZEOF([double])
  AC_CHECK_SIZEOF([void *])
  AC_CHECK_ALIGNOF([float16], [typedef float float16  __attribute__((__ext_vector_type__(16)));])
 === modified file 'include/_kernel.h'
 --- include/_kernel.h	2011-12-14 01:11:29 +0000
 +++ include/_kernel.h	2011-12-15 04:21:23 +0000
@@ -42,12 +42,22 @@
  #else
  #  define __IF_INT64(x)
  #endif
++#ifdef cl_khr_fp16
++#  define __IF_FP16(x) x
++#else
++#  define __IF_FP16(x)
++#endif
  #ifdef cl_khr_fp64
  #  define __IF_FP64(x) x
  #else
  #  define __IF_FP64(x)
  #endif
++#if defined(cl_khr_fp64) && !defined(cles_khr_int64)
++#  error "cl_khr_fp64 requires cles_khr_int64"
++#endif
++
++
  /* A static assert statement to catch inconsistencies at build time */
  #define _cl_static_assert(_t, _x) typedef int ai##_t[(_x) ? 1 : -1]
@@ -79,6 +89,10 @@
  typedef struct error_undefined_type_ulong error_undefined_type_ulong;
  #  define ulong error_undefined_type_ulong
  #endif
++#ifndef cl_khr_fp16
++typedef struct error_undefined_type_half error_undefined_type_half;
++#  define half error_undefined_type_half
++#endif
  #ifndef cl_khr_fp64
  typedef struct error_undefined_type_double error_undefined_type_double;
  #  define double error_undefined_type_double
@@ -210,6 +224,11 @@
  _cl_static_assert(ulong16, sizeof(ulong16) == 16*sizeof(ulong));
  #endif
++#ifdef cl_khr_fp16
++_cl_static_assert(half, sizeof(half) == 2);
++/* There are no vectors of type half */
++#endif
++
  _cl_static_assert(float , sizeof(float ) == 4);
  _cl_static_assert(float2 , sizeof(float2 ) == 2 *sizeof(float));
  _cl_static_assert(float3 , sizeof(float3 ) == 4 *sizeof(float));
@@ -506,6 +525,7 @@
   *    J: vector of int
   *    U: vector of uint or ulong
   *    S: scalar (float or double)
++ *    F: vector of float
   *    V: vector of float or double
   */
@@ -777,6 +797,20 @@
    double _cl_overloadable NAME(double4 , double4 );     \
    double _cl_overloadable NAME(double8 , double8 );     \
    double _cl_overloadable NAME(double16, double16);)
++#define _CL_DECLARE_FUNC_F_F(NAME)              \
++  float    _cl_overloadable NAME(float   );     \
++  float2   _cl_overloadable NAME(float2  );     \
++  float3   _cl_overloadable NAME(float3  );     \
++  float4   _cl_overloadable NAME(float4  );     \
++  float8   _cl_overloadable NAME(float8  );     \
++  float16  _cl_overloadable NAME(float16 );
++#define _CL_DECLARE_FUNC_F_FF(NAME)                     \
++  float    _cl_overloadable NAME(float   , float   );   \
++  float2   _cl_overloadable NAME(float2  , float2  );   \
++  float3   _cl_overloadable NAME(float3  , float3  );   \
++  float4   _cl_overloadable NAME(float4  , float4  );   \
++  float8   _cl_overloadable NAME(float8  , float8  );   \
++  float16  _cl_overloadable NAME(float16 , float16 );
  /* Move built-in declarations out of the way. (There should be a
     better way of doing so.) These five functions are built-in math
@@ -877,6 +911,35 @@
  _CL_DECLARE_FUNC_V_V(tgamma)
  _CL_DECLARE_FUNC_V_V(trunc)
++_CL_DECLARE_FUNC_F_F(half_cos)
++_CL_DECLARE_FUNC_F_FF(half_divide)
++_CL_DECLARE_FUNC_F_F(half_exp)
++_CL_DECLARE_FUNC_F_F(half_exp2)
++_CL_DECLARE_FUNC_F_F(half_exp10)
++_CL_DECLARE_FUNC_F_F(half_log)
++_CL_DECLARE_FUNC_F_F(half_log2)
++_CL_DECLARE_FUNC_F_F(half_log10)
++_CL_DECLARE_FUNC_F_FF(half_powr)
++_CL_DECLARE_FUNC_F_F(half_recip)
++_CL_DECLARE_FUNC_F_F(half_rsqrt)
++_CL_DECLARE_FUNC_F_F(half_sin)
++_CL_DECLARE_FUNC_F_F(half_sqrt)
++_CL_DECLARE_FUNC_F_F(half_tan)
++_CL_DECLARE_FUNC_F_F(native_cos)
++_CL_DECLARE_FUNC_F_FF(native_divide)
++_CL_DECLARE_FUNC_F_F(native_exp)
++_CL_DECLARE_FUNC_F_F(native_exp2)
++_CL_DECLARE_FUNC_F_F(native_exp10)
++_CL_DECLARE_FUNC_F_F(native_log)
++_CL_DECLARE_FUNC_F_F(native_log2)
++_CL_DECLARE_FUNC_F_F(native_log10)
++_CL_DECLARE_FUNC_F_FF(native_powr)
++_CL_DECLARE_FUNC_F_F(native_recip)
++_CL_DECLARE_FUNC_F_F(native_rsqrt)
++_CL_DECLARE_FUNC_F_F(native_sin)
++_CL_DECLARE_FUNC_F_F(native_sqrt)
++_CL_DECLARE_FUNC_F_F(native_tan)
++
  /* Integer Constants */
@@ -1495,6 +1558,58 @@
  #endif
  */
++#ifdef cl_khr_fp16
++
++#define _CL_DECLARE_VLOAD_HALF(MOD)                                     \
++  float   _cl_overloadable vload_half   (size_t offset, const MOD half *p); \
++  float2  _cl_overloadable vload_half2  (size_t offset, const MOD half *p); \
++  float3  _cl_overloadable vload_half3  (size_t offset, const MOD half *p); \
++  float4  _cl_overloadable vload_half4  (size_t offset, const MOD half *p); \
++  float8  _cl_overloadable vload_half8  (size_t offset, const MOD half *p); \
++  float16 _cl_overloadable vload_half16 (size_t offset, const MOD half *p); \
++  float2  _cl_overloadable vloada_half2 (size_t offset, const MOD half *p); \
++  float3  _cl_overloadable vloada_half3 (size_t offset, const MOD half *p); \
++  float4  _cl_overloadable vloada_half4 (size_t offset, const MOD half *p); \
++  float8  _cl_overloadable vloada_half8 (size_t offset, const MOD half *p); \
++  float16 _cl_overloadable vloada_half16(size_t offset, const MOD half *p);
++
++_CL_DECLARE_VLOAD_HALF(__global)
++_CL_DECLARE_VLOAD_HALF(__local)
++_CL_DECLARE_VLOAD_HALF(__constant)
++/* _CL_DECLARE_VLOAD_HALF(__private) */
++
++/* stores to half may have a suffix: _rte _rtz _rtp _rtn */
++#define _CL_DECLARE_VSTORE_HALF(MOD, SUFFIX)                            \
++  void _cl_overloadable vstore_half##SUFFIX   (float   data, size_t offset, MOD half *p); \
++  void _cl_overloadable vstore_half2##SUFFIX  (float2  data, size_t offset, MOD half *p); \
++  void _cl_overloadable vstore_half3##SUFFIX  (float3  data, size_t offset, MOD half *p); \
++  void _cl_overloadable vstore_half4##SUFFIX  (float4  data, size_t offset, MOD half *p); \
++  void _cl_overloadable vstore_half8##SUFFIX  (float8  data, size_t offset, MOD half *p); \
++  void _cl_overloadable vstore_half16##SUFFIX (float16 data, size_t offset, MOD half *p); \
++  void _cl_overloadable vstorea_half2##SUFFIX (float2  data, size_t offset, MOD half *p); \
++  void _cl_overloadable vstorea_half3##SUFFIX (float3  data, size_t offset, MOD half *p); \
++  void _cl_overloadable vstorea_half4##SUFFIX (float4  data, size_t offset, MOD half *p); \
++  void _cl_overloadable vstorea_half8##SUFFIX (float8  data, size_t offset, MOD half *p); \
++  void _cl_overloadable vstorea_half16##SUFFIX(float16 data, size_t offset, MOD half *p);
++
++_CL_DECLARE_VSTORE_HALF(__global  ,     )
++_CL_DECLARE_VSTORE_HALF(__global  , _rte)
++_CL_DECLARE_VSTORE_HALF(__global  , _rtz)
++_CL_DECLARE_VSTORE_HALF(__global  , _rtp)
++_CL_DECLARE_VSTORE_HALF(__global  , _rtn)
++_CL_DECLARE_VSTORE_HALF(__local   ,     )
++_CL_DECLARE_VSTORE_HALF(__local   , _rte)
++_CL_DECLARE_VSTORE_HALF(__local   , _rtz)
++_CL_DECLARE_VSTORE_HALF(__local   , _rtp)
++_CL_DECLARE_VSTORE_HALF(__local   , _rtn)
++/* _CL_DECLARE_VSTORE_HALF(__private ,     ) */
++/* _CL_DECLARE_VSTORE_HALF(__private , _rte) */
++/* _CL_DECLARE_VSTORE_HALF(__private , _rtz) */
++/* _CL_DECLARE_VSTORE_HALF(__private , _rtp) */
++/* _CL_DECLARE_VSTORE_HALF(__private , _rtn) */
++
++#endif
++
  /* Miscellaneous Vector Functions */
 === modified file 'include/arm/types.h'
 --- include/arm/types.h	2011-12-01 17:21:52 +0000
 +++ include/arm/types.h	2011-12-15 04:21:23 +0000
@@ -4,6 +4,7 @@
  #define __EMBEDDED_PROFILE__ 1
  #undef cles_khr_int64
++#define cl_khr_fp16             /* ES: is this correct? */
  #undef cl_khr_fp64
  typedef uint size_t;
 === modified file 'include/tce/types.h'
 --- include/tce/types.h	2011-12-01 17:21:52 +0000
 +++ include/tce/types.h	2011-12-15 04:21:23 +0000
@@ -4,6 +4,7 @@
  #define __EMBEDDED_PROFILE__ 1
  #undef cles_khr_int64
++#define cl_khr_fp16             /* ES: is this correct? */
  #undef cl_khr_fp64
  typedef uint size_t;
 === modified file 'include/types.h'
 --- include/types.h	2011-12-01 17:21:52 +0000
 +++ include/types.h	2011-12-15 04:21:23 +0000
@@ -13,16 +13,22 @@
  #if SIZEOF_LONG == 8
  #  define cles_khr_int64
--#  if SIZEOF_DOUBLE == 8
--#    define cl_khr_fp64
--#  else
--#    undef cl_khr_fp64
--#  endif
--#else /* SIZEOF_LONG != 8 */
++#else
  #  define __EMBEDDED_PROFILE__ 1
  #  undef cles_khr_int64
++#endif
++
++#if SIZEOF_HALF == 2
++#  define cl_khr_fp16
++#else
++#  undef cl_khr_fp16
++#endif
++
++#if SIZEOF_DOUBLE == 8
++#  define cl_khr_fp64
++#else
  #  undef cl_khr_fp64
--#endif /* SIZEOF_LONG != 8 */
++#endif
  #if SIZEOF_VOID_P == 8
  typedef ulong size_t;
 === modified file 'include/x86_64/types.h'
 --- include/x86_64/types.h	2011-12-09 16:01:21 +0000
 +++ include/x86_64/types.h	2011-12-15 04:21:23 +0000
@@ -4,6 +4,7 @@
  typedef unsigned long ulong;
  #define cles_khr_int64
++#define cl_khr_fp16
  #define cl_khr_fp64
  typedef ulong size_t;
 === modified file 'lib/kernel/cos.cl'
 --- lib/kernel/cos.cl	2011-10-26 03:01:29 +0000
 +++ lib/kernel/cos.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_BUILTIN_V_V(cos)
++
++DEFINE_EXPR_F_F(half_cos, cos(a))
++DEFINE_EXPR_F_F(native_cos, cos(a))
 === added file 'lib/kernel/divide.cl'
 --- lib/kernel/divide.cl	1970-01-01 00:00:00 +0000
 +++ lib/kernel/divide.cl	2011-12-15 04:21:23 +0000
@@ -0,0 +1,27 @@
++/* OpenCL built-in library: divide()
++
++   Copyright (c) 2011 Universidad Rey Juan Carlos
++
++   Permission is hereby granted, free of charge, to any person obtaining a copy
++   of this software and associated documentation files (the "Software"), to deal
++   in the Software without restriction, including without limitation the rights
++   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
++   copies of the Software, and to permit persons to whom the Software is
++   furnished to do so, subject to the following conditions:
++
++   The above copyright notice and this permission notice shall be included in
++   all copies or substantial portions of the Software.
++
++   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
++   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
++   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
++   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
++   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
++   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
++   THE SOFTWARE.
++*/
++
++#include "templates.h"
++
++DEFINE_EXPR_F_FF(half_divide, a/b)
++DEFINE_EXPR_F_FF(native_divide, a/b)
 === modified file 'lib/kernel/exp.cl'
 --- lib/kernel/exp.cl	2011-10-26 03:01:29 +0000
 +++ lib/kernel/exp.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_BUILTIN_V_V(exp)
++
++DEFINE_EXPR_F_F(half_exp, exp(a))
++DEFINE_EXPR_F_F(native_exp, exp(a))
 === modified file 'lib/kernel/exp10.cl'
 --- lib/kernel/exp10.cl	2011-11-05 00:10:25 +0000
 +++ lib/kernel/exp10.cl	2011-12-15 04:21:23 +0000
@@ -28,3 +28,6 @@
  #else
  DEFINE_EXPR_V_V(exp10, exp(M_LN10_F*a))
  #endif
++
++DEFINE_EXPR_F_F(half_exp10, exp10(a))
++DEFINE_EXPR_F_F(native_exp10, exp10(a))
 === modified file 'lib/kernel/exp2.cl'
 --- lib/kernel/exp2.cl	2011-10-26 03:01:29 +0000
 +++ lib/kernel/exp2.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_BUILTIN_V_V(exp2)
++
++DEFINE_EXPR_F_F(half_exp2, exp2(a))
++DEFINE_EXPR_F_F(native_exp2, exp2(a))
 === modified file 'lib/kernel/log.cl'
 --- lib/kernel/log.cl	2011-10-26 03:01:29 +0000
 +++ lib/kernel/log.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_BUILTIN_V_V(log)
++
++DEFINE_EXPR_F_F(half_log, log(a))
++DEFINE_EXPR_F_F(native_log, log(a))
 === modified file 'lib/kernel/log10.cl'
 --- lib/kernel/log10.cl	2011-10-26 03:01:29 +0000
 +++ lib/kernel/log10.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_BUILTIN_V_V(log10)
++
++DEFINE_EXPR_F_F(half_log10, log10(a))
++DEFINE_EXPR_F_F(native_log10, log10(a))
 === modified file 'lib/kernel/log2.cl'
 --- lib/kernel/log2.cl	2011-10-26 03:01:29 +0000
 +++ lib/kernel/log2.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_BUILTIN_V_V(log2)
++
++DEFINE_EXPR_F_F(half_log2, log2(a))
++DEFINE_EXPR_F_F(native_log2, log2(a))
 === modified file 'lib/kernel/powr.cl'
 --- lib/kernel/powr.cl	2011-10-26 03:01:29 +0000
 +++ lib/kernel/powr.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_EXPR_V_VV(powr, pow(a, b))
++
++DEFINE_EXPR_F_FF(half_powr, powr(a, b))
++DEFINE_EXPR_F_FF(native_powr, powr(a, b))
 === added file 'lib/kernel/recip.cl'
 --- lib/kernel/recip.cl	1970-01-01 00:00:00 +0000
 +++ lib/kernel/recip.cl	2011-12-15 04:21:23 +0000
@@ -0,0 +1,27 @@
++/* OpenCL built-in library: recip()
++
++   Copyright (c) 2011 Universidad Rey Juan Carlos
++
++   Permission is hereby granted, free of charge, to any person obtaining a copy
++   of this software and associated documentation files (the "Software"), to deal
++   in the Software without restriction, including without limitation the rights
++   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
++   copies of the Software, and to permit persons to whom the Software is
++   furnished to do so, subject to the following conditions:
++
++   The above copyright notice and this permission notice shall be included in
++   all copies or substantial portions of the Software.
++
++   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
++   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
++   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
++   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
++   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
++   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
++   THE SOFTWARE.
++*/
++
++#include "templates.h"
++
++DEFINE_EXPR_F_F(half_recip, (stype)1/a)
++DEFINE_EXPR_F_F(native_recip, (stype)1/a)
 === modified file 'lib/kernel/rsqrt.cl'
 --- lib/kernel/rsqrt.cl	2011-11-05 00:10:25 +0000
 +++ lib/kernel/rsqrt.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_EXPR_V_V(rsqrt, (stype)1/sqrt(a))
++
++DEFINE_EXPR_F_F(half_rsqrt, rsqrt(a))
++DEFINE_EXPR_F_F(native_rsqrt, rsqrt(a))
 === modified file 'lib/kernel/sin.cl'
 --- lib/kernel/sin.cl	2011-10-26 03:01:29 +0000
 +++ lib/kernel/sin.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_BUILTIN_V_V(sin)
++
++DEFINE_EXPR_F_F(half_sin, sin(a))
++DEFINE_EXPR_F_F(native_sin, sin(a))
 === modified file 'lib/kernel/sources.mk'
 --- lib/kernel/sources.mk	2011-12-05 22:08:57 +0000
 +++ lib/kernel/sources.mk	2011-12-15 04:21:23 +0000
@@ -72,6 +72,8 @@
                        tanpi.cl			\
                        tgamma.cl			\
                        trunc.cl			\
++                      divide.cl			\
++                      recip.cl			\
                        abs.cl			\
                        abs_diff.cl		\
                        add_sat.cl		\
@@ -123,4 +125,6 @@
                        bitselect.cl		\
                        select.cl			\
                        vload.cl			\
--                      vstore.cl
++                      vstore.cl			\
++                      vload_half.cl		\
++                      vstore_half.cl
 === modified file 'lib/kernel/sqrt.cl'
 --- lib/kernel/sqrt.cl	2011-10-29 14:08:33 +0000
 +++ lib/kernel/sqrt.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_BUILTIN_V_V(sqrt)
++
++DEFINE_EXPR_F_F(half_sqrt, sqrt(a))
++DEFINE_EXPR_F_F(native_sqrt, sqrt(a))
 === modified file 'lib/kernel/tan.cl'
 --- lib/kernel/tan.cl	2011-10-26 03:01:29 +0000
 +++ lib/kernel/tan.cl	2011-12-15 04:21:23 +0000
@@ -24,3 +24,6 @@
  #include "templates.h"
  DEFINE_BUILTIN_V_V(tan)
++
++DEFINE_EXPR_F_F(half_tan, tan(a))
++DEFINE_EXPR_F_F(native_tan, tan(a))
 === modified file 'lib/kernel/templates.h'
 --- lib/kernel/templates.h	2011-12-14 01:10:26 +0000
 +++ lib/kernel/templates.h	2011-12-15 04:21:23 +0000
@@ -275,28 +275,30 @@
  /******************************************************************************/
--#define IMPLEMENT_EXPR_V_V(NAME, EXPR, VTYPE, STYPE)    \
--  VTYPE __attribute__ ((overloadable))                  \
--  NAME(VTYPE a, VTYPE b)                                \
--  {                                                     \
--    typedef VTYPE vtype;                                \
--    typedef STYPE stype;                                \
--    return EXPR;                                        \
++#define IMPLEMENT_EXPR_V_V(NAME, EXPR, VTYPE, STYPE, JTYPE, SJTYPE)     \
++  VTYPE __attribute__ ((overloadable))                                  \
++  NAME(VTYPE a, VTYPE b)                                                \
++  {                                                                     \
++    typedef VTYPE vtype;                                                \
++    typedef STYPE stype;                                                \
++    typedef JTYPE jtype;                                                \
++    typedef SJTYPE sjtype;                                              \
++    return EXPR;                                                        \
+   }
--#define DEFINE_EXPR_V_V(NAME, EXPR)                     \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, float   , float )      \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, float2  , float )      \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, float3  , float )      \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, float4  , float )      \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, float8  , float )      \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, float16 , float )      \
--  __IF_FP64(                                            \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, double  , double)      \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, double2 , double)      \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, double3 , double)      \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, double4 , double)      \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, double8 , double)      \
--  IMPLEMENT_EXPR_V_V(NAME, EXPR, double16, double))
++#define DEFINE_EXPR_V_V(NAME, EXPR)                                     \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, float   , float , int   , int )        \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, float2  , float , int2  , int )        \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, float3  , float , int3  , int )        \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, float4  , float , int4  , int )        \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, float8  , float , int8  , int )        \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, float16 , float , int16 , int )        \
++  __IF_FP64(                                                            \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, double  , double, long  , long)        \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, double2 , double, long2 , long)        \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, double3 , double, long3 , long)        \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, double4 , double, long4 , long)        \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, double8 , double, long8 , long)        \
++  IMPLEMENT_EXPR_V_V(NAME, EXPR, double16, double, long16, long))
  #define IMPLEMENT_EXPR_V_VV(NAME, EXPR, VTYPE, STYPE, JTYPE)    \
    VTYPE __attribute__ ((overloadable))                          \
@@ -608,6 +610,39 @@
    IMPLEMENT_EXPR_V_SV(NAME, EXPR, double8 , double)     \
    IMPLEMENT_EXPR_V_SV(NAME, EXPR, double16, double))
++#define IMPLEMENT_EXPR_F_F(NAME, EXPR, VTYPE, STYPE)    \
++  VTYPE __attribute__ ((overloadable))                  \
++  NAME(VTYPE a, VTYPE b)                                \
++  {                                                     \
++    typedef VTYPE vtype;                                \
++    typedef STYPE stype;                                \
++    return EXPR;                                        \
++  }
++#define DEFINE_EXPR_F_F(NAME, EXPR)                     \
++  IMPLEMENT_EXPR_F_F(NAME, EXPR, float   , float )      \
++  IMPLEMENT_EXPR_F_F(NAME, EXPR, float2  , float )      \
++  IMPLEMENT_EXPR_F_F(NAME, EXPR, float3  , float )      \
++  IMPLEMENT_EXPR_F_F(NAME, EXPR, float4  , float )      \
++  IMPLEMENT_EXPR_F_F(NAME, EXPR, float8  , float )      \
++  IMPLEMENT_EXPR_F_F(NAME, EXPR, float16 , float )
++
++#define IMPLEMENT_EXPR_F_FF(NAME, EXPR, VTYPE, STYPE, JTYPE)    \
++  VTYPE __attribute__ ((overloadable))                          \
++  NAME(VTYPE a, VTYPE b)                                        \
++  {                                                             \
++    typedef VTYPE vtype;                                        \
++    typedef STYPE stype;                                        \
++    typedef JTYPE jtype;                                        \
++    return EXPR;                                                \
++  }
++#define DEFINE_EXPR_F_FF(NAME, EXPR)                            \
++  IMPLEMENT_EXPR_F_FF(NAME, EXPR, float   , float , int   )     \
++  IMPLEMENT_EXPR_F_FF(NAME, EXPR, float2  , float , int2  )     \
++  IMPLEMENT_EXPR_F_FF(NAME, EXPR, float3  , float , int3  )     \
++  IMPLEMENT_EXPR_F_FF(NAME, EXPR, float4  , float , int4  )     \
++  IMPLEMENT_EXPR_F_FF(NAME, EXPR, float8  , float , int8  )     \
++  IMPLEMENT_EXPR_F_FF(NAME, EXPR, float16 , float , int16 )
++
  #define IMPLEMENT_BUILTIN_G_G(NAME, GTYPE, UGTYPE, LO, HI)      \
 === modified file 'lib/kernel/vload.cl'
 --- lib/kernel/vload.cl	2011-11-25 17:02:42 +0000
 +++ lib/kernel/vload.cl	2011-12-15 04:21:23 +0000
@@ -1,4 +1,4 @@
--/* OpenCL built-in library: vloa()
++/* OpenCL built-in library: vload()
     Copyright (c) 2011 Universidad Rey Juan Carlos
 === added file 'lib/kernel/vload_half.cl'
 --- lib/kernel/vload_half.cl	1970-01-01 00:00:00 +0000
 +++ lib/kernel/vload_half.cl	2011-12-15 04:21:23 +0000
@@ -0,0 +1,113 @@
++/* OpenCL built-in library: vload_half()
++
++   Copyright (c) 2011 Universidad Rey Juan Carlos
++
++   Permission is hereby granted, free of charge, to any person obtaining a copy
++   of this software and associated documentation files (the "Software"), to deal
++   in the Software without restriction, including without limitation the rights
++   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
++   copies of the Software, and to permit persons to whom the Software is
++   furnished to do so, subject to the following conditions:
++
++   The above copyright notice and this permission notice shall be included in
++   all copies or substantial portions of the Software.
++
++   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
++   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
++   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
++   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
++   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
++   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
++   THE SOFTWARE.
++*/
++
++
++
++#ifdef cl_khr_fp16
++
++#define IMPLEMENT_VLOAD_HALF(MOD)                       \
++                                                        \
++  float    __attribute__ ((__overloadable__))           \
++  vload_half(size_t offset, const MOD half *p)          \
++  {                                                     \
++    return (float)p[offset];                            \
++  }                                                     \
++                                                        \
++  float2 __attribute__ ((__overloadable__))             \
++  vload_half2(size_t offset, const MOD half *p)         \
++  {                                                     \
++    return (float2)(vload_half(0, &p[offset*2]),        \
++                    vload_half(0, &p[offset*2+1]));     \
++  }                                                     \
++                                                        \
++  float3 __attribute__ ((__overloadable__))             \
++  vload_half3(size_t offset, const MOD half *p)         \
++  {                                                     \
++    return (float3)(vload_half2(0, &p[offset*3]),       \
++                    vload_half(0, &p[offset*3+2]));     \
++  }                                                     \
++                                                        \
++  float4 __attribute__ ((__overloadable__))             \
++  vload_half4(size_t offset, const MOD half *p)         \
++  {                                                     \
++    return (float4)(vload_half2(0, &p[offset*4]),       \
++                    vload_half2(0, &p[offset*4+2]));    \
++  }                                                     \
++                                                        \
++  float8 __attribute__ ((__overloadable__))             \
++  vload_half8(size_t offset, const MOD half *p)         \
++  {                                                     \
++    return (float8)(vload_half4(0, &p[offset*8]),       \
++                    vload_half4(0, &p[offset*8+4]));    \
++  }                                                     \
++                                                        \
++  float16 __attribute__ ((__overloadable__))            \
++  vload_half16(size_t offset, const MOD half *p)        \
++  {                                                     \
++    return (float16)(vload_half8(0, &p[offset*16]),     \
++                     vload_half8(0, &p[offset*16+8]));  \
++  }                                                     \
++                                                        \
++  float2 __attribute__ ((__overloadable__))             \
++  vloada_half2(size_t offset, const MOD half *p)        \
++  {                                                     \
++    return (float2)(vload_half(0, &p[offset*2]),        \
++                    vload_half(0, &p[offset*2+1]));     \
++  }                                                     \
++                                                        \
++  float3 __attribute__ ((__overloadable__))             \
++  vloada_half3(size_t offset, const MOD half *p)        \
++  {                                                     \
++    return (float3)(vloada_half2(0, &p[offset*4]),      \
++                    vload_half(0, &p[offset*4+2]));     \
++  }                                                     \
++                                                        \
++  float4 __attribute__ ((__overloadable__))             \
++  vloada_half4(size_t offset, const MOD half *p)        \
++  {                                                     \
++    return (float4)(vloada_half2(0, &p[offset*4]),      \
++                    vloada_half2(0, &p[offset*4+2]));   \
++  }                                                     \
++                                                        \
++  float8 __attribute__ ((__overloadable__))             \
++  vloada_half8(size_t offset, const MOD half *p)        \
++  {                                                     \
++    return (float8)(vloada_half4(0, &p[offset*8]),      \
++                    vloada_half4(0, &p[offset*8+4]));   \
++  }                                                     \
++                                                        \
++  float16 __attribute__ ((__overloadable__))            \
++  vloada_half16(size_t offset, const MOD half *p)       \
++  {                                                     \
++    return (float16)(vloada_half8(0, &p[offset*16]),    \
++                     vloada_half8(0, &p[offset*16+8])); \
++  }
++
++
++
++IMPLEMENT_VLOAD_HALF(__global)
++IMPLEMENT_VLOAD_HALF(__local)
++IMPLEMENT_VLOAD_HALF(__constant)
++/* IMPLEMENT_VLOAD_HALF(__private) */
++
++#endif
 === added file 'lib/kernel/vstore_half.cl'
 --- lib/kernel/vstore_half.cl	1970-01-01 00:00:00 +0000
 +++ lib/kernel/vstore_half.cl	2011-12-15 04:21:23 +0000
@@ -0,0 +1,124 @@
++/* OpenCL built-in library: vstore_half()
++
++   Copyright (c) 2011 Universidad Rey Juan Carlos
++
++   Permission is hereby granted, free of charge, to any person obtaining a copy
++   of this software and associated documentation files (the "Software"), to deal
++   in the Software without restriction, including without limitation the rights
++   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
++   copies of the Software, and to permit persons to whom the Software is
++   furnished to do so, subject to the following conditions:
++
++   The above copyright notice and this permission notice shall be included in
++   all copies or substantial portions of the Software.
++
++   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
++   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
++   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
++   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
++   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
++   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
++   THE SOFTWARE.
++*/
++
++
++
++#ifdef cl_khr_fp16
++
++#define IMPLEMENT_VSTORE_HALF(MOD, SUFFIX)                              \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstore_half##SUFFIX(float data, size_t offset, MOD half *p)           \
++  {                                                                     \
++    p[offset] = data;                                                   \
++  }                                                                     \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstore_half2##SUFFIX(float2 data, size_t offset, MOD half *p)         \
++  {                                                                     \
++    vstore_half##SUFFIX(data.lo, 0, &p[offset*2]);                      \
++    vstore_half##SUFFIX(data.hi, 0, &p[offset*2+1]);                    \
++  }                                                                     \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstore_half3##SUFFIX(float3 data, size_t offset, MOD half *p)         \
++  {                                                                     \
++    vstore_half2##SUFFIX(data.lo, 0, &p[offset*3]);                     \
++    vstore_half##SUFFIX(data.s2, 0, &p[offset*3+2]);                    \
++  }                                                                     \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstore_half4##SUFFIX(float4 data, size_t offset, MOD half *p)         \
++  {                                                                     \
++    vstore_half2##SUFFIX(data.lo, 0, &p[offset*4]);                     \
++    vstore_half2##SUFFIX(data.hi, 0, &p[offset*4+2]);                   \
++  }                                                                     \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstore_half8##SUFFIX(float8 data, size_t offset, MOD half *p)         \
++  {                                                                     \
++    vstore_half4##SUFFIX(data.lo, 0, &p[offset*8]);                     \
++    vstore_half4##SUFFIX(data.hi, 0, &p[offset*8+4]);                   \
++  }                                                                     \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstore_half16##SUFFIX(float16 data, size_t offset, MOD half *p)       \
++  {                                                                     \
++    vstore_half8##SUFFIX(data.lo, 0, &p[offset*16]);                    \
++    vstore_half8##SUFFIX(data.hi, 0, &p[offset*16+8]);                  \
++  }                                                                     \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstorea_half2##SUFFIX(float2 data, size_t offset, MOD half *p)        \
++  {                                                                     \
++    vstore_half##SUFFIX(data.lo, 0, &p[offset*2]);                      \
++    vstore_half##SUFFIX(data.hi, 0, &p[offset*2+1]);                    \
++  }                                                                     \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstorea_half3##SUFFIX(float3 data, size_t offset, MOD half *p)        \
++  {                                                                     \
++    vstorea_half2##SUFFIX(data.lo, 0, &p[offset*3]);                    \
++    vstore_half##SUFFIX(data.s2, 0, &p[offset*3+2]);                    \
++  }                                                                     \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstorea_half4##SUFFIX(float4 data, size_t offset, MOD half *p)        \
++  {                                                                     \
++    vstorea_half2##SUFFIX(data.lo, 0, &p[offset*4]);                    \
++    vstorea_half2##SUFFIX(data.hi, 0, &p[offset*4+2]);                  \
++  }                                                                     \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstorea_half8##SUFFIX(float8 data, size_t offset, MOD half *p)        \
++  {                                                                     \
++    vstorea_half4##SUFFIX(data.lo, 0, &p[offset*8]);                    \
++    vstorea_half4##SUFFIX(data.hi, 0, &p[offset*8+4]);                  \
++  }                                                                     \
++                                                                        \
++  void __attribute__ ((__overloadable__))                               \
++  vstorea_half16##SUFFIX(float16 data, size_t offset, MOD half *p)      \
++  {                                                                     \
++    vstorea_half8##SUFFIX(data.lo, 0, &p[offset*16]);                   \
++    vstorea_half8##SUFFIX(data.hi, 0, &p[offset*16+8]);                 \
++  }
++
++
++
++IMPLEMENT_VSTORE_HALF(__global  ,     )
++IMPLEMENT_VSTORE_HALF(__global  , _rte)
++IMPLEMENT_VSTORE_HALF(__global  , _rtz)
++IMPLEMENT_VSTORE_HALF(__global  , _rtp)
++IMPLEMENT_VSTORE_HALF(__global  , _rtn)
++IMPLEMENT_VSTORE_HALF(__local   ,     )
++IMPLEMENT_VSTORE_HALF(__local   , _rte)
++IMPLEMENT_VSTORE_HALF(__local   , _rtz)
++IMPLEMENT_VSTORE_HALF(__local   , _rtp)
++IMPLEMENT_VSTORE_HALF(__local   , _rtn)
++/* IMPLEMENT_VSTORE_HALF(__private ,     ) */
++/* IMPLEMENT_VSTORE_HALF(__private , _rte) */
++/* IMPLEMENT_VSTORE_HALF(__private , _rtz) */
++/* IMPLEMENT_VSTORE_HALF(__private , _rtp) */
++/* IMPLEMENT_VSTORE_HALF(__private , _rtn) */
++
++#endif
 === modified file 'lib/kernel/x86_64/fabs.cl'
 --- lib/kernel/x86_64/fabs.cl	2011-10-31 17:00:12 +0000
 +++ lib/kernel/x86_64/fabs.cl	2011-12-15 04:21:23 +0000
@@ -29,8 +29,8 @@
  DEFINE_EXPR_V_V(fabs,
                  ({
                    int bits = CHAR_BIT * sizeof(stype);
--                  jtype sign_mask = (jtype)1 << (jtype)(bits - 1);
--                  jtype result = ~sign_mask & *(jtype*)&a;
++                  sjtype sign_mask = (sjtype)1 << (sjtype)(bits - 1);
++                  sjtype result = ~sign_mask & *(jtype*)&a;
                    *(vtype*)&result;
                  }))
@@ -70,7 +70,7 @@
      uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \
      __asm__ ("andps %[src], %[dst]" :                                   \
               [dst] "+x" (a) :                                           \
--             [src] "x" (~sign_mask));                                   \
++             [src] "xm" (~sign_mask));                                  \
      a;                                                                  \
    })
  #define IMPLEMENT_FABS_AVX_FLOAT8                                       \
@@ -78,16 +78,16 @@
      uint8 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, \
 x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \
      __asm__ ("andps256 %[src], %[dst]" :                                \
--             [dst] "=x" (a) :                                           \
--             "[dst]" (a), [src] "x" (~sign_mask));                      \
++             [dst] "+x" (a) :                                           \
++             [src] "xm" (~sign_mask));                                  \
      a;                                                                  \
    })
  #define IMPLEMENT_FABS_SSE2_DOUBLE2                                     \
    ({                                                                    \
      ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL};    \
      __asm__ ("andpd %[src], %[dst]" :                                   \
--             [dst] "=x" (a) :                                           \
--             "[dst]" (a), [src] "x" (~sign_mask));                      \
++             [dst] "+x" (a) :                                           \
++             [src] "xm" (~sign_mask));                                  \
      a;                                                                  \
    })
  #define IMPLEMENT_FABS_AVX_DOUBLE4                                      \
@@ -95,8 +95,8 @@
      ulong4 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL,     \
 x8000000000000000UL, 0x8000000000000000UL};    \
      __asm__ ("andpd256 %[src], %[dst]" :                                \
--             [dst] "=x" (a) :                                           \
--             "[dst]" (a), [src] "x" (~sign_mask));                      \
++             [dst] "+x" (a) :                                           \
++             [src] "xm" (~sign_mask));                                  \
      a;                                                                  \
    })

pocl

Merge lp:~schnetter/pocl/main into lp:~pocl/pocl/trunk

Commit message

Description of the change

Preview Diff

Subscribers