Merge lp:~schnetter/pocl/main into lp:~pocl/pocl/trunk

Proposed by Erik Schnetter
Status: Merged
Merged at revision: 142
Proposed branch: lp:~schnetter/pocl/main
Merge into: lp:~pocl/pocl/trunk
Diff against target: 1717 lines (+1527/-28)
11 files modified
examples/kernel/Makefile.am (+1/-1)
examples/kernel/kernel.c (+28/-11)
examples/kernel/test_bitselect.cl (+1/-0)
examples/kernel/test_fabs.cl (+12/-5)
examples/kernel/test_hadd.cl (+1409/-0)
examples/kernel/test_rotate.cl (+3/-0)
lib/kernel/abs.cl (+7/-8)
lib/kernel/abs_diff.cl (+20/-1)
lib/kernel/add_sat.cl (+1/-2)
lib/kernel/x86_64/sqrt.cl (+9/-0)
tests/testsuite.at (+36/-0)
To merge this branch: bzr merge lp:~schnetter/pocl/main
Reviewer Review Type Date Requested Status
Erik Schnetter Needs Resubmitting
Pekka Jääskeläinen Needs Fixing
Review via email: mp+88508@code.launchpad.net

Description of the change

Add test cases for some kernel functions.
Correct some of these kernel functions.

To post a comment you must log in.
Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

Some failing tests after merging. Please merge from the latest trunk and resubmit.

## ---------------------- ##
## Detailed failed tests. ##
## ---------------------- ##

# -*- compilation -*-
11. testsuite.at:169: testing Kernel function bitselect ...
./testsuite.at:174: $abs_top_builddir/examples/kernel/kernel test_bitselect
--- expout 2012-01-13 17:33:38.000000000 +0200
+++ /home/visit0r/src/pocl/tests/testsuite.dir/at-groups/11/stdout 2012-01-13 17:33:39.000000000 +0200
@@ -1,2 +1,4 @@
 Running test test_bitselect...
+FAIL: bitselect type=long3 a=0x9652eb2d b=0xa7d81f42 c=0x3710aa8d c=0xa7524b20
+FAIL: bitselect type=ulong3 a=0x9652eb2d b=0xa7d81f42 c=0x3710aa8d c=0xa7524b20
 OK
11. testsuite.at:169: 11. Kernel function bitselect (testsuite.at:169): FAILED (testsuite.at:174)

# -*- compilation -*-
13. testsuite.at:185: testing Kernel functions abs abs_diff add_sat hadd rhadd ...
./testsuite.at:190: $abs_top_builddir/examples/kernel/kernel test_hadd
--- expout 2012-01-13 17:33:40.000000000 +0200
+++ /home/visit0r/src/pocl/tests/testsuite.dir/at-groups/13/stdout 2012-01-13 17:33:44.000000000 +0200
@@ -1,2 +1,12 @@
 Running test test_hadd...
+FAIL: abs type=long3
+ [0] a=923839117 good=923839117 res=923839117
+ [1] a=-1203173478 good=-1203173478 res=-1203173478
+ [2] a=-731904544 good=-731904544 res=0
+ [3] a=1864236072 good=-1864236072 res=1
+FAIL: abs type=ulong3
+ [0] a=923839117 good=923839117 res=923839117
+ [1] a=-1203173478 good=-1203173478 res=-1203173478
+ [2] a=-731904544 good=-731904544 res=0
+ [3] a=1864236072 good=1864236072 res=1
 OK
13. testsuite.at:185: 13. Kernel functions abs abs_diff add_sat hadd rhadd (testsuite.at:185): FAILED (testsuite.at:190)

review: Needs Fixing
Revision history for this message
Erik Schnetter (schnetter) wrote :

No tests are failing for me, and merging from the trunk only changed text files. Will push my updated tree soon.

Which version of llvm are you using? 3.0 has a known problem with 3-element vectors; I'm using a later version from the trunk.

lp:~schnetter/pocl/main updated
172. By Erik Schnetter

Merge from trunk

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

Yep, I know 3.0 has problems but so far the tests haven't been affected by them. I'll test with llvm trunk.

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

One test fails also with llvm trunk (x86-64/Debian), but the test which uses the 3-size vecs (which are known to be broken in LLVM 3.0) seems to pass.

I'll add a warning to the configure when using LLVM 3.0 which advises to install LLVM trunk (or 3.1) for more robust OpenCL support.

12. testsuite.at:177: testing Kernel functions fabs signbit copysign ...
./testsuite.at:182: $abs_top_builddir/examples/kernel/kernel test_fabs
--- expout 2012-01-16 13:40:06.000000000 +0200
+++ /home/visit0r/src/pocl/tests/testsuite.dir/at-groups/12/stdout 2012-01-16 13:40:07.000000000 +0200
@@ -1,2 +1,3 @@
 Running test test_fabs...
+FAIL: signbit type=float2 val=-0 res=-1
 OK
12. testsuite.at:177: 12. Kernel functions fabs signbit copysign (testsuite.at:177): FAILED (testsuite.at:182)

review: Needs Fixing
lp:~schnetter/pocl/main updated
173. By Erik Schnetter

Merge from trunk

Revision history for this message
Erik Schnetter (schnetter) wrote :

This test case used to pass, but now breaks with a more recent version of llvm. It seems llvm has a regression in the vector shift operator. I will disable this test until llvm is correct again.

Revision history for this message
Erik Schnetter (schnetter) wrote :

I have disabled the signbit test (and submitted a bug report to llvm).

review: Needs Resubmitting
lp:~schnetter/pocl/main updated
174. By Erik Schnetter

Disable signbit test

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'examples/kernel/Makefile.am'
--- examples/kernel/Makefile.am 2012-01-16 13:23:47 +0000
+++ examples/kernel/Makefile.am 2012-01-16 18:23:23 +0000
@@ -24,7 +24,7 @@
2424
25noinst_PROGRAMS = kernel25noinst_PROGRAMS = kernel
2626
27kernel_SOURCES = kernel.c test_bitselect.cl test_fabs.cl test_rotate.cl27kernel_SOURCES = kernel.c test_bitselect.cl test_fabs.cl test_hadd.cl test_rotate.cl
28kernel_LDADD = ../../lib/CL/libOpenCL.la -lm @PTHREAD_LIBS@28kernel_LDADD = ../../lib/CL/libOpenCL.la -lm @PTHREAD_LIBS@
29kernel_CFLAGS = -std=c99 @PTHREAD_CFLAGS@29kernel_CFLAGS = -std=c99 @PTHREAD_CFLAGS@
3030
3131
=== modified file 'examples/kernel/kernel.c'
--- examples/kernel/kernel.c 2011-12-14 01:12:04 +0000
+++ examples/kernel/kernel.c 2012-01-16 18:23:23 +0000
@@ -70,24 +70,41 @@
7070
7171
72int72int
73main(void)73main(int argc, char **argv)
74{74{
75 char const *const tests[] = {75 if (argc < 2) {
76 "test_bitselect",76
77 "test_fabs",77 /* Run all tests */
78 "test_rotate",78 char const *const tests[] = {
79 };79 "test_bitselect",
80 int const ntests = sizeof(tests)/sizeof(*tests);80 "test_fabs",
81 for (int i=0; i<ntests; ++i) {81 "test_hadd",
82 printf("Running test #%d %s...\n", i, tests[i]);82 //"test_rotate", /* TODO: this test fails; LLVM bug #11555 */
83 };
84 int const ntests = sizeof(tests)/sizeof(*tests);
85 for (int i=0; i<ntests; ++i) {
86 printf("Running test #%d %s...\n", i, tests[i]);
87 int ierr;
88 ierr = call_test(tests[i]);
89 if (ierr) {
90 printf("FAIL\n");
91 return 1;
92 }
93 }
94
95 } else {
96
97 /* Run one test */
98 printf("Running test %s...\n", argv[1]);
83 int ierr;99 int ierr;
84 ierr = call_test(tests[i]);100 ierr = call_test(argv[1]);
85 if (ierr) {101 if (ierr) {
86 printf("FAIL\n");102 printf("FAIL\n");
87 return 1;103 return 1;
88 }104 }
105
89 }106 }
90 107
91 printf("DONE\n");108 printf("OK\n");
92 return 0;109 return 0;
93}110}
94111
=== modified file 'examples/kernel/test_bitselect.cl'
--- examples/kernel/test_bitselect.cl 2011-12-18 02:58:06 +0000
+++ examples/kernel/test_bitselect.cl 2012-01-16 18:23:23 +0000
@@ -1165,6 +1165,7 @@
1165 typename,1165 typename,
1166 (uint)left.s[0], (uint)right.s[0], (uint)sel.s[0],1166 (uint)left.s[0], (uint)right.s[0], (uint)sel.s[0],
1167 (uint)res.s[0]);1167 (uint)res.s[0]);
1168 return;
1168 }1169 }
1169 }1170 }
1170 })1171 })
11711172
=== modified file 'examples/kernel/test_fabs.cl'
--- examples/kernel/test_fabs.cl 2011-12-18 02:58:06 +0000
+++ examples/kernel/test_fabs.cl 2012-01-16 18:23:23 +0000
@@ -1,6 +1,6 @@
1// TESTING: copysign
1// TESTING: fabs2// TESTING: fabs
2// TESTING: signbit3// TESTING: signbit
3// TESTING: copysign
44
5#define IMPLEMENT_BODY_V(NAME, BODY, VTYPE, STYPE, JTYPE, SJTYPE) \5#define IMPLEMENT_BODY_V(NAME, BODY, VTYPE, STYPE, JTYPE, SJTYPE) \
6 void NAME##_##VTYPE() \6 void NAME##_##VTYPE() \
@@ -135,8 +135,11 @@
135 equal = equal && r.sj == g.sj;135 equal = equal && r.sj == g.sj;
136 }136 }
137 if (!equal) {137 if (!equal) {
138 printf("FAIL: fabs type=%s val=%.17g res=%.17g\n",138 for (int n=0; n<vecsize; ++n) {
139 typename, val.s[0], res.s[0]);139 printf("FAIL: fabs type=%s val=%.17g res=%.17g\n",
140 typename, val.s[n], res.s[n]);
141 }
142 return;
140 }143 }
141 /* signbit */144 /* signbit */
142 Jvec ires;145 Jvec ires;
@@ -146,8 +149,11 @@
146 equal = equal && ires.s[n] == (sign>0 ? 0 : vecsize==1 ? +1 : -1);149 equal = equal && ires.s[n] == (sign>0 ? 0 : vecsize==1 ? +1 : -1);
147 }150 }
148 if (!equal) {151 if (!equal) {
149 printf("FAIL: signbit type=%s val=%.17g res=%d\n",152 for (int n=0; n<vecsize; ++n) {
150 typename, val.s[0], (int)ires.s[0]);153 printf("FAIL: signbit type=%s val=%.17g res=%d\n",
154 typename, val.s[n], (int)ires.s[n]);
155 }
156 return;
151 }157 }
152 /* copysign */158 /* copysign */
153 for (int sign2=-1; sign2<=+1; sign2+=2) {159 for (int sign2=-1; sign2<=+1; sign2+=2) {
@@ -164,6 +170,7 @@
164 printf("FAIL: copysign type=%s val=%.17g sign=%.17g res=%.17g\n",170 printf("FAIL: copysign type=%s val=%.17g sign=%.17g res=%.17g\n",
165 typename, val.s[n], sign2*val2.s[n], res.s[n]);171 typename, val.s[n], sign2*val2.s[n], res.s[n]);
166 }172 }
173 return;
167 }174 }
168 }175 }
169 }176 }
170177
=== added file 'examples/kernel/test_hadd.cl'
--- examples/kernel/test_hadd.cl 1970-01-01 00:00:00 +0000
+++ examples/kernel/test_hadd.cl 2012-01-16 18:23:23 +0000
@@ -0,0 +1,1409 @@
1// TESTING: abs
2// TESTING: abs_diff
3// TESTING: add_sat
4// TESTING: hadd
5// TESTING: rhadd
6
7
8
9/* Safe-but-slow arithmetic that can handle larger numbers without
10 overflowing. */
11#define DEFINE_SAFE_1(STYPE) \
12 \
13 STYPE##2 _cl_overloadable safe_normalize(STYPE##2 const a) \
14 { \
15 STYPE const halfbits = 4*sizeof(STYPE); \
16 STYPE const halfmax = (STYPE)1 << halfbits; \
17 STYPE const halfmask = halfmax - (STYPE)1; \
18 STYPE##2 b; \
19 b.s0 = a.s0 & halfmask; \
20 b.s1 = a.s1 + (a.s0 >> halfbits); \
21 return b; \
22 } \
23 \
24 STYPE _cl_overloadable safe_extract(STYPE##2 const a) \
25 { \
26 STYPE const halfbits = 4*sizeof(STYPE); \
27 STYPE const halfmax = (STYPE)1 << halfbits; \
28 STYPE const halfmask = halfmax - (STYPE)1; \
29 STYPE b; \
30 b = a.s0 | a.s1 << halfbits; \
31 return b; \
32 } \
33 \
34 STYPE##2 _cl_overloadable safe_neg(STYPE##2 a) \
35 { \
36 STYPE##2 b; \
37 b.s0 = - a.s0; \
38 b.s1 = - a.s1; \
39 return safe_normalize(b); \
40 } \
41 \
42 STYPE##2 _cl_overloadable safe_abs(STYPE##2 const a) \
43 { \
44 STYPE##2 b; \
45 b = a; \
46 if (b.s1 < (STYPE)0) { \
47 b = safe_neg(b); \
48 } \
49 return b; \
50 } \
51 \
52 STYPE##2 _cl_overloadable safe_add(STYPE##2 const a, STYPE##2 const b) \
53 { \
54 STYPE##2 c; \
55 c.s0 = a.s0 + b.s0; \
56 c.s1 = a.s1 + b.s1; \
57 return safe_normalize(c); \
58 } \
59 \
60 STYPE##2 _cl_overloadable safe_sub(STYPE##2 const a, STYPE##2 const b) \
61 { \
62 STYPE##2 c; \
63 c.s0 = a.s0 - b.s0; \
64 c.s1 = a.s1 - b.s1; \
65 return safe_normalize(c); \
66 } \
67 \
68 STYPE##2 _cl_overloadable safe_max(STYPE##2 const a, STYPE##2 const b) \
69 { \
70 STYPE##2 c; \
71 if (a.s1 > b.s1 || (a.s1 == b.s1 && a.s0 >= b.s0)) { \
72 c = a; \
73 } else { \
74 c = b; \
75 } \
76 return c; \
77 } \
78 \
79 STYPE##2 _cl_overloadable safe_min(STYPE##2 const a, STYPE##2 const b) \
80 { \
81 STYPE##2 c; \
82 if (a.s1 < b.s1 || (a.s1 == b.s1 && a.s0 <= b.s0)) { \
83 c = a; \
84 } else { \
85 c = b; \
86 } \
87 return c; \
88 } \
89 \
90 STYPE##2 _cl_overloadable safe_rshift(STYPE##2 a) \
91 { \
92 STYPE const halfbits = 4*sizeof(STYPE); \
93 STYPE const halfmax = (STYPE)1 << halfbits; \
94 STYPE const halfmask = halfmax - (STYPE)1; \
95 STYPE##2 b; \
96 b.s0 = a.s0 | ((a.s1 & (STYPE)1) << halfbits); \
97 b.s1 = a.s1 & ~(STYPE)1; \
98 b.s0 >>= (STYPE)1; \
99 b.s1 >>= (STYPE)1; \
100 return safe_normalize(b); \
101 }
102
103
104
105#define DEFINE_SAFE_2(TYPE, STYPE) \
106 \
107 STYPE##2 _cl_overloadable safe_create(TYPE const a) \
108 { \
109 STYPE const halfbits = 4*sizeof(STYPE); \
110 STYPE const halfmax = (STYPE)1 << halfbits; \
111 STYPE const halfmask = halfmax - (STYPE)1; \
112 STYPE##2 b; \
113 b.s0 = a & (TYPE)halfmask; \
114 b.s1 = a >> (TYPE)halfbits; \
115 b = safe_normalize(b); \
116 if ((TYPE)safe_extract(b) != a) printf("FAIL: safe_create %d\n", (int)a); \
117 return b; \
118 }
119
120
121
122DEFINE_SAFE_1(char )
123DEFINE_SAFE_1(short)
124DEFINE_SAFE_1(int )
125__IF_INT64(
126DEFINE_SAFE_1(long ))
127
128DEFINE_SAFE_2(char , char )
129DEFINE_SAFE_2(uchar , char )
130DEFINE_SAFE_2(short , short)
131DEFINE_SAFE_2(ushort, short)
132DEFINE_SAFE_2(int , int )
133DEFINE_SAFE_2(uint , int )
134__IF_INT64(
135DEFINE_SAFE_2(long , long )
136DEFINE_SAFE_2(ulong , long ))
137
138
139
140#define IMPLEMENT_BODY_G(NAME, BODY, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \
141 void NAME##_##GTYPE() \
142 { \
143 typedef GTYPE gtype; \
144 typedef SGTYPE sgtype; \
145 typedef UGTYPE ugtype; \
146 typedef SUGTYPE sugtype; \
147 char const *const typename = #GTYPE; \
148 BODY; \
149 }
150#define DEFINE_BODY_G(NAME, EXPR) \
151 IMPLEMENT_BODY_G(NAME, EXPR, char , char , uchar , uchar ) \
152 IMPLEMENT_BODY_G(NAME, EXPR, char2 , char , uchar2 , uchar ) \
153 IMPLEMENT_BODY_G(NAME, EXPR, char3 , char , uchar3 , uchar ) \
154 IMPLEMENT_BODY_G(NAME, EXPR, char4 , char , uchar4 , uchar ) \
155 IMPLEMENT_BODY_G(NAME, EXPR, char8 , char , uchar8 , uchar ) \
156 IMPLEMENT_BODY_G(NAME, EXPR, char16 , char , uchar16 , uchar ) \
157 IMPLEMENT_BODY_G(NAME, EXPR, uchar , uchar , uchar , uchar ) \
158 IMPLEMENT_BODY_G(NAME, EXPR, uchar2 , uchar , uchar2 , uchar ) \
159 IMPLEMENT_BODY_G(NAME, EXPR, uchar3 , uchar , uchar3 , uchar ) \
160 IMPLEMENT_BODY_G(NAME, EXPR, uchar4 , uchar , uchar4 , uchar ) \
161 IMPLEMENT_BODY_G(NAME, EXPR, uchar8 , uchar , uchar8 , uchar ) \
162 IMPLEMENT_BODY_G(NAME, EXPR, uchar16 , uchar , uchar16 , uchar ) \
163 IMPLEMENT_BODY_G(NAME, EXPR, short , short , ushort , ushort) \
164 IMPLEMENT_BODY_G(NAME, EXPR, short2 , short , ushort2 , ushort) \
165 IMPLEMENT_BODY_G(NAME, EXPR, short3 , short , ushort3 , ushort) \
166 IMPLEMENT_BODY_G(NAME, EXPR, short4 , short , ushort4 , ushort) \
167 IMPLEMENT_BODY_G(NAME, EXPR, short8 , short , ushort8 , ushort) \
168 IMPLEMENT_BODY_G(NAME, EXPR, short16 , short , ushort16, ushort) \
169 IMPLEMENT_BODY_G(NAME, EXPR, ushort , ushort, ushort , ushort) \
170 IMPLEMENT_BODY_G(NAME, EXPR, ushort2 , ushort, ushort2 , ushort) \
171 IMPLEMENT_BODY_G(NAME, EXPR, ushort3 , ushort, ushort3 , ushort) \
172 IMPLEMENT_BODY_G(NAME, EXPR, ushort4 , ushort, ushort4 , ushort) \
173 IMPLEMENT_BODY_G(NAME, EXPR, ushort8 , ushort, ushort8 , ushort) \
174 IMPLEMENT_BODY_G(NAME, EXPR, ushort16, ushort, ushort16, ushort) \
175 IMPLEMENT_BODY_G(NAME, EXPR, int , int , uint , uint ) \
176 IMPLEMENT_BODY_G(NAME, EXPR, int2 , int , uint2 , uint ) \
177 IMPLEMENT_BODY_G(NAME, EXPR, int3 , int , uint3 , uint ) \
178 IMPLEMENT_BODY_G(NAME, EXPR, int4 , int , uint4 , uint ) \
179 IMPLEMENT_BODY_G(NAME, EXPR, int8 , int , uint8 , uint ) \
180 IMPLEMENT_BODY_G(NAME, EXPR, int16 , int , uint16 , uint ) \
181 IMPLEMENT_BODY_G(NAME, EXPR, uint , uint , uint , uint ) \
182 IMPLEMENT_BODY_G(NAME, EXPR, uint2 , uint , uint2 , uint ) \
183 IMPLEMENT_BODY_G(NAME, EXPR, uint3 , uint , uint3 , uint ) \
184 IMPLEMENT_BODY_G(NAME, EXPR, uint4 , uint , uint4 , uint ) \
185 IMPLEMENT_BODY_G(NAME, EXPR, uint8 , uint , uint8 , uint ) \
186 IMPLEMENT_BODY_G(NAME, EXPR, uint16 , uint , uint16 , uint ) \
187 __IF_INT64( \
188 IMPLEMENT_BODY_G(NAME, EXPR, long , long , ulong , ulong ) \
189 IMPLEMENT_BODY_G(NAME, EXPR, long2 , long , ulong2 , ulong ) \
190 IMPLEMENT_BODY_G(NAME, EXPR, long3 , long , ulong3 , ulong ) \
191 IMPLEMENT_BODY_G(NAME, EXPR, long4 , long , ulong4 , ulong ) \
192 IMPLEMENT_BODY_G(NAME, EXPR, long8 , long , ulong8 , ulong ) \
193 IMPLEMENT_BODY_G(NAME, EXPR, long16 , long , ulong16 , ulong ) \
194 IMPLEMENT_BODY_G(NAME, EXPR, ulong , ulong , ulong , ulong ) \
195 IMPLEMENT_BODY_G(NAME, EXPR, ulong2 , ulong , ulong2 , ulong ) \
196 IMPLEMENT_BODY_G(NAME, EXPR, ulong3 , ulong , ulong3 , ulong ) \
197 IMPLEMENT_BODY_G(NAME, EXPR, ulong4 , ulong , ulong4 , ulong ) \
198 IMPLEMENT_BODY_G(NAME, EXPR, ulong8 , ulong , ulong8 , ulong ) \
199 IMPLEMENT_BODY_G(NAME, EXPR, ulong16 , ulong , ulong16 , ulong ))
200
201#define CALL_FUNC_G(NAME) \
202 NAME##_char (); \
203 NAME##_char2 (); \
204 NAME##_char3 (); \
205 NAME##_char4 (); \
206 NAME##_char8 (); \
207 NAME##_char16 (); \
208 NAME##_uchar (); \
209 NAME##_uchar2 (); \
210 NAME##_uchar3 (); \
211 NAME##_uchar4 (); \
212 NAME##_uchar8 (); \
213 NAME##_uchar16 (); \
214 NAME##_short (); \
215 NAME##_short2 (); \
216 NAME##_short3 (); \
217 NAME##_short4 (); \
218 NAME##_short8 (); \
219 NAME##_short16 (); \
220 NAME##_ushort (); \
221 NAME##_ushort2 (); \
222 NAME##_ushort3 (); \
223 NAME##_ushort4 (); \
224 NAME##_ushort8 (); \
225 NAME##_ushort16(); \
226 NAME##_int (); \
227 NAME##_int2 (); \
228 NAME##_int3 (); \
229 NAME##_int4 (); \
230 NAME##_int8 (); \
231 NAME##_int16 (); \
232 NAME##_uint (); \
233 NAME##_uint2 (); \
234 NAME##_uint3 (); \
235 NAME##_uint4 (); \
236 NAME##_uint8 (); \
237 NAME##_uint16 (); \
238 __IF_INT64( \
239 NAME##_long (); \
240 NAME##_long2 (); \
241 NAME##_long3 (); \
242 NAME##_long4 (); \
243 NAME##_long8 (); \
244 NAME##_long16 (); \
245 NAME##_ulong (); \
246 NAME##_ulong2 (); \
247 NAME##_ulong3 (); \
248 NAME##_ulong4 (); \
249 NAME##_ulong8 (); \
250 NAME##_ulong16 ();)
251
252
253
254#define is_signed(T) ((T)-1 < (T)+1)
255#define is_floating(T) ((T)0.1 > (T)0.0)
256#define count_bits(T) (CHAR_BIT * sizeof(T))
257
258DEFINE_BODY_G
259(test_hadd,
260 ({
261 _cl_static_assert(sgtype, !is_floating(sgtype));
262 uint const randoms[] = {
263 0x00000000,
264 0x00000001,
265 0x7fffffff,
266 0x80000000,
267 0xfffffffe,
268 0xffffffff,
269 0x01010101,
270 0x80808080,
271 0x55555555,
272 0xaaaaaaaa,
273 116127149,
274 331473970,
275 3314285513,
276 1531519032,
277 3871781304,
278 723260354,
279 3734992454,
280 3048883544,
281 424075405,
282 3760586679,
283 364071113,
284 2212396745,
285 3026460845,
286 2062923368,
287 3945483116,
288 774301702,
289 2010645213,
290 353497300,
291 2240089293,
292 645959945,
293 2929402380,
294 3641106046,
295 3731530029,
296 3788502454,
297 3990366079,
298 3532452335,
299 3231247251,
300 123690193,
301 418692672,
302 4146745661,
303 4170087687,
304 3915754726,
305 2052700648,
306 1748863847,
307 276568793,
308 364266289,
309 24718041,
310 3775186845,
311 935438421,
312 3070232227,
313 558364671,
314 2318351214,
315 17943242,
316 1796864907,
317 727165514,
318 223478118,
319 2448924107,
320 496915291,
321 3372891854,
322 361433487,
323 3273766229,
324 251831411,
325 432661417,
326 772908669,
327 289792578,
328 4150526710,
329 4157662725,
330 2594757327,
331 3052388893,
332 3842089578,
333 3467269013,
334 510187125,
335 2596093643,
336 398042620,
337 4272455984,
338 3711648086,
339 2120827851,
340 77269246,
341 2168059317,
342 2750549452,
343 1712682330,
344 2486520097,
345 625173621,
346 1632501477,
347 2935468416,
348 980045574,
349 3080136685,
350 4291385683,
351 1900746145,
352 3343063222,
353 3737266887,
354 3349055009,
355 3557165116,
356 847440541,
357 1195278641,
358 313889830,
359 622790046,
360 326637691,
361 663570370,
362 662327410,
363 923839117,
364 3091793818,
365 3563062752,
366 1864236072,
367 4251970867,
368 2259486024,
369 2512789432,
370 4278284968,
371 244581614,
372 247706675,
373 3268622648,
374 3758387026,
375 206893256,
376 2892198447,
377 3585538105,
378 2484801188,
379 1063964031,
380 3712657639,
381 23179627,
382 1732005357,
383 2522016557,
384 1058341654,
385 1580368080,
386 1890361257,
387 1167428989,
388 2600065453,
389 1547136389,
390 945856727,
391 2005682606,
392 3399854093,
393 2619154565,
394 2207015138,
395 2836381097,
396 612928932,
397 1537934908,
398 897756908,
399 1142275256,
400 1106163744,
401 3209429231,
402 3317761168,
403 2815958850,
404 1282374282,
405 3861163766,
406 2547903564,
407 3139840265,
408 587243656,
409 3261127556,
410 3955999184,
411 2061849860,
412 3778058575,
413 259659645,
414 935157504,
415 3294850933,
416 2164603733,
417 3772888022,
418 732201413,
419 3677934092,
420 321204420,
421 509807651,
422 3626474557,
423 284622251,
424 3655952885,
425 1512028769,
426 1102588652,
427 2700179235,
428 4167405174,
429 2672050627,
430 3410780487,
431 4153733940,
432 2459759898,
433 568792515,
434 1081882827,
435 3211871042,
436 799411732,
437 2101993855,
438 3415550991,
439 3872737342,
440 4168312654,
441 1889019671,
442 4247531636,
443 2442118552,
444 3024016549,
445 1041817509,
446 141773691,
447 28033810,
448 4034097901,
449 1532981240,
450 2593712697,
451 2751535537,
452 269072724,
453 3363560906,
454 3555817938,
455 611297346,
456 366972507,
457 788151801,
458 3990920857,
459 1611303958,
460 3353102293,
461 1334246396,
462 1114446428,
463 3491128109,
464 2922751152,
465 3053407478,
466 2897830841,
467 176546593,
468 3184221063,
469 37923477,
470 1692128510,
471 165719856,
472 1795746307,
473 2422422413,
474 253227286,
475 2188522595,
476 582156087,
477 2342528685,
478 2080142547,
479 1928462563,
480 2713927482,
481 1944972771,
482 2534268146,
483 830798003,
484 1653357460,
485 291743070,
486 593771532,
487 2941865444,
488 855254640,
489 2401129822,
490 2420945774,
491 2447532144,
492 1137540092,
493 1296659939,
494 3252539825,
495 1165427708,
496 3251476781,
497 2597490804,
498 2518198923,
499 1196242486,
500 3646082981,
501 1347758965,
502 3824891532,
503 2959519286,
504 1523237529,
505 2910666174,
506 3226637035,
507 2116458903,
508 1076998092,
509 4222762545,
510 3061300520,
511 4189298288,
512 3943996060,
513 3129210496,
514 3826669630,
515 4235952488,
516 2624429853,
517 2522766390,
518 4137227001,
519 3846448057,
520 1893377487,
521 3658784739,
522 2368074586,
523 170547540,
524 520741120,
525 2662229630,
526 4265731754,
527 1379762094,
528 3395502906,
529 2242123335,
530 1960965916,
531 561815223,
532 2687853297,
533 4051050259,
534 1845906614,
535 3725623071,
536 1857706909,
537 2487006596,
538 1925919247,
539 2796536825,
540 3499954730,
541 2173320675,
542 3416676849,
543 3637473517,
544 340951464,
545 4152841543,
546 3747544606,
547 2659955417,
548 1695145107,
549 3117280269,
550 826143012,
551 3867179892,
552 4269349771,
553 1002613766,
554 3842086144,
555 1431990957,
556 2466205499,
557 653575141,
558 293530756,
559 2318035308,
560 3728576309,
561 1697894989,
562 2955143882,
563 2109912287,
564 2764187839,
565 1805490664,
566 672567480,
567 1374741155,
568 1662665091,
569 3551530257,
570 350283994,
571 685023916,
572 1887748803,
573 1386316091,
574 185708823,
575 3106823178,
576 3014109065,
577 3823816879,
578 2213358313,
579 2696977340,
580 4075569311,
581 365089277,
582 3466850767,
583 312392153,
584 1065191758,
585 2405243644,
586 3174745999,
587 3617861250,
588 867192904,
589 1046475095,
590 1888985494,
591 1127140157,
592 61671281,
593 128055546,
594 2332619657,
595 993669439,
596 2145370329,
597 1462433204,
598 74990676,
599 2898191247,
600 3601586977,
601 794604597,
602 3597643629,
603 4282141339,
604 251591051,
605 84943504,
606 2016044077,
607 946823499,
608 648214756,
609 2530104367,
610 4254219656,
611 1974542801,
612 53097687,
613 157109688,
614 299310673,
615 2866882336,
616 3335682769,
617 2583612755,
618 4114730718,
619 740387484,
620 986157357,
621 1140355266,
622 2825639379,
623 1198731547,
624 1521261313,
625 1204836445,
626 4294274455,
627 2215732661,
628 1369520150,
629 1515223958,
630 2428295267,
631 1945985266,
632 2168529560,
633 3791933294,
634 4021389338,
635 713695045,
636 4254483898,
637 3795986293,
638 1347498014,
639 1746051095,
640 1364967734,
641 206265390,
642 3940088473,
643 1867270033,
644 3893545471,
645 3545819698,
646 2573105187,
647 3859595967,
648 2823745089,
649 1293424244,
650 3948799370,
651 1524394803,
652 3807487752,
653 4055830971,
654 3124609223,
655 119357574,
656 1490516894,
657 3799908122,
658 1700941394,
659 80878888,
660 2719184407,
661 3603450215,
662 27225525,
663 1413638246,
664 3350206268,
665 2643568519,
666 801305037,
667 1341902999,
668 1420459209,
669 968648411,
670 1826125841,
671 2619721007,
672 537879916,
673 860253620,
674 586683700,
675 827412286,
676 2724526294,
677 1019678576,
678 3998975225,
679 339789397,
680 863181640,
681 970475690,
682 2737385140,
683 322021174,
684 4084948327,
685 80691950,
686 1702782677,
687 1266230197,
688 1100861683,
689 3123418948,
690 258978579,
691 3217833394,
692 1780903315,
693 1345341356,
694 2927579299,
695 931392918,
696 9404798,
697 83278219,
698 2470714323,
699 640357359,
700 2169696414,
701 496463525,
702 4127940882,
703 2965369765,
704 4136333330,
705 1159134689,
706 1798163043,
707 4097403856,
708 4284804850,
709 3165524545,
710 2765224926,
711 931350022,
712 1171636623,
713 845799406,
714 709853915,
715 2348457302,
716 3343956878,
717 2438786363,
718 175730452,
719 598587430,
720 2744955366,
721 447049527,
722 1252796590,
723 3044128900,
724 812683575,
725 3721040746,
726 3404688504,
727 2674021068,
728 959056069,
729 322162714,
730 2008064015,
731 3758321185,
732 2877937989,
733 778007512,
734 3502772435,
735 3084124565,
736 111844966,
737 248248909,
738 22147113,
739 2506501875,
740 1430033847,
741 1690841637,
742 2999017281,
743 3658748205,
744 1632773934,
745 4177069459,
746 3187781304,
747 1182255965,
748 4121685939,
749 300554973,
750 2854502901,
751 642657206,
752 1504346771,
753 128405037,
754 2163092164,
755 1091806675,
756 1144089805,
757 54479906,
758 505543118,
759 2844153548,
760 1010229282,
761 2961721580,
762 4235612700,
763 3508832243,
764 1409461040,
765 2568735295,
766 1191284023,
767 2220949766,
768 2605559386,
769 706551146,
770 3452279268,
771 2372892169,
772 2360210709,
773 3228881405,
774 2987444766,
775 1187314024,
776 908783041,
777 144096950,
778 1915948100,
779 2171208878,
780 420772043,
781 793209353,
782 359527746,
783 625018196,
784 1195796799,
785 2079388581,
786 864869238,
787 765565143,
788 1069647859,
789 3857355469,
790 2436437044,
791 238157644,
792 1612883577,
793 1911189891,
794 2070273440,
795 384222456,
796 1186369477,
797 2844794758,
798 3435869876,
799 1486894286,
800 4062343990,
801 440437688,
802 306253241,
803 3650751868,
804 2695961920,
805 3920128930,
806 3921419250,
807 502951143,
808 311093469,
809 2708936678,
810 36677206,
811 3473343884,
812 577655290,
813 3795127787,
814 1448118037,
815 436359554,
816 2051970204,
817 2644913053,
818 2492587228,
819 3125803824,
820 150160619,
821 1725373463,
822 2221292372,
823 2580064663,
824 1330289179,
825 2700556441,
826 1327212925,
827 651999045,
828 2089310372,
829 3221246949,
830 4148251434,
831 4267892623,
832 897583443,
833 1051813251,
834 2131903377,
835 4121163297,
836 4128279241,
837 1634689556,
838 3369895626,
839 1121895497,
840 3158192590,
841 4290462018,
842 3447288838,
843 4035505534,
844 2945114940,
845 1556028368,
846 4235061319,
847 1535570089,
848 2144940257,
849 1961364931,
850 2509075082,
851 804411045,
852 2290609740,
853 1076471626,
854 3254493188,
855 4284011230,
856 923006875,
857 3722016670,
858 2981439178,
859 2038308778,
860 1755166344,
861 488581856,
862 2624361425,
863 1298790575,
864 3550671725,
865 1845109437,
866 2047411775,
867 2488464246,
868 1391825885,
869 2340290304,
870 3623879917,
871 217171099,
872 3698905333,
873 2718846041,
874 73731529,
875 2053405441,
876 2770197347,
877 2983996080,
878 2612966141,
879 2187183079,
880 2796212469,
881 3797629169,
882 1788932364,
883 17748377,
884 627297271,
885 3689459731,
886 3311799950,
887 4263162298,
888 4016852324,
889 3136750215,
890 1725824049,
891 2844064064,
892 2059159211,
893 3182127070,
894 470655679,
895 1166949584,
896 2425843062,
897 219908183,
898 161770982,
899 2394961157,
900 999226372,
901 2367624166,
902 76287885,
903 1110832227,
904 3358123709,
905 1504127646,
906 49596774,
907 1296560019,
908 2320978173,
909 1163934122,
910 1631947491,
911 2702852639,
912 3856755518,
913 2562943123,
914 991330989,
915 993726248,
916 2133737192,
917 20974150,
918 3808389889,
919 2447868340,
920 2434828629,
921 3344419509,
922 4076789444,
923 1446054487,
924 3815933708,
925 3644670988,
926 3175898122,
927 3057844745,
928 559106380,
929 1840065631,
930 3020573012,
931 3203040371,
932 997381925,
933 2563312032,
934 815510593,
935 121805231,
936 1047507862,
937 1841403695,
938 1563170561,
939 1644198099,
940 3470882735,
941 627296501,
942 3006157508,
943 383648566,
944 3136652449,
945 2252034149,
946 1749861990,
947 956381402,
948 3299624735,
949 2798395931,
950 270054444,
951 3757564211,
952 2933717597,
953 1080178310,
954 1367392714,
955 1135266342,
956 2642448461,
957 1067554284,
958 3694982777,
959 3594374699,
960 4170301369,
961 3593401570,
962 2298071009,
963 1561680798,
964 2788490866,
965 1757829499,
966 8819607,
967 2453686068,
968 3458682663,
969 1614888171,
970 2327536307,
971 13960177,
972 125752716,
973 2312371195,
974 1515197240,
975 189747227,
976 666988376,
977 1401118738,
978 986465965,
979 242793663,
980 1830586663,
981 1603054176,
982 391536104,
983 1403125754,
984 4021998614,
985 157985039,
986 966292223,
987 2476444819,
988 3261614719,
989 3888752449,
990 2300656903,
991 1138839559,
992 1227396086,
993 1029493665,
994 2138482384,
995 2182525175,
996 1437393012,
997 2758514342,
998 1394715363,
999 242430786,
1000 4026759135,
1001 379455166,
1002 3454852592,
1003 1128257576,
1004 513994046,
1005 2437643547,
1006 1851772774,
1007 1096918785,
1008 2537378072,
1009 2020382559,
1010 1306056753,
1011 519939769,
1012 2477462755,
1013 2962076712,
1014 2856059355,
1015 111272034,
1016 2363778749,
1017 3031510224,
1018 297098997,
1019 2716928589,
1020 1988398361,
1021 3715685207,
1022 1158387390,
1023 3239718824,
1024 214276640,
1025 1240159361,
1026 302800084,
1027 258391670,
1028 3118615408,
1029 1789752935,
1030 935790045,
1031 1678444383,
1032 3645357112,
1033 1752731774,
1034 1211889371,
1035 2432949496,
1036 1983838022,
1037 2563701701,
1038 3235972690,
1039 2732559614,
1040 4173627589,
1041 918129740,
1042 3528101943,
1043 945287787,
1044 783593046,
1045 1687101911,
1046 4265659819,
1047 1625936204,
1048 419423123,
1049 404748783,
1050 174814826,
1051 561306387,
1052 441376876,
1053 3649973873,
1054 1191532754,
1055 493829681,
1056 462640703,
1057 3037639795,
1058 4234288143,
1059 787992128,
1060 354556603,
1061 1391557094,
1062 1227150157,
1063 25592400,
1064 3032298621,
1065 1655829692,
1066 1736544192,
1067 2936173068,
1068 1867683432,
1069 3284761215,
1070 2988749127,
1071 62083315,
1072 3675433852,
1073 1134152479,
1074 2537382040,
1075 1147996351,
1076 1287284159,
1077 1889610942,
1078 3549411223,
1079 2634772335,
1080 1621708033,
1081 3268420142,
1082 2635222095,
1083 2856377255,
1084 3703296204,
1085 45831019,
1086 1997278369,
1087 1472530726,
1088 4202051236,
1089 1958581642,
1090 1899513707,
1091 1642075765,
1092 217373156,
1093 1177071505,
1094 2179831909,
1095 1894821896,
1096 375785474,
1097 140181353,
1098 2743987480,
1099 123627609,
1100 3644816362,
1101 4244769687,
1102 4053481902,
1103 4272740073,
1104 1701735471,
1105 1799303028,
1106 2810175160,
1107 1531107068,
1108 3059813822,
1109 4125025775,
1110 1932301928,
1111 358163550,
1112 1246286294,
1113 1901878857,
1114 2449370117,
1115 4061706076,
1116 2875797072,
1117 1661522553,
1118 543545982,
1119 300448222,
1120 4019581644,
1121 3197346443,
1122 731278538,
1123 457112622,
1124 669625172,
1125 2548620393,
1126 2931934447,
1127 2318225955,
1128 427149964,
1129 1097556601,
1130 3585697077,
1131 1901391738,
1132 3019912350,
1133 4193989774,
1134 1411691495,
1135 2549773310,
1136 3130489018,
1137 739444137,
1138 1953561922,
1139 228589899,
1140 974825144,
1141 1873934953,
1142 918502475,
1143 4020302125,
1144 2103082289,
1145 1474428456,
1146 269315616,
1147 3376419786,
1148 2903506696,
1149 169344159,
1150 4151327830,
1151 2861975985,
1152 1583628545,
1153 337656074,
1154 2381206238,
1155 1346357469,
1156 3316549550,
1157 1188140897,
1158 928463634,
1159 120466083,
1160 1048016215,
1161 2053770646,
1162 3729204448,
1163 3630812747,
1164 3421817962,
1165 1471357089,
1166 2971633393,
1167 2721366758,
1168 3977792328,
1169 2771228423,
1170 258029855,
1171 325097628,
1172 2816869331,
1173 228010778,
1174 1815596248,
1175 2677647806,
1176 4069826588,
1177 2009464559,
1178 4003870353,
1179 2558198381,
1180 823508134,
1181 256895388,
1182 130455482,
1183 4107398577,
1184 2446165146,
1185 3086759840,
1186 3128842794,
1187 236454548,
1188 3740649072,
1189 1049081391,
1190 3780795812,
1191 1964380357,
1192 3900635454,
1193 1941196066,
1194 1143285596,
1195 1276856333,
1196 2919547816,
1197 2947639569,
1198 1889305089,
1199 2386910172,
1200 2685680362,
1201 2042792556,
1202 2780968041,
1203 976912013,
1204 3562274424,
1205 2336140155,
1206 3464857244,
1207 1108365812,
1208 1201566469,
1209 707126700,
1210 4047776595,
1211 1289380202,
1212 1231913128,
1213 2819729319,
1214 537908270,
1215 3802355886,
1216 2004615093,
1217 2947614997,
1218 4192189156,
1219 2809733754,
1220 3082820238,
1221 2758499499,
1222 1004612882,
1223 1102702383,
1224 1862546275,
1225 3170345990,
1226 883739952,
1227 1641198615,
1228 957782688,
1229 1503652889,
1230 2210400768,
1231 2002162781,
1232 1553086024,
1233 2591721606,
1234 3830165160,
1235 4181044959,
1236 2735782270,
1237 3825677158,
1238 143739895,
1239 771193452,
1240 35990560,
1241 1014009970,
1242 20768744,
1243 1785268932,
1244 1424740580,
1245 1620237280,
1246 848157259,
1247 3808893671,
1248 2746756110,
1249 3903639825,
1250 1822084165,
1251 2891666588,
1252 3853186896,
1253 4248495212,
1254 1178592425,
1255 455721495,
1256 1848821934,
1257 1558397701,
1258 133397899,
1259 1845531767,
1260 2798312897,
1261 1471176399,
1262 1743248506,
1263 2229972777,
1264 1290369879,
1265 3579075953,
1266 309034994,
1267 929728690,
1268 3841454719,
1269 3031753515,
1270 3606461413,
1271 2412281758,
1272 2993123515,
1273 };
1274 int const nrandoms = sizeof(randoms) / sizeof(*randoms);
1275
1276 int const bits = count_bits(sgtype);
1277 sgtype const tmin =
1278 is_signed(sgtype) ? ((sgtype)1 << (sgtype)(bits-1)) : (sgtype)0;
1279 sgtype const tmax = tmin - (sgtype)1;
1280 for (int iter=0; iter<nrandoms; ++iter) {
1281 typedef union {
1282 gtype v;
1283 ugtype u;
1284 sgtype s[16];
1285 } Tvec;
1286 Tvec x, y, z;
1287 Tvec good_abs;
1288 Tvec good_abs_diff, good_add_sat;
1289 Tvec good_hadd, good_rhadd;
1290 int vecsize = vec_step(gtype);
1291 for (int n=0; n<vecsize; ++n) {
1292 x.s[n] = randoms[(iter+n ) % nrandoms];
1293 y.s[n] = randoms[(iter+n+20) % nrandoms];
1294 z.s[n] = randoms[(iter+n+40) % nrandoms];
1295 if (bits>32) {
1296 x.s[n] = (x.s[n] << (bits/2)) | randoms[(iter+n+100) % nrandoms];
1297 y.s[n] = (y.s[n] << (bits/2)) | randoms[(iter+n+120) % nrandoms];
1298 z.s[n] = (z.s[n] << (bits/2)) | randoms[(iter+n+140) % nrandoms];
1299 }
1300 good_abs.s[n] =
1301 safe_extract(safe_abs(safe_create(x.s[n])));
1302 good_abs_diff.s[n] =
1303 safe_extract(safe_abs(safe_sub(safe_create(x.s[n]),
1304 safe_create(y.s[n]))));
1305 good_add_sat.s[n] =
1306 safe_extract(safe_min(safe_max(safe_add(safe_create(x.s[n]),
1307 safe_create(y.s[n])),
1308 safe_create(tmin)),
1309 safe_create(tmax)));
1310 good_hadd.s[n] =
1311 safe_extract(safe_rshift(safe_add(safe_create(x.s[n]),
1312 safe_create(y.s[n]))));
1313 good_rhadd.s[n] =
1314 safe_extract(safe_rshift(safe_add(safe_add(safe_create(x.s[n]),
1315 safe_create(y.s[n])),
1316 safe_create((sgtype)1))));
1317 }
1318 Tvec res_abs;
1319 Tvec res_abs_diff, res_add_sat;
1320 Tvec res_hadd, res_rhadd;
1321 res_abs.u = abs (x.v);
1322 res_abs_diff.u = abs_diff(x.v, y.v);
1323 res_add_sat.v = add_sat (x.v, y.v);
1324 res_hadd.v = hadd (x.v, y.v);
1325 res_rhadd.v = rhadd (x.v, y.v);
1326 bool equal;
1327 // abs
1328 equal = true;
1329 for (int n=0; n<vecsize; ++n) {
1330 equal = equal && res_abs.s[n] == good_abs.s[n];
1331 }
1332 if (!equal) {
1333 printf("FAIL: abs type=%s\n", typename);
1334 for (int n=0; n<vecsize; ++n) {
1335 printf(" [%d] a=%d good=%d res=%d\n",
1336 n,
1337 (int)x.s[n],
1338 (int)good_abs.s[n], (int)res_abs.s[n]);
1339 }
1340 return;
1341 }
1342 // abs_diff
1343 equal = true;
1344 for (int n=0; n<vecsize; ++n) {
1345 equal = equal && res_abs_diff.s[n] == good_abs_diff.s[n];
1346 }
1347 if (!equal) {
1348 printf("FAIL: abs_diff type=%s\n", typename);
1349 for (int n=0; n<vecsize; ++n) {
1350 printf(" [%d] a=%d b=%d good=%d res=%d\n",
1351 n,
1352 (int)x.s[n], (int)y.s[n],
1353 (int)good_abs_diff.s[n], (int)res_abs_diff.s[n]);
1354 }
1355 return;
1356 }
1357 // add_sat
1358 equal = true;
1359 for (int n=0; n<vecsize; ++n) {
1360 equal = equal && res_add_sat.s[n] == good_add_sat.s[n];
1361 }
1362 if (!equal) {
1363 printf("FAIL: add_sat type=%s\n", typename);
1364 for (int n=0; n<vecsize; ++n) {
1365 printf(" [%d] a=%d b=%d good=%d res=%d\n",
1366 n,
1367 (int)x.s[n], (int)y.s[n],
1368 (int)good_add_sat.s[n], (int)res_add_sat.s[n]);
1369 }
1370 return;
1371 }
1372 // hadd
1373 equal = true;
1374 for (int n=0; n<vecsize; ++n) {
1375 equal = equal && res_hadd.s[n] == good_hadd.s[n];
1376 }
1377 if (!equal) {
1378 printf("FAIL: hadd type=%s\n", typename);
1379 for (int n=0; n<vecsize; ++n) {
1380 printf(" [%d] a=%d b=%d good=%d res=%d\n",
1381 n,
1382 (int)x.s[n], (int)y.s[n],
1383 (int)good_hadd.s[n], (int)res_hadd.s[n]);
1384 }
1385 return;
1386 }
1387 // rhadd
1388 equal = true;
1389 for (int n=0; n<vecsize; ++n) {
1390 equal = equal && res_rhadd.s[n] == good_rhadd.s[n];
1391 }
1392 if (!equal) {
1393 printf("FAIL: rhadd type=%s\n", typename);
1394 for (int n=0; n<vecsize; ++n) {
1395 printf(" [%d] a=%d b=%d good=%d res=%d\n",
1396 n,
1397 (int)x.s[n], (int)y.s[n],
1398 (int)good_rhadd.s[n], (int)res_rhadd.s[n]);
1399 }
1400 return;
1401 }
1402 }
1403 })
1404 )
1405
1406kernel void test_hadd()
1407{
1408 CALL_FUNC_G(test_hadd)
1409}
01410
=== modified file 'examples/kernel/test_rotate.cl'
--- examples/kernel/test_rotate.cl 2011-12-18 02:58:06 +0000
+++ examples/kernel/test_rotate.cl 2012-01-16 18:23:23 +0000
@@ -169,6 +169,7 @@
169 printf("FAIL: shift left (<<) type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",169 printf("FAIL: shift left (<<) type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",
170 typename, patterns[p], shiftbase, shiftoffset,170 typename, patterns[p], shiftbase, shiftoffset,
171 (uint)res.s[0], (uint)shl.s[0]);171 (uint)res.s[0], (uint)shl.s[0]);
172 return;
172 }173 }
173 /* shift right */174 /* shift right */
174 res.v = val.v >> shift.v;175 res.v = val.v >> shift.v;
@@ -180,6 +181,7 @@
180 printf("FAIL: shift right (>>) type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",181 printf("FAIL: shift right (>>) type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",
181 typename, patterns[p], shiftbase, shiftoffset,182 typename, patterns[p], shiftbase, shiftoffset,
182 (uint)res.s[0], (uint)shr.s[0]);183 (uint)res.s[0], (uint)shr.s[0]);
184 return;
183 }185 }
184 /* rotate */186 /* rotate */
185 res.v = rotate(val.v, shift.v);187 res.v = rotate(val.v, shift.v);
@@ -191,6 +193,7 @@
191 printf("FAIL: rotate type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",193 printf("FAIL: rotate type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",
192 typename, patterns[p], shiftbase, shiftoffset,194 typename, patterns[p], shiftbase, shiftoffset,
193 (uint)res.s[0], (uint)rot.s[0]);195 (uint)res.s[0], (uint)rot.s[0]);
196 return;
194 }197 }
195 }198 }
196 }199 }
197200
=== modified file 'lib/kernel/abs.cl'
--- lib/kernel/abs.cl 2011-10-27 00:18:42 +0000
+++ lib/kernel/abs.cl 2012-01-16 18:23:23 +0000
@@ -23,13 +23,12 @@
2323
24#include "templates.h"24#include "templates.h"
2525
26/* Define "missing" builtins */26#define __builtin_abshh(a) ((uchar )(a>=(char )0 ? a : -a))
27#define __builtin_abshh(a) (uchar )(a>=(char )0 ? a : -a)27#define __builtin_absh(a) ((ushort)(a>=(short)0 ? a : -a))
28#define __builtin_absh(a) (ushort)(a>=(short)0 ? a : -a)28#define __builtin_absl(a) ((ulong )(a>=(long )0 ? a : -a))
29#define __builtin_absl(a) (ulong )(a>=(long )0 ? a : -a)29#define __builtin_absuhh(a) a
30#define __builtin_absuhh(a) a30#define __builtin_absuh(a) a
31#define __builtin_absuh(a) a31#define __builtin_absu(a) a
32#define __builtin_absu(a) a32#define __builtin_absul(a) a
33#define __builtin_absul(a) a
3433
35DEFINE_BUILTIN_UG_G(abs)34DEFINE_BUILTIN_UG_G(abs)
3635
=== modified file 'lib/kernel/abs_diff.cl'
--- lib/kernel/abs_diff.cl 2011-10-26 03:01:29 +0000
+++ lib/kernel/abs_diff.cl 2012-01-16 18:23:23 +0000
@@ -23,4 +23,23 @@
2323
24#include "templates.h"24#include "templates.h"
2525
26DEFINE_EXPR_UG_GG(abs_diff, abs(a-b))26// DEFINE_EXPR_UG_GG(abs_diff, abs(a-b))
27
28// This could probably also be optimised
29DEFINE_EXPR_UG_GG(abs_diff,
30 (sgtype)-1 < (sgtype)0 ?
31 /* signed */
32 ({
33 (a^b) >= (gtype)0 ?
34 /* same sign: no overflow/underflow */
35 abs(a-b) :
36 /* different signs */
37 abs(a) + abs(b);
38 }) :
39 /* unsigned */
40 ({
41 /* This abs prevents a type error; it is not
42 exectued for signed types, and is a no-op for
43 unsigned types */
44 abs(a > b ? a-b : b-a);
45 }))
2746
=== modified file 'lib/kernel/add_sat.cl'
--- lib/kernel/add_sat.cl 2011-10-26 21:01:40 +0000
+++ lib/kernel/add_sat.cl 2012-01-16 18:23:23 +0000
@@ -30,7 +30,6 @@
30// ushort __builtin_ia32_paddusw12830// ushort __builtin_ia32_paddusw128
31// Other types don't seem to be supported.31// Other types don't seem to be supported.
3232
33// This could do with some testing
34// This could probably also be optimised (i.e. the ?: operators eliminated)33// This could probably also be optimised (i.e. the ?: operators eliminated)
35DEFINE_EXPR_G_GG(add_sat,34DEFINE_EXPR_G_GG(add_sat,
36 (sgtype)-1 < (sgtype)0 ?35 (sgtype)-1 < (sgtype)0 ?
@@ -38,7 +37,7 @@
38 ({37 ({
39 int bits = CHAR_BIT * sizeof(sgtype);38 int bits = CHAR_BIT * sizeof(sgtype);
40 gtype min = (sgtype)1 << (sgtype)(bits-1);39 gtype min = (sgtype)1 << (sgtype)(bits-1);
41 gtype max = min + (sgtype)1;40 gtype max = min - (sgtype)1;
42 (a^b) < (gtype)0 ?41 (a^b) < (gtype)0 ?
43 /* different signs: no overflow/underflow */42 /* different signs: no overflow/underflow */
44 a+b :43 a+b :
4544
=== modified file 'lib/kernel/x86_64/sqrt.cl'
--- lib/kernel/x86_64/sqrt.cl 2011-10-31 16:48:30 +0000
+++ lib/kernel/x86_64/sqrt.cl 2012-01-16 18:23:23 +0000
@@ -21,6 +21,10 @@
21 THE SOFTWARE.21 THE SOFTWARE.
22*/22*/
2323
24#include "../templates.h"
25
26
27
24#define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \28#define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \
25 TYPE _cl_overloadable NAME(TYPE a) \29 TYPE _cl_overloadable NAME(TYPE a) \
26 { \30 { \
@@ -120,3 +124,8 @@
120#endif124#endif
121IMPLEMENT_SPLIT (sqrt, double8 , lo, hi)125IMPLEMENT_SPLIT (sqrt, double8 , lo, hi)
122IMPLEMENT_SPLIT (sqrt, double16, lo, hi)126IMPLEMENT_SPLIT (sqrt, double16, lo, hi)
127
128
129
130DEFINE_EXPR_F_F(half_sqrt, sqrt(a))
131DEFINE_EXPR_F_F(native_sqrt, sqrt(a))
123132
=== modified file 'tests/testsuite.at'
--- tests/testsuite.at 2012-01-04 11:11:09 +0000
+++ tests/testsuite.at 2012-01-16 18:23:23 +0000
@@ -166,6 +166,42 @@
166 166
167AT_BANNER([Kernel runtime library])167AT_BANNER([Kernel runtime library])
168168
169AT_SETUP([Kernel function bitselect])
170AT_DATA([expout],
171[Running test test_bitselect...
172OK
173])
174AT_CHECK([$abs_top_builddir/examples/kernel/kernel test_bitselect], 0, expout)
175AT_CLEANUP
176
177AT_SETUP([Kernel functions fabs signbit copysign])
178AT_DATA([expout],
179[Running test test_fabs...
180OK
181])
182#AT_CHECK([$abs_top_builddir/examples/kernel/kernel test_fabs], 0, expout)
183# Skip this test until >> works correctly again on vectors in clang's OpenCL
184AT_CHECK([exit 77])
185AT_CLEANUP
186
187AT_SETUP([Kernel functions abs abs_diff add_sat hadd rhadd])
188AT_DATA([expout],
189[Running test test_hadd...
190OK
191])
192AT_CHECK([$abs_top_builddir/examples/kernel/kernel test_hadd], 0, expout)
193AT_CLEANUP
194
195AT_SETUP([Kernel functions << >> rotate])
196AT_DATA([expout],
197[Running test test_rotate...
198OK
199])
200#AT_CHECK([$abs_top_builddir/examples/kernel/kernel test_rotate], 0, expout)
201# Skip this test until << and >> work correctly with overflow in clang's OpenCL
202AT_CHECK([exit 77])
203AT_CLEANUP
204
169AT_SETUP([Trigonometric functions])205AT_SETUP([Trigonometric functions])
170AT_DATA([expout],206AT_DATA([expout],
171[f(0.000000, 0.000000, 0.000000, 0.000000) = (1.000000, 1.000000, 1.000000, 1.000000)207[f(0.000000, 0.000000, 0.000000, 0.000000) = (1.000000, 1.000000, 1.000000, 1.000000)