Merge lp:~schnetter/pocl/main into lp:~pocl/pocl/trunk

Proposed by Erik Schnetter
Status: Merged
Merged at revision: 142
Proposed branch: lp:~schnetter/pocl/main
Merge into: lp:~pocl/pocl/trunk
Diff against target: 1717 lines (+1527/-28)
11 files modified
examples/kernel/Makefile.am (+1/-1)
examples/kernel/kernel.c (+28/-11)
examples/kernel/test_bitselect.cl (+1/-0)
examples/kernel/test_fabs.cl (+12/-5)
examples/kernel/test_hadd.cl (+1409/-0)
examples/kernel/test_rotate.cl (+3/-0)
lib/kernel/abs.cl (+7/-8)
lib/kernel/abs_diff.cl (+20/-1)
lib/kernel/add_sat.cl (+1/-2)
lib/kernel/x86_64/sqrt.cl (+9/-0)
tests/testsuite.at (+36/-0)
To merge this branch: bzr merge lp:~schnetter/pocl/main
Reviewer Review Type Date Requested Status
Erik Schnetter Needs Resubmitting
Pekka Jääskeläinen Needs Fixing
Review via email: mp+88508@code.launchpad.net

Description of the change

Add test cases for some kernel functions.
Correct some of these kernel functions.

To post a comment you must log in.
Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

Some failing tests after merging. Please merge from the latest trunk and resubmit.

## ---------------------- ##
## Detailed failed tests. ##
## ---------------------- ##

# -*- compilation -*-
11. testsuite.at:169: testing Kernel function bitselect ...
./testsuite.at:174: $abs_top_builddir/examples/kernel/kernel test_bitselect
--- expout 2012-01-13 17:33:38.000000000 +0200
+++ /home/visit0r/src/pocl/tests/testsuite.dir/at-groups/11/stdout 2012-01-13 17:33:39.000000000 +0200
@@ -1,2 +1,4 @@
 Running test test_bitselect...
+FAIL: bitselect type=long3 a=0x9652eb2d b=0xa7d81f42 c=0x3710aa8d c=0xa7524b20
+FAIL: bitselect type=ulong3 a=0x9652eb2d b=0xa7d81f42 c=0x3710aa8d c=0xa7524b20
 OK
11. testsuite.at:169: 11. Kernel function bitselect (testsuite.at:169): FAILED (testsuite.at:174)

# -*- compilation -*-
13. testsuite.at:185: testing Kernel functions abs abs_diff add_sat hadd rhadd ...
./testsuite.at:190: $abs_top_builddir/examples/kernel/kernel test_hadd
--- expout 2012-01-13 17:33:40.000000000 +0200
+++ /home/visit0r/src/pocl/tests/testsuite.dir/at-groups/13/stdout 2012-01-13 17:33:44.000000000 +0200
@@ -1,2 +1,12 @@
 Running test test_hadd...
+FAIL: abs type=long3
+ [0] a=923839117 good=923839117 res=923839117
+ [1] a=-1203173478 good=-1203173478 res=-1203173478
+ [2] a=-731904544 good=-731904544 res=0
+ [3] a=1864236072 good=-1864236072 res=1
+FAIL: abs type=ulong3
+ [0] a=923839117 good=923839117 res=923839117
+ [1] a=-1203173478 good=-1203173478 res=-1203173478
+ [2] a=-731904544 good=-731904544 res=0
+ [3] a=1864236072 good=1864236072 res=1
 OK
13. testsuite.at:185: 13. Kernel functions abs abs_diff add_sat hadd rhadd (testsuite.at:185): FAILED (testsuite.at:190)

review: Needs Fixing
Revision history for this message
Erik Schnetter (schnetter) wrote :

No tests are failing for me, and merging from the trunk only changed text files. Will push my updated tree soon.

Which version of llvm are you using? 3.0 has a known problem with 3-element vectors; I'm using a later version from the trunk.

lp:~schnetter/pocl/main updated
172. By Erik Schnetter

Merge from trunk

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

Yep, I know 3.0 has problems but so far the tests haven't been affected by them. I'll test with llvm trunk.

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

One test fails also with llvm trunk (x86-64/Debian), but the test which uses the 3-size vecs (which are known to be broken in LLVM 3.0) seems to pass.

I'll add a warning to the configure when using LLVM 3.0 which advises to install LLVM trunk (or 3.1) for more robust OpenCL support.

12. testsuite.at:177: testing Kernel functions fabs signbit copysign ...
./testsuite.at:182: $abs_top_builddir/examples/kernel/kernel test_fabs
--- expout 2012-01-16 13:40:06.000000000 +0200
+++ /home/visit0r/src/pocl/tests/testsuite.dir/at-groups/12/stdout 2012-01-16 13:40:07.000000000 +0200
@@ -1,2 +1,3 @@
 Running test test_fabs...
+FAIL: signbit type=float2 val=-0 res=-1
 OK
12. testsuite.at:177: 12. Kernel functions fabs signbit copysign (testsuite.at:177): FAILED (testsuite.at:182)

review: Needs Fixing
lp:~schnetter/pocl/main updated
173. By Erik Schnetter

Merge from trunk

Revision history for this message
Erik Schnetter (schnetter) wrote :

This test case used to pass, but now breaks with a more recent version of llvm. It seems llvm has a regression in the vector shift operator. I will disable this test until llvm is correct again.

Revision history for this message
Erik Schnetter (schnetter) wrote :

I have disabled the signbit test (and submitted a bug report to llvm).

review: Needs Resubmitting
lp:~schnetter/pocl/main updated
174. By Erik Schnetter

Disable signbit test

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'examples/kernel/Makefile.am'
2--- examples/kernel/Makefile.am 2012-01-16 13:23:47 +0000
3+++ examples/kernel/Makefile.am 2012-01-16 18:23:23 +0000
4@@ -24,7 +24,7 @@
5
6 noinst_PROGRAMS = kernel
7
8-kernel_SOURCES = kernel.c test_bitselect.cl test_fabs.cl test_rotate.cl
9+kernel_SOURCES = kernel.c test_bitselect.cl test_fabs.cl test_hadd.cl test_rotate.cl
10 kernel_LDADD = ../../lib/CL/libOpenCL.la -lm @PTHREAD_LIBS@
11 kernel_CFLAGS = -std=c99 @PTHREAD_CFLAGS@
12
13
14=== modified file 'examples/kernel/kernel.c'
15--- examples/kernel/kernel.c 2011-12-14 01:12:04 +0000
16+++ examples/kernel/kernel.c 2012-01-16 18:23:23 +0000
17@@ -70,24 +70,41 @@
18
19
20 int
21-main(void)
22+main(int argc, char **argv)
23 {
24- char const *const tests[] = {
25- "test_bitselect",
26- "test_fabs",
27- "test_rotate",
28- };
29- int const ntests = sizeof(tests)/sizeof(*tests);
30- for (int i=0; i<ntests; ++i) {
31- printf("Running test #%d %s...\n", i, tests[i]);
32+ if (argc < 2) {
33+
34+ /* Run all tests */
35+ char const *const tests[] = {
36+ "test_bitselect",
37+ "test_fabs",
38+ "test_hadd",
39+ //"test_rotate", /* TODO: this test fails; LLVM bug #11555 */
40+ };
41+ int const ntests = sizeof(tests)/sizeof(*tests);
42+ for (int i=0; i<ntests; ++i) {
43+ printf("Running test #%d %s...\n", i, tests[i]);
44+ int ierr;
45+ ierr = call_test(tests[i]);
46+ if (ierr) {
47+ printf("FAIL\n");
48+ return 1;
49+ }
50+ }
51+
52+ } else {
53+
54+ /* Run one test */
55+ printf("Running test %s...\n", argv[1]);
56 int ierr;
57- ierr = call_test(tests[i]);
58+ ierr = call_test(argv[1]);
59 if (ierr) {
60 printf("FAIL\n");
61 return 1;
62 }
63+
64 }
65
66- printf("DONE\n");
67+ printf("OK\n");
68 return 0;
69 }
70
71=== modified file 'examples/kernel/test_bitselect.cl'
72--- examples/kernel/test_bitselect.cl 2011-12-18 02:58:06 +0000
73+++ examples/kernel/test_bitselect.cl 2012-01-16 18:23:23 +0000
74@@ -1165,6 +1165,7 @@
75 typename,
76 (uint)left.s[0], (uint)right.s[0], (uint)sel.s[0],
77 (uint)res.s[0]);
78+ return;
79 }
80 }
81 })
82
83=== modified file 'examples/kernel/test_fabs.cl'
84--- examples/kernel/test_fabs.cl 2011-12-18 02:58:06 +0000
85+++ examples/kernel/test_fabs.cl 2012-01-16 18:23:23 +0000
86@@ -1,6 +1,6 @@
87+// TESTING: copysign
88 // TESTING: fabs
89 // TESTING: signbit
90-// TESTING: copysign
91
92 #define IMPLEMENT_BODY_V(NAME, BODY, VTYPE, STYPE, JTYPE, SJTYPE) \
93 void NAME##_##VTYPE() \
94@@ -135,8 +135,11 @@
95 equal = equal && r.sj == g.sj;
96 }
97 if (!equal) {
98- printf("FAIL: fabs type=%s val=%.17g res=%.17g\n",
99- typename, val.s[0], res.s[0]);
100+ for (int n=0; n<vecsize; ++n) {
101+ printf("FAIL: fabs type=%s val=%.17g res=%.17g\n",
102+ typename, val.s[n], res.s[n]);
103+ }
104+ return;
105 }
106 /* signbit */
107 Jvec ires;
108@@ -146,8 +149,11 @@
109 equal = equal && ires.s[n] == (sign>0 ? 0 : vecsize==1 ? +1 : -1);
110 }
111 if (!equal) {
112- printf("FAIL: signbit type=%s val=%.17g res=%d\n",
113- typename, val.s[0], (int)ires.s[0]);
114+ for (int n=0; n<vecsize; ++n) {
115+ printf("FAIL: signbit type=%s val=%.17g res=%d\n",
116+ typename, val.s[n], (int)ires.s[n]);
117+ }
118+ return;
119 }
120 /* copysign */
121 for (int sign2=-1; sign2<=+1; sign2+=2) {
122@@ -164,6 +170,7 @@
123 printf("FAIL: copysign type=%s val=%.17g sign=%.17g res=%.17g\n",
124 typename, val.s[n], sign2*val2.s[n], res.s[n]);
125 }
126+ return;
127 }
128 }
129 }
130
131=== added file 'examples/kernel/test_hadd.cl'
132--- examples/kernel/test_hadd.cl 1970-01-01 00:00:00 +0000
133+++ examples/kernel/test_hadd.cl 2012-01-16 18:23:23 +0000
134@@ -0,0 +1,1409 @@
135+// TESTING: abs
136+// TESTING: abs_diff
137+// TESTING: add_sat
138+// TESTING: hadd
139+// TESTING: rhadd
140+
141+
142+
143+/* Safe-but-slow arithmetic that can handle larger numbers without
144+ overflowing. */
145+#define DEFINE_SAFE_1(STYPE) \
146+ \
147+ STYPE##2 _cl_overloadable safe_normalize(STYPE##2 const a) \
148+ { \
149+ STYPE const halfbits = 4*sizeof(STYPE); \
150+ STYPE const halfmax = (STYPE)1 << halfbits; \
151+ STYPE const halfmask = halfmax - (STYPE)1; \
152+ STYPE##2 b; \
153+ b.s0 = a.s0 & halfmask; \
154+ b.s1 = a.s1 + (a.s0 >> halfbits); \
155+ return b; \
156+ } \
157+ \
158+ STYPE _cl_overloadable safe_extract(STYPE##2 const a) \
159+ { \
160+ STYPE const halfbits = 4*sizeof(STYPE); \
161+ STYPE const halfmax = (STYPE)1 << halfbits; \
162+ STYPE const halfmask = halfmax - (STYPE)1; \
163+ STYPE b; \
164+ b = a.s0 | a.s1 << halfbits; \
165+ return b; \
166+ } \
167+ \
168+ STYPE##2 _cl_overloadable safe_neg(STYPE##2 a) \
169+ { \
170+ STYPE##2 b; \
171+ b.s0 = - a.s0; \
172+ b.s1 = - a.s1; \
173+ return safe_normalize(b); \
174+ } \
175+ \
176+ STYPE##2 _cl_overloadable safe_abs(STYPE##2 const a) \
177+ { \
178+ STYPE##2 b; \
179+ b = a; \
180+ if (b.s1 < (STYPE)0) { \
181+ b = safe_neg(b); \
182+ } \
183+ return b; \
184+ } \
185+ \
186+ STYPE##2 _cl_overloadable safe_add(STYPE##2 const a, STYPE##2 const b) \
187+ { \
188+ STYPE##2 c; \
189+ c.s0 = a.s0 + b.s0; \
190+ c.s1 = a.s1 + b.s1; \
191+ return safe_normalize(c); \
192+ } \
193+ \
194+ STYPE##2 _cl_overloadable safe_sub(STYPE##2 const a, STYPE##2 const b) \
195+ { \
196+ STYPE##2 c; \
197+ c.s0 = a.s0 - b.s0; \
198+ c.s1 = a.s1 - b.s1; \
199+ return safe_normalize(c); \
200+ } \
201+ \
202+ STYPE##2 _cl_overloadable safe_max(STYPE##2 const a, STYPE##2 const b) \
203+ { \
204+ STYPE##2 c; \
205+ if (a.s1 > b.s1 || (a.s1 == b.s1 && a.s0 >= b.s0)) { \
206+ c = a; \
207+ } else { \
208+ c = b; \
209+ } \
210+ return c; \
211+ } \
212+ \
213+ STYPE##2 _cl_overloadable safe_min(STYPE##2 const a, STYPE##2 const b) \
214+ { \
215+ STYPE##2 c; \
216+ if (a.s1 < b.s1 || (a.s1 == b.s1 && a.s0 <= b.s0)) { \
217+ c = a; \
218+ } else { \
219+ c = b; \
220+ } \
221+ return c; \
222+ } \
223+ \
224+ STYPE##2 _cl_overloadable safe_rshift(STYPE##2 a) \
225+ { \
226+ STYPE const halfbits = 4*sizeof(STYPE); \
227+ STYPE const halfmax = (STYPE)1 << halfbits; \
228+ STYPE const halfmask = halfmax - (STYPE)1; \
229+ STYPE##2 b; \
230+ b.s0 = a.s0 | ((a.s1 & (STYPE)1) << halfbits); \
231+ b.s1 = a.s1 & ~(STYPE)1; \
232+ b.s0 >>= (STYPE)1; \
233+ b.s1 >>= (STYPE)1; \
234+ return safe_normalize(b); \
235+ }
236+
237+
238+
239+#define DEFINE_SAFE_2(TYPE, STYPE) \
240+ \
241+ STYPE##2 _cl_overloadable safe_create(TYPE const a) \
242+ { \
243+ STYPE const halfbits = 4*sizeof(STYPE); \
244+ STYPE const halfmax = (STYPE)1 << halfbits; \
245+ STYPE const halfmask = halfmax - (STYPE)1; \
246+ STYPE##2 b; \
247+ b.s0 = a & (TYPE)halfmask; \
248+ b.s1 = a >> (TYPE)halfbits; \
249+ b = safe_normalize(b); \
250+ if ((TYPE)safe_extract(b) != a) printf("FAIL: safe_create %d\n", (int)a); \
251+ return b; \
252+ }
253+
254+
255+
256+DEFINE_SAFE_1(char )
257+DEFINE_SAFE_1(short)
258+DEFINE_SAFE_1(int )
259+__IF_INT64(
260+DEFINE_SAFE_1(long ))
261+
262+DEFINE_SAFE_2(char , char )
263+DEFINE_SAFE_2(uchar , char )
264+DEFINE_SAFE_2(short , short)
265+DEFINE_SAFE_2(ushort, short)
266+DEFINE_SAFE_2(int , int )
267+DEFINE_SAFE_2(uint , int )
268+__IF_INT64(
269+DEFINE_SAFE_2(long , long )
270+DEFINE_SAFE_2(ulong , long ))
271+
272+
273+
274+#define IMPLEMENT_BODY_G(NAME, BODY, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \
275+ void NAME##_##GTYPE() \
276+ { \
277+ typedef GTYPE gtype; \
278+ typedef SGTYPE sgtype; \
279+ typedef UGTYPE ugtype; \
280+ typedef SUGTYPE sugtype; \
281+ char const *const typename = #GTYPE; \
282+ BODY; \
283+ }
284+#define DEFINE_BODY_G(NAME, EXPR) \
285+ IMPLEMENT_BODY_G(NAME, EXPR, char , char , uchar , uchar ) \
286+ IMPLEMENT_BODY_G(NAME, EXPR, char2 , char , uchar2 , uchar ) \
287+ IMPLEMENT_BODY_G(NAME, EXPR, char3 , char , uchar3 , uchar ) \
288+ IMPLEMENT_BODY_G(NAME, EXPR, char4 , char , uchar4 , uchar ) \
289+ IMPLEMENT_BODY_G(NAME, EXPR, char8 , char , uchar8 , uchar ) \
290+ IMPLEMENT_BODY_G(NAME, EXPR, char16 , char , uchar16 , uchar ) \
291+ IMPLEMENT_BODY_G(NAME, EXPR, uchar , uchar , uchar , uchar ) \
292+ IMPLEMENT_BODY_G(NAME, EXPR, uchar2 , uchar , uchar2 , uchar ) \
293+ IMPLEMENT_BODY_G(NAME, EXPR, uchar3 , uchar , uchar3 , uchar ) \
294+ IMPLEMENT_BODY_G(NAME, EXPR, uchar4 , uchar , uchar4 , uchar ) \
295+ IMPLEMENT_BODY_G(NAME, EXPR, uchar8 , uchar , uchar8 , uchar ) \
296+ IMPLEMENT_BODY_G(NAME, EXPR, uchar16 , uchar , uchar16 , uchar ) \
297+ IMPLEMENT_BODY_G(NAME, EXPR, short , short , ushort , ushort) \
298+ IMPLEMENT_BODY_G(NAME, EXPR, short2 , short , ushort2 , ushort) \
299+ IMPLEMENT_BODY_G(NAME, EXPR, short3 , short , ushort3 , ushort) \
300+ IMPLEMENT_BODY_G(NAME, EXPR, short4 , short , ushort4 , ushort) \
301+ IMPLEMENT_BODY_G(NAME, EXPR, short8 , short , ushort8 , ushort) \
302+ IMPLEMENT_BODY_G(NAME, EXPR, short16 , short , ushort16, ushort) \
303+ IMPLEMENT_BODY_G(NAME, EXPR, ushort , ushort, ushort , ushort) \
304+ IMPLEMENT_BODY_G(NAME, EXPR, ushort2 , ushort, ushort2 , ushort) \
305+ IMPLEMENT_BODY_G(NAME, EXPR, ushort3 , ushort, ushort3 , ushort) \
306+ IMPLEMENT_BODY_G(NAME, EXPR, ushort4 , ushort, ushort4 , ushort) \
307+ IMPLEMENT_BODY_G(NAME, EXPR, ushort8 , ushort, ushort8 , ushort) \
308+ IMPLEMENT_BODY_G(NAME, EXPR, ushort16, ushort, ushort16, ushort) \
309+ IMPLEMENT_BODY_G(NAME, EXPR, int , int , uint , uint ) \
310+ IMPLEMENT_BODY_G(NAME, EXPR, int2 , int , uint2 , uint ) \
311+ IMPLEMENT_BODY_G(NAME, EXPR, int3 , int , uint3 , uint ) \
312+ IMPLEMENT_BODY_G(NAME, EXPR, int4 , int , uint4 , uint ) \
313+ IMPLEMENT_BODY_G(NAME, EXPR, int8 , int , uint8 , uint ) \
314+ IMPLEMENT_BODY_G(NAME, EXPR, int16 , int , uint16 , uint ) \
315+ IMPLEMENT_BODY_G(NAME, EXPR, uint , uint , uint , uint ) \
316+ IMPLEMENT_BODY_G(NAME, EXPR, uint2 , uint , uint2 , uint ) \
317+ IMPLEMENT_BODY_G(NAME, EXPR, uint3 , uint , uint3 , uint ) \
318+ IMPLEMENT_BODY_G(NAME, EXPR, uint4 , uint , uint4 , uint ) \
319+ IMPLEMENT_BODY_G(NAME, EXPR, uint8 , uint , uint8 , uint ) \
320+ IMPLEMENT_BODY_G(NAME, EXPR, uint16 , uint , uint16 , uint ) \
321+ __IF_INT64( \
322+ IMPLEMENT_BODY_G(NAME, EXPR, long , long , ulong , ulong ) \
323+ IMPLEMENT_BODY_G(NAME, EXPR, long2 , long , ulong2 , ulong ) \
324+ IMPLEMENT_BODY_G(NAME, EXPR, long3 , long , ulong3 , ulong ) \
325+ IMPLEMENT_BODY_G(NAME, EXPR, long4 , long , ulong4 , ulong ) \
326+ IMPLEMENT_BODY_G(NAME, EXPR, long8 , long , ulong8 , ulong ) \
327+ IMPLEMENT_BODY_G(NAME, EXPR, long16 , long , ulong16 , ulong ) \
328+ IMPLEMENT_BODY_G(NAME, EXPR, ulong , ulong , ulong , ulong ) \
329+ IMPLEMENT_BODY_G(NAME, EXPR, ulong2 , ulong , ulong2 , ulong ) \
330+ IMPLEMENT_BODY_G(NAME, EXPR, ulong3 , ulong , ulong3 , ulong ) \
331+ IMPLEMENT_BODY_G(NAME, EXPR, ulong4 , ulong , ulong4 , ulong ) \
332+ IMPLEMENT_BODY_G(NAME, EXPR, ulong8 , ulong , ulong8 , ulong ) \
333+ IMPLEMENT_BODY_G(NAME, EXPR, ulong16 , ulong , ulong16 , ulong ))
334+
335+#define CALL_FUNC_G(NAME) \
336+ NAME##_char (); \
337+ NAME##_char2 (); \
338+ NAME##_char3 (); \
339+ NAME##_char4 (); \
340+ NAME##_char8 (); \
341+ NAME##_char16 (); \
342+ NAME##_uchar (); \
343+ NAME##_uchar2 (); \
344+ NAME##_uchar3 (); \
345+ NAME##_uchar4 (); \
346+ NAME##_uchar8 (); \
347+ NAME##_uchar16 (); \
348+ NAME##_short (); \
349+ NAME##_short2 (); \
350+ NAME##_short3 (); \
351+ NAME##_short4 (); \
352+ NAME##_short8 (); \
353+ NAME##_short16 (); \
354+ NAME##_ushort (); \
355+ NAME##_ushort2 (); \
356+ NAME##_ushort3 (); \
357+ NAME##_ushort4 (); \
358+ NAME##_ushort8 (); \
359+ NAME##_ushort16(); \
360+ NAME##_int (); \
361+ NAME##_int2 (); \
362+ NAME##_int3 (); \
363+ NAME##_int4 (); \
364+ NAME##_int8 (); \
365+ NAME##_int16 (); \
366+ NAME##_uint (); \
367+ NAME##_uint2 (); \
368+ NAME##_uint3 (); \
369+ NAME##_uint4 (); \
370+ NAME##_uint8 (); \
371+ NAME##_uint16 (); \
372+ __IF_INT64( \
373+ NAME##_long (); \
374+ NAME##_long2 (); \
375+ NAME##_long3 (); \
376+ NAME##_long4 (); \
377+ NAME##_long8 (); \
378+ NAME##_long16 (); \
379+ NAME##_ulong (); \
380+ NAME##_ulong2 (); \
381+ NAME##_ulong3 (); \
382+ NAME##_ulong4 (); \
383+ NAME##_ulong8 (); \
384+ NAME##_ulong16 ();)
385+
386+
387+
388+#define is_signed(T) ((T)-1 < (T)+1)
389+#define is_floating(T) ((T)0.1 > (T)0.0)
390+#define count_bits(T) (CHAR_BIT * sizeof(T))
391+
392+DEFINE_BODY_G
393+(test_hadd,
394+ ({
395+ _cl_static_assert(sgtype, !is_floating(sgtype));
396+ uint const randoms[] = {
397+ 0x00000000,
398+ 0x00000001,
399+ 0x7fffffff,
400+ 0x80000000,
401+ 0xfffffffe,
402+ 0xffffffff,
403+ 0x01010101,
404+ 0x80808080,
405+ 0x55555555,
406+ 0xaaaaaaaa,
407+ 116127149,
408+ 331473970,
409+ 3314285513,
410+ 1531519032,
411+ 3871781304,
412+ 723260354,
413+ 3734992454,
414+ 3048883544,
415+ 424075405,
416+ 3760586679,
417+ 364071113,
418+ 2212396745,
419+ 3026460845,
420+ 2062923368,
421+ 3945483116,
422+ 774301702,
423+ 2010645213,
424+ 353497300,
425+ 2240089293,
426+ 645959945,
427+ 2929402380,
428+ 3641106046,
429+ 3731530029,
430+ 3788502454,
431+ 3990366079,
432+ 3532452335,
433+ 3231247251,
434+ 123690193,
435+ 418692672,
436+ 4146745661,
437+ 4170087687,
438+ 3915754726,
439+ 2052700648,
440+ 1748863847,
441+ 276568793,
442+ 364266289,
443+ 24718041,
444+ 3775186845,
445+ 935438421,
446+ 3070232227,
447+ 558364671,
448+ 2318351214,
449+ 17943242,
450+ 1796864907,
451+ 727165514,
452+ 223478118,
453+ 2448924107,
454+ 496915291,
455+ 3372891854,
456+ 361433487,
457+ 3273766229,
458+ 251831411,
459+ 432661417,
460+ 772908669,
461+ 289792578,
462+ 4150526710,
463+ 4157662725,
464+ 2594757327,
465+ 3052388893,
466+ 3842089578,
467+ 3467269013,
468+ 510187125,
469+ 2596093643,
470+ 398042620,
471+ 4272455984,
472+ 3711648086,
473+ 2120827851,
474+ 77269246,
475+ 2168059317,
476+ 2750549452,
477+ 1712682330,
478+ 2486520097,
479+ 625173621,
480+ 1632501477,
481+ 2935468416,
482+ 980045574,
483+ 3080136685,
484+ 4291385683,
485+ 1900746145,
486+ 3343063222,
487+ 3737266887,
488+ 3349055009,
489+ 3557165116,
490+ 847440541,
491+ 1195278641,
492+ 313889830,
493+ 622790046,
494+ 326637691,
495+ 663570370,
496+ 662327410,
497+ 923839117,
498+ 3091793818,
499+ 3563062752,
500+ 1864236072,
501+ 4251970867,
502+ 2259486024,
503+ 2512789432,
504+ 4278284968,
505+ 244581614,
506+ 247706675,
507+ 3268622648,
508+ 3758387026,
509+ 206893256,
510+ 2892198447,
511+ 3585538105,
512+ 2484801188,
513+ 1063964031,
514+ 3712657639,
515+ 23179627,
516+ 1732005357,
517+ 2522016557,
518+ 1058341654,
519+ 1580368080,
520+ 1890361257,
521+ 1167428989,
522+ 2600065453,
523+ 1547136389,
524+ 945856727,
525+ 2005682606,
526+ 3399854093,
527+ 2619154565,
528+ 2207015138,
529+ 2836381097,
530+ 612928932,
531+ 1537934908,
532+ 897756908,
533+ 1142275256,
534+ 1106163744,
535+ 3209429231,
536+ 3317761168,
537+ 2815958850,
538+ 1282374282,
539+ 3861163766,
540+ 2547903564,
541+ 3139840265,
542+ 587243656,
543+ 3261127556,
544+ 3955999184,
545+ 2061849860,
546+ 3778058575,
547+ 259659645,
548+ 935157504,
549+ 3294850933,
550+ 2164603733,
551+ 3772888022,
552+ 732201413,
553+ 3677934092,
554+ 321204420,
555+ 509807651,
556+ 3626474557,
557+ 284622251,
558+ 3655952885,
559+ 1512028769,
560+ 1102588652,
561+ 2700179235,
562+ 4167405174,
563+ 2672050627,
564+ 3410780487,
565+ 4153733940,
566+ 2459759898,
567+ 568792515,
568+ 1081882827,
569+ 3211871042,
570+ 799411732,
571+ 2101993855,
572+ 3415550991,
573+ 3872737342,
574+ 4168312654,
575+ 1889019671,
576+ 4247531636,
577+ 2442118552,
578+ 3024016549,
579+ 1041817509,
580+ 141773691,
581+ 28033810,
582+ 4034097901,
583+ 1532981240,
584+ 2593712697,
585+ 2751535537,
586+ 269072724,
587+ 3363560906,
588+ 3555817938,
589+ 611297346,
590+ 366972507,
591+ 788151801,
592+ 3990920857,
593+ 1611303958,
594+ 3353102293,
595+ 1334246396,
596+ 1114446428,
597+ 3491128109,
598+ 2922751152,
599+ 3053407478,
600+ 2897830841,
601+ 176546593,
602+ 3184221063,
603+ 37923477,
604+ 1692128510,
605+ 165719856,
606+ 1795746307,
607+ 2422422413,
608+ 253227286,
609+ 2188522595,
610+ 582156087,
611+ 2342528685,
612+ 2080142547,
613+ 1928462563,
614+ 2713927482,
615+ 1944972771,
616+ 2534268146,
617+ 830798003,
618+ 1653357460,
619+ 291743070,
620+ 593771532,
621+ 2941865444,
622+ 855254640,
623+ 2401129822,
624+ 2420945774,
625+ 2447532144,
626+ 1137540092,
627+ 1296659939,
628+ 3252539825,
629+ 1165427708,
630+ 3251476781,
631+ 2597490804,
632+ 2518198923,
633+ 1196242486,
634+ 3646082981,
635+ 1347758965,
636+ 3824891532,
637+ 2959519286,
638+ 1523237529,
639+ 2910666174,
640+ 3226637035,
641+ 2116458903,
642+ 1076998092,
643+ 4222762545,
644+ 3061300520,
645+ 4189298288,
646+ 3943996060,
647+ 3129210496,
648+ 3826669630,
649+ 4235952488,
650+ 2624429853,
651+ 2522766390,
652+ 4137227001,
653+ 3846448057,
654+ 1893377487,
655+ 3658784739,
656+ 2368074586,
657+ 170547540,
658+ 520741120,
659+ 2662229630,
660+ 4265731754,
661+ 1379762094,
662+ 3395502906,
663+ 2242123335,
664+ 1960965916,
665+ 561815223,
666+ 2687853297,
667+ 4051050259,
668+ 1845906614,
669+ 3725623071,
670+ 1857706909,
671+ 2487006596,
672+ 1925919247,
673+ 2796536825,
674+ 3499954730,
675+ 2173320675,
676+ 3416676849,
677+ 3637473517,
678+ 340951464,
679+ 4152841543,
680+ 3747544606,
681+ 2659955417,
682+ 1695145107,
683+ 3117280269,
684+ 826143012,
685+ 3867179892,
686+ 4269349771,
687+ 1002613766,
688+ 3842086144,
689+ 1431990957,
690+ 2466205499,
691+ 653575141,
692+ 293530756,
693+ 2318035308,
694+ 3728576309,
695+ 1697894989,
696+ 2955143882,
697+ 2109912287,
698+ 2764187839,
699+ 1805490664,
700+ 672567480,
701+ 1374741155,
702+ 1662665091,
703+ 3551530257,
704+ 350283994,
705+ 685023916,
706+ 1887748803,
707+ 1386316091,
708+ 185708823,
709+ 3106823178,
710+ 3014109065,
711+ 3823816879,
712+ 2213358313,
713+ 2696977340,
714+ 4075569311,
715+ 365089277,
716+ 3466850767,
717+ 312392153,
718+ 1065191758,
719+ 2405243644,
720+ 3174745999,
721+ 3617861250,
722+ 867192904,
723+ 1046475095,
724+ 1888985494,
725+ 1127140157,
726+ 61671281,
727+ 128055546,
728+ 2332619657,
729+ 993669439,
730+ 2145370329,
731+ 1462433204,
732+ 74990676,
733+ 2898191247,
734+ 3601586977,
735+ 794604597,
736+ 3597643629,
737+ 4282141339,
738+ 251591051,
739+ 84943504,
740+ 2016044077,
741+ 946823499,
742+ 648214756,
743+ 2530104367,
744+ 4254219656,
745+ 1974542801,
746+ 53097687,
747+ 157109688,
748+ 299310673,
749+ 2866882336,
750+ 3335682769,
751+ 2583612755,
752+ 4114730718,
753+ 740387484,
754+ 986157357,
755+ 1140355266,
756+ 2825639379,
757+ 1198731547,
758+ 1521261313,
759+ 1204836445,
760+ 4294274455,
761+ 2215732661,
762+ 1369520150,
763+ 1515223958,
764+ 2428295267,
765+ 1945985266,
766+ 2168529560,
767+ 3791933294,
768+ 4021389338,
769+ 713695045,
770+ 4254483898,
771+ 3795986293,
772+ 1347498014,
773+ 1746051095,
774+ 1364967734,
775+ 206265390,
776+ 3940088473,
777+ 1867270033,
778+ 3893545471,
779+ 3545819698,
780+ 2573105187,
781+ 3859595967,
782+ 2823745089,
783+ 1293424244,
784+ 3948799370,
785+ 1524394803,
786+ 3807487752,
787+ 4055830971,
788+ 3124609223,
789+ 119357574,
790+ 1490516894,
791+ 3799908122,
792+ 1700941394,
793+ 80878888,
794+ 2719184407,
795+ 3603450215,
796+ 27225525,
797+ 1413638246,
798+ 3350206268,
799+ 2643568519,
800+ 801305037,
801+ 1341902999,
802+ 1420459209,
803+ 968648411,
804+ 1826125841,
805+ 2619721007,
806+ 537879916,
807+ 860253620,
808+ 586683700,
809+ 827412286,
810+ 2724526294,
811+ 1019678576,
812+ 3998975225,
813+ 339789397,
814+ 863181640,
815+ 970475690,
816+ 2737385140,
817+ 322021174,
818+ 4084948327,
819+ 80691950,
820+ 1702782677,
821+ 1266230197,
822+ 1100861683,
823+ 3123418948,
824+ 258978579,
825+ 3217833394,
826+ 1780903315,
827+ 1345341356,
828+ 2927579299,
829+ 931392918,
830+ 9404798,
831+ 83278219,
832+ 2470714323,
833+ 640357359,
834+ 2169696414,
835+ 496463525,
836+ 4127940882,
837+ 2965369765,
838+ 4136333330,
839+ 1159134689,
840+ 1798163043,
841+ 4097403856,
842+ 4284804850,
843+ 3165524545,
844+ 2765224926,
845+ 931350022,
846+ 1171636623,
847+ 845799406,
848+ 709853915,
849+ 2348457302,
850+ 3343956878,
851+ 2438786363,
852+ 175730452,
853+ 598587430,
854+ 2744955366,
855+ 447049527,
856+ 1252796590,
857+ 3044128900,
858+ 812683575,
859+ 3721040746,
860+ 3404688504,
861+ 2674021068,
862+ 959056069,
863+ 322162714,
864+ 2008064015,
865+ 3758321185,
866+ 2877937989,
867+ 778007512,
868+ 3502772435,
869+ 3084124565,
870+ 111844966,
871+ 248248909,
872+ 22147113,
873+ 2506501875,
874+ 1430033847,
875+ 1690841637,
876+ 2999017281,
877+ 3658748205,
878+ 1632773934,
879+ 4177069459,
880+ 3187781304,
881+ 1182255965,
882+ 4121685939,
883+ 300554973,
884+ 2854502901,
885+ 642657206,
886+ 1504346771,
887+ 128405037,
888+ 2163092164,
889+ 1091806675,
890+ 1144089805,
891+ 54479906,
892+ 505543118,
893+ 2844153548,
894+ 1010229282,
895+ 2961721580,
896+ 4235612700,
897+ 3508832243,
898+ 1409461040,
899+ 2568735295,
900+ 1191284023,
901+ 2220949766,
902+ 2605559386,
903+ 706551146,
904+ 3452279268,
905+ 2372892169,
906+ 2360210709,
907+ 3228881405,
908+ 2987444766,
909+ 1187314024,
910+ 908783041,
911+ 144096950,
912+ 1915948100,
913+ 2171208878,
914+ 420772043,
915+ 793209353,
916+ 359527746,
917+ 625018196,
918+ 1195796799,
919+ 2079388581,
920+ 864869238,
921+ 765565143,
922+ 1069647859,
923+ 3857355469,
924+ 2436437044,
925+ 238157644,
926+ 1612883577,
927+ 1911189891,
928+ 2070273440,
929+ 384222456,
930+ 1186369477,
931+ 2844794758,
932+ 3435869876,
933+ 1486894286,
934+ 4062343990,
935+ 440437688,
936+ 306253241,
937+ 3650751868,
938+ 2695961920,
939+ 3920128930,
940+ 3921419250,
941+ 502951143,
942+ 311093469,
943+ 2708936678,
944+ 36677206,
945+ 3473343884,
946+ 577655290,
947+ 3795127787,
948+ 1448118037,
949+ 436359554,
950+ 2051970204,
951+ 2644913053,
952+ 2492587228,
953+ 3125803824,
954+ 150160619,
955+ 1725373463,
956+ 2221292372,
957+ 2580064663,
958+ 1330289179,
959+ 2700556441,
960+ 1327212925,
961+ 651999045,
962+ 2089310372,
963+ 3221246949,
964+ 4148251434,
965+ 4267892623,
966+ 897583443,
967+ 1051813251,
968+ 2131903377,
969+ 4121163297,
970+ 4128279241,
971+ 1634689556,
972+ 3369895626,
973+ 1121895497,
974+ 3158192590,
975+ 4290462018,
976+ 3447288838,
977+ 4035505534,
978+ 2945114940,
979+ 1556028368,
980+ 4235061319,
981+ 1535570089,
982+ 2144940257,
983+ 1961364931,
984+ 2509075082,
985+ 804411045,
986+ 2290609740,
987+ 1076471626,
988+ 3254493188,
989+ 4284011230,
990+ 923006875,
991+ 3722016670,
992+ 2981439178,
993+ 2038308778,
994+ 1755166344,
995+ 488581856,
996+ 2624361425,
997+ 1298790575,
998+ 3550671725,
999+ 1845109437,
1000+ 2047411775,
1001+ 2488464246,
1002+ 1391825885,
1003+ 2340290304,
1004+ 3623879917,
1005+ 217171099,
1006+ 3698905333,
1007+ 2718846041,
1008+ 73731529,
1009+ 2053405441,
1010+ 2770197347,
1011+ 2983996080,
1012+ 2612966141,
1013+ 2187183079,
1014+ 2796212469,
1015+ 3797629169,
1016+ 1788932364,
1017+ 17748377,
1018+ 627297271,
1019+ 3689459731,
1020+ 3311799950,
1021+ 4263162298,
1022+ 4016852324,
1023+ 3136750215,
1024+ 1725824049,
1025+ 2844064064,
1026+ 2059159211,
1027+ 3182127070,
1028+ 470655679,
1029+ 1166949584,
1030+ 2425843062,
1031+ 219908183,
1032+ 161770982,
1033+ 2394961157,
1034+ 999226372,
1035+ 2367624166,
1036+ 76287885,
1037+ 1110832227,
1038+ 3358123709,
1039+ 1504127646,
1040+ 49596774,
1041+ 1296560019,
1042+ 2320978173,
1043+ 1163934122,
1044+ 1631947491,
1045+ 2702852639,
1046+ 3856755518,
1047+ 2562943123,
1048+ 991330989,
1049+ 993726248,
1050+ 2133737192,
1051+ 20974150,
1052+ 3808389889,
1053+ 2447868340,
1054+ 2434828629,
1055+ 3344419509,
1056+ 4076789444,
1057+ 1446054487,
1058+ 3815933708,
1059+ 3644670988,
1060+ 3175898122,
1061+ 3057844745,
1062+ 559106380,
1063+ 1840065631,
1064+ 3020573012,
1065+ 3203040371,
1066+ 997381925,
1067+ 2563312032,
1068+ 815510593,
1069+ 121805231,
1070+ 1047507862,
1071+ 1841403695,
1072+ 1563170561,
1073+ 1644198099,
1074+ 3470882735,
1075+ 627296501,
1076+ 3006157508,
1077+ 383648566,
1078+ 3136652449,
1079+ 2252034149,
1080+ 1749861990,
1081+ 956381402,
1082+ 3299624735,
1083+ 2798395931,
1084+ 270054444,
1085+ 3757564211,
1086+ 2933717597,
1087+ 1080178310,
1088+ 1367392714,
1089+ 1135266342,
1090+ 2642448461,
1091+ 1067554284,
1092+ 3694982777,
1093+ 3594374699,
1094+ 4170301369,
1095+ 3593401570,
1096+ 2298071009,
1097+ 1561680798,
1098+ 2788490866,
1099+ 1757829499,
1100+ 8819607,
1101+ 2453686068,
1102+ 3458682663,
1103+ 1614888171,
1104+ 2327536307,
1105+ 13960177,
1106+ 125752716,
1107+ 2312371195,
1108+ 1515197240,
1109+ 189747227,
1110+ 666988376,
1111+ 1401118738,
1112+ 986465965,
1113+ 242793663,
1114+ 1830586663,
1115+ 1603054176,
1116+ 391536104,
1117+ 1403125754,
1118+ 4021998614,
1119+ 157985039,
1120+ 966292223,
1121+ 2476444819,
1122+ 3261614719,
1123+ 3888752449,
1124+ 2300656903,
1125+ 1138839559,
1126+ 1227396086,
1127+ 1029493665,
1128+ 2138482384,
1129+ 2182525175,
1130+ 1437393012,
1131+ 2758514342,
1132+ 1394715363,
1133+ 242430786,
1134+ 4026759135,
1135+ 379455166,
1136+ 3454852592,
1137+ 1128257576,
1138+ 513994046,
1139+ 2437643547,
1140+ 1851772774,
1141+ 1096918785,
1142+ 2537378072,
1143+ 2020382559,
1144+ 1306056753,
1145+ 519939769,
1146+ 2477462755,
1147+ 2962076712,
1148+ 2856059355,
1149+ 111272034,
1150+ 2363778749,
1151+ 3031510224,
1152+ 297098997,
1153+ 2716928589,
1154+ 1988398361,
1155+ 3715685207,
1156+ 1158387390,
1157+ 3239718824,
1158+ 214276640,
1159+ 1240159361,
1160+ 302800084,
1161+ 258391670,
1162+ 3118615408,
1163+ 1789752935,
1164+ 935790045,
1165+ 1678444383,
1166+ 3645357112,
1167+ 1752731774,
1168+ 1211889371,
1169+ 2432949496,
1170+ 1983838022,
1171+ 2563701701,
1172+ 3235972690,
1173+ 2732559614,
1174+ 4173627589,
1175+ 918129740,
1176+ 3528101943,
1177+ 945287787,
1178+ 783593046,
1179+ 1687101911,
1180+ 4265659819,
1181+ 1625936204,
1182+ 419423123,
1183+ 404748783,
1184+ 174814826,
1185+ 561306387,
1186+ 441376876,
1187+ 3649973873,
1188+ 1191532754,
1189+ 493829681,
1190+ 462640703,
1191+ 3037639795,
1192+ 4234288143,
1193+ 787992128,
1194+ 354556603,
1195+ 1391557094,
1196+ 1227150157,
1197+ 25592400,
1198+ 3032298621,
1199+ 1655829692,
1200+ 1736544192,
1201+ 2936173068,
1202+ 1867683432,
1203+ 3284761215,
1204+ 2988749127,
1205+ 62083315,
1206+ 3675433852,
1207+ 1134152479,
1208+ 2537382040,
1209+ 1147996351,
1210+ 1287284159,
1211+ 1889610942,
1212+ 3549411223,
1213+ 2634772335,
1214+ 1621708033,
1215+ 3268420142,
1216+ 2635222095,
1217+ 2856377255,
1218+ 3703296204,
1219+ 45831019,
1220+ 1997278369,
1221+ 1472530726,
1222+ 4202051236,
1223+ 1958581642,
1224+ 1899513707,
1225+ 1642075765,
1226+ 217373156,
1227+ 1177071505,
1228+ 2179831909,
1229+ 1894821896,
1230+ 375785474,
1231+ 140181353,
1232+ 2743987480,
1233+ 123627609,
1234+ 3644816362,
1235+ 4244769687,
1236+ 4053481902,
1237+ 4272740073,
1238+ 1701735471,
1239+ 1799303028,
1240+ 2810175160,
1241+ 1531107068,
1242+ 3059813822,
1243+ 4125025775,
1244+ 1932301928,
1245+ 358163550,
1246+ 1246286294,
1247+ 1901878857,
1248+ 2449370117,
1249+ 4061706076,
1250+ 2875797072,
1251+ 1661522553,
1252+ 543545982,
1253+ 300448222,
1254+ 4019581644,
1255+ 3197346443,
1256+ 731278538,
1257+ 457112622,
1258+ 669625172,
1259+ 2548620393,
1260+ 2931934447,
1261+ 2318225955,
1262+ 427149964,
1263+ 1097556601,
1264+ 3585697077,
1265+ 1901391738,
1266+ 3019912350,
1267+ 4193989774,
1268+ 1411691495,
1269+ 2549773310,
1270+ 3130489018,
1271+ 739444137,
1272+ 1953561922,
1273+ 228589899,
1274+ 974825144,
1275+ 1873934953,
1276+ 918502475,
1277+ 4020302125,
1278+ 2103082289,
1279+ 1474428456,
1280+ 269315616,
1281+ 3376419786,
1282+ 2903506696,
1283+ 169344159,
1284+ 4151327830,
1285+ 2861975985,
1286+ 1583628545,
1287+ 337656074,
1288+ 2381206238,
1289+ 1346357469,
1290+ 3316549550,
1291+ 1188140897,
1292+ 928463634,
1293+ 120466083,
1294+ 1048016215,
1295+ 2053770646,
1296+ 3729204448,
1297+ 3630812747,
1298+ 3421817962,
1299+ 1471357089,
1300+ 2971633393,
1301+ 2721366758,
1302+ 3977792328,
1303+ 2771228423,
1304+ 258029855,
1305+ 325097628,
1306+ 2816869331,
1307+ 228010778,
1308+ 1815596248,
1309+ 2677647806,
1310+ 4069826588,
1311+ 2009464559,
1312+ 4003870353,
1313+ 2558198381,
1314+ 823508134,
1315+ 256895388,
1316+ 130455482,
1317+ 4107398577,
1318+ 2446165146,
1319+ 3086759840,
1320+ 3128842794,
1321+ 236454548,
1322+ 3740649072,
1323+ 1049081391,
1324+ 3780795812,
1325+ 1964380357,
1326+ 3900635454,
1327+ 1941196066,
1328+ 1143285596,
1329+ 1276856333,
1330+ 2919547816,
1331+ 2947639569,
1332+ 1889305089,
1333+ 2386910172,
1334+ 2685680362,
1335+ 2042792556,
1336+ 2780968041,
1337+ 976912013,
1338+ 3562274424,
1339+ 2336140155,
1340+ 3464857244,
1341+ 1108365812,
1342+ 1201566469,
1343+ 707126700,
1344+ 4047776595,
1345+ 1289380202,
1346+ 1231913128,
1347+ 2819729319,
1348+ 537908270,
1349+ 3802355886,
1350+ 2004615093,
1351+ 2947614997,
1352+ 4192189156,
1353+ 2809733754,
1354+ 3082820238,
1355+ 2758499499,
1356+ 1004612882,
1357+ 1102702383,
1358+ 1862546275,
1359+ 3170345990,
1360+ 883739952,
1361+ 1641198615,
1362+ 957782688,
1363+ 1503652889,
1364+ 2210400768,
1365+ 2002162781,
1366+ 1553086024,
1367+ 2591721606,
1368+ 3830165160,
1369+ 4181044959,
1370+ 2735782270,
1371+ 3825677158,
1372+ 143739895,
1373+ 771193452,
1374+ 35990560,
1375+ 1014009970,
1376+ 20768744,
1377+ 1785268932,
1378+ 1424740580,
1379+ 1620237280,
1380+ 848157259,
1381+ 3808893671,
1382+ 2746756110,
1383+ 3903639825,
1384+ 1822084165,
1385+ 2891666588,
1386+ 3853186896,
1387+ 4248495212,
1388+ 1178592425,
1389+ 455721495,
1390+ 1848821934,
1391+ 1558397701,
1392+ 133397899,
1393+ 1845531767,
1394+ 2798312897,
1395+ 1471176399,
1396+ 1743248506,
1397+ 2229972777,
1398+ 1290369879,
1399+ 3579075953,
1400+ 309034994,
1401+ 929728690,
1402+ 3841454719,
1403+ 3031753515,
1404+ 3606461413,
1405+ 2412281758,
1406+ 2993123515,
1407+ };
1408+ int const nrandoms = sizeof(randoms) / sizeof(*randoms);
1409+
1410+ int const bits = count_bits(sgtype);
1411+ sgtype const tmin =
1412+ is_signed(sgtype) ? ((sgtype)1 << (sgtype)(bits-1)) : (sgtype)0;
1413+ sgtype const tmax = tmin - (sgtype)1;
1414+ for (int iter=0; iter<nrandoms; ++iter) {
1415+ typedef union {
1416+ gtype v;
1417+ ugtype u;
1418+ sgtype s[16];
1419+ } Tvec;
1420+ Tvec x, y, z;
1421+ Tvec good_abs;
1422+ Tvec good_abs_diff, good_add_sat;
1423+ Tvec good_hadd, good_rhadd;
1424+ int vecsize = vec_step(gtype);
1425+ for (int n=0; n<vecsize; ++n) {
1426+ x.s[n] = randoms[(iter+n ) % nrandoms];
1427+ y.s[n] = randoms[(iter+n+20) % nrandoms];
1428+ z.s[n] = randoms[(iter+n+40) % nrandoms];
1429+ if (bits>32) {
1430+ x.s[n] = (x.s[n] << (bits/2)) | randoms[(iter+n+100) % nrandoms];
1431+ y.s[n] = (y.s[n] << (bits/2)) | randoms[(iter+n+120) % nrandoms];
1432+ z.s[n] = (z.s[n] << (bits/2)) | randoms[(iter+n+140) % nrandoms];
1433+ }
1434+ good_abs.s[n] =
1435+ safe_extract(safe_abs(safe_create(x.s[n])));
1436+ good_abs_diff.s[n] =
1437+ safe_extract(safe_abs(safe_sub(safe_create(x.s[n]),
1438+ safe_create(y.s[n]))));
1439+ good_add_sat.s[n] =
1440+ safe_extract(safe_min(safe_max(safe_add(safe_create(x.s[n]),
1441+ safe_create(y.s[n])),
1442+ safe_create(tmin)),
1443+ safe_create(tmax)));
1444+ good_hadd.s[n] =
1445+ safe_extract(safe_rshift(safe_add(safe_create(x.s[n]),
1446+ safe_create(y.s[n]))));
1447+ good_rhadd.s[n] =
1448+ safe_extract(safe_rshift(safe_add(safe_add(safe_create(x.s[n]),
1449+ safe_create(y.s[n])),
1450+ safe_create((sgtype)1))));
1451+ }
1452+ Tvec res_abs;
1453+ Tvec res_abs_diff, res_add_sat;
1454+ Tvec res_hadd, res_rhadd;
1455+ res_abs.u = abs (x.v);
1456+ res_abs_diff.u = abs_diff(x.v, y.v);
1457+ res_add_sat.v = add_sat (x.v, y.v);
1458+ res_hadd.v = hadd (x.v, y.v);
1459+ res_rhadd.v = rhadd (x.v, y.v);
1460+ bool equal;
1461+ // abs
1462+ equal = true;
1463+ for (int n=0; n<vecsize; ++n) {
1464+ equal = equal && res_abs.s[n] == good_abs.s[n];
1465+ }
1466+ if (!equal) {
1467+ printf("FAIL: abs type=%s\n", typename);
1468+ for (int n=0; n<vecsize; ++n) {
1469+ printf(" [%d] a=%d good=%d res=%d\n",
1470+ n,
1471+ (int)x.s[n],
1472+ (int)good_abs.s[n], (int)res_abs.s[n]);
1473+ }
1474+ return;
1475+ }
1476+ // abs_diff
1477+ equal = true;
1478+ for (int n=0; n<vecsize; ++n) {
1479+ equal = equal && res_abs_diff.s[n] == good_abs_diff.s[n];
1480+ }
1481+ if (!equal) {
1482+ printf("FAIL: abs_diff type=%s\n", typename);
1483+ for (int n=0; n<vecsize; ++n) {
1484+ printf(" [%d] a=%d b=%d good=%d res=%d\n",
1485+ n,
1486+ (int)x.s[n], (int)y.s[n],
1487+ (int)good_abs_diff.s[n], (int)res_abs_diff.s[n]);
1488+ }
1489+ return;
1490+ }
1491+ // add_sat
1492+ equal = true;
1493+ for (int n=0; n<vecsize; ++n) {
1494+ equal = equal && res_add_sat.s[n] == good_add_sat.s[n];
1495+ }
1496+ if (!equal) {
1497+ printf("FAIL: add_sat type=%s\n", typename);
1498+ for (int n=0; n<vecsize; ++n) {
1499+ printf(" [%d] a=%d b=%d good=%d res=%d\n",
1500+ n,
1501+ (int)x.s[n], (int)y.s[n],
1502+ (int)good_add_sat.s[n], (int)res_add_sat.s[n]);
1503+ }
1504+ return;
1505+ }
1506+ // hadd
1507+ equal = true;
1508+ for (int n=0; n<vecsize; ++n) {
1509+ equal = equal && res_hadd.s[n] == good_hadd.s[n];
1510+ }
1511+ if (!equal) {
1512+ printf("FAIL: hadd type=%s\n", typename);
1513+ for (int n=0; n<vecsize; ++n) {
1514+ printf(" [%d] a=%d b=%d good=%d res=%d\n",
1515+ n,
1516+ (int)x.s[n], (int)y.s[n],
1517+ (int)good_hadd.s[n], (int)res_hadd.s[n]);
1518+ }
1519+ return;
1520+ }
1521+ // rhadd
1522+ equal = true;
1523+ for (int n=0; n<vecsize; ++n) {
1524+ equal = equal && res_rhadd.s[n] == good_rhadd.s[n];
1525+ }
1526+ if (!equal) {
1527+ printf("FAIL: rhadd type=%s\n", typename);
1528+ for (int n=0; n<vecsize; ++n) {
1529+ printf(" [%d] a=%d b=%d good=%d res=%d\n",
1530+ n,
1531+ (int)x.s[n], (int)y.s[n],
1532+ (int)good_rhadd.s[n], (int)res_rhadd.s[n]);
1533+ }
1534+ return;
1535+ }
1536+ }
1537+ })
1538+ )
1539+
1540+kernel void test_hadd()
1541+{
1542+ CALL_FUNC_G(test_hadd)
1543+}
1544
1545=== modified file 'examples/kernel/test_rotate.cl'
1546--- examples/kernel/test_rotate.cl 2011-12-18 02:58:06 +0000
1547+++ examples/kernel/test_rotate.cl 2012-01-16 18:23:23 +0000
1548@@ -169,6 +169,7 @@
1549 printf("FAIL: shift left (<<) type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",
1550 typename, patterns[p], shiftbase, shiftoffset,
1551 (uint)res.s[0], (uint)shl.s[0]);
1552+ return;
1553 }
1554 /* shift right */
1555 res.v = val.v >> shift.v;
1556@@ -180,6 +181,7 @@
1557 printf("FAIL: shift right (>>) type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",
1558 typename, patterns[p], shiftbase, shiftoffset,
1559 (uint)res.s[0], (uint)shr.s[0]);
1560+ return;
1561 }
1562 /* rotate */
1563 res.v = rotate(val.v, shift.v);
1564@@ -191,6 +193,7 @@
1565 printf("FAIL: rotate type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",
1566 typename, patterns[p], shiftbase, shiftoffset,
1567 (uint)res.s[0], (uint)rot.s[0]);
1568+ return;
1569 }
1570 }
1571 }
1572
1573=== modified file 'lib/kernel/abs.cl'
1574--- lib/kernel/abs.cl 2011-10-27 00:18:42 +0000
1575+++ lib/kernel/abs.cl 2012-01-16 18:23:23 +0000
1576@@ -23,13 +23,12 @@
1577
1578 #include "templates.h"
1579
1580-/* Define "missing" builtins */
1581-#define __builtin_abshh(a) (uchar )(a>=(char )0 ? a : -a)
1582-#define __builtin_absh(a) (ushort)(a>=(short)0 ? a : -a)
1583-#define __builtin_absl(a) (ulong )(a>=(long )0 ? a : -a)
1584-#define __builtin_absuhh(a) a
1585-#define __builtin_absuh(a) a
1586-#define __builtin_absu(a) a
1587-#define __builtin_absul(a) a
1588+#define __builtin_abshh(a) ((uchar )(a>=(char )0 ? a : -a))
1589+#define __builtin_absh(a) ((ushort)(a>=(short)0 ? a : -a))
1590+#define __builtin_absl(a) ((ulong )(a>=(long )0 ? a : -a))
1591+#define __builtin_absuhh(a) a
1592+#define __builtin_absuh(a) a
1593+#define __builtin_absu(a) a
1594+#define __builtin_absul(a) a
1595
1596 DEFINE_BUILTIN_UG_G(abs)
1597
1598=== modified file 'lib/kernel/abs_diff.cl'
1599--- lib/kernel/abs_diff.cl 2011-10-26 03:01:29 +0000
1600+++ lib/kernel/abs_diff.cl 2012-01-16 18:23:23 +0000
1601@@ -23,4 +23,23 @@
1602
1603 #include "templates.h"
1604
1605-DEFINE_EXPR_UG_GG(abs_diff, abs(a-b))
1606+// DEFINE_EXPR_UG_GG(abs_diff, abs(a-b))
1607+
1608+// This could probably also be optimised
1609+DEFINE_EXPR_UG_GG(abs_diff,
1610+ (sgtype)-1 < (sgtype)0 ?
1611+ /* signed */
1612+ ({
1613+ (a^b) >= (gtype)0 ?
1614+ /* same sign: no overflow/underflow */
1615+ abs(a-b) :
1616+ /* different signs */
1617+ abs(a) + abs(b);
1618+ }) :
1619+ /* unsigned */
1620+ ({
1621+ /* This abs prevents a type error; it is not
1622+ exectued for signed types, and is a no-op for
1623+ unsigned types */
1624+ abs(a > b ? a-b : b-a);
1625+ }))
1626
1627=== modified file 'lib/kernel/add_sat.cl'
1628--- lib/kernel/add_sat.cl 2011-10-26 21:01:40 +0000
1629+++ lib/kernel/add_sat.cl 2012-01-16 18:23:23 +0000
1630@@ -30,7 +30,6 @@
1631 // ushort __builtin_ia32_paddusw128
1632 // Other types don't seem to be supported.
1633
1634-// This could do with some testing
1635 // This could probably also be optimised (i.e. the ?: operators eliminated)
1636 DEFINE_EXPR_G_GG(add_sat,
1637 (sgtype)-1 < (sgtype)0 ?
1638@@ -38,7 +37,7 @@
1639 ({
1640 int bits = CHAR_BIT * sizeof(sgtype);
1641 gtype min = (sgtype)1 << (sgtype)(bits-1);
1642- gtype max = min + (sgtype)1;
1643+ gtype max = min - (sgtype)1;
1644 (a^b) < (gtype)0 ?
1645 /* different signs: no overflow/underflow */
1646 a+b :
1647
1648=== modified file 'lib/kernel/x86_64/sqrt.cl'
1649--- lib/kernel/x86_64/sqrt.cl 2011-10-31 16:48:30 +0000
1650+++ lib/kernel/x86_64/sqrt.cl 2012-01-16 18:23:23 +0000
1651@@ -21,6 +21,10 @@
1652 THE SOFTWARE.
1653 */
1654
1655+#include "../templates.h"
1656+
1657+
1658+
1659 #define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \
1660 TYPE _cl_overloadable NAME(TYPE a) \
1661 { \
1662@@ -120,3 +124,8 @@
1663 #endif
1664 IMPLEMENT_SPLIT (sqrt, double8 , lo, hi)
1665 IMPLEMENT_SPLIT (sqrt, double16, lo, hi)
1666+
1667+
1668+
1669+DEFINE_EXPR_F_F(half_sqrt, sqrt(a))
1670+DEFINE_EXPR_F_F(native_sqrt, sqrt(a))
1671
1672=== modified file 'tests/testsuite.at'
1673--- tests/testsuite.at 2012-01-04 11:11:09 +0000
1674+++ tests/testsuite.at 2012-01-16 18:23:23 +0000
1675@@ -166,6 +166,42 @@
1676
1677 AT_BANNER([Kernel runtime library])
1678
1679+AT_SETUP([Kernel function bitselect])
1680+AT_DATA([expout],
1681+[Running test test_bitselect...
1682+OK
1683+])
1684+AT_CHECK([$abs_top_builddir/examples/kernel/kernel test_bitselect], 0, expout)
1685+AT_CLEANUP
1686+
1687+AT_SETUP([Kernel functions fabs signbit copysign])
1688+AT_DATA([expout],
1689+[Running test test_fabs...
1690+OK
1691+])
1692+#AT_CHECK([$abs_top_builddir/examples/kernel/kernel test_fabs], 0, expout)
1693+# Skip this test until >> works correctly again on vectors in clang's OpenCL
1694+AT_CHECK([exit 77])
1695+AT_CLEANUP
1696+
1697+AT_SETUP([Kernel functions abs abs_diff add_sat hadd rhadd])
1698+AT_DATA([expout],
1699+[Running test test_hadd...
1700+OK
1701+])
1702+AT_CHECK([$abs_top_builddir/examples/kernel/kernel test_hadd], 0, expout)
1703+AT_CLEANUP
1704+
1705+AT_SETUP([Kernel functions << >> rotate])
1706+AT_DATA([expout],
1707+[Running test test_rotate...
1708+OK
1709+])
1710+#AT_CHECK([$abs_top_builddir/examples/kernel/kernel test_rotate], 0, expout)
1711+# Skip this test until << and >> work correctly with overflow in clang's OpenCL
1712+AT_CHECK([exit 77])
1713+AT_CLEANUP
1714+
1715 AT_SETUP([Trigonometric functions])
1716 AT_DATA([expout],
1717 [f(0.000000, 0.000000, 0.000000, 0.000000) = (1.000000, 1.000000, 1.000000, 1.000000)