Merge lp:~schnetter/pocl/main into lp:~pocl/pocl/trunk

Proposed by Erik Schnetter
Status: Merged
Merge reported by: Pekka Jääskeläinen
Merged at revision: not available
Proposed branch: lp:~schnetter/pocl/main
Merge into: lp:~pocl/pocl/trunk
Diff against target: 3735 lines (+2544/-357)
56 files modified
.bzrignore (+4/-0)
README (+10/-5)
configure.ac (+2/-0)
examples/Makefile.am (+1/-1)
examples/barriers/barriers.c (+1/-1)
examples/example1/example1.c (+3/-1)
examples/example1/example1_exec.c (+1/-1)
examples/example2/example2.c (+1/-1)
examples/example2a/example2a.c (+1/-1)
examples/forloops/forloops.c (+1/-1)
examples/kernel/Makefile.am (+31/-0)
examples/kernel/kernel.c (+93/-0)
examples/kernel/test_bitselect.cl (+1176/-0)
examples/kernel/test_fabs.cl (+178/-0)
examples/kernel/test_rotate.cl (+204/-0)
examples/loopbarriers/loopbarriers.c (+1/-1)
examples/run_all.sh (+1/-1)
examples/scalarwave/Makefile.am (+31/-0)
examples/scalarwave/scalarwave.c (+233/-0)
examples/scalarwave/scalarwave.cl (+94/-0)
examples/trig/trig_exec.c (+1/-1)
include/_kernel.h (+11/-10)
lib/CL/Makefile.am (+1/-0)
lib/CL/clCreateBuffer.c (+7/-2)
lib/CL/clCreateKernel.c (+10/-8)
lib/CL/clEnqueueNDRangeKernel.c (+29/-1)
lib/CL/clFinish.c (+31/-0)
lib/CL/clGetDeviceIDs.c (+4/-5)
lib/CL/clReleaseKernel.c (+6/-1)
lib/CL/clReleaseMemObject.c (+1/-1)
lib/CL/clSetKernelArg.c (+7/-1)
lib/CL/devices/native/native.c (+6/-43)
lib/CL/devices/native/native.h (+3/-3)
lib/CL/devices/pthread/pthread.c (+27/-54)
lib/CL/devices/pthread/pthread.h (+7/-7)
lib/CL/pocl_cl.h (+3/-3)
lib/kernel/Makefile.am (+5/-3)
lib/kernel/arm/Makefile.am (+5/-3)
lib/kernel/clz.cl (+1/-1)
lib/kernel/get_global_id.c (+8/-4)
lib/kernel/get_global_offset.c (+39/-0)
lib/kernel/get_global_size.c (+1/-1)
lib/kernel/get_group_id.c (+1/-1)
lib/kernel/get_local_id.c (+4/-4)
lib/kernel/get_local_size.c (+39/-0)
lib/kernel/get_num_groups.c (+4/-4)
lib/kernel/get_work_dim.c (+31/-0)
lib/kernel/popcount.cl (+35/-0)
lib/kernel/signbit.cl (+34/-3)
lib/kernel/sources.mk (+4/-0)
lib/kernel/tce/Makefile.am (+5/-3)
lib/kernel/templates.h (+1/-2)
lib/kernel/x86_64/Makefile.am (+5/-3)
lib/kernel/x86_64/copysign.cl (+0/-169)
lib/llvmopencl/Workgroup.cc (+70/-2)
tests/testsuite.at (+31/-0)
To merge this branch: bzr merge lp:~schnetter/pocl/main
Reviewer Review Type Date Requested Status
Pekka Jääskeläinen Approve
Erik Schnetter Needs Resubmitting
Review via email: mp+84682@code.launchpad.net

Description of the change

This branch passes the same test cases as the trunk: all on Ubuntu, and four failures on Mac OSX.

This branch also introduces a new test case scalarwave, which fails everywhere because of a real problem (that I have not debugged yet). If you want, I can deactivate this new test case, to not introduce a failing test case into the trunk.

To post a comment you must log in.
Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

get_global_offset() implementation seems wrong. I think it needs the separate values from clEnqueueNDRangeKernel() which are not stored yet in the context struct, IIRC.

In my opinion having failing tests in the test suite that reproduce (reported) bugs is fine but they should be marked with XFAIL (with a bug id maybe?) or similar until they are fixed so we won't get confused what is a regression and what a known bug reproduction (or if you are into TDD, just unimplemented features).

Other than these, seems good to merge for me.

review: Needs Fixing
Revision history for this message
Erik Schnetter (schnetter) wrote :

Indeed; I misunderstood the meaning of global_offset.

-erik

2011/12/6 Pekka Jääskeläinen <email address hidden>

> Review: Needs Fixing
>
> get_global_offset() implementation seems wrong. I think it needs the
> separate values from clEnqueueNDRangeKernel() which are not stored yet in
> the context struct, IIRC.
>
> In my opinion having failing tests in the test suite that reproduce
> (reported) bugs is fine but they should be marked with XFAIL (with a bug id
> maybe?) or similar until they are fixed so we won't get confused what is a
> regression and what a known bug reproduction (or if you are into TDD, just
> unimplemented features).
>
> Other than these, seems good to merge for me.
> --
> https://code.launchpad.net/~schnetter/pocl/main/+merge/84682
> You are the owner of lp:~schnetter/pocl/main.
>

--
Erik Schnetter <email address hidden> http://www.cct.lsu.edu/~eschnett/

lp:~schnetter/pocl/main updated
105. By Erik Schnetter

Correct implementation of get_global_offset

Introduce global variables to hold the work_dim and global_offset.
Set these variables.
Correct error in setting num_groups.

Disable scalarwave test case (could have used XFAIL instead, but I
don't know how to do this).

106. By Erik Schnetter

Ignore two auto-generated files

Revision history for this message
Erik Schnetter (schnetter) wrote :

The current head of the branch corrects the global offset issue.

I didn't implement XFAIL since I didn't know how to do this.

review: Needs Resubmitting
lp:~schnetter/pocl/main updated
107. By Erik Schnetter

Implement get_work_dim() properly

108. By Erik Schnetter

More error checking when setting kernel arguments

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

The matrix transpose test case seems to crash here. Some double free? Valgrind should help debugging it.

  6: example2: matrix transpose *** glibc detected *** /home/visit0r/src/pocl/examples/example2/.libs/lt-example2: free(): invalid next size (fast): 0x0000000000f05cc0 ***
======= Backtrace: =========
/lib/libc.so.6(+0x71ad6)[0x2b2e98a19ad6]
/lib/libc.so.6(cfree+0x6c)[0x2b2e98a1e84c]
/home/visit0r/src/pocl/lib/CL/.libs/libCL.so.0(+0x45d7)[0x2b2e985605d7]
/lib/libpthread.so.0(+0x68ba)[0x2b2e987928ba]
/lib/libc.so.6(clone+0x6d)[0x2b2e98a7702d]

lp:~schnetter/pocl/main updated
109. By Erik Schnetter

Add debug output to scalarwave example

110. By Erik Schnetter

Merge from trunk

111. By Erik Schnetter

Properly free partially allocated memory object if there is an error

112. By Erik Schnetter

Properly remove pointers from host_buffers list

Remove pointers from host_buffers list when the buffer is freed.
Insert pointers into host_buffers list more efficiently.

113. By Erik Schnetter

Correct memory allocation in kernel handling

Insert kernels into linked list more efficiently.

Temporarily: Don't deallocate kernels (ever), because this would lead
to dangling pointers. Needs more fixing.

114. By Erik Schnetter

Correct malloc problems in handling memory objects

Don't use a linked list to distinguish between user-supplied and
malloc'd pointers. Instead, use the OpenCL memory flags to distinguish
between these.

Increase size for /proc/cpuinfo buffer (necessary on my system).

115. By Erik Schnetter

Merge from trunk

116. By Erik Schnetter

Merge from trunk

Revision history for this message
Erik Schnetter (schnetter) wrote :

The matrix transpose fails because pocl doesn't recognise the local argument as local, and thus doesn't allocate memory for it. This seems Mac OSX specific, and is now bug 901519.

lp:~schnetter/pocl/main updated
117. By Erik Schnetter

Merge from trunk: Correct many errors on Mac OSX

118. By Erik Schnetter

Merge from trunk

119. By Erik Schnetter

Recommend --enable-debug when configuring

120. By Erik Schnetter

Simplify code

121. By Erik Schnetter

Correct signbit implementation

122. By Erik Schnetter

Add printf prototype

123. By Erik Schnetter

Add beginnings of testsuite

124. By Erik Schnetter

Add missing #ifdef cl_khr_fp64

125. By Erik Schnetter

Ignore autogenerated file

126. By Erik Schnetter

Add all source files

127. By Erik Schnetter

"kernel" example: Small improvements

128. By Erik Schnetter

Correct Scalarwave example. Simplify code.

129. By Erik Schnetter

Activate Scalarwave test case

Revision history for this message
Erik Schnetter (schnetter) wrote :

My branch is now in a good state. All test cases pass, and my new scalar wave example (solving a PDE) is a new test case that also passes.

I also have the beginnings of test cases for the kernel library, but this is still disable because most tests fail (for known reasons, e.g. 3-element vectors don't work, and shift counts are not handled correctly; bug reports have been submitted to llvm).

review: Needs Resubmitting
Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

I merged this but please fix the memory leak in clReleaseKernel().

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '.bzrignore'
2--- .bzrignore 2011-10-31 10:43:04 +0000
3+++ .bzrignore 2011-12-14 15:59:07 +0000
4@@ -29,10 +29,14 @@
5 examples/barriers/barriers
6 examples/example1/example1
7 examples/example2/example2
8+examples/example2a/example2a
9 examples/forloops/forloops
10+examples/loopbarriers/loopbarriers
11 examples/standalone/standalone.bc
12 examples/standalone/standalone.h
13 examples/trig/trig
14+examples/scalarwave/scalarwave
15+examples/kernel/kernel
16 scripts/pocl-build
17 scripts/pocl-kernel
18 scripts/pocl-standalone
19
20=== modified file 'README'
21--- README 2011-10-25 18:05:29 +0000
22+++ README 2011-12-14 15:59:07 +0000
23@@ -34,11 +34,15 @@
24 autotools, usually installable from distribution packages 'automake',
25 'autoconf', and 'libtool'.
26
27-Once that is done, the usual GNU build system
28-
29- ./configure && make
30-
31-is enough to build pocl. Builds out of source directory are also supported.
32+Once that is done, the usual GNU build commands build pocl. Builds out
33+of source directory are supported. We recommend using
34+
35+ ./configure --enable-debug
36+ make
37+
38+This will build pocl without optimization, which simplifies debugging.
39+(This does not influence whether pocl will optimize the code that it
40+generates from OpenCL source files.)
41
42 The configure script will use the following special environment variables,
43 if present or passed in the command line:
44@@ -59,6 +63,7 @@
45 forloops Simple example wth for loops inside the kernel
46 standalone Non-executable standalone example (see STANDALONE section)
47 trig Example using various trigonometric functions
48+ scalarwave Example evolving the scalar wave equation
49
50
51 LINKING YOUR PROGRAM WITH POCL
52
53=== modified file 'configure.ac'
54--- configure.ac 2011-12-05 10:43:51 +0000
55+++ configure.ac 2011-12-14 15:59:07 +0000
56@@ -227,7 +227,9 @@
57 examples/forloops/Makefile
58 examples/standalone/Makefile
59 examples/trig/Makefile
60+ examples/scalarwave/Makefile
61 examples/loopbarriers/Makefile
62+ examples/kernel/Makefile
63 scripts/Makefile
64 tests/Makefile
65 tests/atlocal
66
67=== modified file 'examples/Makefile.am'
68--- examples/Makefile.am 2011-12-05 10:43:51 +0000
69+++ examples/Makefile.am 2011-12-14 15:59:07 +0000
70@@ -22,4 +22,4 @@
71 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
72 # THE SOFTWARE.
73
74-SUBDIRS = example1 example2 example2a barriers forloops standalone trig loopbarriers
75+SUBDIRS = example1 example2 example2a barriers forloops standalone trig scalarwave loopbarriers kernel
76
77=== modified file 'examples/barriers/barriers.c'
78--- examples/barriers/barriers.c 2011-11-08 14:24:04 +0000
79+++ examples/barriers/barriers.c 2011-12-14 15:59:07 +0000
80@@ -62,7 +62,7 @@
81
82 fclose(source_file);
83
84- context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU,
85+ context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU,
86 NULL, NULL, NULL);
87 if (context == (cl_context)0)
88 return -1;
89
90=== modified file 'examples/example1/example1.c'
91--- examples/example1/example1.c 2011-11-08 14:24:04 +0000
92+++ examples/example1/example1.c 2011-12-14 15:59:07 +0000
93@@ -39,6 +39,7 @@
94 int source_size;
95 cl_float4 *srcA, *srcB;
96 cl_float *dst;
97+ int ierr;
98 int i;
99
100 source_file = fopen("example1.cl", "r");
101@@ -75,7 +76,8 @@
102 srcB[i].w = i;
103 }
104
105- exec_dot_product_kernel (source, N, srcA, srcB, dst);
106+ ierr = exec_dot_product_kernel (source, N, srcA, srcB, dst);
107+ if (ierr) printf ("ERROR\n");
108
109 for (i = 0; i < N; ++i)
110 {
111
112=== modified file 'examples/example1/example1_exec.c'
113--- examples/example1/example1_exec.c 2011-10-07 15:53:02 +0000
114+++ examples/example1/example1_exec.c 2011-12-14 15:59:07 +0000
115@@ -25,7 +25,7 @@
116 cl_int err;
117
118 // create the OpenCL context on a GPU device
119- context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU,
120+ context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU,
121 NULL, NULL, NULL);
122 if (context == (cl_context)0)
123 return -1;
124
125=== modified file 'examples/example2/example2.c'
126--- examples/example2/example2.c 2011-11-08 14:24:04 +0000
127+++ examples/example2/example2.c 2011-12-14 15:59:07 +0000
128@@ -80,7 +80,7 @@
129 input[i * WIDTH + j] = drand48();
130 }
131
132- context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU,
133+ context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU,
134 NULL, NULL, NULL);
135 if (context == (cl_context)0)
136 return -1;
137
138=== modified file 'examples/example2a/example2a.c'
139--- examples/example2a/example2a.c 2011-12-05 10:43:51 +0000
140+++ examples/example2a/example2a.c 2011-12-14 15:59:07 +0000
141@@ -81,7 +81,7 @@
142 input[i * WIDTH + j] = drand48();
143 }
144
145- context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU,
146+ context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU,
147 NULL, NULL, NULL);
148 if (context == (cl_context)0)
149 return -1;
150
151=== modified file 'examples/forloops/forloops.c'
152--- examples/forloops/forloops.c 2011-11-08 14:24:04 +0000
153+++ examples/forloops/forloops.c 2011-12-14 15:59:07 +0000
154@@ -62,7 +62,7 @@
155
156 fclose(source_file);
157
158- context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU,
159+ context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU,
160 NULL, NULL, NULL);
161 if (context == (cl_context)0)
162 return -1;
163
164=== added directory 'examples/kernel'
165=== added file 'examples/kernel/Makefile.am'
166--- examples/kernel/Makefile.am 1970-01-01 00:00:00 +0000
167+++ examples/kernel/Makefile.am 2011-12-14 15:59:07 +0000
168@@ -0,0 +1,31 @@
169+# Process this file with automake to produce Makefile.in (in this,
170+# and all subdirectories).
171+# Makefile.am for pocl/examples/kernel.
172+#
173+# Copyright (c) 2011 Universidad Rey Juan Carlos
174+#
175+# Permission is hereby granted, free of charge, to any person obtaining a copy
176+# of this software and associated documentation files (the "Software"), to deal
177+# in the Software without restriction, including without limitation the rights
178+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
179+# copies of the Software, and to permit persons to whom the Software is
180+# furnished to do so, subject to the following conditions:
181+#
182+# The above copyright notice and this permission notice shall be included in
183+# all copies or substantial portions of the Software.
184+#
185+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
186+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
187+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
188+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
189+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
190+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
191+# THE SOFTWARE.
192+
193+noinst_PROGRAMS = kernel
194+
195+kernel_SOURCES = kernel.c test_bitselect.cl test_fabs.cl test_rotate.cl
196+kernel_LDADD = ../../lib/CL/libCL.la -lm @PTHREAD_LIBS@
197+kernel_CFLAGS = -std=c99 @PTHREAD_CFLAGS@
198+
199+AM_CPPFLAGS = -I$(top_srcdir)/include -U__APPLE__ -DSRCDIR='"$(abs_srcdir)"'
200
201=== added file 'examples/kernel/kernel.c'
202--- examples/kernel/kernel.c 1970-01-01 00:00:00 +0000
203+++ examples/kernel/kernel.c 2011-12-14 15:59:07 +0000
204@@ -0,0 +1,93 @@
205+#include <assert.h>
206+#include <stdio.h>
207+#include <stdlib.h>
208+
209+#include <CL/opencl.h>
210+
211+
212+#ifndef SRCDIR
213+# define SRCDIR "."
214+#endif
215+
216+
217+
218+int call_test(char const *const name)
219+{
220+ /* read source code */
221+ char filename[1000];
222+ snprintf(filename, sizeof filename, "%s/%s.cl", SRCDIR, name);
223+ FILE *const source_file = fopen(filename, "r");
224+ assert(source_file != NULL && "source file not found");
225+
226+ fseek(source_file, 0, SEEK_END);
227+ long const source_size = ftell(source_file);
228+ fseek(source_file, 0, SEEK_SET);
229+
230+ char source[source_size + 1];
231+ fread(source, source_size, 1, source_file);
232+ source[source_size] = '\0';
233+
234+ fclose(source_file);
235+
236+ /* call OpenCL program */
237+ cl_context const context =
238+ clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, NULL, NULL, NULL);
239+ if (context == 0) return -1;
240+
241+ size_t ndevices;
242+ clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &ndevices);
243+ ndevices /= sizeof(cl_device_id);
244+ cl_device_id devices[ndevices];
245+ clGetContextInfo(context, CL_CONTEXT_DEVICES,
246+ ndevices*sizeof(cl_device_id), devices, NULL);
247+
248+ cl_command_queue const cmd_queue =
249+ clCreateCommandQueue(context, devices[0], 0, NULL);
250+ if (cmd_queue == 0) return -1;
251+
252+ char const *sources[] = {source};
253+ cl_program const program =
254+ clCreateProgramWithSource(context, 1, sources, NULL, NULL);
255+ if (program == 0) return -1;
256+
257+ int ierr;
258+ ierr = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
259+ if (ierr != CL_SUCCESS) return -1;
260+
261+ cl_kernel const kernel = clCreateKernel(program, name, NULL);
262+ if (kernel == 0) return -1;
263+
264+ size_t global_work_size[] = {1};
265+ size_t local_work_size[]= {1};
266+ ierr = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL,
267+ global_work_size, local_work_size,
268+ 0, NULL, NULL);
269+ if (ierr != CL_SUCCESS) return -1;
270+
271+ return 0;
272+}
273+
274+
275+
276+int
277+main(void)
278+{
279+ char const *const tests[] = {
280+ "test_bitselect",
281+ "test_fabs",
282+ "test_rotate",
283+ };
284+ int const ntests = sizeof(tests)/sizeof(*tests);
285+ for (int i=0; i<ntests; ++i) {
286+ printf("Running test #%d %s...\n", i, tests[i]);
287+ int ierr;
288+ ierr = call_test(tests[i]);
289+ if (ierr) {
290+ printf("FAIL\n");
291+ return 1;
292+ }
293+ }
294+
295+ printf("DONE\n");
296+ return 0;
297+}
298
299=== added file 'examples/kernel/test_bitselect.cl'
300--- examples/kernel/test_bitselect.cl 1970-01-01 00:00:00 +0000
301+++ examples/kernel/test_bitselect.cl 2011-12-14 15:59:07 +0000
302@@ -0,0 +1,1176 @@
303+// TESTING: bitselect
304+
305+#define IMPLEMENT_BODY_G(NAME, BODY, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \
306+ void NAME##_##GTYPE() \
307+ { \
308+ typedef GTYPE gtype; \
309+ typedef SGTYPE sgtype; \
310+ typedef UGTYPE ugtype; \
311+ typedef SUGTYPE sugtype; \
312+ char const *const typename = #GTYPE; \
313+ BODY; \
314+ }
315+#define DEFINE_BODY_G(NAME, EXPR) \
316+ IMPLEMENT_BODY_G(NAME, EXPR, char , char , uchar , uchar ) \
317+ IMPLEMENT_BODY_G(NAME, EXPR, char2 , char , uchar2 , uchar ) \
318+ IMPLEMENT_BODY_G(NAME, EXPR, char3 , char , uchar3 , uchar ) \
319+ IMPLEMENT_BODY_G(NAME, EXPR, char4 , char , uchar4 , uchar ) \
320+ IMPLEMENT_BODY_G(NAME, EXPR, char8 , char , uchar8 , uchar ) \
321+ IMPLEMENT_BODY_G(NAME, EXPR, char16 , char , uchar16 , uchar ) \
322+ IMPLEMENT_BODY_G(NAME, EXPR, uchar , uchar , uchar , uchar ) \
323+ IMPLEMENT_BODY_G(NAME, EXPR, uchar2 , uchar , uchar2 , uchar ) \
324+ IMPLEMENT_BODY_G(NAME, EXPR, uchar3 , uchar , uchar3 , uchar ) \
325+ IMPLEMENT_BODY_G(NAME, EXPR, uchar4 , uchar , uchar4 , uchar ) \
326+ IMPLEMENT_BODY_G(NAME, EXPR, uchar8 , uchar , uchar8 , uchar ) \
327+ IMPLEMENT_BODY_G(NAME, EXPR, uchar16 , uchar , uchar16 , uchar ) \
328+ IMPLEMENT_BODY_G(NAME, EXPR, short , short , ushort , ushort) \
329+ IMPLEMENT_BODY_G(NAME, EXPR, short2 , short , ushort2 , ushort) \
330+ IMPLEMENT_BODY_G(NAME, EXPR, short3 , short , ushort3 , ushort) \
331+ IMPLEMENT_BODY_G(NAME, EXPR, short4 , short , ushort4 , ushort) \
332+ IMPLEMENT_BODY_G(NAME, EXPR, short8 , short , ushort8 , ushort) \
333+ IMPLEMENT_BODY_G(NAME, EXPR, short16 , short , ushort16, ushort) \
334+ IMPLEMENT_BODY_G(NAME, EXPR, ushort , ushort, ushort , ushort) \
335+ IMPLEMENT_BODY_G(NAME, EXPR, ushort2 , ushort, ushort2 , ushort) \
336+ IMPLEMENT_BODY_G(NAME, EXPR, ushort3 , ushort, ushort3 , ushort) \
337+ IMPLEMENT_BODY_G(NAME, EXPR, ushort4 , ushort, ushort4 , ushort) \
338+ IMPLEMENT_BODY_G(NAME, EXPR, ushort8 , ushort, ushort8 , ushort) \
339+ IMPLEMENT_BODY_G(NAME, EXPR, ushort16, ushort, ushort16, ushort) \
340+ IMPLEMENT_BODY_G(NAME, EXPR, int , int , uint , uint ) \
341+ IMPLEMENT_BODY_G(NAME, EXPR, int2 , int , uint2 , uint ) \
342+ IMPLEMENT_BODY_G(NAME, EXPR, int3 , int , uint3 , uint ) \
343+ IMPLEMENT_BODY_G(NAME, EXPR, int4 , int , uint4 , uint ) \
344+ IMPLEMENT_BODY_G(NAME, EXPR, int8 , int , uint8 , uint ) \
345+ IMPLEMENT_BODY_G(NAME, EXPR, int16 , int , uint16 , uint ) \
346+ IMPLEMENT_BODY_G(NAME, EXPR, uint , uint , uint , uint ) \
347+ IMPLEMENT_BODY_G(NAME, EXPR, uint2 , uint , uint2 , uint ) \
348+ IMPLEMENT_BODY_G(NAME, EXPR, uint3 , uint , uint3 , uint ) \
349+ IMPLEMENT_BODY_G(NAME, EXPR, uint4 , uint , uint4 , uint ) \
350+ IMPLEMENT_BODY_G(NAME, EXPR, uint8 , uint , uint8 , uint ) \
351+ IMPLEMENT_BODY_G(NAME, EXPR, uint16 , uint , uint16 , uint ) \
352+ __IF_INT64( \
353+ IMPLEMENT_BODY_G(NAME, EXPR, long , long , ulong , ulong ) \
354+ IMPLEMENT_BODY_G(NAME, EXPR, long2 , long , ulong2 , ulong ) \
355+ IMPLEMENT_BODY_G(NAME, EXPR, long3 , long , ulong3 , ulong ) \
356+ IMPLEMENT_BODY_G(NAME, EXPR, long4 , long , ulong4 , ulong ) \
357+ IMPLEMENT_BODY_G(NAME, EXPR, long8 , long , ulong8 , ulong ) \
358+ IMPLEMENT_BODY_G(NAME, EXPR, long16 , long , ulong16 , ulong ) \
359+ IMPLEMENT_BODY_G(NAME, EXPR, ulong , ulong , ulong , ulong ) \
360+ IMPLEMENT_BODY_G(NAME, EXPR, ulong2 , ulong , ulong2 , ulong ) \
361+ IMPLEMENT_BODY_G(NAME, EXPR, ulong3 , ulong , ulong3 , ulong ) \
362+ IMPLEMENT_BODY_G(NAME, EXPR, ulong4 , ulong , ulong4 , ulong ) \
363+ IMPLEMENT_BODY_G(NAME, EXPR, ulong8 , ulong , ulong8 , ulong ) \
364+ IMPLEMENT_BODY_G(NAME, EXPR, ulong16 , ulong , ulong16 , ulong ))
365+
366+#define CALL_FUNC_G(NAME) \
367+ NAME##_char (); \
368+ NAME##_char2 (); \
369+ NAME##_char3 (); \
370+ NAME##_char4 (); \
371+ NAME##_char8 (); \
372+ NAME##_char16 (); \
373+ NAME##_uchar (); \
374+ NAME##_uchar2 (); \
375+ NAME##_uchar3 (); \
376+ NAME##_uchar4 (); \
377+ NAME##_uchar8 (); \
378+ NAME##_uchar16 (); \
379+ NAME##_short (); \
380+ NAME##_short2 (); \
381+ NAME##_short3 (); \
382+ NAME##_short4 (); \
383+ NAME##_short8 (); \
384+ NAME##_short16 (); \
385+ NAME##_ushort (); \
386+ NAME##_ushort2 (); \
387+ NAME##_ushort3 (); \
388+ NAME##_ushort4 (); \
389+ NAME##_ushort8 (); \
390+ NAME##_ushort16(); \
391+ NAME##_int (); \
392+ NAME##_int2 (); \
393+ NAME##_int3 (); \
394+ NAME##_int4 (); \
395+ NAME##_int8 (); \
396+ NAME##_int16 (); \
397+ NAME##_uint (); \
398+ NAME##_uint2 (); \
399+ NAME##_uint3 (); \
400+ NAME##_uint4 (); \
401+ NAME##_uint8 (); \
402+ NAME##_uint16 (); \
403+ __IF_INT64( \
404+ NAME##_long (); \
405+ NAME##_long2 (); \
406+ NAME##_long3 (); \
407+ NAME##_long4 (); \
408+ NAME##_long8 (); \
409+ NAME##_long16 (); \
410+ NAME##_ulong (); \
411+ NAME##_ulong2 (); \
412+ NAME##_ulong3 (); \
413+ NAME##_ulong4 (); \
414+ NAME##_ulong8 (); \
415+ NAME##_ulong16 ();)
416+
417+
418+
419+#define is_signed(T) ((T)-1 < (T)+1)
420+#define is_floating(T) ((T)0.1 > (T)0.0)
421+#define count_bits(T) (CHAR_BIT * sizeof(T))
422+
423+DEFINE_BODY_G
424+(test_bitselect,
425+ ({
426+ _cl_static_assert(sgtype, !is_floating(sgtype));
427+ uint const randoms[] = {
428+ 0x00000000,
429+ 0x00000001,
430+ 0x7fffffff,
431+ 0x80000000,
432+ 0xfffffffe,
433+ 0xffffffff,
434+ 0x01010101,
435+ 0x80808080,
436+ 0x55555555,
437+ 0xaaaaaaaa,
438+ 116127149,
439+ 331473970,
440+ 3314285513,
441+ 1531519032,
442+ 3871781304,
443+ 723260354,
444+ 3734992454,
445+ 3048883544,
446+ 424075405,
447+ 3760586679,
448+ 364071113,
449+ 2212396745,
450+ 3026460845,
451+ 2062923368,
452+ 3945483116,
453+ 774301702,
454+ 2010645213,
455+ 353497300,
456+ 2240089293,
457+ 645959945,
458+ 2929402380,
459+ 3641106046,
460+ 3731530029,
461+ 3788502454,
462+ 3990366079,
463+ 3532452335,
464+ 3231247251,
465+ 123690193,
466+ 418692672,
467+ 4146745661,
468+ 4170087687,
469+ 3915754726,
470+ 2052700648,
471+ 1748863847,
472+ 276568793,
473+ 364266289,
474+ 24718041,
475+ 3775186845,
476+ 935438421,
477+ 3070232227,
478+ 558364671,
479+ 2318351214,
480+ 17943242,
481+ 1796864907,
482+ 727165514,
483+ 223478118,
484+ 2448924107,
485+ 496915291,
486+ 3372891854,
487+ 361433487,
488+ 3273766229,
489+ 251831411,
490+ 432661417,
491+ 772908669,
492+ 289792578,
493+ 4150526710,
494+ 4157662725,
495+ 2594757327,
496+ 3052388893,
497+ 3842089578,
498+ 3467269013,
499+ 510187125,
500+ 2596093643,
501+ 398042620,
502+ 4272455984,
503+ 3711648086,
504+ 2120827851,
505+ 77269246,
506+ 2168059317,
507+ 2750549452,
508+ 1712682330,
509+ 2486520097,
510+ 625173621,
511+ 1632501477,
512+ 2935468416,
513+ 980045574,
514+ 3080136685,
515+ 4291385683,
516+ 1900746145,
517+ 3343063222,
518+ 3737266887,
519+ 3349055009,
520+ 3557165116,
521+ 847440541,
522+ 1195278641,
523+ 313889830,
524+ 622790046,
525+ 326637691,
526+ 663570370,
527+ 662327410,
528+ 923839117,
529+ 3091793818,
530+ 3563062752,
531+ 1864236072,
532+ 4251970867,
533+ 2259486024,
534+ 2512789432,
535+ 4278284968,
536+ 244581614,
537+ 247706675,
538+ 3268622648,
539+ 3758387026,
540+ 206893256,
541+ 2892198447,
542+ 3585538105,
543+ 2484801188,
544+ 1063964031,
545+ 3712657639,
546+ 23179627,
547+ 1732005357,
548+ 2522016557,
549+ 1058341654,
550+ 1580368080,
551+ 1890361257,
552+ 1167428989,
553+ 2600065453,
554+ 1547136389,
555+ 945856727,
556+ 2005682606,
557+ 3399854093,
558+ 2619154565,
559+ 2207015138,
560+ 2836381097,
561+ 612928932,
562+ 1537934908,
563+ 897756908,
564+ 1142275256,
565+ 1106163744,
566+ 3209429231,
567+ 3317761168,
568+ 2815958850,
569+ 1282374282,
570+ 3861163766,
571+ 2547903564,
572+ 3139840265,
573+ 587243656,
574+ 3261127556,
575+ 3955999184,
576+ 2061849860,
577+ 3778058575,
578+ 259659645,
579+ 935157504,
580+ 3294850933,
581+ 2164603733,
582+ 3772888022,
583+ 732201413,
584+ 3677934092,
585+ 321204420,
586+ 509807651,
587+ 3626474557,
588+ 284622251,
589+ 3655952885,
590+ 1512028769,
591+ 1102588652,
592+ 2700179235,
593+ 4167405174,
594+ 2672050627,
595+ 3410780487,
596+ 4153733940,
597+ 2459759898,
598+ 568792515,
599+ 1081882827,
600+ 3211871042,
601+ 799411732,
602+ 2101993855,
603+ 3415550991,
604+ 3872737342,
605+ 4168312654,
606+ 1889019671,
607+ 4247531636,
608+ 2442118552,
609+ 3024016549,
610+ 1041817509,
611+ 141773691,
612+ 28033810,
613+ 4034097901,
614+ 1532981240,
615+ 2593712697,
616+ 2751535537,
617+ 269072724,
618+ 3363560906,
619+ 3555817938,
620+ 611297346,
621+ 366972507,
622+ 788151801,
623+ 3990920857,
624+ 1611303958,
625+ 3353102293,
626+ 1334246396,
627+ 1114446428,
628+ 3491128109,
629+ 2922751152,
630+ 3053407478,
631+ 2897830841,
632+ 176546593,
633+ 3184221063,
634+ 37923477,
635+ 1692128510,
636+ 165719856,
637+ 1795746307,
638+ 2422422413,
639+ 253227286,
640+ 2188522595,
641+ 582156087,
642+ 2342528685,
643+ 2080142547,
644+ 1928462563,
645+ 2713927482,
646+ 1944972771,
647+ 2534268146,
648+ 830798003,
649+ 1653357460,
650+ 291743070,
651+ 593771532,
652+ 2941865444,
653+ 855254640,
654+ 2401129822,
655+ 2420945774,
656+ 2447532144,
657+ 1137540092,
658+ 1296659939,
659+ 3252539825,
660+ 1165427708,
661+ 3251476781,
662+ 2597490804,
663+ 2518198923,
664+ 1196242486,
665+ 3646082981,
666+ 1347758965,
667+ 3824891532,
668+ 2959519286,
669+ 1523237529,
670+ 2910666174,
671+ 3226637035,
672+ 2116458903,
673+ 1076998092,
674+ 4222762545,
675+ 3061300520,
676+ 4189298288,
677+ 3943996060,
678+ 3129210496,
679+ 3826669630,
680+ 4235952488,
681+ 2624429853,
682+ 2522766390,
683+ 4137227001,
684+ 3846448057,
685+ 1893377487,
686+ 3658784739,
687+ 2368074586,
688+ 170547540,
689+ 520741120,
690+ 2662229630,
691+ 4265731754,
692+ 1379762094,
693+ 3395502906,
694+ 2242123335,
695+ 1960965916,
696+ 561815223,
697+ 2687853297,
698+ 4051050259,
699+ 1845906614,
700+ 3725623071,
701+ 1857706909,
702+ 2487006596,
703+ 1925919247,
704+ 2796536825,
705+ 3499954730,
706+ 2173320675,
707+ 3416676849,
708+ 3637473517,
709+ 340951464,
710+ 4152841543,
711+ 3747544606,
712+ 2659955417,
713+ 1695145107,
714+ 3117280269,
715+ 826143012,
716+ 3867179892,
717+ 4269349771,
718+ 1002613766,
719+ 3842086144,
720+ 1431990957,
721+ 2466205499,
722+ 653575141,
723+ 293530756,
724+ 2318035308,
725+ 3728576309,
726+ 1697894989,
727+ 2955143882,
728+ 2109912287,
729+ 2764187839,
730+ 1805490664,
731+ 672567480,
732+ 1374741155,
733+ 1662665091,
734+ 3551530257,
735+ 350283994,
736+ 685023916,
737+ 1887748803,
738+ 1386316091,
739+ 185708823,
740+ 3106823178,
741+ 3014109065,
742+ 3823816879,
743+ 2213358313,
744+ 2696977340,
745+ 4075569311,
746+ 365089277,
747+ 3466850767,
748+ 312392153,
749+ 1065191758,
750+ 2405243644,
751+ 3174745999,
752+ 3617861250,
753+ 867192904,
754+ 1046475095,
755+ 1888985494,
756+ 1127140157,
757+ 61671281,
758+ 128055546,
759+ 2332619657,
760+ 993669439,
761+ 2145370329,
762+ 1462433204,
763+ 74990676,
764+ 2898191247,
765+ 3601586977,
766+ 794604597,
767+ 3597643629,
768+ 4282141339,
769+ 251591051,
770+ 84943504,
771+ 2016044077,
772+ 946823499,
773+ 648214756,
774+ 2530104367,
775+ 4254219656,
776+ 1974542801,
777+ 53097687,
778+ 157109688,
779+ 299310673,
780+ 2866882336,
781+ 3335682769,
782+ 2583612755,
783+ 4114730718,
784+ 740387484,
785+ 986157357,
786+ 1140355266,
787+ 2825639379,
788+ 1198731547,
789+ 1521261313,
790+ 1204836445,
791+ 4294274455,
792+ 2215732661,
793+ 1369520150,
794+ 1515223958,
795+ 2428295267,
796+ 1945985266,
797+ 2168529560,
798+ 3791933294,
799+ 4021389338,
800+ 713695045,
801+ 4254483898,
802+ 3795986293,
803+ 1347498014,
804+ 1746051095,
805+ 1364967734,
806+ 206265390,
807+ 3940088473,
808+ 1867270033,
809+ 3893545471,
810+ 3545819698,
811+ 2573105187,
812+ 3859595967,
813+ 2823745089,
814+ 1293424244,
815+ 3948799370,
816+ 1524394803,
817+ 3807487752,
818+ 4055830971,
819+ 3124609223,
820+ 119357574,
821+ 1490516894,
822+ 3799908122,
823+ 1700941394,
824+ 80878888,
825+ 2719184407,
826+ 3603450215,
827+ 27225525,
828+ 1413638246,
829+ 3350206268,
830+ 2643568519,
831+ 801305037,
832+ 1341902999,
833+ 1420459209,
834+ 968648411,
835+ 1826125841,
836+ 2619721007,
837+ 537879916,
838+ 860253620,
839+ 586683700,
840+ 827412286,
841+ 2724526294,
842+ 1019678576,
843+ 3998975225,
844+ 339789397,
845+ 863181640,
846+ 970475690,
847+ 2737385140,
848+ 322021174,
849+ 4084948327,
850+ 80691950,
851+ 1702782677,
852+ 1266230197,
853+ 1100861683,
854+ 3123418948,
855+ 258978579,
856+ 3217833394,
857+ 1780903315,
858+ 1345341356,
859+ 2927579299,
860+ 931392918,
861+ 9404798,
862+ 83278219,
863+ 2470714323,
864+ 640357359,
865+ 2169696414,
866+ 496463525,
867+ 4127940882,
868+ 2965369765,
869+ 4136333330,
870+ 1159134689,
871+ 1798163043,
872+ 4097403856,
873+ 4284804850,
874+ 3165524545,
875+ 2765224926,
876+ 931350022,
877+ 1171636623,
878+ 845799406,
879+ 709853915,
880+ 2348457302,
881+ 3343956878,
882+ 2438786363,
883+ 175730452,
884+ 598587430,
885+ 2744955366,
886+ 447049527,
887+ 1252796590,
888+ 3044128900,
889+ 812683575,
890+ 3721040746,
891+ 3404688504,
892+ 2674021068,
893+ 959056069,
894+ 322162714,
895+ 2008064015,
896+ 3758321185,
897+ 2877937989,
898+ 778007512,
899+ 3502772435,
900+ 3084124565,
901+ 111844966,
902+ 248248909,
903+ 22147113,
904+ 2506501875,
905+ 1430033847,
906+ 1690841637,
907+ 2999017281,
908+ 3658748205,
909+ 1632773934,
910+ 4177069459,
911+ 3187781304,
912+ 1182255965,
913+ 4121685939,
914+ 300554973,
915+ 2854502901,
916+ 642657206,
917+ 1504346771,
918+ 128405037,
919+ 2163092164,
920+ 1091806675,
921+ 1144089805,
922+ 54479906,
923+ 505543118,
924+ 2844153548,
925+ 1010229282,
926+ 2961721580,
927+ 4235612700,
928+ 3508832243,
929+ 1409461040,
930+ 2568735295,
931+ 1191284023,
932+ 2220949766,
933+ 2605559386,
934+ 706551146,
935+ 3452279268,
936+ 2372892169,
937+ 2360210709,
938+ 3228881405,
939+ 2987444766,
940+ 1187314024,
941+ 908783041,
942+ 144096950,
943+ 1915948100,
944+ 2171208878,
945+ 420772043,
946+ 793209353,
947+ 359527746,
948+ 625018196,
949+ 1195796799,
950+ 2079388581,
951+ 864869238,
952+ 765565143,
953+ 1069647859,
954+ 3857355469,
955+ 2436437044,
956+ 238157644,
957+ 1612883577,
958+ 1911189891,
959+ 2070273440,
960+ 384222456,
961+ 1186369477,
962+ 2844794758,
963+ 3435869876,
964+ 1486894286,
965+ 4062343990,
966+ 440437688,
967+ 306253241,
968+ 3650751868,
969+ 2695961920,
970+ 3920128930,
971+ 3921419250,
972+ 502951143,
973+ 311093469,
974+ 2708936678,
975+ 36677206,
976+ 3473343884,
977+ 577655290,
978+ 3795127787,
979+ 1448118037,
980+ 436359554,
981+ 2051970204,
982+ 2644913053,
983+ 2492587228,
984+ 3125803824,
985+ 150160619,
986+ 1725373463,
987+ 2221292372,
988+ 2580064663,
989+ 1330289179,
990+ 2700556441,
991+ 1327212925,
992+ 651999045,
993+ 2089310372,
994+ 3221246949,
995+ 4148251434,
996+ 4267892623,
997+ 897583443,
998+ 1051813251,
999+ 2131903377,
1000+ 4121163297,
1001+ 4128279241,
1002+ 1634689556,
1003+ 3369895626,
1004+ 1121895497,
1005+ 3158192590,
1006+ 4290462018,
1007+ 3447288838,
1008+ 4035505534,
1009+ 2945114940,
1010+ 1556028368,
1011+ 4235061319,
1012+ 1535570089,
1013+ 2144940257,
1014+ 1961364931,
1015+ 2509075082,
1016+ 804411045,
1017+ 2290609740,
1018+ 1076471626,
1019+ 3254493188,
1020+ 4284011230,
1021+ 923006875,
1022+ 3722016670,
1023+ 2981439178,
1024+ 2038308778,
1025+ 1755166344,
1026+ 488581856,
1027+ 2624361425,
1028+ 1298790575,
1029+ 3550671725,
1030+ 1845109437,
1031+ 2047411775,
1032+ 2488464246,
1033+ 1391825885,
1034+ 2340290304,
1035+ 3623879917,
1036+ 217171099,
1037+ 3698905333,
1038+ 2718846041,
1039+ 73731529,
1040+ 2053405441,
1041+ 2770197347,
1042+ 2983996080,
1043+ 2612966141,
1044+ 2187183079,
1045+ 2796212469,
1046+ 3797629169,
1047+ 1788932364,
1048+ 17748377,
1049+ 627297271,
1050+ 3689459731,
1051+ 3311799950,
1052+ 4263162298,
1053+ 4016852324,
1054+ 3136750215,
1055+ 1725824049,
1056+ 2844064064,
1057+ 2059159211,
1058+ 3182127070,
1059+ 470655679,
1060+ 1166949584,
1061+ 2425843062,
1062+ 219908183,
1063+ 161770982,
1064+ 2394961157,
1065+ 999226372,
1066+ 2367624166,
1067+ 76287885,
1068+ 1110832227,
1069+ 3358123709,
1070+ 1504127646,
1071+ 49596774,
1072+ 1296560019,
1073+ 2320978173,
1074+ 1163934122,
1075+ 1631947491,
1076+ 2702852639,
1077+ 3856755518,
1078+ 2562943123,
1079+ 991330989,
1080+ 993726248,
1081+ 2133737192,
1082+ 20974150,
1083+ 3808389889,
1084+ 2447868340,
1085+ 2434828629,
1086+ 3344419509,
1087+ 4076789444,
1088+ 1446054487,
1089+ 3815933708,
1090+ 3644670988,
1091+ 3175898122,
1092+ 3057844745,
1093+ 559106380,
1094+ 1840065631,
1095+ 3020573012,
1096+ 3203040371,
1097+ 997381925,
1098+ 2563312032,
1099+ 815510593,
1100+ 121805231,
1101+ 1047507862,
1102+ 1841403695,
1103+ 1563170561,
1104+ 1644198099,
1105+ 3470882735,
1106+ 627296501,
1107+ 3006157508,
1108+ 383648566,
1109+ 3136652449,
1110+ 2252034149,
1111+ 1749861990,
1112+ 956381402,
1113+ 3299624735,
1114+ 2798395931,
1115+ 270054444,
1116+ 3757564211,
1117+ 2933717597,
1118+ 1080178310,
1119+ 1367392714,
1120+ 1135266342,
1121+ 2642448461,
1122+ 1067554284,
1123+ 3694982777,
1124+ 3594374699,
1125+ 4170301369,
1126+ 3593401570,
1127+ 2298071009,
1128+ 1561680798,
1129+ 2788490866,
1130+ 1757829499,
1131+ 8819607,
1132+ 2453686068,
1133+ 3458682663,
1134+ 1614888171,
1135+ 2327536307,
1136+ 13960177,
1137+ 125752716,
1138+ 2312371195,
1139+ 1515197240,
1140+ 189747227,
1141+ 666988376,
1142+ 1401118738,
1143+ 986465965,
1144+ 242793663,
1145+ 1830586663,
1146+ 1603054176,
1147+ 391536104,
1148+ 1403125754,
1149+ 4021998614,
1150+ 157985039,
1151+ 966292223,
1152+ 2476444819,
1153+ 3261614719,
1154+ 3888752449,
1155+ 2300656903,
1156+ 1138839559,
1157+ 1227396086,
1158+ 1029493665,
1159+ 2138482384,
1160+ 2182525175,
1161+ 1437393012,
1162+ 2758514342,
1163+ 1394715363,
1164+ 242430786,
1165+ 4026759135,
1166+ 379455166,
1167+ 3454852592,
1168+ 1128257576,
1169+ 513994046,
1170+ 2437643547,
1171+ 1851772774,
1172+ 1096918785,
1173+ 2537378072,
1174+ 2020382559,
1175+ 1306056753,
1176+ 519939769,
1177+ 2477462755,
1178+ 2962076712,
1179+ 2856059355,
1180+ 111272034,
1181+ 2363778749,
1182+ 3031510224,
1183+ 297098997,
1184+ 2716928589,
1185+ 1988398361,
1186+ 3715685207,
1187+ 1158387390,
1188+ 3239718824,
1189+ 214276640,
1190+ 1240159361,
1191+ 302800084,
1192+ 258391670,
1193+ 3118615408,
1194+ 1789752935,
1195+ 935790045,
1196+ 1678444383,
1197+ 3645357112,
1198+ 1752731774,
1199+ 1211889371,
1200+ 2432949496,
1201+ 1983838022,
1202+ 2563701701,
1203+ 3235972690,
1204+ 2732559614,
1205+ 4173627589,
1206+ 918129740,
1207+ 3528101943,
1208+ 945287787,
1209+ 783593046,
1210+ 1687101911,
1211+ 4265659819,
1212+ 1625936204,
1213+ 419423123,
1214+ 404748783,
1215+ 174814826,
1216+ 561306387,
1217+ 441376876,
1218+ 3649973873,
1219+ 1191532754,
1220+ 493829681,
1221+ 462640703,
1222+ 3037639795,
1223+ 4234288143,
1224+ 787992128,
1225+ 354556603,
1226+ 1391557094,
1227+ 1227150157,
1228+ 25592400,
1229+ 3032298621,
1230+ 1655829692,
1231+ 1736544192,
1232+ 2936173068,
1233+ 1867683432,
1234+ 3284761215,
1235+ 2988749127,
1236+ 62083315,
1237+ 3675433852,
1238+ 1134152479,
1239+ 2537382040,
1240+ 1147996351,
1241+ 1287284159,
1242+ 1889610942,
1243+ 3549411223,
1244+ 2634772335,
1245+ 1621708033,
1246+ 3268420142,
1247+ 2635222095,
1248+ 2856377255,
1249+ 3703296204,
1250+ 45831019,
1251+ 1997278369,
1252+ 1472530726,
1253+ 4202051236,
1254+ 1958581642,
1255+ 1899513707,
1256+ 1642075765,
1257+ 217373156,
1258+ 1177071505,
1259+ 2179831909,
1260+ 1894821896,
1261+ 375785474,
1262+ 140181353,
1263+ 2743987480,
1264+ 123627609,
1265+ 3644816362,
1266+ 4244769687,
1267+ 4053481902,
1268+ 4272740073,
1269+ 1701735471,
1270+ 1799303028,
1271+ 2810175160,
1272+ 1531107068,
1273+ 3059813822,
1274+ 4125025775,
1275+ 1932301928,
1276+ 358163550,
1277+ 1246286294,
1278+ 1901878857,
1279+ 2449370117,
1280+ 4061706076,
1281+ 2875797072,
1282+ 1661522553,
1283+ 543545982,
1284+ 300448222,
1285+ 4019581644,
1286+ 3197346443,
1287+ 731278538,
1288+ 457112622,
1289+ 669625172,
1290+ 2548620393,
1291+ 2931934447,
1292+ 2318225955,
1293+ 427149964,
1294+ 1097556601,
1295+ 3585697077,
1296+ 1901391738,
1297+ 3019912350,
1298+ 4193989774,
1299+ 1411691495,
1300+ 2549773310,
1301+ 3130489018,
1302+ 739444137,
1303+ 1953561922,
1304+ 228589899,
1305+ 974825144,
1306+ 1873934953,
1307+ 918502475,
1308+ 4020302125,
1309+ 2103082289,
1310+ 1474428456,
1311+ 269315616,
1312+ 3376419786,
1313+ 2903506696,
1314+ 169344159,
1315+ 4151327830,
1316+ 2861975985,
1317+ 1583628545,
1318+ 337656074,
1319+ 2381206238,
1320+ 1346357469,
1321+ 3316549550,
1322+ 1188140897,
1323+ 928463634,
1324+ 120466083,
1325+ 1048016215,
1326+ 2053770646,
1327+ 3729204448,
1328+ 3630812747,
1329+ 3421817962,
1330+ 1471357089,
1331+ 2971633393,
1332+ 2721366758,
1333+ 3977792328,
1334+ 2771228423,
1335+ 258029855,
1336+ 325097628,
1337+ 2816869331,
1338+ 228010778,
1339+ 1815596248,
1340+ 2677647806,
1341+ 4069826588,
1342+ 2009464559,
1343+ 4003870353,
1344+ 2558198381,
1345+ 823508134,
1346+ 256895388,
1347+ 130455482,
1348+ 4107398577,
1349+ 2446165146,
1350+ 3086759840,
1351+ 3128842794,
1352+ 236454548,
1353+ 3740649072,
1354+ 1049081391,
1355+ 3780795812,
1356+ 1964380357,
1357+ 3900635454,
1358+ 1941196066,
1359+ 1143285596,
1360+ 1276856333,
1361+ 2919547816,
1362+ 2947639569,
1363+ 1889305089,
1364+ 2386910172,
1365+ 2685680362,
1366+ 2042792556,
1367+ 2780968041,
1368+ 976912013,
1369+ 3562274424,
1370+ 2336140155,
1371+ 3464857244,
1372+ 1108365812,
1373+ 1201566469,
1374+ 707126700,
1375+ 4047776595,
1376+ 1289380202,
1377+ 1231913128,
1378+ 2819729319,
1379+ 537908270,
1380+ 3802355886,
1381+ 2004615093,
1382+ 2947614997,
1383+ 4192189156,
1384+ 2809733754,
1385+ 3082820238,
1386+ 2758499499,
1387+ 1004612882,
1388+ 1102702383,
1389+ 1862546275,
1390+ 3170345990,
1391+ 883739952,
1392+ 1641198615,
1393+ 957782688,
1394+ 1503652889,
1395+ 2210400768,
1396+ 2002162781,
1397+ 1553086024,
1398+ 2591721606,
1399+ 3830165160,
1400+ 4181044959,
1401+ 2735782270,
1402+ 3825677158,
1403+ 143739895,
1404+ 771193452,
1405+ 35990560,
1406+ 1014009970,
1407+ 20768744,
1408+ 1785268932,
1409+ 1424740580,
1410+ 1620237280,
1411+ 848157259,
1412+ 3808893671,
1413+ 2746756110,
1414+ 3903639825,
1415+ 1822084165,
1416+ 2891666588,
1417+ 3853186896,
1418+ 4248495212,
1419+ 1178592425,
1420+ 455721495,
1421+ 1848821934,
1422+ 1558397701,
1423+ 133397899,
1424+ 1845531767,
1425+ 2798312897,
1426+ 1471176399,
1427+ 1743248506,
1428+ 2229972777,
1429+ 1290369879,
1430+ 3579075953,
1431+ 309034994,
1432+ 929728690,
1433+ 3841454719,
1434+ 3031753515,
1435+ 3606461413,
1436+ 2412281758,
1437+ 2993123515,
1438+ };
1439+ int const nrandoms = sizeof(randoms) / sizeof(*randoms);
1440+
1441+ int const bits = count_bits(sgtype);
1442+ for (int iter=0; iter<nrandoms; ++iter) {
1443+ typedef union {
1444+ gtype v;
1445+ sgtype s[16];
1446+ } Tvec;
1447+ Tvec sel, left, right, res;
1448+ int vecsize = vec_step(gtype);
1449+ for (int n=0; n<vecsize; ++n) {
1450+ sel.s[n] = randoms[(iter+n ) % nrandoms];
1451+ left.s[n] = randoms[(iter+n+20) % nrandoms];
1452+ right.s[n] = randoms[(iter+n+40) % nrandoms];
1453+ if (bits>32) {
1454+ sel.s[n] = (sel.s[n] << (bits/2)) | randoms[(iter+n+100) % nrandoms];
1455+ left.s[n] = (left.s[n] << (bits/2)) | randoms[(iter+n+120) % nrandoms];
1456+ right.s[n] = (right.s[n] << (bits/2)) | randoms[(iter+n+140) % nrandoms];
1457+ }
1458+ }
1459+ res.v = bitselect(left.v, right.v, sel.v);
1460+ bool equal = true;
1461+ for (int n=0; n<vecsize; ++n) {
1462+ equal = equal && ((res.s[n] & ~sel.s[n]) == (left.s[n] & ~sel.s[n]));
1463+ equal = equal && ((res.s[n] & sel.s[n]) == (right.s[n] & sel.s[n]));
1464+ }
1465+ if (!equal) {
1466+ printf("FAIL: bitselect type=%s a=0x%08x b=0x%08x c=0x%08x c=0x%08x\n",
1467+ typename,
1468+ (uint)left.s[0], (uint)right.s[0], (uint)sel.s[0],
1469+ (uint)res.s[0]);
1470+ }
1471+ }
1472+ })
1473+ )
1474+
1475+void test_bitselect()
1476+{
1477+ CALL_FUNC_G(test_bitselect)
1478+}
1479
1480=== added file 'examples/kernel/test_fabs.cl'
1481--- examples/kernel/test_fabs.cl 1970-01-01 00:00:00 +0000
1482+++ examples/kernel/test_fabs.cl 2011-12-14 15:59:07 +0000
1483@@ -0,0 +1,178 @@
1484+// TESTING: fabs
1485+// TESTING: signbit
1486+// TESTING: copysign
1487+
1488+#define IMPLEMENT_BODY_V(NAME, BODY, VTYPE, STYPE, JTYPE, SJTYPE) \
1489+ void NAME##_##VTYPE() \
1490+ { \
1491+ typedef VTYPE vtype; \
1492+ typedef STYPE stype; \
1493+ typedef JTYPE jtype; \
1494+ typedef SJTYPE sjtype; \
1495+ char const *const typename = #VTYPE; \
1496+ BODY; \
1497+ }
1498+#define DEFINE_BODY_V(NAME, EXPR) \
1499+ IMPLEMENT_BODY_V(NAME, EXPR, float , float , int , int ) \
1500+ IMPLEMENT_BODY_V(NAME, EXPR, float2 , float , int2 , int ) \
1501+ IMPLEMENT_BODY_V(NAME, EXPR, float3 , float , int3 , int ) \
1502+ IMPLEMENT_BODY_V(NAME, EXPR, float4 , float , int4 , int ) \
1503+ IMPLEMENT_BODY_V(NAME, EXPR, float8 , float , int8 , int ) \
1504+ IMPLEMENT_BODY_V(NAME, EXPR, float16 , float , int16 , int ) \
1505+ __IF_FP64( \
1506+ IMPLEMENT_BODY_V(NAME, EXPR, double , double, long , long) \
1507+ IMPLEMENT_BODY_V(NAME, EXPR, double2 , double, long2 , long) \
1508+ IMPLEMENT_BODY_V(NAME, EXPR, double3 , double, long3 , long) \
1509+ IMPLEMENT_BODY_V(NAME, EXPR, double4 , double, long4 , long) \
1510+ IMPLEMENT_BODY_V(NAME, EXPR, double8 , double, long8 , long) \
1511+ IMPLEMENT_BODY_V(NAME, EXPR, double16, double, long16, long))
1512+
1513+#define CALL_FUNC_V(NAME) \
1514+ NAME##_float (); \
1515+ NAME##_float2 (); \
1516+ NAME##_float3 (); \
1517+ NAME##_float4 (); \
1518+ NAME##_float8 (); \
1519+ NAME##_float16 (); \
1520+ __IF_FP64( \
1521+ NAME##_double (); \
1522+ NAME##_double2 (); \
1523+ NAME##_double3 (); \
1524+ NAME##_double4 (); \
1525+ NAME##_double8 (); \
1526+ NAME##_double16();)
1527+
1528+
1529+
1530+#define is_signed(T) ((T)-1 < (T)+1)
1531+#define is_floating(T) ((T)0.1 > (T)0.0)
1532+#define count_bits(T) (CHAR_BIT * sizeof(T))
1533+
1534+DEFINE_BODY_V
1535+(test_fabs,
1536+ ({
1537+ _cl_static_assert(stype, is_floating(stype));
1538+ float const values[] = {
1539+ 0.0f,
1540+ 0.1f,
1541+ 0.9f,
1542+ 1.0f,
1543+ 1.1f,
1544+ 10.0f,
1545+ 1000000.0f,
1546+ 1000000000000.0f,
1547+ MAXFLOAT,
1548+ HUGE_VALF,
1549+ INFINITY,
1550+ /* NAN, a nan has no specific sign */
1551+ FLT_MAX,
1552+ FLT_MIN,
1553+ FLT_EPSILON,
1554+ };
1555+ int const nvalues = sizeof(values) / sizeof(*values);
1556+ int ninputs = 1;
1557+#ifdef cl_khr_fp64
1558+ double const dvalues[] = {
1559+ 0.0,
1560+ 0.1,
1561+ 0.9,
1562+ 1.0,
1563+ 1.1,
1564+ 10.0,
1565+ 1000000.0,
1566+ 1000000000000.0,
1567+ 1000000000000000000000000.0,
1568+ HUGE_VAL,
1569+ INFINITY,
1570+ /* NAN, a nan has no specific sign */
1571+ DBL_MAX,
1572+ DBL_MIN,
1573+ DBL_EPSILON,
1574+ };
1575+ int const ndvalues = sizeof(dvalues) / sizeof(*dvalues);
1576+ ++ninputs;
1577+#endif
1578+
1579+ for (int input=0; input<ninputs; ++input) {
1580+ for (int iter=0; iter<nvalues; ++iter) {
1581+ for (int sign=-1; sign<=+1; sign+=2) {
1582+ typedef union {
1583+ vtype v;
1584+ stype s[16];
1585+ } Tvec;
1586+ Tvec val, good, val2;
1587+ int vecsize = vec_step(vtype);
1588+ for (int n=0; n<vecsize; ++n) {
1589+ if (input==0) {
1590+ val.s[n] = sign * values[(iter+n) % nvalues];
1591+ good.s[n] = values[(iter+n) % nvalues];
1592+ val2.s[n] = values[(iter+n+1) % nvalues];
1593+ } else {
1594+#ifdef cl_khr_fp64
1595+ val.s[n] = sign * dvalues[(iter+n) % ndvalues];
1596+ good.s[n] = dvalues[(iter+n) % ndvalues];
1597+ val2.s[n] = dvalues[(iter+n+1) % ndvalues];
1598+#endif
1599+ }
1600+ }
1601+ Tvec res;
1602+ bool equal;
1603+ typedef union {
1604+ stype s;
1605+ sjtype sj;
1606+ } S;
1607+ typedef union {
1608+ jtype v;
1609+ sjtype s[16];
1610+ } Jvec;
1611+ /* fabs */
1612+ res.v = fabs(val.v);
1613+ equal = true;
1614+ for (int n=0; n<vecsize; ++n) {
1615+ S r, g;
1616+ r.s = res.s[n];
1617+ g.s = good.s[n];
1618+ equal = equal && r.sj == g.sj;
1619+ }
1620+ if (!equal) {
1621+ printf("FAIL: fabs type=%s val=%.17g res=%.17g\n",
1622+ typename, val.s[0], res.s[0]);
1623+ }
1624+ /* signbit */
1625+ Jvec ires;
1626+ ires.v = signbit(val.v);
1627+ equal = true;
1628+ for (int n=0; n<vecsize; ++n) {
1629+ equal = equal && ires.s[n] == (sign>0 ? 0 : vecsize==1 ? +1 : -1);
1630+ }
1631+ if (!equal) {
1632+ printf("FAIL: signbit type=%s val=%.17g res=%d\n",
1633+ typename, val.s[0], (int)ires.s[0]);
1634+ }
1635+ /* copysign */
1636+ for (int sign2=-1; sign2<=+1; sign2+=2) {
1637+ res.v = copysign(val.v, (stype)sign2*val2.v);
1638+ equal = true;
1639+ for (int n=0; n<vecsize; ++n) {
1640+ S r, g;
1641+ r.s = res.s[n];
1642+ g.s = sign2*good.s[n];
1643+ equal = equal && r.sj == g.sj;
1644+ }
1645+ if (!equal) {
1646+ for (int n=0; n<vecsize; ++n) {
1647+ printf("FAIL: copysign type=%s val=%.17g sign=%.17g res=%.17g\n",
1648+ typename, val.s[n], sign2*val2.s[n], res.s[n]);
1649+ }
1650+ }
1651+ }
1652+ }
1653+ }
1654+ }
1655+ })
1656+ )
1657+
1658+void test_fabs()
1659+{
1660+ CALL_FUNC_V(test_fabs)
1661+}
1662
1663=== added file 'examples/kernel/test_rotate.cl'
1664--- examples/kernel/test_rotate.cl 1970-01-01 00:00:00 +0000
1665+++ examples/kernel/test_rotate.cl 2011-12-14 15:59:07 +0000
1666@@ -0,0 +1,204 @@
1667+// TESTING: <<
1668+// TESTING: >>
1669+// TESTING: rotate
1670+
1671+#define IMPLEMENT_BODY_G(NAME, BODY, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \
1672+ void NAME##_##GTYPE() \
1673+ { \
1674+ typedef GTYPE gtype; \
1675+ typedef SGTYPE sgtype; \
1676+ typedef UGTYPE ugtype; \
1677+ typedef SUGTYPE sugtype; \
1678+ char const *const typename = #GTYPE; \
1679+ BODY; \
1680+ }
1681+#define DEFINE_BODY_G(NAME, EXPR) \
1682+ IMPLEMENT_BODY_G(NAME, EXPR, char , char , uchar , uchar ) \
1683+ IMPLEMENT_BODY_G(NAME, EXPR, char2 , char , uchar2 , uchar ) \
1684+ IMPLEMENT_BODY_G(NAME, EXPR, char3 , char , uchar3 , uchar ) \
1685+ IMPLEMENT_BODY_G(NAME, EXPR, char4 , char , uchar4 , uchar ) \
1686+ IMPLEMENT_BODY_G(NAME, EXPR, char8 , char , uchar8 , uchar ) \
1687+ IMPLEMENT_BODY_G(NAME, EXPR, char16 , char , uchar16 , uchar ) \
1688+ IMPLEMENT_BODY_G(NAME, EXPR, uchar , uchar , uchar , uchar ) \
1689+ IMPLEMENT_BODY_G(NAME, EXPR, uchar2 , uchar , uchar2 , uchar ) \
1690+ IMPLEMENT_BODY_G(NAME, EXPR, uchar3 , uchar , uchar3 , uchar ) \
1691+ IMPLEMENT_BODY_G(NAME, EXPR, uchar4 , uchar , uchar4 , uchar ) \
1692+ IMPLEMENT_BODY_G(NAME, EXPR, uchar8 , uchar , uchar8 , uchar ) \
1693+ IMPLEMENT_BODY_G(NAME, EXPR, uchar16 , uchar , uchar16 , uchar ) \
1694+ IMPLEMENT_BODY_G(NAME, EXPR, short , short , ushort , ushort) \
1695+ IMPLEMENT_BODY_G(NAME, EXPR, short2 , short , ushort2 , ushort) \
1696+ IMPLEMENT_BODY_G(NAME, EXPR, short3 , short , ushort3 , ushort) \
1697+ IMPLEMENT_BODY_G(NAME, EXPR, short4 , short , ushort4 , ushort) \
1698+ IMPLEMENT_BODY_G(NAME, EXPR, short8 , short , ushort8 , ushort) \
1699+ IMPLEMENT_BODY_G(NAME, EXPR, short16 , short , ushort16, ushort) \
1700+ IMPLEMENT_BODY_G(NAME, EXPR, ushort , ushort, ushort , ushort) \
1701+ IMPLEMENT_BODY_G(NAME, EXPR, ushort2 , ushort, ushort2 , ushort) \
1702+ IMPLEMENT_BODY_G(NAME, EXPR, ushort3 , ushort, ushort3 , ushort) \
1703+ IMPLEMENT_BODY_G(NAME, EXPR, ushort4 , ushort, ushort4 , ushort) \
1704+ IMPLEMENT_BODY_G(NAME, EXPR, ushort8 , ushort, ushort8 , ushort) \
1705+ IMPLEMENT_BODY_G(NAME, EXPR, ushort16, ushort, ushort16, ushort) \
1706+ IMPLEMENT_BODY_G(NAME, EXPR, int , int , uint , uint ) \
1707+ IMPLEMENT_BODY_G(NAME, EXPR, int2 , int , uint2 , uint ) \
1708+ IMPLEMENT_BODY_G(NAME, EXPR, int3 , int , uint3 , uint ) \
1709+ IMPLEMENT_BODY_G(NAME, EXPR, int4 , int , uint4 , uint ) \
1710+ IMPLEMENT_BODY_G(NAME, EXPR, int8 , int , uint8 , uint ) \
1711+ IMPLEMENT_BODY_G(NAME, EXPR, int16 , int , uint16 , uint ) \
1712+ IMPLEMENT_BODY_G(NAME, EXPR, uint , uint , uint , uint ) \
1713+ IMPLEMENT_BODY_G(NAME, EXPR, uint2 , uint , uint2 , uint ) \
1714+ IMPLEMENT_BODY_G(NAME, EXPR, uint3 , uint , uint3 , uint ) \
1715+ IMPLEMENT_BODY_G(NAME, EXPR, uint4 , uint , uint4 , uint ) \
1716+ IMPLEMENT_BODY_G(NAME, EXPR, uint8 , uint , uint8 , uint ) \
1717+ IMPLEMENT_BODY_G(NAME, EXPR, uint16 , uint , uint16 , uint ) \
1718+ __IF_INT64( \
1719+ IMPLEMENT_BODY_G(NAME, EXPR, long , long , ulong , ulong ) \
1720+ IMPLEMENT_BODY_G(NAME, EXPR, long2 , long , ulong2 , ulong ) \
1721+ IMPLEMENT_BODY_G(NAME, EXPR, long3 , long , ulong3 , ulong ) \
1722+ IMPLEMENT_BODY_G(NAME, EXPR, long4 , long , ulong4 , ulong ) \
1723+ IMPLEMENT_BODY_G(NAME, EXPR, long8 , long , ulong8 , ulong ) \
1724+ IMPLEMENT_BODY_G(NAME, EXPR, long16 , long , ulong16 , ulong ) \
1725+ IMPLEMENT_BODY_G(NAME, EXPR, ulong , ulong , ulong , ulong ) \
1726+ IMPLEMENT_BODY_G(NAME, EXPR, ulong2 , ulong , ulong2 , ulong ) \
1727+ IMPLEMENT_BODY_G(NAME, EXPR, ulong3 , ulong , ulong3 , ulong ) \
1728+ IMPLEMENT_BODY_G(NAME, EXPR, ulong4 , ulong , ulong4 , ulong ) \
1729+ IMPLEMENT_BODY_G(NAME, EXPR, ulong8 , ulong , ulong8 , ulong ) \
1730+ IMPLEMENT_BODY_G(NAME, EXPR, ulong16 , ulong , ulong16 , ulong ))
1731+
1732+#define CALL_FUNC_G(NAME) \
1733+ NAME##_char (); \
1734+ NAME##_char2 (); \
1735+ NAME##_char3 (); \
1736+ NAME##_char4 (); \
1737+ NAME##_char8 (); \
1738+ NAME##_char16 (); \
1739+ NAME##_uchar (); \
1740+ NAME##_uchar2 (); \
1741+ NAME##_uchar3 (); \
1742+ NAME##_uchar4 (); \
1743+ NAME##_uchar8 (); \
1744+ NAME##_uchar16 (); \
1745+ NAME##_short (); \
1746+ NAME##_short2 (); \
1747+ NAME##_short3 (); \
1748+ NAME##_short4 (); \
1749+ NAME##_short8 (); \
1750+ NAME##_short16 (); \
1751+ NAME##_ushort (); \
1752+ NAME##_ushort2 (); \
1753+ NAME##_ushort3 (); \
1754+ NAME##_ushort4 (); \
1755+ NAME##_ushort8 (); \
1756+ NAME##_ushort16(); \
1757+ NAME##_int (); \
1758+ NAME##_int2 (); \
1759+ NAME##_int3 (); \
1760+ NAME##_int4 (); \
1761+ NAME##_int8 (); \
1762+ NAME##_int16 (); \
1763+ NAME##_uint (); \
1764+ NAME##_uint2 (); \
1765+ NAME##_uint3 (); \
1766+ NAME##_uint4 (); \
1767+ NAME##_uint8 (); \
1768+ NAME##_uint16 (); \
1769+ __IF_INT64( \
1770+ NAME##_long (); \
1771+ NAME##_long2 (); \
1772+ NAME##_long3 (); \
1773+ NAME##_long4 (); \
1774+ NAME##_long8 (); \
1775+ NAME##_long16 (); \
1776+ NAME##_ulong (); \
1777+ NAME##_ulong2 (); \
1778+ NAME##_ulong3 (); \
1779+ NAME##_ulong4 (); \
1780+ NAME##_ulong8 (); \
1781+ NAME##_ulong16 ();)
1782+
1783+
1784+
1785+#define is_signed(T) ((T)-1 < (T)+1)
1786+#define is_floating(T) ((T)0.1 > (T)0.0)
1787+#define count_bits(T) (CHAR_BIT * sizeof(T))
1788+
1789+DEFINE_BODY_G
1790+(test_rotate,
1791+ ({
1792+ _cl_static_assert(sgtype, !is_floating(sgtype));
1793+ int patterns[] = {0x01, 0x80, 0x77, 0xee};
1794+ for (int p=0; p<4; ++p) {
1795+ int const bits = count_bits(sgtype);
1796+ int array[bits];
1797+ for (int b=0; b<bits; ++b) {
1798+ array[b] = !!(patterns[p] & (1 << (b & 7)));
1799+ }
1800+ int vecsize = vec_step(gtype);
1801+ typedef union {
1802+ gtype v;
1803+ sgtype s[16];
1804+ } Tvec;
1805+ for (int shiftbase=0; shiftbase<=bits; ++shiftbase) {
1806+ for (int shiftoffset=0; shiftoffset<(vecsize==1?1:4); ++shiftoffset) {
1807+ Tvec shift;
1808+ Tvec val;
1809+ Tvec shl, shr, rot;
1810+ for (int n=0; n<vecsize; ++n) {
1811+ shift.s[n] = shiftbase + n*shiftoffset;
1812+ val.s[n] = 0;
1813+ shl.s[n] = 0;
1814+ shr.s[n] = 0;
1815+ rot.s[n] = 0;
1816+ for (int b=bits; b>=0; --b) {
1817+ int bl = b - (shift.s[n] & (bits-1));
1818+ int br = b + (shift.s[n] & (bits-1));
1819+ int sign = is_signed(sgtype) ? array[bits-1] : 0;
1820+ val.s[n] = (val.s[n] << 1) | array[b];
1821+ shl.s[n] = (shl.s[n] << 1) | (bl < 0 ? 0 : array[bl]);
1822+ shr.s[n] = (shr.s[n] << 1) | (br >= bits ? sign : array[br]);
1823+ rot.s[n] = (rot.s[n] << 1) | array[bl & (bits-1)];
1824+ }
1825+ }
1826+ Tvec res;
1827+ bool equal;
1828+ /* shift left */
1829+ res.v = val.v << shift.v;
1830+ equal = true;
1831+ for (int n=0; n<vecsize; ++n) {
1832+ equal = equal && res.s[n] == shl.s[n];
1833+ }
1834+ if (!equal) {
1835+ printf("FAIL: shift left (<<) type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",
1836+ typename, patterns[p], shiftbase, shiftoffset,
1837+ (uint)res.s[0], (uint)shl.s[0]);
1838+ }
1839+ /* shift right */
1840+ res.v = val.v >> shift.v;
1841+ equal = true;
1842+ for (int n=0; n<vecsize; ++n) {
1843+ equal = equal && res.s[n] == shr.s[n];
1844+ }
1845+ if (!equal) {
1846+ printf("FAIL: shift right (>>) type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",
1847+ typename, patterns[p], shiftbase, shiftoffset,
1848+ (uint)res.s[0], (uint)shr.s[0]);
1849+ }
1850+ /* rotate */
1851+ res.v = rotate(val.v, shift.v);
1852+ equal = true;
1853+ for (int n=0; n<vecsize; ++n) {
1854+ equal = equal && res.s[n] == rot.s[n];
1855+ }
1856+ if (!equal) {
1857+ printf("FAIL: rotate type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n",
1858+ typename, patterns[p], shiftbase, shiftoffset,
1859+ (uint)res.s[0], (uint)rot.s[0]);
1860+ }
1861+ }
1862+ }
1863+ }
1864+ })
1865+ )
1866+
1867+void test_rotate()
1868+{
1869+ CALL_FUNC_G(test_rotate)
1870+}
1871
1872=== modified file 'examples/loopbarriers/loopbarriers.c'
1873--- examples/loopbarriers/loopbarriers.c 2011-11-07 18:21:03 +0000
1874+++ examples/loopbarriers/loopbarriers.c 2011-12-14 15:59:07 +0000
1875@@ -57,7 +57,7 @@
1876
1877 fclose(source_file);
1878
1879- context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU,
1880+ context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU,
1881 NULL, NULL, NULL);
1882 if (context == (cl_context)0)
1883 return -1;
1884
1885=== modified file 'examples/run_all.sh'
1886--- examples/run_all.sh 2011-11-08 17:02:15 +0000
1887+++ examples/run_all.sh 2011-12-14 15:59:07 +0000
1888@@ -2,7 +2,7 @@
1889 #
1890 # Executes all the examples in a row.
1891 #
1892-TESTS="example1 example2 barriers forloops trig"
1893+TESTS="example1 example2 barriers forloops trig scalarwave"
1894
1895 for dname in ${TESTS};
1896 do
1897
1898=== added directory 'examples/scalarwave'
1899=== added file 'examples/scalarwave/Makefile.am'
1900--- examples/scalarwave/Makefile.am 1970-01-01 00:00:00 +0000
1901+++ examples/scalarwave/Makefile.am 2011-12-14 15:59:07 +0000
1902@@ -0,0 +1,31 @@
1903+# Process this file with automake to produce Makefile.in (in this,
1904+# and all subdirectories).
1905+# Makefile.am for pocl/examples/scalarwave.
1906+#
1907+# Copyright (c) 2011 Universidad Rey Juan Carlos
1908+#
1909+# Permission is hereby granted, free of charge, to any person obtaining a copy
1910+# of this software and associated documentation files (the "Software"), to deal
1911+# in the Software without restriction, including without limitation the rights
1912+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
1913+# copies of the Software, and to permit persons to whom the Software is
1914+# furnished to do so, subject to the following conditions:
1915+#
1916+# The above copyright notice and this permission notice shall be included in
1917+# all copies or substantial portions of the Software.
1918+#
1919+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
1920+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
1921+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
1922+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
1923+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
1924+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
1925+# THE SOFTWARE.
1926+
1927+noinst_PROGRAMS = scalarwave
1928+
1929+scalarwave_SOURCES = scalarwave.c scalarwave.cl
1930+scalarwave_LDADD = ../../lib/CL/libCL.la -lm @PTHREAD_LIBS@
1931+scalarwave_CFLAGS = -std=c99 @PTHREAD_CFLAGS@
1932+
1933+AM_CPPFLAGS = -I$(top_srcdir)/include -U__APPLE__ -DSRCDIR='"$(abs_srcdir)"'
1934
1935=== added file 'examples/scalarwave/scalarwave.c'
1936--- examples/scalarwave/scalarwave.c 1970-01-01 00:00:00 +0000
1937+++ examples/scalarwave/scalarwave.c 2011-12-14 15:59:07 +0000
1938@@ -0,0 +1,233 @@
1939+/* scalarwave - Scalar wave evolution */
1940+
1941+#define _BSD_SOURCE // define M_PI
1942+
1943+#include <assert.h>
1944+#include <math.h>
1945+#include <stdio.h>
1946+#include <stdlib.h>
1947+
1948+#include <CL/opencl.h>
1949+
1950+
1951+
1952+#define GRID_GRANULARITY 1 // TODO 2
1953+
1954+typedef struct grid_t {
1955+ cl_double dt; // time step
1956+ cl_double dx, dy, dz; // resolution
1957+ cl_int ai, aj, ak; // allocated size
1958+ cl_int ni, nj, nk; // used size
1959+} grid_t;
1960+
1961+
1962+
1963+int
1964+exec_scalarwave_kernel(char const *const program_source,
1965+ cl_double *const phi,
1966+ cl_double const *const phi_p,
1967+ cl_double const *const phi_p_p,
1968+ grid_t const *const grid)
1969+{
1970+ static int initialised = 0;
1971+ static cl_context context;
1972+ static cl_command_queue cmd_queue;
1973+ static cl_program program;
1974+ static cl_kernel kernel;
1975+
1976+ if (!initialised) {
1977+ initialised = 1;
1978+
1979+ context =
1980+ clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, NULL, NULL, NULL);
1981+ if (!context) return -1;
1982+
1983+ size_t ndevices;
1984+ clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &ndevices);
1985+ ndevices /= sizeof(cl_device_id);
1986+ cl_device_id devices[ndevices];
1987+ clGetContextInfo(context, CL_CONTEXT_DEVICES,
1988+ ndevices*sizeof(cl_device_id), devices, NULL);
1989+
1990+ cmd_queue =
1991+ clCreateCommandQueue(context, devices[0], 0, NULL);
1992+ if (!cmd_queue) return -1;
1993+
1994+ program =
1995+ clCreateProgramWithSource(context, 1, (const char**)&program_source,
1996+ NULL, NULL);
1997+ if (!program) return -1;
1998+
1999+ int ierr;
2000+ ierr = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
2001+ if (ierr) return -1;
2002+
2003+ kernel = clCreateKernel(program, "scalarwave", NULL);
2004+ if (!kernel) return -1;
2005+
2006+ }
2007+
2008+ size_t const npoints = grid->ai * grid->aj * grid->ak;
2009+ cl_mem const mem_phi =
2010+ clCreateBuffer(context, 0,
2011+ npoints*sizeof(*phi), NULL, NULL);
2012+ if (!mem_phi) return -1;
2013+ cl_mem const mem_phi_p =
2014+ clCreateBuffer(context, CL_MEM_COPY_HOST_PTR,
2015+ npoints*sizeof(*phi_p), phi_p, NULL);
2016+ if (!mem_phi_p) return -1;
2017+ cl_mem const mem_phi_p_p =
2018+ clCreateBuffer(context, CL_MEM_COPY_HOST_PTR,
2019+ npoints*sizeof(*phi_p_p), phi_p_p, NULL);
2020+ if (!mem_phi_p_p) return -1;
2021+ cl_mem const mem_grid =
2022+ clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
2023+ sizeof(*grid), grid, NULL);
2024+ if (!mem_grid) return -1;
2025+
2026+ int ierr;
2027+ ierr = clSetKernelArg(kernel, 0, sizeof(cl_mem), &mem_phi);
2028+ if (ierr) return -1;
2029+ ierr = clSetKernelArg(kernel, 1, sizeof(cl_mem), &mem_phi_p);
2030+ if (ierr) return -1;
2031+ ierr = clSetKernelArg(kernel, 2, sizeof(cl_mem), &mem_phi_p_p);
2032+ if (ierr) return -1;
2033+ ierr = clSetKernelArg(kernel, 3, sizeof(cl_mem), &mem_grid);
2034+ if (ierr) return -1;
2035+
2036+ size_t const global_work_size[3] =
2037+ {grid->ni, grid->nj, grid->nk};
2038+ size_t const local_work_size[3] =
2039+ {GRID_GRANULARITY, GRID_GRANULARITY, GRID_GRANULARITY};
2040+
2041+ ierr = clEnqueueNDRangeKernel(cmd_queue, kernel, 3, NULL,
2042+ global_work_size, local_work_size,
2043+ 0, NULL, NULL);
2044+ if (ierr) return -1;
2045+
2046+ ierr = clFinish(cmd_queue);
2047+ if (ierr) return -1;
2048+
2049+ ierr = clEnqueueReadBuffer(cmd_queue, mem_phi, CL_TRUE,
2050+ 0, npoints*sizeof(*phi), phi,
2051+ 0, NULL, NULL);
2052+ if (ierr) return -1;
2053+
2054+ clReleaseMemObject(mem_phi);
2055+ clReleaseMemObject(mem_phi_p);
2056+ clReleaseMemObject(mem_phi_p_p);
2057+ clReleaseMemObject(mem_grid);
2058+ /* clReleaseKernel(kernel); */
2059+ /* clReleaseProgram(program); */
2060+ /* clReleaseCommandQueue(cmd_queue); */
2061+ /* clReleaseContext(context); */
2062+
2063+ return 0;
2064+}
2065+
2066+
2067+
2068+#define ALPHA 0.5 // CFL factor
2069+#define NT 4 // time steps
2070+#define NX 17 // grid size
2071+
2072+
2073+
2074+// Round up to next multiple of GRID_GRANULARITY
2075+static int roundup(int const nx)
2076+{
2077+ return (nx + GRID_GRANULARITY-1) / GRID_GRANULARITY * GRID_GRANULARITY;
2078+}
2079+
2080+#ifndef SRCDIR
2081+# define SRCDIR "."
2082+#endif
2083+
2084+int
2085+main(void)
2086+{
2087+ FILE *const source_file = fopen(SRCDIR "/scalarwave.cl", "r");
2088+ assert(source_file != NULL && "scalarwave.cl not found!");
2089+
2090+ fseek(source_file, 0, SEEK_END);
2091+ size_t const source_size = ftell(source_file);
2092+ fseek(source_file, 0, SEEK_SET);
2093+
2094+ char source[source_size + 1];
2095+ fread(source, source_size, 1, source_file);
2096+ source[source_size] = '\0';
2097+
2098+ fclose(source_file);
2099+
2100+
2101+
2102+ grid_t grid;
2103+ grid.dt = ALPHA/(NX-1);
2104+ grid.dx = grid.dy = grid.dz = 1.0/(NX-1);
2105+ grid.ai = grid.aj = grid.ak = roundup(NX);
2106+ grid.ni = grid.nj = grid.nk = NX;
2107+
2108+ cl_double *restrict phi =
2109+ malloc (grid.ai*grid.aj*grid.ak * sizeof *phi );
2110+ cl_double *restrict phi_p =
2111+ malloc (grid.ai*grid.aj*grid.ak * sizeof *phi_p );
2112+ cl_double *restrict phi_p_p =
2113+ malloc (grid.ai*grid.aj*grid.ak * sizeof *phi_p_p);
2114+
2115+ // Set up initial data (TODO: do this on the device instead)
2116+ printf ("Initial condition: t=%g\n", 0.0);
2117+ double const kx = M_PI;
2118+ double const ky = M_PI;
2119+ double const kz = M_PI;
2120+ double const omega = sqrt(pow(kx,2)+pow(ky,2)+pow(kz,2));
2121+ for (int k = 0; k < NX; ++k) {
2122+ for (int j = 0; j < NX; ++j) {
2123+ for (int i = 0; i < NX; ++i) {
2124+ double const t0 = 0.0;
2125+ double const t1 = -grid.dt;
2126+ double const x = i*grid.dx;
2127+ double const y = j*grid.dy;
2128+ double const z = k*grid.dz;
2129+ int const ind3d = i+grid.ai*(j+grid.aj*k);
2130+ phi [ind3d] = sin(kx*x) * sin(ky*y) * sin(kz*z) * cos(omega*t0);
2131+ phi_p[ind3d] = sin(kx*x) * sin(ky*y) * sin(kz*z) * cos(omega*t1);
2132+ }
2133+ }
2134+ }
2135+
2136+ // Take some time steps
2137+ for (int n=0; n<NT; ++n) {
2138+ printf ("Time step %d: t=%g\n", n+1, (n+1)*grid.dt);
2139+
2140+ // Cycle time levels
2141+ {
2142+ cl_double *tmp = phi_p_p;
2143+ phi_p_p = phi_p;
2144+ phi_p = phi;
2145+ phi = tmp;
2146+ }
2147+
2148+ // TODO: We create the program and allocate the buffers each time,
2149+ // which is slow. But then, we only want to test correctness, not
2150+ // performance. (Yet?)
2151+ int const ierr =
2152+ exec_scalarwave_kernel (source, phi, phi_p, phi_p_p, &grid);
2153+ assert(!ierr);
2154+
2155+ } // for n
2156+
2157+ for (int i=0; i<NX; ++i) {
2158+ int const j = i;
2159+ int const k = i;
2160+ double const x = grid.dx*i;
2161+ double const y = grid.dy*j;
2162+ double const z = grid.dz*k;
2163+ int const ind3d = i+grid.ai*(j+grid.aj*k);
2164+
2165+ printf ("phi(%-8g,%-8g,%-8g) = %g\n", x,y,z, phi[ind3d]);
2166+ }
2167+
2168+ printf ("Done.\n");
2169+
2170+ return 0;
2171+}
2172
2173=== added file 'examples/scalarwave/scalarwave.cl'
2174--- examples/scalarwave/scalarwave.cl 1970-01-01 00:00:00 +0000
2175+++ examples/scalarwave/scalarwave.cl 2011-12-14 15:59:07 +0000
2176@@ -0,0 +1,94 @@
2177+// Evolve the scalar wave equation with Dirichlet boundaries
2178+
2179+/* This kernel is very short. To run efficiently, probably the
2180+ following optimizations need to occur:
2181+ - Vectorization (with the device's natural vector length)
2182+ - Maybe: Loop unrolling with small 3D blocks
2183+ - Small explicit 3D loops (to amortize stencil loads, aka "loop
2184+ blocking")
2185+ - Multi-threading (aka parallelization)
2186+ - Hoist setup operations (mostly integer operations) out of the
2187+ kernel loop
2188+ None of these are implemented explicitly here. We could provide
2189+ several optimized versions, and then compare with pocl's
2190+ capabilities.
2191+ */
2192+
2193+typedef struct grid_t {
2194+ double dt; // time step
2195+ double dx, dy, dz; // resolution
2196+ int ai, aj, ak; // allocated size
2197+ int ni, nj, nk; // used size
2198+} grid_t;
2199+
2200+void
2201+scalarwave(global double *restrict const phi,
2202+ global double const *restrict const phi_p,
2203+ global double const *restrict const phi_p_p,
2204+ constant grid_t *restrict const grid)
2205+{
2206+ /* printf("dt=%g\n", grid->dt); */
2207+ /* printf("dxyz=[%g,%g,%g]\n", grid->dx, grid->dy, grid->dz); */
2208+ /* printf("aijk=[%d,%d,%d]\n", grid->ai, grid->aj, grid->ak); */
2209+ /* printf("nijk=[%d,%d,%d]\n", grid->ni, grid->nj, grid->nk); */
2210+
2211+ double const dt = grid->dt;
2212+
2213+ double const dx = grid->dx;
2214+ double const dy = grid->dy;
2215+ double const dz = grid->dz;
2216+
2217+ double const dt2 = pown(dt,2);
2218+
2219+ double const idx2 = pown(dx,-2);
2220+ double const idy2 = pown(dy,-2);
2221+ double const idz2 = pown(dz,-2);
2222+
2223+ size_t const ai = grid->ai;
2224+ size_t const aj = grid->aj;
2225+ size_t const ak = grid->ak;
2226+
2227+ size_t const ni = grid->ni;
2228+ size_t const nj = grid->nj;
2229+ size_t const nk = grid->nk;
2230+
2231+ size_t const di = 1;
2232+ size_t const dj = di * ai;
2233+ size_t const dk = dj * aj;
2234+
2235+#if 0
2236+ printf("work_dim =%u\n", get_work_dim());
2237+ printf("global_size =[%zu,%zu,%zu]\n", get_global_size(0), get_global_size(1), get_global_size(2));
2238+ printf("global_id =[%zu,%zu,%zu]\n", get_global_id(0), get_global_id(1), get_global_id(2));
2239+ printf("local_size =[%zu,%zu,%zu]\n", get_local_size(0), get_local_size(1), get_local_size(2));
2240+ printf("local_id =[%zu,%zu,%zu]\n", get_local_id(0), get_local_id(1), get_local_id(2));
2241+ printf("num_groups =[%zu,%zu,%zu]\n", get_num_groups(0), get_num_groups(1), get_num_groups(2));
2242+ printf("group_id =[%zu,%zu,%zu]\n", get_group_id(0), get_group_id(1), get_group_id(2));
2243+ printf("global_offset=[%zu,%zu,%zu]\n", get_global_offset(0), get_global_offset(1), get_global_offset(2));
2244+#endif
2245+
2246+ size_t const i = get_global_id(0);
2247+ size_t const j = get_global_id(1);
2248+ size_t const k = get_global_id(2);
2249+
2250+ // If outside the domain, do nothing
2251+ if (i>=ni || j>=nj || k>=nk) return;
2252+
2253+ size_t const ind3d = di*i + dj*j + dk*k;
2254+
2255+ if (i==0 || i==ni-1 || j==0 || j==nj-1 || k==0 || k==nk-1) {
2256+ // Boundary condition
2257+
2258+ phi[ind3d] = 0.0;
2259+
2260+ } else {
2261+ // Scalar wave equation
2262+
2263+ phi[ind3d] =
2264+ 2.0 * phi_p[ind3d] - phi_p_p[ind3d] +
2265+ dt2 * ((phi_p[ind3d-di] - 2.0*phi_p[ind3d] + phi_p[ind3d+di]) * idx2 +
2266+ (phi_p[ind3d-dj] - 2.0*phi_p[ind3d] + phi_p[ind3d+dj]) * idy2 +
2267+ (phi_p[ind3d-dk] - 2.0*phi_p[ind3d] + phi_p[ind3d+dk]) * idz2);
2268+
2269+ }
2270+}
2271
2272=== modified file 'examples/trig/trig_exec.c'
2273--- examples/trig/trig_exec.c 2011-10-24 17:55:01 +0000
2274+++ examples/trig/trig_exec.c 2011-12-14 15:59:07 +0000
2275@@ -25,7 +25,7 @@
2276 cl_int err;
2277
2278 // create the OpenCL context on a GPU device
2279- context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU,
2280+ context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU,
2281 NULL, NULL, NULL);
2282 if (context == (cl_context)0)
2283 return -1;
2284
2285=== modified file 'include/_kernel.h'
2286--- include/_kernel.h 2011-12-09 13:35:18 +0000
2287+++ include/_kernel.h 2011-12-14 15:59:07 +0000
2288@@ -424,14 +424,14 @@
2289
2290
2291 /* Work-Item Functions */
2292
2293-// uint get_work_dim();
2294-uint get_global_size(uint); // should return size_t
2295-uint get_global_id(uint); // should return size_t
2296-// size_t get_local_size(uint);
2297-uint get_local_id(uint); // should return size_t
2298-uint get_num_groups(uint); // should return size_t
2299-uint get_group_id(uint); // should return size_t
2300-// size_t get_global_offset(uint);
2301+uint get_work_dim();
2302+size_t get_global_size(uint);
2303+size_t get_global_id(uint);
2304+size_t get_local_size(uint);
2305+size_t get_local_id(uint);
2306+size_t get_num_groups(uint);
2307+size_t get_group_id(uint);
2308+size_t get_global_offset(uint);
2309
2310 __attribute__ ((noinline)) void barrier (cl_mem_fence_flags flags);
2311
2312@@ -1304,6 +1304,7 @@
2313 _CL_DECLARE_FUNC_G_GG(rotate)
2314 _CL_DECLARE_FUNC_G_GG(sub_sat)
2315 _CL_DECLARE_FUNC_LG_GUG(upsample)
2316+_CL_DECLARE_FUNC_G_G(popcount)
2317 _CL_DECLARE_FUNC_J_JJJ(mad24)
2318 _CL_DECLARE_FUNC_J_JJ(mul24)
2319
2320@@ -1613,5 +1614,5 @@
2321 // shuffle2
2322
2323
2324
2325-/* printf */
2326-// int printf(constant char * restrict format, ...);
2327+int printf(const /*constant*/ char * restrict format, ...)
2328+ __attribute__((format(printf, 1, 2)));
2329
2330=== modified file 'lib/CL/Makefile.am'
2331--- lib/CL/Makefile.am 2011-12-07 13:02:21 +0000
2332+++ lib/CL/Makefile.am 2011-12-14 15:59:07 +0000
2333@@ -46,6 +46,7 @@
2334 clCreateContext.c \
2335 clGetProgramBuildInfo.c \
2336 clEnqueueWriteBuffer.c \
2337+ clFinish.c \
2338 clGetKernelWorkGroupInfo.c \
2339 pocl_cl.h \
2340 devices/devices.h \
2341
2342=== modified file 'lib/CL/clCreateBuffer.c'
2343--- lib/CL/clCreateBuffer.c 2011-10-14 10:31:27 +0000
2344+++ lib/CL/clCreateBuffer.c 2011-12-14 15:59:07 +0000
2345@@ -33,7 +33,7 @@
2346 cl_mem mem;
2347 cl_device_id device_id;
2348 void *device_ptr;
2349- unsigned i;
2350+ unsigned i, j;
2351
2352 if (context == NULL)
2353 POCL_ERROR(CL_INVALID_CONTEXT);
2354@@ -54,7 +54,12 @@
2355 device_ptr = device_id->malloc(device_id->data, flags, size, host_ptr);
2356 if (device_ptr == NULL)
2357 {
2358- clReleaseMemObject(mem);
2359+ for (j = 0; j < i; ++j)
2360+ {
2361+ device_id = context->devices[j];
2362+ device_id->free(device_id->data, flags, mem->device_ptrs[j]);
2363+ }
2364+ free(mem);
2365 POCL_ERROR(CL_MEM_OBJECT_ALLOCATION_FAILURE);
2366 }
2367 mem->device_ptrs[i] = device_ptr;
2368
2369=== modified file 'lib/CL/clCreateKernel.c'
2370--- lib/CL/clCreateKernel.c 2011-12-05 11:21:04 +0000
2371+++ lib/CL/clCreateKernel.c 2011-12-14 15:59:07 +0000
2372@@ -125,6 +125,13 @@
2373 sizeof (struct pocl_argument));
2374 kernel->next = NULL;
2375
2376+ /* Initialize kernel arguments (in case the user doesn't). */
2377+ for (i = 0; i < kernel->num_args; ++i)
2378+ {
2379+ kernel->arguments[i].value = NULL;
2380+ kernel->arguments[i].size = 0;
2381+ }
2382+
2383 /* Fill up automatic local arguments. */
2384 for (i = 0; i < kernel->num_locals; ++i)
2385 {
2386@@ -133,14 +140,9 @@
2387 ((size_t *) lt_dlsym(dlhandle, "_local_size"))[i];
2388 }
2389
2390- if (program->kernels == NULL)
2391- program->kernels = kernel;
2392- else {
2393- cl_kernel k = program->kernels;
2394- while (k->next != NULL)
2395- k = k->next;
2396- k->next = kernel;
2397- }
2398+ cl_kernel k = program->kernels;
2399+ program->kernels = kernel;
2400+ kernel->next = k;
2401
2402 return kernel;
2403 }
2404
2405=== modified file 'lib/CL/clEnqueueNDRangeKernel.c'
2406--- lib/CL/clEnqueueNDRangeKernel.c 2011-11-29 17:55:10 +0000
2407+++ lib/CL/clEnqueueNDRangeKernel.c 2011-12-14 15:59:07 +0000
2408@@ -22,6 +22,7 @@
2409 */
2410
2411 #include "pocl_cl.h"
2412+#include <assert.h>
2413 #include <sys/stat.h>
2414 #include <unistd.h>
2415
2416@@ -62,8 +63,10 @@
2417 if (command_queue->context != kernel->context)
2418 return CL_INVALID_CONTEXT;
2419
2420- if (work_dim < 1 || work_dim > 3)
2421+ if (work_dim < 1 ||
2422+ work_dim > command_queue->device->max_work_item_dimensions)
2423 return CL_INVALID_WORK_DIMENSION;
2424+ assert(command_queue->device->max_work_item_dimensions <= 3);
2425
2426 if (global_work_offset != NULL)
2427 {
2428@@ -82,14 +85,36 @@
2429 global_y = work_dim > 1 ? global_work_size[1] : 1;
2430 global_z = work_dim > 2 ? global_work_size[2] : 1;
2431
2432+ if (global_x ==0 || global_y == 0 || global_z == 0)
2433+ return CL_INVALID_GLOBAL_WORK_SIZE;
2434+
2435 local_x = local_work_size[0];
2436 local_y = work_dim > 1 ? local_work_size[1] : 1;
2437 local_z = work_dim > 2 ? local_work_size[2] : 1;
2438
2439+ if (local_x * local_y * local_z > command_queue->device->max_work_group_size)
2440+ return CL_INVALID_WORK_GROUP_SIZE;
2441+
2442+ if (local_x > command_queue->device->max_work_item_sizes[0] ||
2443+ (work_dim > 1 &&
2444+ local_y > command_queue->device->max_work_item_sizes[1]) ||
2445+ (work_dim > 2 &&
2446+ local_z > command_queue->device->max_work_item_sizes[2]))
2447+ return CL_INVALID_WORK_ITEM_SIZE;
2448+
2449+ if (global_x % local_x != 0 ||
2450+ global_y % local_y != 0 ||
2451+ global_z % local_z != 0)
2452+ return CL_INVALID_WORK_GROUP_SIZE;
2453+
2454 tmpdir = mkdtemp(template);
2455 if (tmpdir == NULL)
2456 return CL_OUT_OF_HOST_MEMORY;
2457
2458+ if ((event_wait_list == NULL && num_events_in_wait_list > 0) ||
2459+ (event_wait_list != NULL && num_events_in_wait_list == 0))
2460+ return CL_INVALID_EVENT_WAIT_LIST;
2461+
2462 error = snprintf(kernel_filename, POCL_FILENAME_LENGTH,
2463 "%s/kernel.bc",
2464 tmpdir);
2465@@ -136,6 +161,9 @@
2466 pc.num_groups[0] = global_x / local_x;
2467 pc.num_groups[1] = global_y / local_y;
2468 pc.num_groups[2] = global_z / local_z;
2469+ pc.global_offset[0] = offset_x;
2470+ pc.global_offset[1] = offset_y;
2471+ pc.global_offset[2] = offset_z;
2472
2473 command_queue->device->run(command_queue->device->data,
2474 parallel_filename,
2475
2476=== added file 'lib/CL/clFinish.c'
2477--- lib/CL/clFinish.c 1970-01-01 00:00:00 +0000
2478+++ lib/CL/clFinish.c 2011-12-14 15:59:07 +0000
2479@@ -0,0 +1,31 @@
2480+/* OpenCL runtime library: clFinish()
2481+
2482+ Copyright (c) 2011 Erik Schnetter
2483+
2484+ Permission is hereby granted, free of charge, to any person obtaining a copy
2485+ of this software and associated documentation files (the "Software"), to deal
2486+ in the Software without restriction, including without limitation the rights
2487+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
2488+ copies of the Software, and to permit persons to whom the Software is
2489+ furnished to do so, subject to the following conditions:
2490+
2491+ The above copyright notice and this permission notice shall be included in
2492+ all copies or substantial portions of the Software.
2493+
2494+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2495+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2496+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2497+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
2498+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2499+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
2500+ THE SOFTWARE.
2501+*/
2502+
2503+#include "pocl_cl.h"
2504+
2505+CL_API_ENTRY cl_int CL_API_CALL
2506+clFinish(cl_command_queue command_queue)
2507+{
2508+ /* All operations are serialised, so we never have to wait */
2509+ return CL_SUCCESS;
2510+}
2511
2512=== modified file 'lib/CL/clGetDeviceIDs.c'
2513--- lib/CL/clGetDeviceIDs.c 2011-12-02 13:09:02 +0000
2514+++ lib/CL/clGetDeviceIDs.c 2011-12-14 15:59:07 +0000
2515@@ -42,12 +42,11 @@
2516 return CL_INVALID_PLATFORM;
2517
2518 // Currently - POCL supports only the host device - i.e. a CPU
2519- if (device_type & CL_DEVICE_TYPE_CPU ||
2520- device_type & CL_DEVICE_TYPE_DEFAULT ||
2521- device_type & CL_DEVICE_TYPE_ALL)
2522+ if ((device_type & CL_DEVICE_TYPE_CPU) ||
2523+ (device_type & CL_DEVICE_TYPE_DEFAULT))
2524 num = 1;
2525- else if (device_type == CL_DEVICE_TYPE_GPU ||
2526- device_type == CL_DEVICE_TYPE_ACCELERATOR )
2527+ else if ((device_type | CL_DEVICE_TYPE_GPU) ||
2528+ (device_type | CL_DEVICE_TYPE_ACCELERATOR))
2529 num = 0;
2530 else
2531 return CL_INVALID_DEVICE_TYPE;
2532
2533=== modified file 'lib/CL/clReleaseKernel.c'
2534--- lib/CL/clReleaseKernel.c 2011-10-14 10:31:27 +0000
2535+++ lib/CL/clReleaseKernel.c 2011-12-14 15:59:07 +0000
2536@@ -26,6 +26,11 @@
2537 CL_API_ENTRY cl_int CL_API_CALL
2538 clReleaseKernel(cl_kernel kernel) CL_API_SUFFIX__VERSION_1_0
2539 {
2540- free(kernel);
2541+ /* TODO: Remove kernel from the program's linked list of kernels! */
2542+ /* TODO: Deallocate kernel arguments! */
2543+ /* In the mean time, we better don't free the kernel, but just make
2544+ it unusable... */
2545+ /*free(kernel);*/
2546+ memset(kernel, -1, sizeof *kernel);
2547 return CL_SUCCESS;
2548 }
2549
2550=== modified file 'lib/CL/clReleaseMemObject.c'
2551--- lib/CL/clReleaseMemObject.c 2011-10-14 10:31:27 +0000
2552+++ lib/CL/clReleaseMemObject.c 2011-12-14 15:59:07 +0000
2553@@ -32,7 +32,7 @@
2554 for (i = 0; i < memobj->context->num_devices; ++i)
2555 {
2556 device_id = memobj->context->devices[i];
2557- device_id->free(device_id->data, memobj->device_ptrs[i]);
2558+ device_id->free(device_id->data, memobj->flags, memobj->device_ptrs[i]);
2559 }
2560
2561 free(memobj->device_ptrs);
2562
2563=== modified file 'lib/CL/clSetKernelArg.c'
2564--- lib/CL/clSetKernelArg.c 2011-12-05 10:43:51 +0000
2565+++ lib/CL/clSetKernelArg.c 2011-12-14 15:59:07 +0000
2566@@ -45,9 +45,10 @@
2567
2568 p = &(kernel->arguments[arg_index]);
2569
2570- p->value = NULL;
2571 if (arg_value != NULL)
2572 {
2573+ free (p->value);
2574+
2575 value = malloc (arg_size);
2576 if (value == NULL)
2577 return CL_OUT_OF_HOST_MEMORY;
2578@@ -56,6 +57,11 @@
2579
2580 p->value = value;
2581 }
2582+ else
2583+ {
2584+ free (p->value);
2585+ p->value = NULL;
2586+ }
2587
2588 p->size = arg_size;
2589
2590
2591=== modified file 'lib/CL/devices/native/native.c'
2592--- lib/CL/devices/native/native.c 2011-12-05 10:43:51 +0000
2593+++ lib/CL/devices/native/native.c 2011-12-14 15:59:07 +0000
2594@@ -30,15 +30,7 @@
2595 #define COMMAND_LENGTH 256
2596 #define WORKGROUP_STRING_LENGTH 128
2597
2598-struct pointer_list {
2599- void *pointer;
2600- struct pointer_list *next;
2601-};
2602-
2603 struct data {
2604- /* Buffers where host pointer is used, and thus
2605- should not be deallocated on free. */
2606- struct pointer_list *host_buffers;
2607 /* Currently loaded kernel. */
2608 cl_kernel current_kernel;
2609 /* Loaded kernel dynamic library handle. */
2610@@ -54,7 +46,6 @@
2611
2612 d = (struct data *) malloc (sizeof (struct data));
2613
2614- d->host_buffers = NULL;
2615 d->current_kernel = NULL;
2616 d->current_dlhandle = 0;
2617
2618@@ -65,11 +56,7 @@
2619 pocl_native_malloc (void *data, cl_mem_flags flags,
2620 size_t size, void *host_ptr)
2621 {
2622- struct data *d;
2623 void *b;
2624- struct pointer_list *p;
2625-
2626- d = (struct data *) data;
2627
2628 if (flags & CL_MEM_COPY_HOST_PTR)
2629 {
2630@@ -84,19 +71,6 @@
2631
2632 if (host_ptr != NULL)
2633 {
2634- if (d->host_buffers == NULL)
2635- d->host_buffers = malloc (sizeof (struct pointer_list));
2636-
2637- p = d->host_buffers;
2638- while (p->next != NULL)
2639- p = p->next;
2640-
2641- p->next = malloc (sizeof (struct pointer_list));
2642- p = p->next;
2643-
2644- p->pointer = host_ptr;
2645- p->next = NULL;
2646-
2647 return host_ptr;
2648 }
2649
2650@@ -107,27 +81,16 @@
2651 }
2652
2653 void
2654-pocl_native_free (void *data, void *ptr)
2655+pocl_native_free (void *data, cl_mem_flags flags, void *ptr)
2656 {
2657- struct data *d;
2658- struct pointer_list *p;
2659-
2660- d = (struct data *) data;
2661-
2662- p = d->host_buffers;
2663- while (p != NULL)
2664- {
2665- if (p->pointer == ptr)
2666- return;
2667-
2668- p = p->next;
2669- }
2670+ if (flags & CL_MEM_COPY_HOST_PTR)
2671+ return;
2672
2673 free (ptr);
2674 }
2675
2676 void
2677-pocl_native_read (void *data, void *host_ptr, void *device_ptr, size_t cb)
2678+pocl_native_read (void *data, void *host_ptr, const void *device_ptr, size_t cb)
2679 {
2680 if (host_ptr == device_ptr)
2681 return;
2682@@ -284,12 +247,12 @@
2683 for (i = 0; i < kernel->num_args; ++i)
2684 {
2685 if (kernel->arg_is_local[i])
2686- pocl_native_free(data, *(void **)(arguments[i]));
2687+ pocl_native_free(data, 0, *(void **)(arguments[i]));
2688 }
2689 for (i = kernel->num_args;
2690 i < kernel->num_args + kernel->num_locals;
2691 ++i)
2692- pocl_native_free(data, *(void **)(arguments[i]));
2693+ pocl_native_free(data, 0, *(void **)(arguments[i]));
2694 }
2695 }
2696 }
2697
2698=== modified file 'lib/CL/devices/native/native.h'
2699--- lib/CL/devices/native/native.h 2011-12-05 10:43:51 +0000
2700+++ lib/CL/devices/native/native.h 2011-12-14 15:59:07 +0000
2701@@ -29,8 +29,8 @@
2702 void pocl_native_init (cl_device_id device);
2703 void *pocl_native_malloc (void *data, cl_mem_flags flags,
2704 size_t size, void *host_ptr);
2705-void pocl_native_free (void *data, void *ptr);
2706-void pocl_native_read (void *data, void *host_ptr, void *device_ptr, size_t cb);
2707+void pocl_native_free (void *data, cl_mem_flags flags, void *ptr);
2708+void pocl_native_read (void *data, void *host_ptr, const void *device_ptr, size_t cb);
2709 void pocl_native_write (void *data, const void *host_ptr, void *device_ptr, size_t cb);
2710 void pocl_native_run (void *data, const char *bytecode,
2711 cl_kernel kernel,
2712@@ -39,7 +39,7 @@
2713 extern size_t pocl_native_max_work_item_sizes[];
2714
2715 #define POCL_DEVICES_NATIVE { \
2716- CL_DEVICE_TYPE_GPU, /* type */ \
2717+ CL_DEVICE_TYPE_CPU, /* type */ \
2718 0, /* vendor_id */ \
2719 0, /* max_compute_units */ \
2720 1, /* max_work_item_dimensions */ \
2721
2722=== modified file 'lib/CL/devices/pthread/pthread.c'
2723--- lib/CL/devices/pthread/pthread.c 2011-12-05 10:43:51 +0000
2724+++ lib/CL/devices/pthread/pthread.c 2011-12-14 15:59:07 +0000
2725@@ -37,11 +37,6 @@
2726 for the thread execution. */
2727 #define THREAD_COUNT_ENV "POCL_MAX_PTHREAD_COUNT"
2728
2729-struct pointer_list {
2730- void *pointer;
2731- struct pointer_list *next;
2732-};
2733-
2734 struct thread_arguments {
2735 void *data;
2736 cl_kernel kernel;
2737@@ -52,9 +47,6 @@
2738 };
2739
2740 struct data {
2741- /* Buffers where host pointer is used, and thus
2742- should not be deallocated on free. */
2743- struct pointer_list *host_buffers;
2744 /* Currently loaded kernel. */
2745 cl_kernel current_kernel;
2746 /* Loaded kernel dynamic library handle. */
2747@@ -63,7 +55,10 @@
2748
2749 static void * workgroup_thread (void *p);
2750
2751-size_t pocl_pthread_max_work_item_sizes[] = {1};
2752+/* This could be SIZE_T_MAX, but setting it to INT_MAX should suffice,
2753+ and may avoid errors in user code that uses int instead of
2754+ size_t */
2755+size_t pocl_pthread_max_work_item_sizes[] = {CL_INT_MAX,CL_INT_MAX,CL_INT_MAX};
2756
2757 void
2758 pocl_pthread_init (cl_device_id device)
2759@@ -72,7 +67,6 @@
2760
2761 d = (struct data *) malloc (sizeof (struct data));
2762
2763- d->host_buffers = NULL;
2764 d->current_kernel = NULL;
2765 d->current_dlhandle = 0;
2766
2767@@ -80,14 +74,9 @@
2768 }
2769
2770 void *
2771-pocl_pthread_malloc (void *data, cl_mem_flags flags,
2772- size_t size, void *host_ptr)
2773+pocl_pthread_malloc (void *data, cl_mem_flags flags, size_t size, void *host_ptr)
2774 {
2775- struct data *d;
2776 void *b;
2777- struct pointer_list *p;
2778-
2779- d = (struct data *) data;
2780
2781 if (flags & CL_MEM_COPY_HOST_PTR)
2782 {
2783@@ -102,19 +91,6 @@
2784
2785 if (host_ptr != NULL)
2786 {
2787- if (d->host_buffers == NULL)
2788- d->host_buffers = malloc (sizeof (struct pointer_list));
2789-
2790- p = d->host_buffers;
2791- while (p->next != NULL)
2792- p = p->next;
2793-
2794- p->next = malloc (sizeof (struct pointer_list));
2795- p = p->next;
2796-
2797- p->pointer = host_ptr;
2798- p->next = NULL;
2799-
2800 return host_ptr;
2801 }
2802
2803@@ -125,27 +101,16 @@
2804 }
2805
2806 void
2807-pocl_pthread_free (void *data, void *ptr)
2808+pocl_pthread_free (void *data, cl_mem_flags flags, void *ptr)
2809 {
2810- struct data *d;
2811- struct pointer_list *p;
2812-
2813- d = (struct data *) data;
2814-
2815- p = d->host_buffers;
2816- while (p != NULL)
2817- {
2818- if (p->pointer == ptr)
2819- return;
2820-
2821- p = p->next;
2822- }
2823+ if (flags & CL_MEM_COPY_HOST_PTR)
2824+ return;
2825
2826 free (ptr);
2827 }
2828
2829 void
2830-pocl_pthread_read (void *data, void *host_ptr, void *device_ptr, size_t cb)
2831+pocl_pthread_read (void *data, void *host_ptr, const void *device_ptr, size_t cb)
2832 {
2833 if (host_ptr == device_ptr)
2834 return;
2835@@ -188,7 +153,7 @@
2836 if (access (cpuinfo, R_OK) == 0)
2837 {
2838 FILE *f = fopen (cpuinfo, "r");
2839-# define MAX_CPUINFO_SIZE 16*1024
2840+# define MAX_CPUINFO_SIZE 64*1024
2841 char contents[MAX_CPUINFO_SIZE];
2842 int num_read = fread (contents, 1, MAX_CPUINFO_SIZE - 1, f);
2843 fclose (f);
2844@@ -373,6 +338,8 @@
2845 /* In case the work group count is not divisible by the
2846 number of threads, we have to execute the remaining
2847 workgroups in one of the threads. */
2848+ /* TODO: This is inefficient; it is better to round up when
2849+ calculating wgs_per_thread */
2850 int leftover_wgs = num_groups_x - (num_threads*wgs_per_thread);
2851
2852 #ifdef DEBUG_MT
2853@@ -392,25 +359,25 @@
2854 first_gid_x, last_gid_x);
2855 #endif
2856
2857- pc->group_id[0] = first_gid_x;
2858-
2859 arguments[i].data = data;
2860 arguments[i].kernel = kernel;
2861 arguments[i].device = device;
2862 arguments[i].pc = *pc;
2863+ arguments[i].pc.group_id[0] = first_gid_x;
2864 arguments[i].workgroup = w;
2865 arguments[i].last_gid_x = last_gid_x;
2866
2867- pthread_create (&threads[i],
2868- NULL,
2869- workgroup_thread,
2870- &arguments[i]);
2871+ error = pthread_create (&threads[i],
2872+ NULL,
2873+ workgroup_thread,
2874+ &arguments[i]);
2875+ assert(!error);
2876 }
2877
2878 for (i = 0; i < num_threads; ++i) {
2879 pthread_join(threads[i], NULL);
2880 #ifdef DEBUG_MT
2881- printf("### thread %x finished\n", (unsigned)threads[i]);
2882+ printf("### thread %u finished\n", (unsigned)threads[i]);
2883 #endif
2884 }
2885
2886@@ -468,12 +435,18 @@
2887 for (i = 0; i < kernel->num_args; ++i)
2888 {
2889 if (kernel->arg_is_local[i])
2890- pocl_native_free(ta->data, *(void **)(arguments[i]));
2891+ {
2892+ pocl_pthread_free(ta->data, 0, *(void **)(arguments[i]));
2893+ free(arguments[i]);
2894+ }
2895 }
2896 for (i = kernel->num_args;
2897 i < kernel->num_args + kernel->num_locals;
2898 ++i)
2899- pocl_native_free(ta->data, *(void **)(arguments[i]));
2900+ {
2901+ pocl_pthread_free(ta->data, 0, *(void **)(arguments[i]));
2902+ free(arguments[i]);
2903+ }
2904
2905 return NULL;
2906 }
2907
2908=== modified file 'lib/CL/devices/pthread/pthread.h'
2909--- lib/CL/devices/pthread/pthread.h 2011-12-05 10:43:51 +0000
2910+++ lib/CL/devices/pthread/pthread.h 2011-12-14 15:59:07 +0000
2911@@ -29,8 +29,8 @@
2912 void pocl_pthread_init (cl_device_id device);
2913 void *pocl_pthread_malloc (void *data, cl_mem_flags flags,
2914 size_t size, void *host_ptr);
2915-void pocl_pthread_free (void *data, void *ptr);
2916-void pocl_pthread_read (void *data, void *host_ptr, void *device_ptr, size_t cb);
2917+void pocl_pthread_free (void *data, cl_mem_flags flags, void *ptr);
2918+void pocl_pthread_read (void *data, void *host_ptr, const void *device_ptr, size_t cb);
2919 void pocl_pthread_write (void *data, const void *host_ptr, void *device_ptr, size_t cb);
2920 void pocl_pthread_run (void *data, const char *bytecode,
2921 cl_kernel kernel,
2922@@ -39,14 +39,14 @@
2923 extern size_t pocl_pthread_max_work_item_sizes[];
2924
2925 #define POCL_DEVICES_PTHREAD { \
2926- CL_DEVICE_TYPE_GPU, /* type */ \
2927+ CL_DEVICE_TYPE_CPU, /* type */ \
2928 0, /* vendor_id */ \
2929 0, /* max_compute_units */ \
2930- 1, /* max_work_item_dimensions */ \
2931+ 3, /* max_work_item_dimensions */ \
2932 pocl_pthread_max_work_item_sizes, /* max_work_item_sizes */ \
2933- 1, /*max_work_group_size */ \
2934+ CL_INT_MAX, /* max_work_group_size */ \
2935 0, /* preferred_vector_width_char */ \
2936- 0, /* preferred_vector_width_shortr */ \
2937+ 0, /* preferred_vector_width_short */ \
2938 0, /* preferred_vector_width_int */ \
2939 0, /* preferred_vector_width_long */ \
2940 0, /* preferred_vector_width_float */ \
2941@@ -57,7 +57,7 @@
2942 CL_FALSE, /* image_support */ \
2943 0, /* max_read_image_args */ \
2944 0, /* max_write_image_args */ \
2945- 0, /*image2d_max_width */ \
2946+ 0, /* image2d_max_width */ \
2947 0, /* image2d_max_height */ \
2948 0, /* image3d_max_width */ \
2949 0, /* image3d_max_height */ \
2950
2951=== modified file 'lib/CL/pocl_cl.h'
2952--- lib/CL/pocl_cl.h 2011-12-05 10:43:51 +0000
2953+++ lib/CL/pocl_cl.h 2011-12-14 15:59:07 +0000
2954@@ -98,7 +98,7 @@
2955 void (*init) (cl_device_id device);
2956 void *(*malloc) (void *data, cl_mem_flags flags,
2957 size_t size, void *host_ptr);
2958- void (*free) (void *data, void *ptr);
2959+ void (*free) (void *data, cl_mem_flags flags, void *ptr);
2960 void (*read) (void *data, void *host_ptr, void *device_ptr, size_t cb);
2961 void (*write) (void *data, const void *host_ptr, void *device_ptr, size_t cb);
2962 void (*run) (void *data, const char *bytecode,
2963@@ -164,8 +164,8 @@
2964 cl_program program;
2965 /* implementation */
2966 lt_dlhandle dlhandle;
2967- int *arg_is_pointer;
2968- int *arg_is_local;
2969+ cl_int *arg_is_pointer;
2970+ cl_int *arg_is_local;
2971 cl_uint num_locals;
2972 struct pocl_argument *arguments;
2973 struct _cl_kernel *next;
2974
2975=== modified file 'lib/kernel/Makefile.am'
2976--- lib/kernel/Makefile.am 2011-12-01 16:44:06 +0000
2977+++ lib/kernel/Makefile.am 2011-12-14 15:59:07 +0000
2978@@ -37,10 +37,12 @@
2979 $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -I$(top_builddir) -include $(top_srcdir)/include/types.h -include $(top_srcdir)/include/_kernel.h -o $@ $<
2980
2981 .c.o:
2982- $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -I$(top_builddir) -o $@ $<
2983+ $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -I$(top_builddir) -include $(top_srcdir)/include/types.h -o $@ $<
2984
2985 .ll.o:
2986 $(LLVM_AS) -o $@ $<
2987
2988-$(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h
2989-$(libkernel_a_SOURCES:.cl=.o): $(top_srcdir)/include/_kernel.h templates.h
2990+# TODO: This misses the dependency on the device-specific include files
2991+# TODO: Dependencies should be generated automatically
2992+$(patsubst %.c, %.o, $(filter %.c, $(libkernel_a_SOURCES))): $(top_srcdir)/include/types.h
2993+$(patsubst %.cl, %.o, $(filter %.cl, $(libkernel_a_SOURCES))): $(top_srcdir)/include/types.h $(top_srcdir)/include/_kernel.h templates.h
2994
2995=== modified file 'lib/kernel/arm/Makefile.am'
2996--- lib/kernel/arm/Makefile.am 2011-12-01 16:44:06 +0000
2997+++ lib/kernel/arm/Makefile.am 2011-12-14 15:59:07 +0000
2998@@ -58,7 +58,9 @@
2999 endif
3000
3001 .c.o:
3002- $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -o $@ $<
3003+ $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/arm/types.h -o $@ $<
3004
3005-$(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h
3006-$(libkernel_a_SOURCES:.cl=.o): $(top_srcdir)/include/_kernel.h ../templates.h
3007+# TODO: This misses the dependency on the device-specific include files
3008+# TODO: Dependencies should be generated automatically
3009+$(patsubst %.c, %.o, $(filter %.c, $(libkernel_a_SOURCES))): $(top_srcdir)/include/arm/types.h
3010+$(patsubst %.cl, %.o, $(filter %.cl, $(libkernel_a_SOURCES))): $(top_srcdir)/include/types.h $(top_srcdir)/include/arm/types.h $(top_srcdir)/include/_kernel.h ../templates.h
3011
3012=== modified file 'lib/kernel/clz.cl'
3013--- lib/kernel/clz.cl 2011-10-27 00:18:42 +0000
3014+++ lib/kernel/clz.cl 2011-12-14 15:59:07 +0000
3015@@ -23,7 +23,7 @@
3016
3017 #include "templates.h"
3018
3019-// Intel: LZCNT (and POPCNT)
3020+// Intel: LZCNT
3021
3022 #define __builtin_clzhh __builtin_clz
3023 #define __builtin_clzh __builtin_clz
3024
3025=== modified file 'lib/kernel/get_global_id.c'
3026--- lib/kernel/get_global_id.c 2011-12-01 16:44:06 +0000
3027+++ lib/kernel/get_global_id.c 2011-12-14 15:59:07 +0000
3028@@ -33,14 +33,18 @@
3029 extern unsigned int _local_id_y;
3030 extern unsigned int _local_id_z;
3031
3032-unsigned int
3033+extern unsigned int _global_offset_x;
3034+extern unsigned int _global_offset_y;
3035+extern unsigned int _global_offset_z;
3036+
3037+size_t
3038 get_global_id(unsigned int dimindx)
3039 {
3040 switch(dimindx)
3041 {
3042- case 0: return _local_size_x * _group_id_x + _local_id_x;
3043- case 1: return _local_size_y * _group_id_y + _local_id_y;
3044- case 2: return _local_size_z * _group_id_z + _local_id_z;
3045+ case 0: return _global_offset_x + _local_size_x * _group_id_x + _local_id_x;
3046+ case 1: return _global_offset_y + _local_size_y * _group_id_y + _local_id_y;
3047+ case 2: return _global_offset_z + _local_size_z * _group_id_z + _local_id_z;
3048 default: return 0;
3049 }
3050 }
3051
3052=== added file 'lib/kernel/get_global_offset.c'
3053--- lib/kernel/get_global_offset.c 1970-01-01 00:00:00 +0000
3054+++ lib/kernel/get_global_offset.c 2011-12-14 15:59:07 +0000
3055@@ -0,0 +1,39 @@
3056+/* OpenCL built-in library: get_global_offset()
3057+
3058+ Copyright (c) 2011 Universidad Rey Juan Carlos
3059+
3060+ Permission is hereby granted, free of charge, to any person obtaining a copy
3061+ of this software and associated documentation files (the "Software"), to deal
3062+ in the Software without restriction, including without limitation the rights
3063+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
3064+ copies of the Software, and to permit persons to whom the Software is
3065+ furnished to do so, subject to the following conditions:
3066+
3067+ The above copyright notice and this permission notice shall be included in
3068+ all copies or substantial portions of the Software.
3069+
3070+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
3071+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
3072+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
3073+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
3074+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
3075+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
3076+ THE SOFTWARE.
3077+*/
3078+
3079+extern unsigned int _global_offset_x;
3080+extern unsigned int _global_offset_y;
3081+extern unsigned int _global_offset_z;
3082+
3083+size_t
3084+get_global_offset(unsigned int dimindx)
3085+{
3086+ switch(dimindx)
3087+ {
3088+ case 0: return _global_offset_x;
3089+ case 1: return _global_offset_y;
3090+ case 2: return _global_offset_z;
3091+ default: return 0;
3092+ }
3093+}
3094+
3095
3096=== modified file 'lib/kernel/get_global_size.c'
3097--- lib/kernel/get_global_size.c 2011-12-01 16:44:06 +0000
3098+++ lib/kernel/get_global_size.c 2011-12-14 15:59:07 +0000
3099@@ -29,7 +29,7 @@
3100 extern unsigned int _num_groups_y;
3101 extern unsigned int _num_groups_z;
3102
3103-unsigned int
3104+size_t
3105 get_global_size(unsigned int dimindx)
3106 {
3107 switch(dimindx)
3108
3109=== modified file 'lib/kernel/get_group_id.c'
3110--- lib/kernel/get_group_id.c 2011-12-01 16:44:06 +0000
3111+++ lib/kernel/get_group_id.c 2011-12-14 15:59:07 +0000
3112@@ -25,7 +25,7 @@
3113 extern unsigned int _group_id_y;
3114 extern unsigned int _group_id_z;
3115
3116-unsigned int
3117+size_t
3118 get_group_id(unsigned int dimindx)
3119 {
3120 switch(dimindx)
3121
3122=== modified file 'lib/kernel/get_local_id.c'
3123--- lib/kernel/get_local_id.c 2011-12-01 16:44:06 +0000
3124+++ lib/kernel/get_local_id.c 2011-12-14 15:59:07 +0000
3125@@ -25,14 +25,14 @@
3126 extern unsigned int _local_id_y;
3127 extern unsigned int _local_id_z;
3128
3129-unsigned int
3130+size_t
3131 get_local_id(unsigned int dimindx)
3132 {
3133 switch(dimindx)
3134 {
3135- case 0: return _local_id_x;
3136- case 1: return _local_id_y;
3137- case 2: return _local_id_z;
3138+ case 0: return _local_id_x;
3139+ case 1: return _local_id_y;
3140+ case 2: return _local_id_z;
3141 default: return 0;
3142 }
3143 }
3144
3145=== added file 'lib/kernel/get_local_size.c'
3146--- lib/kernel/get_local_size.c 1970-01-01 00:00:00 +0000
3147+++ lib/kernel/get_local_size.c 2011-12-14 15:59:07 +0000
3148@@ -0,0 +1,39 @@
3149+/* OpenCL built-in library: get_local_size()
3150+
3151+ Copyright (c) 2011 Universidad Rey Juan Carlos
3152+
3153+ Permission is hereby granted, free of charge, to any person obtaining a copy
3154+ of this software and associated documentation files (the "Software"), to deal
3155+ in the Software without restriction, including without limitation the rights
3156+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
3157+ copies of the Software, and to permit persons to whom the Software is
3158+ furnished to do so, subject to the following conditions:
3159+
3160+ The above copyright notice and this permission notice shall be included in
3161+ all copies or substantial portions of the Software.
3162+
3163+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
3164+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
3165+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
3166+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
3167+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
3168+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
3169+ THE SOFTWARE.
3170+*/
3171+
3172+extern unsigned int _local_size_x;
3173+extern unsigned int _local_size_y;
3174+extern unsigned int _local_size_z;
3175+
3176+size_t
3177+get_local_size(unsigned int dimindx)
3178+{
3179+ switch(dimindx)
3180+ {
3181+ case 0: return _local_size_x;
3182+ case 1: return _local_size_y;
3183+ case 2: return _local_size_z;
3184+ default: return 0;
3185+ }
3186+}
3187+
3188
3189=== modified file 'lib/kernel/get_num_groups.c'
3190--- lib/kernel/get_num_groups.c 2011-12-01 16:44:06 +0000
3191+++ lib/kernel/get_num_groups.c 2011-12-14 15:59:07 +0000
3192@@ -25,14 +25,14 @@
3193 extern unsigned int _num_groups_y;
3194 extern unsigned int _num_groups_z;
3195
3196-unsigned int
3197+size_t
3198 get_num_groups(unsigned int dimindx)
3199 {
3200 switch(dimindx)
3201 {
3202- case 0: return _num_groups_x;
3203- case 1: return _num_groups_y;
3204- case 2: return _num_groups_z;
3205+ case 0: return _num_groups_x;
3206+ case 1: return _num_groups_y;
3207+ case 2: return _num_groups_z;
3208 default: return 0;
3209 }
3210 }
3211
3212=== added file 'lib/kernel/get_work_dim.c'
3213--- lib/kernel/get_work_dim.c 1970-01-01 00:00:00 +0000
3214+++ lib/kernel/get_work_dim.c 2011-12-14 15:59:07 +0000
3215@@ -0,0 +1,31 @@
3216+/* OpenCL built-in library: get_work_dim()
3217+
3218+ Copyright (c) 2011 Universidad Rey Juan Carlos
3219+
3220+ Permission is hereby granted, free of charge, to any person obtaining a copy
3221+ of this software and associated documentation files (the "Software"), to deal
3222+ in the Software without restriction, including without limitation the rights
3223+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
3224+ copies of the Software, and to permit persons to whom the Software is
3225+ furnished to do so, subject to the following conditions:
3226+
3227+ The above copyright notice and this permission notice shall be included in
3228+ all copies or substantial portions of the Software.
3229+
3230+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
3231+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
3232+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
3233+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
3234+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
3235+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
3236+ THE SOFTWARE.
3237+*/
3238+
3239+extern unsigned int _work_dim;
3240+
3241+unsigned int
3242+get_work_dim()
3243+{
3244+ return _work_dim;
3245+}
3246+
3247
3248=== added file 'lib/kernel/popcount.cl'
3249--- lib/kernel/popcount.cl 1970-01-01 00:00:00 +0000
3250+++ lib/kernel/popcount.cl 2011-12-14 15:59:07 +0000
3251@@ -0,0 +1,35 @@
3252+/* OpenCL built-in library: popcount()
3253+
3254+ Copyright (c) 2011 Universidad Rey Juan Carlos
3255+
3256+ Permission is hereby granted, free of charge, to any person obtaining a copy
3257+ of this software and associated documentation files (the "Software"), to deal
3258+ in the Software without restriction, including without limitation the rights
3259+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
3260+ copies of the Software, and to permit persons to whom the Software is
3261+ furnished to do so, subject to the following conditions:
3262+
3263+ The above copyright notice and this permission notice shall be included in
3264+ all copies or substantial portions of the Software.
3265+
3266+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
3267+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
3268+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
3269+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
3270+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
3271+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
3272+ THE SOFTWARE.
3273+*/
3274+
3275+#include "templates.h"
3276+
3277+// Intel: POPCNT
3278+
3279+#define __builtin_popcounthh __builtin_popcount
3280+#define __builtin_popcounth __builtin_popcount
3281+#define __builtin_popcountuhh __builtin_popcount
3282+#define __builtin_popcountuh __builtin_popcount
3283+#define __builtin_popcountu __builtin_popcount
3284+#define __builtin_popcountul __builtin_popcountl
3285+
3286+DEFINE_BUILTIN_G_G(popcount)
3287
3288=== modified file 'lib/kernel/signbit.cl'
3289--- lib/kernel/signbit.cl 2011-11-05 00:10:25 +0000
3290+++ lib/kernel/signbit.cl 2011-12-14 15:59:07 +0000
3291@@ -21,6 +21,37 @@
3292 THE SOFTWARE.
3293 */
3294
3295-#include "templates.h"
3296-
3297-DEFINE_BUILTIN_K_V(signbit)
3298+#define IMPLEMENT_SIGNBIT_BUILTIN_FLOAT __builtin_signbitf(a)
3299+#define IMPLEMENT_SIGNBIT_BUILTIN_DOUBLE __builtin_signbit(a)
3300+#define IMPLEMENT_SIGNBIT_DIRECT \
3301+ ({ \
3302+ int bits = CHAR_BIT * sizeof(stype); \
3303+ *(jtype*)&a >> (jtype)(bits-1); \
3304+ })
3305+
3306+#define IMPLEMENT_DIRECT(NAME, VTYPE, STYPE, JTYPE, EXPR) \
3307+ JTYPE _cl_overloadable NAME(VTYPE a) \
3308+ { \
3309+ typedef VTYPE vtype; \
3310+ typedef STYPE stype; \
3311+ typedef JTYPE jtype; \
3312+ return EXPR; \
3313+ }
3314+
3315+
3316+
3317+IMPLEMENT_DIRECT(signbit, float , float, int , IMPLEMENT_SIGNBIT_BUILTIN_FLOAT)
3318+IMPLEMENT_DIRECT(signbit, float2 , float, int2 , IMPLEMENT_SIGNBIT_DIRECT)
3319+IMPLEMENT_DIRECT(signbit, float3 , float, int3 , IMPLEMENT_SIGNBIT_DIRECT)
3320+IMPLEMENT_DIRECT(signbit, float4 , float, int4 , IMPLEMENT_SIGNBIT_DIRECT)
3321+IMPLEMENT_DIRECT(signbit, float8 , float, int8 , IMPLEMENT_SIGNBIT_DIRECT)
3322+IMPLEMENT_DIRECT(signbit, float16, float, int16, IMPLEMENT_SIGNBIT_DIRECT)
3323+
3324+#ifdef cl_khr_fp64
3325+IMPLEMENT_DIRECT(signbit, double , double, int , IMPLEMENT_SIGNBIT_BUILTIN_DOUBLE)
3326+IMPLEMENT_DIRECT(signbit, double2 , double, long2 , IMPLEMENT_SIGNBIT_DIRECT)
3327+IMPLEMENT_DIRECT(signbit, double3 , double, long3 , IMPLEMENT_SIGNBIT_DIRECT)
3328+IMPLEMENT_DIRECT(signbit, double4 , double, long4 , IMPLEMENT_SIGNBIT_DIRECT)
3329+IMPLEMENT_DIRECT(signbit, double8 , double, long8 , IMPLEMENT_SIGNBIT_DIRECT)
3330+IMPLEMENT_DIRECT(signbit, double16, double, long16, IMPLEMENT_SIGNBIT_DIRECT)
3331+#endif
3332
3333=== modified file 'lib/kernel/sources.mk'
3334--- lib/kernel/sources.mk 2011-11-14 19:14:01 +0000
3335+++ lib/kernel/sources.mk 2011-12-14 15:59:07 +0000
3336@@ -1,10 +1,13 @@
3337 libkernel_a_SOURCES = templates.h \
3338 barrier.c \
3339+ get_work_dim.c \
3340 get_global_size.c \
3341 get_global_id.c \
3342+ get_local_size.c \
3343 get_local_id.c \
3344 get_num_groups.c \
3345 get_group_id.c \
3346+ get_global_offset.c \
3347 as_type.cl \
3348 convert_type.cl \
3349 acos.cl \
3350@@ -84,6 +87,7 @@
3351 rotate.cl \
3352 sub_sat.cl \
3353 upsample.cl \
3354+ popcount.cl \
3355 mad24.cl \
3356 mul24.cl \
3357 degrees.cl \
3358
3359=== modified file 'lib/kernel/tce/Makefile.am'
3360--- lib/kernel/tce/Makefile.am 2011-12-01 16:44:06 +0000
3361+++ lib/kernel/tce/Makefile.am 2011-12-14 15:59:07 +0000
3362@@ -53,7 +53,9 @@
3363 endif
3364
3365 .c.o:
3366- $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -o $@ $<
3367+ $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/tce/types.h -o $@ $<
3368
3369-$(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h
3370-$(libkernel_a_SOURCES:.cl=.o): $(top_srcdir)/include/_kernel.h ../templates.h
3371+# TODO: This misses the dependency on the device-specific include files
3372+# TODO: Dependencies should be generated automatically
3373+$(patsubst %.c, %.o, $(filter %.c, $(libkernel_a_SOURCES))): $(top_srcdir)/include/tce/types.h
3374+$(patsubst %.cl, %.o, $(filter %.cl, $(libkernel_a_SOURCES))): $(top_srcdir)/include/types.h $(top_srcdir)/include/tce/types.h $(top_srcdir)/include/_kernel.h ../templates.h
3375
3376=== modified file 'lib/kernel/templates.h'
3377--- lib/kernel/templates.h 2011-11-08 15:13:28 +0000
3378+++ lib/kernel/templates.h 2011-12-14 15:59:07 +0000
3379@@ -267,12 +267,11 @@
3380 { \
3381 return __builtin_##NAME(a); \
3382 } \
3383- __IF_INT64( \
3384 IMPLEMENT_BUILTIN_K_V(NAME, long2 , double2 , lo, hi) \
3385 IMPLEMENT_BUILTIN_K_V(NAME, long3 , double3 , lo, s2) \
3386 IMPLEMENT_BUILTIN_K_V(NAME, long4 , double4 , lo, hi) \
3387 IMPLEMENT_BUILTIN_K_V(NAME, long8 , double8 , lo, hi) \
3388- IMPLEMENT_BUILTIN_K_V(NAME, long16, double16, lo, hi)))
3389+ IMPLEMENT_BUILTIN_K_V(NAME, long16, double16, lo, hi))
3390
3391 /******************************************************************************/
3392
3393
3394=== modified file 'lib/kernel/x86_64/Makefile.am'
3395--- lib/kernel/x86_64/Makefile.am 2011-12-01 16:44:06 +0000
3396+++ lib/kernel/x86_64/Makefile.am 2011-12-14 15:59:07 +0000
3397@@ -51,10 +51,12 @@
3398 endif
3399
3400 .c.o:
3401- $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -o $@ $<
3402+ $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/x86_64/types.h -o $@ $<
3403
3404 .ll.o:
3405 $(LLVM_AS) -o $@ $<
3406
3407-$(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h
3408-$(libkernel_a_SOURCES:.cl=.o): $(top_srcdir)/include/_kernel.h ../templates.h
3409+# TODO: This misses the dependency on the device-specific include files
3410+# TODO: Dependencies should be generated automatically
3411+$(patsubst %.c, %.o, $(filter %.c, $(libkernel_a_SOURCES))): $(top_srcdir)/include/x86_64/types.h
3412+$(patsubst %.cl, %.o, $(filter %.cl, $(libkernel_a_SOURCES))): $(top_srcdir)/include/types.h $(top_srcdir)/include/x86_64/types.h $(top_srcdir)/include/_kernel.h ../templates.h
3413
3414=== removed file 'lib/kernel/x86_64/copysign.cl'
3415--- lib/kernel/x86_64/copysign.cl 2011-10-31 17:00:12 +0000
3416+++ lib/kernel/x86_64/copysign.cl 1970-01-01 00:00:00 +0000
3417@@ -1,169 +0,0 @@
3418-/* OpenCL built-in library: copysign()
3419-
3420- Copyright (c) 2011 Universidad Rey Juan Carlos
3421-
3422- Permission is hereby granted, free of charge, to any person obtaining a copy
3423- of this software and associated documentation files (the "Software"), to deal
3424- in the Software without restriction, including without limitation the rights
3425- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
3426- copies of the Software, and to permit persons to whom the Software is
3427- furnished to do so, subject to the following conditions:
3428-
3429- The above copyright notice and this permission notice shall be included in
3430- all copies or substantial portions of the Software.
3431-
3432- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
3433- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
3434- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
3435- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
3436- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
3437- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
3438- THE SOFTWARE.
3439-*/
3440-
3441-#if 0
3442-
3443-#include "../templates.h"
3444-
3445-// LLVM generates non-optimal code for this implementation
3446-DEFINE_EXPR_V_VV(copysign,
3447- ({
3448- int bits = CHAR_BIT * sizeof(stype);
3449- jtype sign_mask = (jtype)1 << (jtype)(bits - 1);
3450- jtype result = ((~sign_mask & *(jtype*)&a) |
3451- ( sign_mask & *(jtype*)&b));
3452- *(vtype*)&result;
3453- }))
3454-
3455-#endif
3456-
3457-
3458-
3459-#define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \
3460- TYPE _cl_overloadable NAME(TYPE a, TYPE b) \
3461- { \
3462- return EXPR; \
3463- }
3464-
3465-#define IMPLEMENT_UPCAST(NAME, TYPE, UPTYPE, LO) \
3466- TYPE _cl_overloadable NAME(TYPE a, TYPE b) \
3467- { \
3468- return NAME(*(UPTYPE*)&a, *(UPTYPE*)&b).LO; \
3469- }
3470-
3471-#define IMPLEMENT_SPLIT(NAME, TYPE, LO, HI) \
3472- TYPE _cl_overloadable NAME(TYPE a, TYPE b) \
3473- { \
3474- return (TYPE)(NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \
3475- }
3476-
3477-
3478-
3479-#define IMPLEMENT_COPYSIGN_DIRECT \
3480- ({ \
3481- int bits = CHAR_BIT * sizeof(stype); \
3482- jtype sign_mask = (jtype)1 << (jtype)(bits - 1); \
3483- jtype result = (~sign_mask & *(jtype*)&a) | (sign_mask & *(jtype*)&b); \
3484- *(vtype*)&result; \
3485- })
3486-#define IMPLEMENT_COPYSIGN_SSE_FLOAT4 \
3487- ({ \
3488- uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \
3489- __asm__ ("andps %[src], %[dst]" : \
3490- [dst] "+x" (a) : \
3491- [src] "x" (~sign_mask)); \
3492- __asm__ ("andps %[src], %[dst]" : \
3493- [dst] "+x" (b) : \
3494- [src] "x" (sign_mask)); \
3495- __asm__ ("orps %[src], %[dst]" : \
3496- [dst] "+x" (a) : \
3497- [src] "x" (b)); \
3498- a; \
3499- })
3500-#define IMPLEMENT_COPYSIGN_AVX_FLOAT8 \
3501- ({ \
3502- uint8 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, \
3503- 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \
3504- __asm__ ("andps256 %[src], %[dst]" : \
3505- [dst] "+x" (a) : \
3506- [src] "x" (~sign_mask)); \
3507- __asm__ ("andps256 %[src], %[dst]" : \
3508- [dst] "+x" (b) : \
3509- [src] "x" (sign_mask)); \
3510- __asm__ ("orps256 %[src], %[dst]" : \
3511- [dst] "+x" (a) : \
3512- [b] "x" (b)); \
3513- a; \
3514- })
3515-#define IMPLEMENT_COPYSIGN_SSE2_DOUBLE2 \
3516- ({ \
3517- ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL}; \
3518- __asm__ ("andpd %[src], %[dst]" : \
3519- [dst] "+x" (a) : \
3520- [src] "x" (~sign_mask)); \
3521- __asm__ ("andpd %[src], %[dst]" : \
3522- [dst] "+x" (b) : \
3523- [src] "x" (sign_mask)); \
3524- __asm__ ("orpd %[src], %[dst]" : \
3525- [dst] "+x" (a) : \
3526- [src] "x" (b)); \
3527- a; \
3528- })
3529-#define IMPLEMENT_COPYSIGN_AVX_DOUBLE4 \
3530- ({ \
3531- ulong4 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL, \
3532- 0x8000000000000000UL, 0x8000000000000000UL}; \
3533- __asm__ ("andpd256 %[src], %[dst]" : \
3534- [dst] "+x" (a) : \
3535- [src] "x" (~sign_mask)); \
3536- __asm__ ("andpd256 %[src], %[dst]" : \
3537- [dst] "+x" (b) : \
3538- [src] "x" (sign_mask)); \
3539- __asm__ ("orpd256 %[src], %[dst]" : \
3540- [dst] "+x" (a) : \
3541- [src] "x" (b)); \
3542- a; \
3543- })
3544-
3545-
3546-
3547-#ifdef __SSE__
3548-IMPLEMENT_DIRECT(copysign, float , IMPLEMENT_COPYSIGN_SSE_FLOAT4)
3549-IMPLEMENT_UPCAST(copysign, float2 , float4, lo)
3550-IMPLEMENT_UPCAST(copysign, float3 , float4, s012)
3551-IMPLEMENT_DIRECT(copysign, float4 , IMPLEMENT_COPYSIGN_SSE_FLOAT4)
3552-# ifdef __AVX__
3553-IMPLEMENT_DIRECT(copysign, float8 , IMPLEMENT_COPYSIGN_AVX_FLOAT8)
3554-# else
3555-IMPLEMENT_SPLIT (copysign, float8 , lo, hi)
3556-# endif
3557-IMPLEMENT_SPLIT (copysign, float16, lo, hi)
3558-#else
3559-IMPLEMENT_DIRECT(copysign, float , IMPLEMENT_COPYSIGN_DIRECT)
3560-IMPLEMENT_DIRECT(copysign, float2 , IMPLEMENT_COPYSIGN_DIRECT)
3561-IMPLEMENT_DIRECT(copysign, float3 , IMPLEMENT_COPYSIGN_DIRECT)
3562-IMPLEMENT_DIRECT(copysign, float4 , IMPLEMENT_COPYSIGN_DIRECT)
3563-IMPLEMENT_DIRECT(copysign, float8 , IMPLEMENT_COPYSIGN_DIRECT)
3564-IMPLEMENT_DIRECT(copysign, float16, IMPLEMENT_COPYSIGN_DIRECT)
3565-#endif
3566-
3567-#ifdef __SSE2__
3568-IMPLEMENT_DIRECT(copysign, double , IMPLEMENT_COPYSIGN_SSE2_DOUBLE2)
3569-IMPLEMENT_DIRECT(copysign, double2 , IMPLEMENT_COPYSIGN_SSE2_DOUBLE2)
3570-# ifdef __AVX__
3571-IMPLEMENT_UPCAST(copysign, double3 , double4, s012)
3572-IMPLEMENT_DIRECT(copysign, double4 , IMPLEMENT_COPYSIGN_AVX_DOUBLE4)
3573-# else
3574-IMPLEMENT_SPLIT (copysign, double3 , lo, s2)
3575-IMPLEMENT_SPLIT (copysign, double4 , lo, hi)
3576-# endif
3577-IMPLEMENT_SPLIT (copysign, double8 , lo, hi)
3578-IMPLEMENT_SPLIT (copysign, double16, lo, hi)
3579-#else
3580-IMPLEMENT_DIRECT(copysign, double , IMPLEMENT_COPYSIGN_DIRECT)
3581-IMPLEMENT_DIRECT(copysign, double2 , IMPLEMENT_COPYSIGN_DIRECT)
3582-IMPLEMENT_DIRECT(copysign, double3 , IMPLEMENT_COPYSIGN_DIRECT)
3583-IMPLEMENT_DIRECT(copysign, double4 , IMPLEMENT_COPYSIGN_DIRECT)
3584-IMPLEMENT_DIRECT(copysign, double8 , IMPLEMENT_COPYSIGN_DIRECT)
3585-IMPLEMENT_DIRECT(copysign, double16, IMPLEMENT_COPYSIGN_DIRECT)
3586-#endif
3587
3588=== modified file 'lib/llvmopencl/Workgroup.cc'
3589--- lib/llvmopencl/Workgroup.cc 2011-12-02 15:11:21 +0000
3590+++ lib/llvmopencl/Workgroup.cc 2011-12-14 15:59:07 +0000
3591@@ -184,6 +184,23 @@
3592
3593 IRBuilder<> builder(BasicBlock::Create(M.getContext(), "", L));
3594
3595+ // TODO: _num_groups_%c and friends should probably have type size_t
3596+ // instead of unsigned int, because this may avoid integer
3597+ // conversions when accessing these variables
3598+
3599+ // TODO: _num_groups_%c and friends should probably be stored as
3600+ // arrays instead of as 3 independent variables, because this may
3601+ // lead to better code when the respective get_* functions are
3602+ // called in a loop (array access instead of switch statement)
3603+
3604+ ptr = builder.CreateStructGEP(ai,
3605+ TypeBuilder<PoclContext, true>::WORK_DIM);
3606+ gv = M.getGlobalVariable("_work_dim");
3607+ if (gv != NULL) {
3608+ v = builder.CreateLoad(builder.CreateConstGEP1_32(ptr, 0));
3609+ builder.CreateStore(v, gv);
3610+ }
3611+
3612 ptr = builder.CreateStructGEP(ai,
3613 TypeBuilder<PoclContext, true>::GROUP_ID);
3614 for (int i = 0; i < 3; ++i) {
3615@@ -195,6 +212,8 @@
3616 }
3617 }
3618
3619+ ptr = builder.CreateStructGEP(ai,
3620+ TypeBuilder<PoclContext, true>::NUM_GROUPS);
3621 for (int i = 0; i < 3; ++i) {
3622 snprintf(s, STRING_LENGTH, "_num_groups_%c", 'x' + i);
3623 gv = M.getGlobalVariable(s);
3624@@ -204,6 +223,17 @@
3625 }
3626 }
3627
3628+ ptr = builder.CreateStructGEP(ai,
3629+ TypeBuilder<PoclContext, true>::GLOBAL_OFFSET);
3630+ for (int i = 0; i < 3; ++i) {
3631+ snprintf(s, STRING_LENGTH, "_global_offset_%c", 'x' + i);
3632+ gv = M.getGlobalVariable(s);
3633+ if (gv != NULL) {
3634+ v = builder.CreateLoad(builder.CreateConstGEP2_32(ptr, 0, i));
3635+ builder.CreateStore(v, gv);
3636+ }
3637+ }
3638+
3639 CallInst *c = builder.CreateCall(F, ArrayRef<Value*>(arguments));
3640 builder.CreateRetVoid();
3641
3642@@ -264,6 +294,44 @@
3643 }
3644 }
3645
3646+ // Privatize _work_dim
3647+ gv[0] = M.getGlobalVariable("_work_dim");
3648+ if (gv[0] != NULL) {
3649+ ai[0] = builder.CreateAlloca(gv[0]->getType()->getElementType(),
3650+ 0, "_work_dim");
3651+ if(gv[0]->hasInitializer()) {
3652+ Constant *c = gv[0]->getInitializer();
3653+ builder.CreateStore(c, ai[0]);
3654+ }
3655+ }
3656+ for (Function::iterator i = F->begin(), e = F->end(); i != e; ++i) {
3657+ for (BasicBlock::iterator ii = i->begin(), ee = i->end();
3658+ ii != ee; ++ii) {
3659+ ii->replaceUsesOfWith(gv[0], ai[0]);
3660+ }
3661+ }
3662+
3663+ // Privatize _num_groups
3664+ for (int i = 0; i < 3; ++i) {
3665+ snprintf(s, STRING_LENGTH, "_num_groups_%c", 'x' + i);
3666+ gv[i] = M.getGlobalVariable(s);
3667+ if (gv[i] != NULL) {
3668+ ai[i] = builder.CreateAlloca(gv[i]->getType()->getElementType(),
3669+ 0, s);
3670+ if(gv[i]->hasInitializer()) {
3671+ Constant *c = gv[i]->getInitializer();
3672+ builder.CreateStore(c, ai[i]);
3673+ }
3674+ }
3675+ }
3676+ for (Function::iterator i = F->begin(), e = F->end(); i != e; ++i) {
3677+ for (BasicBlock::iterator ii = i->begin(), ee = i->end();
3678+ ii != ee; ++ii) {
3679+ for (int j = 0; j < 3; ++j)
3680+ ii->replaceUsesOfWith(gv[j], ai[j]);
3681+ }
3682+ }
3683+
3684 // Privatize _group_id
3685 for (int i = 0; i < 3; ++i) {
3686 snprintf(s, STRING_LENGTH, "_group_id_%c", 'x' + i);
3687@@ -285,9 +353,9 @@
3688 }
3689 }
3690
3691- // Privatize _num_groups
3692+ // Privatize _global_offset
3693 for (int i = 0; i < 3; ++i) {
3694- snprintf(s, STRING_LENGTH, "_num_groups_%c", 'x' + i);
3695+ snprintf(s, STRING_LENGTH, "_global_offset_%c", 'x' + i);
3696 gv[i] = M.getGlobalVariable(s);
3697 if (gv[i] != NULL) {
3698 ai[i] = builder.CreateAlloca(gv[i]->getType()->getElementType(),
3699
3700=== modified file 'tests/testsuite.at'
3701--- tests/testsuite.at 2011-12-09 13:35:18 +0000
3702+++ tests/testsuite.at 2011-12-14 15:59:07 +0000
3703@@ -172,3 +172,34 @@
3704 ])
3705 AT_CHECK([$abs_top_builddir/examples/trig/trig], 0, expout)
3706 AT_CLEANUP
3707+
3708+AT_BANNER([Full applications])
3709+
3710+AT_SETUP([Scalar wave equation])
3711+AT_DATA([expout],
3712+[Initial condition: t=0
3713+Time step 1: t=0.03125
3714+Time step 2: t=0.0625
3715+Time step 3: t=0.09375
3716+Time step 4: t=0.125
3717+phi(0 ,0 ,0 ) = 0
3718+phi(0.0625 ,0.0625 ,0.0625 ) = 0.0057744
3719+phi(0.125 ,0.125 ,0.125 ) = 0.0435832
3720+phi(0.1875 ,0.1875 ,0.1875 ) = 0.133357
3721+phi(0.25 ,0.25 ,0.25 ) = 0.274951
3722+phi(0.3125 ,0.3125 ,0.3125 ) = 0.447032
3723+phi(0.375 ,0.375 ,0.375 ) = 0.613262
3724+phi(0.4375 ,0.4375 ,0.4375 ) = 0.733705
3725+phi(0.5 ,0.5 ,0.5 ) = 0.777678
3726+phi(0.5625 ,0.5625 ,0.5625 ) = 0.733705
3727+phi(0.625 ,0.625 ,0.625 ) = 0.613262
3728+phi(0.6875 ,0.6875 ,0.6875 ) = 0.447032
3729+phi(0.75 ,0.75 ,0.75 ) = 0.274951
3730+phi(0.8125 ,0.8125 ,0.8125 ) = 0.133357
3731+phi(0.875 ,0.875 ,0.875 ) = 0.0435832
3732+phi(0.9375 ,0.9375 ,0.9375 ) = 0.0057744
3733+phi(1 ,1 ,1 ) = 0
3734+Done.
3735+])
3736+AT_CHECK([$abs_top_builddir/examples/scalarwave/scalarwave], 0, expout)
3737+AT_CLEANUP